Classification and automatic scoring of arousal intensity during sleep stages using machine learning

Han, Hyewon; Seong, Min Jae; Hyeon, Janghun; Joo, Eunyeon; Oh, Junhyoung

doi:10.1038/s41598-023-50653-9

Download PDF

Article
Open access
Published: 12 March 2024

Classification and automatic scoring of arousal intensity during sleep stages using machine learning

Hyewon Han¹^na1,
Min Jae Seong²^na1,
Janghun Hyeon³,
Eunyeon Joo^4,5 &
…
Junhyoung Oh⁶

Scientific Reports volume 14, Article number: 5983 (2024) Cite this article

662 Accesses
Metrics details

Subjects

Abstract

Arousal during sleep can result in sleep fragmentation and various physiological effects, impairing cognitive function and raising blood pressure and heart rate. However, the current definition of arousal has limitations in assessing both amplitude and duration, making it challenging to measure sleep fragmentation accurately. Moreover, there is inconsistency among inter-raters in arousal scoring, which renders it susceptible to subjective variability. Therefore, this study aims to identify a highly accurate classifier for each sleep stage by employing optimized feature selection and machine learning models. According to electroencephalography (EEG) signals during the arousal phase, the intensity level was categorized into four levels. For control, the non-arousal cases were used as level 0 and referred as sham arousal, resulting in five arousal intensity levels. Wavelet transform was applied to analyze sleep arousal to extract features from EEG. Based on these features, we classified arousal intensity levels through machine learning algorithms. Due to the different characteristics of EEG in each sleep stage, the classification model was optimized for the four sleep stages. Excluding sham arousals, a total of 13,532 arousal events were used. The lowest intensity in the entire data, level 1, was computed to be 3107, level 2 was 3384, level 3 was 3472, and the highest intensity of level 4 was 3,569. The optimized classification model for each sleep stage achieved an average sensitivity of 82.68%, specificity of 95.68%, and AUROC of 96.30%. The sensitivity of the control, arousal intensity level 0, was 83.07%, a 1.25% increase over the unoptimized model and a 14.22% increase over previous research. This study used machine learning techniques to develop classifiers for each sleep stage, improving the accuracy of arousal intensity classification. The classifiers showed high sensitivity and specificity and revealed the unique characteristics of arousal intensity during different sleep stages. These findings represent a novel approach to arousal research and have implications for developing more accurate predictive models in sleep research.

Machine learning polysomnographically-derived electroencephalography biomarkers predictive of epworth sleepiness scale

Article Open access 05 June 2023

Classification of sleep apnea based on EEG sub-band signal characteristics

Article Open access 12 March 2021

Discriminating rapid eye movement sleep from wakefulness by analyzing high frequencies from single-channel EEG recordings in mice

Article Open access 13 June 2023

Introduction

Various sleep disorders are closely associated with disruptions in standard sleep patterns, characterized by increased wakefulness or arousal. Arousal can result in sleep fragmentation, leading to a decline in cognitive function and autonomic reflex activation, increasing blood pressure and heart rate^1,2,3,4. The standard definition of arousal is an abrupt shift in electroencephalogram (EEG) to a higher frequency, including alpha, theta, or beta, for at least 3 seconds, with at least 10 s of stable sleep preceding the change⁵.

Respiratory arousals during sleep are a significant predictor of the severity of obstructive sleep apnea (OSA) on polysomnography (PSG). While the severity of OSA is determined by the apnea-hypopnea index (AHI), there is evidence that the microstructure and intensity of arousal are closely correlated with the severity and could be the trait of the disease⁶. Despite the widespread use of the arousal index, which is defined as the frequency of arousals in PSG, to measure sleep fragmentation persists limited, as the standard definition from the American Academy of Sleep Medicine fails to consider variability in the duration and amplitude of EEG changes that meet this definition^7,8,9. Additionally, discrepancies in arousal between scorers during the visual scoring process of PSG are still prevalent. In order to address these issues, researchers have begun to focus on the microstructure of arousal and have introduced the concept of arousal intensity⁷. Since research in this area is still in the emergent stage, there is currently no established definition of arousal intensity among researchers, leading to differences in the methodology used in each study.

In previous studies by Azarbarzin et al., a scorer visually scaled arousal intensity for each of the two EEG channels on a scale of 0 to 9, based entirely on EEG appearance within the most intense region of the arousal duration^8,9. The study used a wavelet transform to extract specific features from the EEG signal related to arousal intensity to overcome the subjectivity and variability of visual scaling. The researchers trained seven different classifiers using different algorithms and values of k for k-nearest neighbor classifiers, with the training set made up of wavelet features and corresponding visually assigned arousal intensity scales. The study also found significant variability in arousal intensity among subjects, with increased heart rate response to arousal most strongly related to arousal intensity. This study introduced a method for automating the measurement of arousal intensity, which provides a tool to investigate the clinical consequences of arousal. A study by Amatoury et al. found that arousal intensity is a distinct pathophysiologic characteristic in OSA using this intensity scoring method¹⁰. In a recent study by Bahr et al., arousal intensity was defined as the amplitude of the electroencephalogram during respiratory arousals¹¹. The distance between the highest and lowest point on the vertical axis of the EEG was defined as the amplitude of each respiratory arousal. The results showed strong positive correlations between the average arousal intensity and the respiratory disturbance and arousal indexes. All of the above studies are the results of efforts to overcome the limitations of the arousal index, such as inter-rater variability and the somewhat ambiguous and imprecise arousal definition.

Compared to the rapidly developing field of machine learning and various EEG analysis techniques, arousal analysis using these techniques is lagging. To the best of our knowledge, no studies have applied different updated machine learning techniques to analyze arousals and improve accuracy. Machine learning has shown remarkable applicability and potential in healthcare due to its ability and robustness to process high-dimensional data, resulting in recent inflation in research applying machine learning to various medical fields. EEG analysis is no exception to this trend, with diverse research exploring the application of machine learning¹². Therfore, this study aims to utilize the concept of arousal intensity and apply it through machine learning techniques in order to address the limitations of traditional methods of measuring arousal.

In addition to medication and the presence of OSA, the sleep stage is also a significant factor affecting arousal¹³. Several studies have reported that arousal exhibits different characteristics depending on the sleep stage^11,14,15. However, the relationship between sleep stage and arousal, in terms of arousal intensity and automated scoring, remains unknown. Recognizing the significant impact different sleep stages have on arousal, our study explores the void in the investigations conducted so far in understanding the relationship between sleep stage and arousal in terms of arousal intensity and automated scoring. Precisely identifying the unique characteristics of arousal intensity during different sleep stages could establish a more reliable measure of sleep fragmentation, which can help diagnose and treat sleep disorders by providing insight into the differences in arousal characteristics associated with various sleep disorders.

Our research aims to advance sleep research by introducing a new way to measure sleep arousal intensity. Automatically measuring sleep arousal intensity through applying machine learning and introducing increased precision through analyzing arousal intensity by each sleep stage will provide insights for diagnosing and treating sleep disorders.

Methods

Data acquisition and ethical statement

Ethical approval was secured from the Institutional Review Board (IRB) at a local hospital (IRB no. 2023-02-078), ensuring the study’s conduct in a secure and ethical manner, with all data being anonymized. The IRB meticulously reviewed the study protocol and informed consent procedures. The study initially involved 101 patients diagnosed with OSA who underwent PSG between January and May 2022. The collected data encompassed PSG, EEG, and electromyography (EMG) features, along with participants’ demographics and sleep quality, as assessed through a structured questionnaire.

The analysis was confined to data interpreted by a single chosen scorer from multiple readers to curtail variability arising from potential scoring differences between scorers. Five of the 101 PSG results were deemed unusable due to the unavailability of EEG signal data, arousal event, and sleep stage data, culminating in a total of 96 valid results. A cumulative of 13,532 arousals were extracted from these 96 PSG results. A visualized workflow of the scoring and classification processes is delineated in Fig. 1.

The study utilized anonymized data and was conducted in compliance with the guidelines of the Declaration of Helsinki. Ethical approval was received from the Ethics Committee of Samsung Medical Center, Sungkyunkwan University (Ethical approval number: 2023-02-078), which carefully reviewed the study protocol and informed consent procedures to ensure the study was conducted in a safe and ethical manner. Informed consent was obtained from all study participants and their legal guardians.

Automatic arousal intensity scoring

The previous study, which introduced the term “arousal intensity,” scored the intensity based on visual assessments of signal graphs of EEG channel C3/4 by an expert⁸. However, in the later research, arousal intensity was defined as the mean distance between the highest and lowest points of the amplitude from the C3 and C4 channels of the EEG signal¹¹. Our study adapted the scoring method from the later study because it allows us to evaluate signals in a manner that most approximates the visual evaluation process while providing a more objective metric. Computed average distances were then binned into four levels of 1 to 4 using quantile. The intervals of the EEG signal and average distance for each level used for binning are shown in Table 1. To score intensity by contrasting stable sleep and arousal, 10 to 14 stable sleep periods of 9 seconds were selected from each PSG result. A total of 980 periods were selected, and we termed those “sham arousal,” as the previous research⁸. The intensity level was set as 0 since sham arousal is a period of stable sleep and not arousal. Therefore, five arousal intensity levels from 0 to 4 were constructed.

Table 1 Summarization of quartile range and intervals of the EEG signal used for scoring arousal intensity level.

Full size table

Classification

Feature extraction

All data were processed to learn and classify the arousal intensity levels generated. The data used were EEG (C3/4, O1/2, F3/4), EMG (Chin), sleep stages, and arousal durations. The overall methods and procedures for signal processing and feature extraction were referred to in the previous work⁸. Discrete wavelet transform (DWT) is a signal processing method that decomposes a given signal into sets, where each set is coefficients and a pre-defined function called wavelet. Unlike the Fourier transform, which represents a signal in the frequency domain, the DWT represents a signal in both the time and frequency domains simultaneously; therefore, it is known as more suitable for processing EEG signals¹⁶. There are several types of these functions, and in this study, the Daubechies order 4 wavelet was used, which is considered the most suitable for analyzing EEG signals¹⁷.

A 5-level DWT was performed on all signals from EEG and EMG for the arousal period and the same period before the arousal onset. Every preprocessing and analysis of the data was conducted with Python(version 3.9.9, Python Software Foundation), including the transformation. Before applying DWT, high-pass filtering with a cutoff frequency of 0.3 Hz was also applied to reduce the influence of the artifacts. 200 Hz EEG signals were resampled to 128 Hz for efficiency in signal processing. Figure 2 shows an example of the result of a 5-level DWT with a Daubechies order 4 wavelet on an EEG signal (C3). The calculated coefficients (D\(_1\)-D\(_5\) and A\(_5\)) were used to calculate the mean power, the mean of absolute values (MABS), the ratio of MABS for all combinations of each signal (e.g. \(\frac{C3/4\; MABS(D_1)}{C3/4\; MABS(D_2)}, \frac{C3/4\; MABS(D_1)}{C3/4\; MABS(D_3)}, \frac{C3/4\; MABS(D_1)}{C3/4\; MABS(D_4)}, \frac{C3/4\; MABS(D_1)}{C3/4\; MABS(D_5)}, \frac{C3/4\; MABS(D_1)}{C3/4\; MABS(A_5)}\), ...), and the total variance. Thus, a total of 132 features were generated, and all those features were normalized by dividing the arousal value by the pre-arousal value.

In addition to the features computed using coefficients resulting from the DWT, four more feature extraction methods widely used in EEG signal analysis to obtain additional features were applied: average power spectral intensity (PSI)¹⁸, average power, average root mean square value (RMS)¹⁹, and average detrended fluctuation analysis (DFA)²⁰. First, PSI provides information about the distribution of power across different frequencies. Second is average power, which provides the overall power of the EEG signal, irrespective of the frequency. Third, RMS shows information about the overall amplitude of the EEG signal. Lastly, DFA provides information about the complexity or irregularity of the EEG signal. Outcomes from the four techniques using the EEG C3/4 signals were used as additional features. As a result, a total of 136 features were generated.

Since there was not enough level 0 compared to other arousal intensity levels, a synthetic minority over-sampling technique (SMOTE) was applied to solve this data imbalance problem²¹. SMOTE is a technique that adopts the k-nearest neighbor algorithm to generate new data for underrepresented classes. It is one of the most common oversampling methods used to address the data imbalance problem in machine learning.

Classification and feature selection methods

Data were randomly selected and divided into a train set (80%) and a test set (20%). Two machine learning classifiers, random forest²² and LightGBM²³, were used. Random forest is a classifier consisting of a combination of decision trees built on random sub-samples of the dataset. LightGBM is a gradient-boosting framework designed to be fast and highly efficient. Both are machine learning algorithms commonly used in various fields when applying machine learning techniques due to their outstanding performance.

To reduce the dimensionality of the data, we included the feature selection process, and recursive feature elimination (RFE)²⁴ was used as the feature selection technique in this study. RFE is a prevailing feature selection method applied in machine learning. The method allows identifying and selecting the most important features in a dataset by recursively removing the least important ones. Cross-validation was employed in conjunction with RFE to enhance precision in this study. The feature selection procedure was performed for each classifier: random forest and LightGBM. The number of cross-validation was set to 5.

Optimization by sleep stages

Due to the different characteristics of each sleep stage, we optimized the classification model by sleep stages. Sleep can be divided into two broad categories: rapid eye movement (REM) sleep and non-REM (NREM) sleep. In REM sleep, diminished EMG tone is observed, and the EEG appears remarkably similar to the waking state with low-voltage, high-frequency²⁵. NREM sleep can be further categorized into three stages: N1, N2, and N3.

In N1, EEG shows moderate amplitude and irregularly spaced bursts of slow waves. N2 is characterized by a further increase in EEG amplitude and the frequent appearance of sleep spindles with K-complexes. Sleep spindles are clusters of occasional high-frequency spikes that occur periodically and last for several seconds. K-complexes are defined as the occurrence of sharp, clear, high-voltage bipolar waves lasting more than half a second. N3 is also known as slow-wave sleep. A larger EEG amplitude also characterizes it compared to the previous stage. As the stage escalates, sleep becomes more profound, and muscle activity decreases.

Table 2 Descriptive Statistics of participants’ demographics and polysomnography.

Full size table

During the classification process, we optimized performance for each of the four sleep stages: REM, N1, N2, and N3. The optimization was conducted through two phases: feature selection and hyperparameter tuning. For the hyperparameter tuning, a tree-structured parzen estimator (TPE) approach²⁶, which is a method based on Bayesian optimization, was utilized to sample the hyperparameter space, and it was pruned with the hyperband algorithm²⁷ for the efficiency of the process.

Table 3 Results of arousal intensity scoring using vertical distances of EEG signals (channel C3/4).

Full size table

Results

Demographic and polysomnographic characteristics of participants

Participants in the study had a mean age of 56.0 ± 14.6 years, and 72 (75%) were male. The polysomnographic data revealed a substantial arousal index at 32.3 ± 15.9, with the majority being respiratory at 24.4 ± 18.3. Notably, arousals were more prominent during non-REM (NREM) sleep, with an index of 33.0 ± 16.7 compared to 29.5 ± 18.0 during REM sleep.

The average AHI was marked at 38.5 ± 23.4, indicative of severe OSA. Correspondingly, the lowest saturation was noted at 82.3 ± 8.4%, and the oxygen desaturation index (ODI) was 32.1 ± 21.9, reflecting a significant desaturation burden attributable to OSA. The arousal index was 32.3 ± 15.9 with subcategories: respiratory 24.4 ± 18.3, spontaneous 4.0 ± 6.2, movement 0.7 ± 1.6, REM 29.5 ± 18.0, and NREM 33.0 ± 16.7. Sleep efficiency was 78.3 ± 11.5% (Table 2).

Arousal intensity level scoring

A total of 13,532 arousal events, excluding sham arousals, were divided into four levels according to the proposed scoring approach. In the total data, the lowest intensity of level 1 was calculated as 3,107, level 2 as 3384, level 3 as 3472, and finally, the highest intensity of level 4 was computed as 3569. Table 3 summarizes the numbers of data for each arousal intensity level, including sham arousals, by the dataset.

Table 4 Determined number of features and classifiers for each sleep stage. Based on the trial result, LightGBM was used for REM stage and the random forest algorithm was used for all the other sleep stages.

Full size table

Optimization

RFE with cross-validation was used to select features and determine the number of features. The feature selection process was conducted for each classifier and each sleep stage. Based on the average accuracy of 5 cross-validations, the combination of features with the highest mean accuracy was chosen. For the REM stage, the highest accuracy was achieved when 31 features were used with the LightGBM classifier. For the N1 and N2 stages, 31 and 36 features were selected, respectively, using the random forest. Lastly, for the N3 stage, 11 features were used using the random forest classifier. Table 4 summarizes the selected classifiers and features. Random forest classifiers were used in all the other stages except for the REM stage.

Classification

Sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) of the arousal intensity level classification results before optimization by sleep stage are shown in Fig. 3. Figure 4 visualizes the metrics of the results for each sleep stage. In the case of the N3 stage, the number of data belonging to this stage was relatively limited. Therefore, only level 3 and level 4 from the total five levels were present in the test set. The highest tier of arousal intensity, level 4, measured the highest scores for all evaluation metrics, regardless of the optimization. Without the optimization, all the outcome metrics for each intensity level in the classification results were comparable, but with the optimization, they differed marginally. This was especially noticeable in the results for REM sleep.

The optimization was driven by increasing the sensitivity of the control, sham arousal (i.e., arousal intensity level 0). Without the optimization process, the highest sensitivity of intensity level 0 using a single classifier was 81.82%, which was the case when using a random forest classifier and SMOTE simultaneously. When it optimized for each sleep stage, the average sensitivity of level 0 was 83.07%, an increase of 1.25% compared to the previous result. For the N2 stage, the sensitivity of arousal intensity level 0 was the highest among all the sleep stages, with a value of 86.42%.

Confusion matrices of the results are shown in Fig. 5. While the classification is overall well achieved, there was an average of 14.70% of cases where level 1 was classified as level 2 and an average of 13.84% of cases where level 2 was classified as level 1. For an average of 10.39% of the arousals, level 2 was categorized as level 3, and an average of 11.72% of arousal intensity level 3 was allocated as intensity level 2. In the remaining levels, there were minor misclassifications for the continuous levels, but it was less than 10% in all cases.

The average AUC value for the receiver operating characteristic (ROC) curve for sham arousal was 97.07%. As for the sensitivity, the AUC for intensity level 0 was highest at stage N2, with a value of 97.95%. The ROC curves by arousal intensity level for all four sleep stages are illustrated in Fig. 6. All the performance metrics of the optimized classification model by sleep stages for the test set are summarized in Table 5.

Relationship to prior works

While previous studies have primarily focused on using arousal intensity to investigate the relationship between sleep arousal and cardiovascular disease⁸ or to examine the association between sleep arousal severity and OSA¹⁰, this study took a different approach by focusing on developing an automated pipeline for the measurement and classification of arousal intensity. Unlike previous studies that relied heavily on manual assessment by experts to score arousal intensity, we introduced a fully automated pipeline, which not only simplified the process of scoring arousal intensity but also increased its potential for use in various studies.

While this research shares similarities with previous studies, particularly in its data processing techniques, we focused on designing and implementing an automated arousal intensity scoring algorithm. In particular, our study demonstrated that the classification algorithm outperformed previous studies. Since previous studies lacked a quantitative baseline, the only direct comparison in terms of performance is the classification result of sham arousal. In the previous study, 168 received the arousal intensity scale of 0, which was classified as sham arousal out of 244 sham arousal⁸. Therefore, the calculated hit rate of the previous study is 68.85%, and our proposed classifier achieved a hit rate of 83.07%, recording an improvement of 14.22% over previous results. This advancement represents an essential contribution to the field of sleep research and arousal intensity assessment.

Table 5 Summary table of sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and area under the receiver operating characteristic (AUROC) curves for arousal intensity classification results by the sleep stage-optimized model.

Full size table

Discussion

Sleep disorders, such as OSA, can severely impact an individual’s health, making accurate identification and classification of arousal intensity during sleep essential. However, the existing method of scoring arousal intensity involves the visual reading of all EEG recordings, which not only requires a dedicated human expert to perform the scoring but also makes all scores solely dependent on the subjective assessment of the expert. Therefore, our study aimed to overcome these shortcomings by proposing an automated sleep intensity scoring methodology. In addition, previous studies have utilized traditional machine learning methods to identify arousal precisely; however, this approach needs more accuracy. Therefore, this study aimed to develop a more accurate arousal intensity classification model according to sleep stage.

The classifier was optimized using feature selection and cross-validation techniques for each sleep stage, and the classification performance was evaluated using various metrics such as sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and area under receiver operating characteristic (AUROC). The results indicated that the proposed model achieved high sensitivity and specificity, particularly for the highest intensity level 4. The optimization process improved the sensitivity of control and sham arousal (level 0), which was essential for minimizing false positives and improving the overall accuracy of the model. The potential benefits of combining feature selection with other advanced techniques, such as machine learning, highlight the importance of ongoing research to identify new and innovative ways to optimize the performance of machine learning models. Previous studies utilizing machine learning models such as naive Bayes, decision tree, support vector machine, generalized linear model, and k-nearest neighbor have reported an accuracy range of approximately 81% to 93% for predictive power and a sensitivity range of 75% to 83% for arousal detection^28,29,30,31. The models utilized in this study included LightGBM and random forest, which have demonstrated effectiveness in various classification tasks. Therefore, the strength of the machine learning model used in our study lies in its ability to provide more precise classification and access to the arousal microstructure.

In addition, our study aimed to develop an accurate arousal intensity classifier based on sleep stage. The authors focus on the macroscopic aspects of arousal that are characteristic of arousal in OSA for each sleep stage, which provides an important foundation for our study. One researcher reported that the arousal threshold was found to be highest during N3 and lowest during N1 and N2, with REM having a lower threshold than N1 and N2¹³. These findings suggest that N3 sleep requires stronger stimuli to induce wakefulness and is less prone to progressing to wakefulness. Moreover, the arousal index is highest during N1 and lowest during N3, consistent with the arousal threshold³².

Although there are limited studies on the differences in the microstructure of arousal across sleep stages, a few studies have reported no significant difference in arousal intensity between healthy individuals and those with OSA, as well as between REM and NREM sleep¹⁰. However, it is difficult to conclude the relationship between arousal intensity and sleep stage based on visual scoring methods used in these studies, as appropriate classifiers for each sleep stage were not utilized.

Our study selected the combination of features with the highest mean accuracy for each sleep stage. The REM stage achieved the highest accuracy using 31 features with the LightGBM classifier, and the N1 stage selected 31 features with the random forest classifier. The N2 stage used 36 features with the random forest classifier, while the N3 stage used 11 features with the same classifier. Assuming that the microstructure of arousal is consistent across all sleep stages, it would be reasonable to expect that the selected feature and the classifier with the highest accuracy would demonstrate common characteristics. This represents a novel perspective in approaching arousal and its related factors and has important implications for developing more accurate and effective predictive models in sleep research.

However, there are some limitations to this study. Experiments could not be conducted using a more diversified set of machine learning models, and the optimization by sleep stages phase took a considerable amount of time since RFE, which selects features by removing features one by one and confirms the performance, was adopted. Another limitation is that since arousal intensity was calculated as the vertical distance from the EEG amplitude, time-frequency factors could not be included in the level calculation. In addition, we recognize that our data collection was limited to a local hospital, which may affect the diversity of participant composition. We also acknowledge that our study’s design, while robust, may not completely eliminate subject-related factors. In future research, exploring additional techniques for subject-independent analyses or conducting larger-scale studies may offer further insights and enhance the generalizability of our findings.

Conclusion

Our finding offers several notable advantages in the field of sleep arousal research. First and foremost, it proposed an automatic scoring method for arousal intensity, which does not require relying on human experts to inspect all EEG recordings and subjectively score them visually. Additionally, our proposed methodology introduces a refined and more granular approach to assessing sleep arousal intensity by analyzing it by each of the four sleep stages, contributing to a more nuanced understanding of sleep fragmentation. By categorizing arousal intensity into five distinct levels and employing machine learning techniques, this method enhances the precision and objectivity of sleep quality assessment. In conclusion, this study successfully developed classifiers optimized for each sleep stage to improve the accuracy of arousal intensity classification using machine learning techniques. The optimized classifiers showed high sensitivity and specificity for each sleep stage, achieving an average sensitivity of 82.68%, specificity of 95.68%, and AUROC of 96.30%. The sensitivity of the control, arousal intensity level 0, was 83.07%, a 1.25% increase over the unoptimized model. The findings shed light on the unique characteristics of arousal intensity during different sleep stages, and the selected features and classifiers with the highest accuracy demonstrated common characteristics. Developing a classifier for each sleep stage based on the concept of arousal intensity represents a novel approach to sleep arousal research. It has important implications for the future development of more accurate and effective predictive models in sleep research. The classification model, which is optimized for each sleep stage, not only provides a reliable means of distinguishing between different arousal intensity levels but also offers a valuable tool for researchers and clinicians seeking to quantify sleep disturbances accurately. Furthermore, the novel approach of this study to arousal research holds the promise of advancing the development of more accurate and helpful models in the field of sleep research, ultimately benefiting individuals affected by sleep disorders and promoting improved overall health.

Data availibility

The data that support the findings of this study are available from Samsung Medical Center, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are available from the authors upon reasonable request and with permission of Samsung Medical Center (eunyeon.joo@gmail.com).

References

Horner, R. Autonomic consequences of arousal from sleep: Mechanisms and implications. Sleep 19, S193–S195. https://doi.org/10.1093/sleep/19.suppl_10.S193 (1996).
Article CAS PubMed Google Scholar
Sforza, E., Jouny, C. & Ibanez, V. Cardiac activation during arousal in humans: Further evidence for hierarchy in the arousal response. Clin. Neurophysiol. 111, 1611–1619. https://doi.org/10.1016/S1388-2457(00)00363-1 (2000).
Article CAS PubMed Google Scholar
Engleman, H., Kingshott, R., Martin, S. & Douglas, N. Cognitive function in the sleep apnea/hypopnea syndrome (sahs). Sleep 23, S102–S108 (2000).
PubMed Google Scholar
Trinder, J. et al. On the nature of cardiovascular activation at an arousal from sleep. Sleep 26, 543–551 (2003).
PubMed Google Scholar
Iber, C., Ancoli-Israel, S., Chesson, A. & Quan, S. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications (American Academy of Sleep Medicine, 2007).
Google Scholar
Ludwig, K. et al. A novel quantitative arousal-associated EEG-metric to predict severity of respiratory distress in obstructive sleep apnea patients. Front. Physiol. 13, 885270 (2022).
Article PubMed PubMed Central Google Scholar
Younes, M. Role of arousals in the pathogenesis of obstructive sleep apnea. Am. J. Respir. Crit. Care Med. 169, 623–633. https://doi.org/10.1164/rccm.200307-1023OC (2004).
Article PubMed Google Scholar
Azarbarzin, A., Ostrowski, M., Hanly, P. & Younes, M. Relationship between arousal intensity and heart rate response to arousal. Sleep 37, 645–653. https://doi.org/10.5665/sleep.3560 (2014).
Article PubMed PubMed Central Google Scholar
Azarbarzin, A. et al. Arousal responses during overnight polysomnography and their reproducibility in healthy young adults. Sleep 38, 1313–1321. https://doi.org/10.5665/sleep.4916 (2015).
Article PubMed PubMed Central Google Scholar
Amatoury, J. et al. Arousal intensity is a distinct pathophysiological trait in obstructive sleep apnea. Sleep 39, 2091–2100. https://doi.org/10.5665/sleep.6304 (2016).
Article PubMed PubMed Central Google Scholar
Bahr, K. et al. Intensity of respiratory cortical arousals is a distinct pathophysiologic feature and is associated with disease severity in obstructive sleep apnea patients. Brain Sci.https://doi.org/10.3390/brainsci11030282 (2021).
Article PubMed PubMed Central Google Scholar
Hosseini, M.-P., Hosseini, A. & Ahi, K. A review on machine learning for EEG signal processing in bioengineering. IEEE Rev. Biomed. Eng. 14, 204–218 (2020).
Article Google Scholar
Berry, R. B. & Gleeson, K. Respiratory arousal from sleep: Mechanisms and significance. Sleep 20, 654–675 (1997).
Article CAS PubMed Google Scholar
Al Oweidat, K., AlRyalat, S. A., Al-Essa, M. & Obeidat, N. Comparing rem- and nrem-related obstructive sleep apnea in jordan: A cross-sectional study. Canadian Respiratory Journal2018, https://doi.org/10.1155/2018/9270329 (2018).
Ratnavadivel, R. et al. Marked reduction in obstructive sleep apnea severity in slow wave sleep. J. Clin. Sleep Med. 5, 519–524 (2009).
Article PubMed PubMed Central Google Scholar
Akin, M. Comparison of wavelet transform and FFT methods in the analysis of EEG signals. J. Med. Syst. 26, 241–247. https://doi.org/10.1023/A:1015075101937 (2002).
Article CAS PubMed Google Scholar
Adeli, H., Zhou, Z. & Dadmehr, N. Analysis of EEG records in an epileptic patient using wavelet transform. J. Neurosci. Methods 123, 69–87. https://doi.org/10.1016/S0165-0270(02)00340-0 (2003).
Article PubMed Google Scholar
Bao, F. S., Liu, X. & Zhang, C. PyEEG: An open source Python module for EEG/MEG feature extraction. Computat. Intell. Neurosci.https://doi.org/10.1155/2011/406391 (2011).
Article Google Scholar
Abdul-Latif, A. A., Cosic, I., Kumar, D. K., Polus, B. & Da Costa, C. Power changes of EEG signals associated with muscle fatigue: The root mean square analysis of EEG bands. In Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004 (ed. Abdul-Latif, A. A.) 531–534 (IEEE, 2004).
Chapter Google Scholar
Lee, J.-M., Kim, D.-J., Kim, I.-Y., Park, K. S. & Kim, S. I. Nonlinear-analysis of human sleep EEG using detrended fluctuation analysis. Med. Eng. Phys. 26, 773–776 (2004).
Article PubMed Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inform. Process. Syst. 2017, 3147–3155 (2017).
Google Scholar
Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422. https://doi.org/10.1023/A:1012487302797 (2002).
Article Google Scholar
Purves, D. et al. Sleep and wakefulness. In Neuroscience, chap. 27, 665–674 (Sinauer Associates, Inc., 2004), 3rd edn.
Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2546-2554 (2011).
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. & Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18, 6765–6816 (2017).
MathSciNet Google Scholar
Shmiel, O., Shmiel, T., Dagan, Y. & Teicher, M. Data mining techniques for detection of sleep arousals. J. Neurosci. Methods 179, 331–337. https://doi.org/10.1016/j.jneumeth.2009.01.035 (2009).
Article PubMed Google Scholar
Espiritu, H. & Metsis, V. Automated detection of sleep disorder-related events from polysomnographic data. In 2015 IEEE International Conference on Healthcare Informatics (eds Balakrishnan, P. et al.) 562–569 (ICHI, 2015). https://doi.org/10.1109/ICHI.2015.105.
Chapter Google Scholar
Shahrbabaki, S. S., Dissanayaka, C., Patti, C. R. & Cvetkovic, D. Automatic detection of sleep arousal events from polysomnographic biosignals. In 2015 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS), Biomedical Circuits and Systems Conference, 310–313 (2015).
Coppieters ‘t Wallant, D. et al. Automatic artifacts and arousals detection in whole-night sleep EEG recordings. J. Neurosci. Methods 258, 124–133. https://doi.org/10.1016/j.jneumeth.2015.11.005 (2016).
Article Google Scholar
Fietze, I. et al. Arousals and sleep stages in patients with obstructive sleep apnoea syndrome: Changes under nCPAP treatment. J. Sleep Res. 6, 128–133 (1997).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was funded by Samsung Medical Center Grant (OTC1190671).

Author information

These authors contributed equally: Hyewon Han and Min Jae Seong.

Authors and Affiliations

Department of Computer Engineering, Hongik University, Seoul, 04066, Republic of Korea
Hyewon Han
Department of Neurology, Bundang Jesaeng Hospital, Seongnam, 13590, Korea
Min Jae Seong
Department of Artificial Intelligence Software, Hanbat National University, Daejeon, 34158, Republic of Korea
Janghun Hyeon
Department of Neurology, Neuroscience center, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351, South Korea
Eunyeon Joo
Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355, South Korea
Eunyeon Joo
Division of Information Security, Seoul Women’s University, Seoul, 01797, Republic of Korea
Junhyoung Oh

Authors

Hyewon Han
View author publications
You can also search for this author in PubMed Google Scholar
Min Jae Seong
View author publications
You can also search for this author in PubMed Google Scholar
Janghun Hyeon
View author publications
You can also search for this author in PubMed Google Scholar
Eunyeon Joo
View author publications
You can also search for this author in PubMed Google Scholar
Junhyoung Oh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.H. wrote the first draft of the manuscript and performed analysis and data curation. M.S. performed data curation and data pre-processing. J.H. assisted in designing the model. E.J. administrated the whole project and acquired resources. J.O. supervised the overall research. All authors contributed to the manuscript revision and reading and approved the submitted version.

Corresponding authors

Correspondence to Eunyeon Joo or Junhyoung Oh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Han, H., Seong, M.J., Hyeon, J. et al. Classification and automatic scoring of arousal intensity during sleep stages using machine learning. Sci Rep 14, 5983 (2024). https://doi.org/10.1038/s41598-023-50653-9

Download citation

Received: 03 March 2023
Accepted: 22 December 2023
Published: 12 March 2024
DOI: https://doi.org/10.1038/s41598-023-50653-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.