Introduction

Preterm infants have immature lungs at birth and may need supplemental oxygen for prolonged periods of time [1]. Due to their lung pathology, their blood oxygen saturation (SpO2) fluctuates widely necessitating frequent adjustments to the provided fraction of inspired oxygen (FiO2) by the nursing staff [2]. This may still be associated with variable periods of time spent outside the intended SpO2 target range. To optimize the time spent within the targeted SpO2 range some Neonatal Intensive Care Units (NICUs) around the world have started using a computer algorithm to automatically control FiO2 according to the patient’s measured SpO2. Most of the commercially available systems for this automated FiO2 control measure SpO2 continuously or intermittently, compare values to the set target range and make the FiO2 adjustments inversely related to their difference [2]. Although the automated system may seem conceptually superior, concerns have been raised as to whether the algorithms, in their current forms, efficiently serve the purpose [2].

A number of narrative reviews have previously explored the need for using automated oxygen control systems [2,3,4,5,6,7]. However, none assessed the quality of studies or synthesized the data into a meta-analysis. All reviews focused on short-term outcomes and did not report on long term outcomes. In this systematic review of prospective clinical trials, we explored the following research question: In oxygen-dependent premature infants who are on invasive or non-invasive positive pressure respiratory support, does automated control of FiO2 compared to manual control improve SpO2 targeting, reduce hypoxia and bradycardia events, reduce mortality and/or improve clinical outcomes such as chronic lung disease (CLD), retinopathy of prematurity (ROP), and long-term neurodevelopment?

Methods

Protocol and registration

The protocol was registered with the PROSPERO database PROSPERO 2016:CRD42016036415 available from http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42016036415

Eligibility criteria

We included prospective clinical trials (Randomized Controlled Trials (RCTs) and quasi-RCTs) that compared automated versus manual control of FiO2 in preterm infants. Studies with the following characteristics were included.

Population

Inclusion criteria

Preterm infants receiving invasive or non-invasive positive pressure respiratory support and supplemental oxygen (FiO2 > 21%).

Exclusion criteria

Chromosomal anomalies, major congenital malformations as congenital respiratory and structural heart disease, infants on low flow oxygen supplementation by nasal prongs.

Intervention

Automated control of FiO2 using a device/algorithm which automatically adjusts FiO2 based on output of an incorporated SpO2 monitoring device.

Control

Manual control of FiO2 by NICU personnel.

Outcome(s)

The primary outcome measure for the systematic review was percentage of time that the infants spent within the intended SpO2 range (any range within a lower limit of 85% and an upper limit of 96%). All outcome measures have been summarized in the supplementary information (SI Appendix A).

Information sources and search

We searched Medline and Embase, Cochrane Central Register of Controlled Trials, and CINAHL databases electronically from inception until December 2016. Registered details of selected trials in the U.S. National Institutes of Health resource (www.clinicaltrials.gov) were sought using search terms “automated oxygen control”/“oxygen” and “neonate”. Detailed search strategy for these databases are outlined in the supplementary information (SI Appendix B). Additional grey literature was sought through personal communication from experts in the field, reviewing the reference lists of relevant articles, abstracts and conference proceedings (European Society for Pediatric Research, Pediatric American Societies 1990 to 2016).

Study selection and data collection process

The titles and abstracts retrieved followed by the full texts were screened by two independent reviewers in duplicate to assess their eligibility. Primary authors of two studies were contacted for further information during the full text screening and data extraction process. A pre-specified standardized data extraction form was used to extract the data from the eligible studies. Four reviewers carried out the extraction, working independently in pairs and in duplicate. Discrepancies were resolved through discussion or in consultation with a third reviewer.

Risk of bias in individual studies

The risk of bias (ROB) of eligible studies was assessed according to a modified version of the Cochrane Collaboration’s ROB tool [8]. According to this tool, the six criteria that were assessed included sequence generation, allocation concealment, blinding of participants, personnel and outcome assessors, completeness of follow up, selective outcome reporting, and presence of other biases. In anticipation of inclusion of cross-over studies, three additional points were included in the other biases category. These included: appropriate cross-over design, carry-over effect, and unbiased data [9].

Summary measures and synthesis of results

The results were first described narratively and, where possible, the evidence was quantitatively pooled to obtain a summary estimate using a random-effects (RE) model [10]. Effect estimates along with 95% confidence intervals (CIs) were estimated using odds ratio (OR) for binary outcomes, and mean difference (MD) for continuous outcomes (SI Appendix A). Statistical heterogeneity between the studies was estimated by using the I2 statistic [11, 12]. The I2 statistic was interpreted using the thresholds set forth by the Cochrane Collaboration [8].

Risk of bias across studies

The confidence in the estimates for each outcome across the studies was assessed using the GRADE (Grading of Recommendations, Assessment, Development and Evaluation) approach [13]. For this purpose, two authors did an independent assessment using the GRADEPro software (GRADEPro guideline development tool, McMaster University 2015) [14]. The confidence in the estimates was based on four levels: high, moderate, low, and very low.

Additional analyses

The following potential sources of heterogeneity were proposed a priori: type of automated oxygen control system used, study design, mode of ventilation and age at enrolment. A subgroup analysis of studies predominantly including non-invasively ventilated infants (represented by >50% of enrolled infants) was planned as these infants were perceived to have the most fluctuations in SpO2 and with potential for greater benefit from an improvised technique of SpO2 targeting. Sensitivity analyses were also planned based on the study design and the type of automated system used.

Results

We identified 276 potentially relevant articles from electronic databases and other sources. Seventeen studies were assessed for full text screening. Ten studies including 274 infants met our study requirements [15,16,17,18,19,20,21,22,23,24]. The study selection flow is presented in the form of a PRISMA flow diagram (SI Fig. 1).

Study characteristics

The clinical profile of the included infants and the methodological characteristics of the studies are presented in the supplementary information (SI Table) [15,16,17,18,19,20,21,22,23,24]. All included studies were published in English between 2001 and 2016. Eight out of the ten studies were cross-over RCTs; one study by Zapata et al. was a parallel-design RCT [23] and another study by Plottier et al. was cross-over, but not specifically labeled as randomized trial [24]. All the studies with a cross-over design had two phases, except those by van Kaam et al. and Plottier et al. which had four and three phases respectively [21, 24].

Population characteristics

All enrolled infants were born prematurely at a gestational age between 23–30 weeks. There was a wide variation in the age of enrollment ranging from within the first week to the 11th week after birth. Five studies included only infants who were invasively ventilated [15, 16, 18,19,20]; two included infants who were exclusively on non-invasive respiratory support [23, 24]; the rest of the studies included both [17, 21, 22]. In the study by van Kaam et al. two separate cohorts of infants were examined, one with a lower SpO2 target (89–93%) and the other with a higher SpO2 target (91–95%) [21]. No infant crossed over between higher and lower SpO2 targets. Hence the two cohorts were entered as separate studies in the analysis, i.e., high SpO2 cohort [van Kaam(a)] and low SpO2 cohort [van Kaam(b)] (SI Table) [21].

Intervention and control

The automated oxygen control system built into the Avea [CareFusion, Yorba Linda, CA] infant ventilator was the one most commonly used [15, 16, 19,20,21,22]. The other automated systems included the FiO2C software linked to Radical 7 (Masimo Inc), the Auto-Mixer algorithm (Centro Medico Imbanaco, Cali, Columbia) and the VDL1.0 algorithm [17, 23, 24]. The automated function was set to adopt the current FiO2 as a basal FiO2 level. The systems subsequently analyzed the measured SpO2 at regular intervals, usually once per second, and the FiO2 was increased or decreased in a stepwise manner if SpO2 fell below or exceeded the intended range. The response time with the Avea system was usually within 10 s of detection of hypoxia and 15–90 s for hyperoxia depending on the extent of the latter. The averaging interval on the pulse oximeter in the automated systems varied between 1–10 s. The included infants were randomized to each arm for 24 h periods in four studies [17, 20,21,22], 12 h periods in two studies [19, 23], 4 h periods in one study [15], a 4 h (automated control) and an 8 h (manual control) period in one study [24], 2 h periods in one study [16] and 1.5 h period in another study [18]. For the manual control intervention, the nurse to patient ratio was mostly 1:2 [17,18,19,20,21, 24]. The study by Urschitz et al. had a third arm with enhanced (1:1) nursing care [18].

Outcomes

All included studies presented data on the primary outcome, i.e., percentage of time spent within the intended saturation range [15,16,17,18,19,20,21,22,23,24]. The definition of target saturation range differed slightly between studies, but always remained between 85–96%, thereby reducing the clinical heterogeneity across the studies. Most of the studies also presented data on time spent outside (above and below) the targeted saturation ranges [15,16,17, 19,20,21,22,23,24]. Time spent below a severe hypoxia cut-off was presented in seven studies [15, 16, 19,20,21,22, 24]. Severe hypoxia was defined as oxygen saturation less than 80% thus maintaining homogeneity in the definition. Four studies presented data on time spent below other hypoxia cut-offs (usually more severe hypoxia) such as below 70 or 75% [15, 16, 20, 22]. Hyperoxia was defined as any SpO2 above the upper limit of the intended SpO2 range. The median upper limit of SpO2 range in the included studies was 95% (range 93–96%). Data on number of hypoxic and bradycardic events (as described in SI Appendix A) as well as number of manual interventions and FiO2 exposure was recorded from the included studies. Zapata et al. also presented data on time spent within and outside targeted regional cerebral, renal, and hepatic oxygen saturation ranges [23]. None of the studies reported mortality or clinical outcomes such as CLD, ROP or neurodevelopmental impairment. Some of the outcomes presented as medians (with interquartile range/range) were transformed to means (and standard deviations) using the Hozo et al. transformation method [25].

Risk of bias within studies

A high risk of bias was noted among the studies in general. Sequence generation and/or allocation concealment was not mentioned in eight out of the ten included studies (SI Table). None of the studies were blinded thus increasing the risk of co-intervention bias. As oxygen saturation changes quickly, the randomized cross-over design seemed to be appropriate without obvious risk of bias due to “contamination” from the previous oxygen control system. There seemed to be no obvious bias in terms of loss of follow-up, incomplete outcome data or selective reporting of outcomes. However, with the lack of clarity on sequence generation, allocation concealment and lack of blinding, the studies were deemed to have a high risk of bias.

Results of individual studies

Synthesis of results

Automated control of FiO2 resulted in significantly higher time being spent within the intended target saturation range [MD: 12.8%; 95% CI: 6.5 to 19.2%; I2 = 90%] (Fig. 1a). It also significantly reduced periods of hyperoxia [MD: –8.8%; 95% CI:–15 to –2.7 %; I2 = 92%] (Fig. 1b), severe hypoxia (SpO2 < 80%) [MD: –0.9%; 95% CI: –1.5 to –0.4 %; I2 = 47%] (Fig. 1c) and number of hypoxic events [MD: –5.6; 95% CI: –9.1 to –2.1 %; I2 = 97%] (Fig. 2a). However, there was no significant difference in the time spent below the targeted SpO2 range [MD: –2.3%; 95% CI: –6 to 1.5%; I2 = 80%] (Fig. 2b) or FiO2 exposure [MD: –1.2%; 95% CI: –3.8 to 1.5 %; I2 = 42%] (Fig. 2c).

Fig. 1
figure 1

Forest plots comparing a time spent within the targeted saturation range; b time spent above the target saturation range (hyperoxia); c time spent in severe hypoxia (SpO2 < 80%)

Fig. 2
figure 2

Forest plots comparing a number of hypoxic events (any event with SpO2 < 80% lasting for 60 s or more); b time spent below the target saturation range (hypoxia); c FiO2 exposure

Subgroup analysis

The primary outcome was explored in the subgroup of infants with predominantly non-invasive mode of respiratory support. Five studies (169 infants) were included in the analysis and infants in the automated control group spent significantly more time within the targeted saturation range compared to the manual control group [MD: 15.2%; 95% CI: 5.4–24.9%; I2 = 94%]. Similar outcome was noted in infants with predominantly invasive mode of ventilation [5 studies (105 infants); MD: 10.5%; 95% CI: 6.6–14.3%; I2 = 10%].

Sensitivity analyses

In order to explore the high degree of heterogeneity observed across the studies, a sensitivity analysis was conducted for the primary outcome (time spent within the targeted SpO2 range) based on study design and type of automated device used. On removal of the parallel design study by Zapata et al. and the non-randomized study by Plottier et al., the high degree of heterogeneity (I2 = 90%) completely disappeared on meta-analysis of the remaining cross-over RCTs [9 studies (234 infants); MD: 8.9%; 95% CI: 6.5–11.5 %; I2 = 0%]. Similarly sensitivity analysis combining only the studies using the most commonly used Avea system was also found to substantially reduce the heterogeneity [7 studies (187 infants); MD: 8.9%; 95% CI: 5.9–11.8 %; I2 = 13%].

Publication bias assessment

Publication bias was explored as another potential source of heterogeneity across studies. Significant publication bias was observed when all the studies were combined for the primary outcome [Egger Regression test (p < 0.001) and Begg & Mazumdar test (p = 0.006)] (SI Figure 2).

Exploratory regression analysis

A regression analysis was conducted to explore the possible reasons for such a significant inconsistency in the effect size of the primary outcome. The four possible sources of heterogeneity that were identified a priori (type of automated system, design of study, mode of ventilation, and age at enrollment) were entered into the model. The regression analysis identified the study design as the only independent predictor for the previously noted variation in effect size (p < 0.05).

Risk of bias across studies and quality assessment

On GRADE assessment the quality of evidence was judged to be “very low” for the difference in percentage time spent within, above and below the targeted saturation range as well as for the difference in the number hypoxic episodes. The quality of evidence for the difference in time spent in severe hypoxia and the difference in FiO2 exposure was noted to be “moderate”. The summary of GRADE evidence profile is presented in Table 1.

Table 1 Summary (GRADE) evidence profile

Discussion

In this systematic review, ten studies including 274 infants were evaluated to compare the efficacy of automated versus traditional manual control of FiO2 for SpO2 targeting in preterm infants. Infants managed with automated control of FiO2 spent significantly more time in the intended target saturation range (12.8%; 95% CI: 6.5 to 19.2%) as compared to when manual control was used.

Maintaining SpO2 within the intended target range has been a major challenge in preterm infants due a number of factors ranging from severity of lung disease to the logistic challenge of frequent FiO2 titration in a busy NICU setting. Over the past decade, five large randomized clinical trials (RCTs) have been conducted to explore the effects of higher (90–95%) versus lower (85–90%) SpO2 targets in preterm infants [26,27,28]. Even in more controlled research settings of these large multicenter oxygen trials (Surfactant, Positive Pressure, and Oxygenation Randomized Trial (SUPPORT), benefits of oxygen saturation targeting (BOOST), COT), the enrolled infants spent a considerable time outside the intended SpO2 targets thereby leading to significant overlap of the reported SpO2 between the two study groups [26,27,28]. Interestingly, in a recent systematic review that included follow-up data from the above mentioned studies, Manja et al. pointed out that the proportion of time infants spent outside the target range while on supplemental oxygen ranged from 8.2 to 27.4% <85% and 8.1 to 22.4% >95% with significant overlap between the two groups [29]. These results point to the fact that substantial work still needs to be done to consistently achieve desired SpO2 targets [29, 30]. Hence it is imperative for NICUs to implement new strategies that would significantly improve time spent within the targeted SpO2 range and minimize hyperoxic and hypoxic periods. Our meta-analysis shows that automated control of FiO2 significantly improves SpO2 targeting in both invasively and non-invasively ventilated infants and therefore potentially is a better alternative to manual control. There is increasing evidence that hyperoxia in preterm infants is associated with adverse long term outcomes such as ROP and CLD. Askie et al. (BOOST study group) showed that targeting a higher SpO2 target (95–98%) in preterm infants resulted in a significant increase in CLD (p < 0.001) while severe ROP was significantly reduced in infants with a lower oxygen target (85–89%) as shown in the studies by Carlo et al. (SUPPORT study group) (RR 0.52; 95% CI, 0.37–0.73; p < 0.001) and the BOOST II United Kingdom collaborative group (RR 0.79; 95% CI, 0.63–1.00; p = 0.045) [26, 27, 31]. Our meta-analysis shows that the automated system significantly reduces the time spent above the intended SpO2 range. Although it is well-known that the harmful effects of hyperoxia are more pronounced in preterm infants due to increased oxidative stress, there is still considerable debate on SpO2 thresholds of clinical significance [32]. Therefore, whether an 8.8% (95% CI: 2.7–15%) reduction in time spent in the hyperoxic range translates into improvement in clinical outcomes such as ROP and CLD remains to be seen.

The multicenter oxygen trials exploring higher versus lower SpO2 targets also brought to light some important long-term consequences of potential periods of hypoxia in the preterm population [26,27,28]. The SUPPORT study showed that infants in the lower oxygen target had a significantly higher risk of mortality (RR 1.27; 95% CI, 1.01 to 1.60; p = 0.04) [26]. The BOOST II UK collaborative group study also revealed that infants with lower target saturation (85–89%) had a higher risk of mortality (RR 1.45; 95% CI 1.15–1.84; p = 0.002) and NEC (RR 1.31; 95% CI 1.02–1.68; p = 0.04) [27]. Thus, in spite of being a surrogate marker, oxygen saturation targeting seems to have far reaching impacts on long term outcomes such as mortality. Our meta-analysis showed that there was no significant difference between automated and manual FiO2 control in terms of time spent below the target range. Moreover, periods of severe hypoxia (SpO2 < 80%) was significantly reduced in the automated group. A recent post hoc analysis of data from the Canadian Oxygen Trial (COT) showed that in infants with prolonged hypoxic episodes (SpO2 < 80% for ≥60 s) the risk of late death or disability at 18 months was increased by 66% (RR 1.66; 95% CI, 1.35–2.05; p < 0.001) [33]. In our analysis, infants in the automated control group had significantly less prolonged hypoxic episodes, which seems to be clinically relevant in light of the above evidence.

One of the major limitations in the generalizability of our meta-analysis is the substantial variation in the effect estimate observed across the studies (improvement in time spent within the intended target ranging from 4–35%) [21, 24]. This may temper the enthusiasm and limit its clinical applicability in some centers especially if the clinicians are not convinced that automated control improves SpO2 targeting by a clinically significant margin. We conducted an exploratory regression analysis of all the potential confounders and effect modifiers to explore the factors which may have caused this variation in effect size. We showed that study design was the only independent predictor of variation in effect estimates. The studies by Plottier et al. and Zapata et al. were not of a randomized cross-over design and hence were excluded in the sensitivity analysis [23, 24]. In the former study the effect size was substantially higher compared to the rest which significantly contributed to the heterogeneity [24]. We speculate that the difference could range from a technological difference in the automated system to a methodological bias in the study itself. In the said study a new algorithm known as the VDL1.0 was used which was reported to be more rapidly responsive and adaptive [24]. In the VDL1.0 algorithm the proportional-integral-derivative (PID) controller was enhanced to mitigate iatrogenic hyperoxia and adapt to the severity of lung disease [24]. This may have had a faster response to the fluctuations in SpO2 and therefore could have led to improved outcomes. However, since this was the only study that was not reported to have a randomized allocation of subjects, there could have been considerable selection bias. The parallel-design RCT by Zapata et al. was also found to skew the results in favor of the automated device [23]. Again, a different automated algorithm (Auto-Mixer algorithm) was used in this study which could have affected the results, or, the difference in study design could have also accounted for the differences in effect estimates. After removing these two studies from the analysis the results were more consistent (I2 = 0) but the effect estimates turned out to be more conservative with only an 8.9% improvement in time spent in target saturation range in the automated control group, which still remained statistically significant.

The studies that used the Avea system were also found to produce similar modest results (8.9% improvement in time spent in target saturation range). Therefore even with the use of automated FiO2 control, infants do seem to spend a considerable time outside their targets. This may be due to the fact that most of the currently available automated algorithms are rule-based which respond to any deviation from target SpO2 according to a set of predefined rules [34]. However, to optimally respond to fluctuations in SpO2, an ideal algorithm should be able to respond to both, a gradual change in FiO2 requirement and sudden hypoxia, and should also be able to adapt their response to changes in the infant’s lung function over time [34]. Thus more work seems to be necessary to refine the automation process to achieve better effects.

This systematic review is the first to quantitatively synthesize available evidence from prospective clinical trials and summarize the evidence using the GRADE approach. Although automated control significantly improves the SpO2 targeting in preterm infants, the quality of evidence was deemed “very low” as the methodological qualities of the studies were prone to significant risk of bias (Table 1). Furthermore, even if saturation targeting is a reliable surrogate marker, we have limited knowledge of specific “thresholds” for clinical effects. While refinements in the controlling algorithm would be beneficial, further studies should look at clinical relationships including ROP, CLD, neurodevelopment and death, which were not reported by any of the current studies.

Conclusion

Implications for practice

Automated FiO2 control significantly improves SpO2 targeting and reduces periods of hyperoxia, severe hypoxia, and hypoxic events in preterm infants receiving positive pressure respiratory support. However the quality of evidence is very low to moderate on GRADE assessment.

Implications for further research

In view of the generally low quality evidence, further RCTs, preferably parallel-design and blinded, looking at patient important outcomes such as mortality, CLD, ROP and long-term neurodevelopment, are needed to establish a stronger evidence to routinely promote the use of automated control of FiO2 for preterm infants in the NICU.