Main

Lung cancer is the leading cause of cancer death globally1,2. Findings from the US National Lung Screening Trial (NLST) and the Dutch–Belgian NELSON trial demonstrated that lung cancer screening with low-dose computed tomography (LDCT) of high-risk individuals leads to a reduction of 20% or more in lung cancer mortality3,4. In response to this evidence, lung cancer screening has been recommended for high-risk individuals in the United States by the US Preventive Services Task Force and the Centers for Medicare and Medicaid Services5,6, and, currently, over 4,000 sites are offering screening7. These guidelines have not been universally adopted in jurisdictions outside of the United States. Despite the slow start, global interest in the implementation of lung cancer screening has recently been accelerating. As of 26 September 2023, the Lung Cancer Policy Network’s Interactive Map of Lung Cancer Screening showed that 137 implementation studies were working toward starting programs around the world (North America, 42; South America, four; UK/Europe/Middle East/North Africa, 71; Far East, 13; Australasia, four)8. Current lung cancer screening program performance outside of the United States may differ from that observed in past screening trials and programs in the United States because the US healthcare system is different in structure and performance from that of other high-income countries9,10. Additionally, lung cancer screening has evolved and improved since the two major randomized control trials, the NLST and the NELSON trial3,11.

Ontario Health, a government agency in the province of Ontario, Canada, that coordinates the province’s healthcare system, conducted the Ontario Lung Cancer Screening Pilot (Pilot), a multi-center lung cancer screening pilot from June 2017 to August 2019, with the aim of determining how best to implement organized lung cancer screening in Ontario. Here we compare Pilot results, including quality indicators, with those observed in the NLST, NELSON and UK Lung Cancer Screening (UKLS) trials and the Lahey Hospital & Medical Clinic (Lahey Hospital) program. The Pilot aims to provide a real-world case study in lung cancer screening implementation for multiple domains across the lung cancer screening pathway, informing how a risk-based lung screening program can be implemented in clinical practice in a large, diverse, populous geographic area within a universal healthcare system12,13,14,15.

Results

Enrollment, risk assessment and scans

Figure 1 describes the flow diagram that includes individuals for each stage of the Pilot pathway. Between 1 June 2017 and 31 May 2019, the Pilot identified 7,768 potential participants, of whom 86.0% were provider referred and 14.0% self-presented. Of these 7,768 individuals, 7,260 met triage criteria and underwent risk assessment, and 4,918 (67.7%) were eligible for screening (PLCOm2012noRace ≥ 2.0% over 6 years) (model risk factors and beta coefficients are provided in Supplementary Text 1). Including an additional 69 individuals who did not meet triage criteria but underwent risk assessment, the total number of risk-assessed individuals was 7,329, and 4,944 (67.5%) were eligible for screening. Most individuals (n = 4,439, 89.8%; 95% confidence interval (CI) 88.9–90.6%) completed a baseline LDCT scan. Including an additional 12 individuals who did not meet the screening eligibility criteria but completed a baseline scan, the total number of completed baseline scans was 4,451.

Fig. 1: Ontario Lung Cancer Screening Pilot flowchart.
figure 1

The flow diagram shows the number of individuals at each stage of the Pilot pathway.

Participant characteristics

Table 1 describes the characteristics of individuals who were recruited, risk assessed and scanned. Recruitment did not differ substantially by sex (51.3% male). Most individuals recruited were aged 55–64 years (61.2%), and age distribution did not differ between self-presented versus physician referred. Overall, 50.5% had high school education or less. Recruitment was inversely associated with neighborhood income quintile in a consistent dose–response relationship. Of those recruited, 78.6% resided in urban areas, and 51.0% lived less than 20 km from their Pilot site. Physician-referred individuals were different from self-presenting individuals in many characteristics (Table 2). Compared to self-presenters, a higher proportion of physician-referred participants were male (51.9% versus 46.0%), current smoking (63.4% versus 47.9%), had high school education or less (52.3% versus 39.8%), lived in the lowest two income quartile neighborhoods (44.3% versus 37.4%) and came from non-urban residential neighborhoods (21.7% versus 15.4%) (all comparisons, P < 0.001).

Table 1 Characteristics of individuals recruited, risk assessed and screened with LDCT in the Ontario Lung Cancer Screening Pilot
Table 2 Characteristics of physician-referred compared to self-presenting recruited individuals who met triage criteria

Of individuals who were eligible and received LDCT scans (n = 4,439), 2,355 were males (53.1%); 987 (22.2%) were ages 55–59 years; 1,417 (31.9%) were ages 60–64 years; 1,278 (28.8%) were ages 65–69 years; and 757 (17.1%) were ages 70–74 years. Of the 4,439 individuals, 2,452 (55.2%) had high school education or less. Census-based neighborhood income data were available for 4,424 of 4,439 (99.7%) individuals, and, of these, 2,030 (45.9%) lived in neighborhoods in the two lowest census income quintiles. Of the 4,432 individuals who were eligible and received LDCT scans for whom home geographic locations were known, 3,490 (78.7%) lived in urban communities (≥10,000 population). Of the 4,439, 3,168 (71.4%) were current smokers.

LDCT scan results

The baseline and annual scan results are presented in Table 3. In the Pilot, the American College of Radiology’s Lung Screening Reporting & Data System 1.0 (Lung-RADS) screening results management system was used, in which a Lung-RADS classification of 3 indicates an abnormality that is ‘probably benign’ but requires further, more immediate imaging (that is, 6 months versus 12 months for Lung-RADS classifications of 1 or 2) to monitor one or more nodules. A Lung-RADS classification of 4 indicates results that are suspicious or very suspicious for lung cancer that require more rapid imaging follow-up (Lung-RADS classification of 4A at 3 months) or immediate clinical investigation (Lung-RADS classification of 4B or 4X). From baseline to annual follow-up, significant decreases were observed in the prevalence of Lung-RADS classification of 3 (10.1% versus 2.3%; odds ratio (OR) = 0.21, 95% CI 0.15–0.29, P < 0.0001) and Lung-RADS classification of 4 (7.5% versus 3.3%, OR = 0.42, 95% CI 0.31–0.56, P < 0.0001).

Table 3 Lung-RADS classification distributions at baseline and annual follow-up scan

Among the 4,451 baseline scans, 244 (5.5%) with Lung-RADS 4A, 4B or 4X results were referred to and attended diagnostic assessment. Of these 244 individuals, 78 (32.0%) were diagnosed with lung cancer; 33 (13.5%) were negative for cancer; and 129 (52.9%) had investigations that were inconclusive and required further follow-up. Five or fewer individuals (≤2.5%) attended but did not complete diagnostic assessment, for unknown reasons.

Among the 1,804 individuals who had an annual scan, 23 (1.3%) were referred to diagnostic assessment and attended. Of these 23 individuals, six (26.1%) were diagnosed with lung cancer, and seven (30.4%) were negative. Ten (43.5%) individuals had investigations that were inconclusive and required further follow-up.

Actionable incidental findings

An actionable incidental finding (AIF) was defined as an LDCT scan that had a finding that was not related to lung cancer but was judged by the reading radiologist to be potentially clinically important. More details are provided in the Methods. AIFs were observed in 20.5% (911/4,451) of baseline scans, in 6.5% (10/153) of 3-month follow-up scans, in 10.6% (42/398) of 6-month follow-up scans and in 9.9% (178/1,804) of annual recall scans (Extended Data Table 1). There was a significantly higher proportion of AIFs reported on baseline scans compared to annual follow-up scans (OR = 2.35, 95% CI 1.97–2.80, P < 0.0001). Rates of AIFs at baseline varied across Pilot sites (13.1–33.7%).

Adherence to post-baseline scans

Overall, the adherence rate to annual recall was high (1,101/1,222 (90.1%)) (Extended Data Table 1). Similarly, adherence to 6-month recall (337/366 (92.1%)) and to 3-month recall (≥66/71 (≥93%)) were high. Adherence rates were high across all screening sites (75.0–100.0% for all recall scans).

Adherence to diagnostic investigations

Over the first 2 years of the Pilot, 302 individuals with abnormal screening results (Lung-RADS 4A, 4B and 4X) were referred for diagnostic assessment. Of the 302 referred individuals, most (267, 88.4%) attended diagnostic assessment at Pilot sites. Of the 35 individuals who did not undergo assessment at their screening site, most had a diagnostic assessment within 3 months of their abnormal screening result in Ontario at a non-Pilot facility and five or fewer did not receive follow-up. The proportion of screen-positive individuals who received appropriate assessment was 97.7% (95% CI 95.3–99.1%). Adherence to diagnostic assessment at Pilot sites was high across sites (82.4–100.0%) and within age (82.0–100.0%) and sex (88.3–88.6%) subgroups.

Lung cancers and detection rates

Of 4,451 Pilot participants, 106 (2.4%, 95% CI 2.0–2.9%) were diagnosed with lung cancer during the Pilot evaluation period after a positive screen: 97 cancers were diagnosed at baseline screening, and nine cancers were diagnosed at the annual follow-up screen (Fig. 1), including five or fewer cancers detected outside of the Pilot. Of the 106 lung cancer cases, 22 (20.8%) were diagnosed outside of the Pilot screening sites’ diagnostic assessment program. These 22 reported lung cancer cases are considered screen detected because the diagnosis of lung cancer followed a positive scan. The cancer detection rate was 2.2% (97/4,451; 95% CI 1.8–2.7%) in the baseline screening round and 0.5% (9/1,804, 95% CI 0.2–0.9%) in the annual follow-up round. Lung cancer detection rates were similar across sites (P = 0.39). Histology was available for 96 cancers: 57 (59.4%) were adenocarcinoma; 20 (20.8%) were squamous cell carcinoma; and 19 (19.8%) were other histological types.

Accuracy measures

Pilot accuracy statistics are presented in Table 4. Overall, of participants with Lung-RADS scores of 4 who were referred to diagnostic assessment, 267 attended Pilot site assessment programs. Most other participants were assessed outside Pilot sites; five or fewer individuals were lost to follow-up; and 106 lung cancers were diagnosed in positive individuals who were not lost to follow-up, for a positive predictive value (PPV) of 35.7% (95% CI 30.3–41.4%). PPV was 35.1% (95% CI 29.7–40.8%) for all individuals referred to clinical investigation. With an abnormal scan being defined as one leading to a referral to diagnostic investigation, the Pilot specificity was 95.5% (n = 4148/4,344; 95% CI 94.8–96.1%; false-positive rate, 4.5%). At time of analysis, 24 of 145 (16.6%) individuals receiving an invasive diagnostic test remained suspicious for lung cancer, requiring further follow-up. If only individuals with a definitive lung cancer or a benign diagnosis are considered, the PPV is 62.0% (106/171; 95% CI 54.3–69.3%).

Table 4 Comparison of accuracy statistics in the Ontario Lung Cancer Screening Pilot, NLST, NELSON, UKLS and Lahey Hospital screening pilot/trials/programs

Interval cancers

In the Pilot cohort, five or fewer individuals (≤4.5% of lung cancers) were diagnosed with interval cancers within 1 year of a ‘negative/benign’ scan (that is, Lung-RADS classification of 1 or 2, n = 2,026) (Table 4 and Extended Data Table 1). The probability that someone with a normal screen was diagnosed with lung cancer within 1 year was very low (negative predictive value > 99%) and indicates that the Pilot had high sensitivity for detecting lung cancer.

Stage shift

Of the 106 lung cancers detected, 96 had been staged at the time of analysis, and 79.2% (76/96, 95% CI 69.7–86.8%) were diagnosed at an early stage (I or II) (Extended Data Table 1). This contrasts with the population proportion of 31.6% (7652/24,187) of malignant lung cancers in people ages 55–74 years in the Ontario Cancer Registry from 2012 to 2016 (P < 0.0001). The stage shift was observed across sites. Of the lung cancers detected at baseline, 68 of 88 were early stage (I or II) (77.3%) versus eight of eight (100%) at the annual follow-up scan (P = 0.13). Early-stage detection is important for survival. Canadian lung cancer 5-year net survivals (2010–2017, national statistics excluding Quebec) were 61.5% stage I, 39.3% stage II, 16.3% stage III and 3.1% stage IV16.

Treatments

More than 98% of early-stage lung cancer cases received treatment: 73.7% received surgery; 18.4% received radiation therapy; and ≤8% received systemic or combined therapy.

Possible observed harms

Diagnostic assessment for suspicious scans

Of the 267 individuals with Lung-RADS classifications of 4A, 4B and 4X who attended diagnostic assessment at Pilot sites, 145 (54.3%) had invasive diagnostic procedures, including 29 bronchoscopies; 102 biopsies; and 14 thoracotomies, mediastinoscopies and thoracenteses (Fig. 1 and Extended Data Table 1). Image-guided biopsy was the most common invasive procedure, with 72 of the 267 (27.0%) participants with Lung-RADS classification of 4 undergoing image-guided biopsies. Of these 72 participants, 39 (54.2%) received a cancer diagnosis; 16 (22.2%) did not have cancer; and 16 (22.2%) had evaluation results that were inconclusive, requiring further follow-up. Inter-site heterogeneity was observed in the proportion of cancers diagnosed among individuals who underwent image-guided biopsy: one site had a significantly lower proportion of cancers detected (9.1%) compared to other sites combined (51.4%) (P = 0.007). Additionally, 44 of 267 (16.5%) individuals underwent a surgical biopsy or lung resection, and, of these, 38 (86.4%) received a lung cancer diagnosis. Invasive biopsy procedures for benign nodules in the Pilot were rare with 21 or fewer occurrences in 4,451 individuals screened (≤0.5%). The proportion of surgical biopsies for benign disease was 12% or less (≤5/44) when a definitive diagnosis was established.

Unplanned hospital visits in 30 d

Twenty of the 145 individuals (13.8%; 95% CI 8.6–20.5%) with Lung-RADS classification of 4 who had an invasive diagnostic procedure had unplanned hospital visits within 30 d of the invasive procedure (Extended Data Table 1). None of the invasive procedures had an associated complication. Although the rates of unplanned hospital visits varied across sites (range, 10.3–22.2%), the proportions did not differ significantly (P = 0.575).

Potentially serious complications

The proportion of participants who experienced 30-d potentially serious complications within 30 d of an invasive procedure was 7.6% (n = 11/145; 95% CI 3.8–13.2%) (Extended Data Table 1). All 11 complications were pneumothoraces after needle biopsies, and they were all resolved without complications. The probability of having a complication after an image-guided needle biopsy was 10.8% (n = 11/102).

Deaths

Within 30 d of an invasive diagnostic procedure, five or fewer deaths occurred (Extended Data Table 1). However, none of these deaths was related to a screening-related complication. All deaths occurred in individuals who had received bronchoscopies (n = 29).

Impact on existing hospital services

For all sites combined, non-emergency CT scans used 25,481 h, and Pilot LDCT screening used 1,373 h (5.1% of all CT hours used) (Extended Data Table 1). No differences were observed in wait times to CT scan or to diagnosis between Pilot participants and non-Pilot patients at any of the Pilot sites, and the wait times were similar to those observed for the province. Thus, adding Pilot screening volumes did not appear to impact overall CT wait times at the Pilot sites (Table 5).

Table 5 Wait times: Pilot median and 90th percentile wait times (days) for selected periods through different phases of the screening pathway

Wait times

Seven wait times monitoring processes are summarized in Table 5. Overall, the wait times measured were similar to or shorter than those observed provincially or provincial benchmarks17,18.

Hub-and-spoke screening model

The hub-and-spoke medical organization design is one in which service delivery is arranged in a hierarchy with an anchor institution (hub) that offers a full array of services, accompanied by secondary establishments (spokes) that offer limited services, routing patients needing more intensive services to the hub (details in the Methods)19. Interviews with stakeholders involved with the hub-and-spoke model found no major challenges with the screening process. Participant interviews reported high or very high satisfaction with the screening process (Supplementary Table 1 and Extended Data Table 2). The Pilot quality indicators at the Cornwall Community Hospital spoke site were, in general, similar to other sites and demonstrated good performance.

Lung-RADS 4A policy modification

Sixteen months after the start of the Pilot, the policy was changed from referring individuals with a Lung-RADS 4A result directly to clinical evaluation to sending them to a 3-month follow-up LDCT scan, which was the standard procedure recommended by the American College of Radiology Lung-RADS 1.0 protocol. Changes in this Pilot policy did not result in a shift to more advanced stage, which had been a concern because the changes had the potential to delay clinical assessment by 3 months (Extended Data Table 3).

Comparisons with other lung cancer screening trials and programs

Comparison of accuracy statistics in the Pilot versus previous lung cancer screening trials and programs, which include NLST, the NELSON trial, the UKLS trial and the Lahey Hospital community program, are presented in Table 4. The distribution of Lung-RADS classifications in the Pilot was like those observed in many other programs, with Lung-RADS 4 constituting the smallest group. In the Pilot, there were significantly more Lung-RADS classifications of 3 and 4 and AIFs detected in the baseline scans compared to the annual follow-up scans (Lung-RADS 3 10.1% versus 2.3%, Lung-RADS 4 7.5% versus 3.3% and AIF 20.5% versus 9.9%, all P < 0001). These patterns have been documented in the literature20. For example, the Lahey Hospital program reported 5.5% and 3.6% Lung-RADS 4 in baseline and first screening rounds, respectively21.

In the Pilot, adherence to follow-up visits was high for all time periods and across all sites. Lopez-Olivo et al.22 conducted a meta-analysis of patient adherence to screening for lung cancer in the United States, which included 15 studies and 16,863 individuals. The adherence rate across all follow-up periods in the meta-analysis was 55%. The overall pooled Pilot adherence combining 12-month, 6-month and 3-month follow-up scans was 90.8% (95% CI 89.3–92.2%), which was significantly greater than the meta-analysis pooled adherence (P < 0.0001).

The lung cancer detection rate in the Pilot baseline screening round using the PLCOm2012noRace risk prediction model (2.2%) was more than double that in the NLST (OR = 2.15, 95% CI 1.68–2.73, P < 0.0001) and the NELSON trial (OR = 2.49, 95% CI 1.77–3.53, P < 0.0001), both of which used categorical age–smoking eligibility criteria. In the NLST baseline LDCT arm scans, 270 lung cancers were detected in 26,309 individuals (1.0%, 95% CI 0.9–1.2)3. In the NELSON trial baseline CT arm scans, 56 lung cancers were detected in 6,309 individuals (0.9%, 95% CI 0.8–1.2%)11. In the UKLS trial, which used a lung cancer risk prediction model, the LLPv2, to determine eligibility, 42 lung cancers were detected in 1,994 individuals who were scanned (2.1%, 95% CI 1.5–2.8%)23.

The Pilot was highly effective at detecting lung cancer at early stage. Similar yet slightly smaller stage shifts seen in the NLST and the NELSON trial were associated with significant lung cancer mortality reductions. In the Pilot, 79% of lung cancers were early stage compared to 70% in the LDCT arm of the NLST (P = 0.07) and 73% in the CT arm of the NELSON trial (P = 0.23). In the UKLS trial, 36 of 42 (85.7%) lung cancers were early stage23.

PPV depends on the definition of positive. In the Pilot, the PPV estimates ranged from 35.7% to 62.0% depending on whether only definitive diagnoses were included in the denominator. The PPV calculations were limited to Lung-RADS 4 because they were sent to clinical diagnostic evaluation, whereas Lung-RADS 3 was kept within the Pilot’s surveillance imaging scope of work. The Pilot PPV estimates are higher than those previously reported for many trials, such as the NLST (3.2%). Differences in calculation methods prevent direct comparisons with the NLST and the NELSON trial24. Like the Pilot, the Lahey Hospital program reported a suspicious predictive value for Lung-RADS 4 of 33% (ref. 21).

The proportion of interval cancers in the Pilot was very low and was significantly lower than that observed in the NLST (6.0%, in T0 + T1, P < 0.05)3 and the NELSON trial (44/344, 12.8%) (P < 0.01)11. The Lahey Hospital program had four of 85 (4.7%) interval cancers in the baseline year and four of 37 (10.8%) in the T1 screening round21. The Pilot had significantly fewer interval cancers than the Lahey baseline and T1 year interval cancers combined: 8/122 (6.6%) (P < 0.03).

Overall, invasive procedures for proven benign nodules were rare in the Pilot: only 20 cases in 4,451 individuals screened (0.4%, 95% CI 0.3–0.7%). A very small number (≤5) of deaths occurred in 145 individuals having invasive procedures, but they were not preceded by a procedure-related complication. Bach et al.25 summarized the findings of 11 studies in which a total of 287 surgeries were undertaken for benign disease in 1,186 screened individuals (24.2%; 95% CI 21.8–26.7%). The Pilot had significantly fewer surgeries for benign disease than the summary pooled results from the Bach et al. report (OR = 0.31, 95% CI 0.08–0.88, P = 0.021).

Participant satisfaction

Three participant satisfaction surveys (Supplementary Table 1) were administered along the screening pathway. Extended Data Table 2 summarizes survey findings. For all three surveys for all questions, the average proportion responding ‘somewhat agree’/‘strongly agree’ or ‘good’/‘excellent’ was greater than 90%, with the minimum value being 83.9%.

Discussion

The Pilot successfully recruited and enrolled 7,768 individuals at high risk for lung cancer between 1 June 2017 and 31 May 2019. Pilot participants were successfully screened, retained and provided with high-quality follow-up, including appropriate treatment as needed. The Pilot lung cancer detection rate and proportion of early-stage disease were high at 2.4% and 79.2%, respectively, and serious harms were low. The cancer detection rate is conservative as additional lung cancers are expected with longer follow-up. The cancer detection rate was higher than observed in other studies, and this may have been due, in part, to the high proportion of current smoking individuals enrolled, which may have been the result of physician referral preferences and the PLCOm2012noRace model including a term for smoking status. Despite the Pilot sites being heterogenous according to geographic area, population served and hospital setting, consistent high performance was demonstrated across sites.

A primary goal of lung cancer screening is to identify individuals at risk of developing lung cancer (and at an early stage). The superior Pilot cancer detection rate can be attributable to the eligibility criteria26, imaging sensitivity and nodule management. Significant quality indicator improvements over those observed in the NELSON trial suggest that a volumetric approach to nodule management is not necessarily needed to achieve high screening performance. The high Pilot PPVs may reflect higher sensitivity, specificity and prevalence of lung cancer in the screening cohort due to high-quality LDCT imaging and reading, an effective screening results management system and the use of the PLCOm2012noRace risk prediction model for participant selection. Overall, the proportion of individuals referred for diagnostic assessment through the Pilot was relatively small and had only a modest impact on resource utilization while yielding a high proportion of cancers detected.

Significant site variation was observed in the proportion of lung cancers detected after image-guided biopsy. This may be attributed to several factors, including radiologists’ interpretations, challenges in obtaining appropriate samples during biopsy, pathology interpretations or the result of reasonable variation in clinical practice. Site-level monitoring and follow-up for diagnostic biopsy outcomes are recommended for root cause assessment and quality improvement.

The low interval cancer rate indicates high program sensitivity to detect lung cancers—in other words, a low number of false negatives. The follow-up period for the Pilot was short, and more interval cancers may have occurred with longer follow-up as well as more screen-detected cancers with longer follow-up of those individuals who ‘remained suspicious, requiring further follow-up’.

The low number of Lung-RADS classifications of 3 and 4 and AIFs detected after baseline is reassuring to healthcare providers concerned about system costs over time and to participants whose quality of life may be negatively affected by abnormal screening findings27. Furthermore, relatively high and inconsistent initial AIF proportions became attenuated with experience and led to an expert panel developing the Recommendations for the Management of Actionable Incidental Findings (link in Extended Data Table 4).

A strong driver of the cost-effectiveness of lung cancer screening and successful outcomes is program adherence28. Adherence to surveillance follow-up scans in the Pilot (>85%) was higher than what has been reported for most US programs22. The high Pilot adherence may be attributed to several factors. Healthcare providers were educated on the benefits of lung cancer screening and thought that lung cancer screening could benefit their high-risk patients. Patients followed recommendations made by their healthcare providers. The information about the Pilot and the value of lung cancer screening provided by navigators to prospective participants was effective. Recall reminder systems helped decrease the likelihood of ‘no-shows’. Navigators helped guide participants along the lung cancer screening pathway. Participant satisfaction surveys describe strong bond formations between participants and navigators. Universal healthcare coverage pays for costs of screening-related medical follow-ups and care, so ‘out-of-pocket’ costs were not a deterrent. There are limited published data on adherence to attendance for diagnostic investigation of abnormal screening results. Lopez-Olivo et al.22, in their systematic review and meta-analysis, reported insufficient evidence to evaluate the rate of diagnostic assessment after abnormal screening results. The Pilot found that approximately 98% of individuals referred for diagnostic evaluation attended. Individuals referred for diagnostic assessment were considered high risk for cancer, and almost all of them received assessment.

The proportions of invasive procedures for benign or non-lung cancer disease, potentially serious complications and unplanned hospital visits within 30 d of an invasive procedure were low in comparison with expected benefits. Image-guided biopsy was the only invasive procedure associated with potentially serious complications in the Pilot, as other more invasive procedures, such as surgical lung biopsy, were less commonly performed. Complications are not unexpected, as image-guided biopsy pneumothorax is not uncommon. All pneumothoraces resolved without further complication. The probability of having a pneumothorax after image-guided biopsy was lower than reported in the literature29.

Wait times at the Pilot sites did not deviate from provincial patterns; the additional volumes that were attributed to the Pilot did not impact overall wait times for CT. Access to imaging has been under pressure provincially, and, although the volume of scans from the Pilot was relatively low, the impact to wait times of these additional scans was likely mitigated by having scans and funding allocated specifically for the Pilot.

Pilot facilitators included responsiveness to physician input and providing physician education on lung cancer screening, navigators assisting participants throughout the screening pathway, radiologist training, community outreach, risk assessment by phone, use of risk model, use of opt-out in-hospital smoking cessation programs, follow-up within Ontario Health databases and universal health care preventing financial barriers. Barriers to good Pilot functioning included site distance from potential participants, wait times being sometimes longer than desired due to capacity issues, initial AIF classification and management needing clarification and optimal smoking cessation programs being unknown and best practice programs not being in place at all sites.

Indigenous people represent 2.8% of the total population of Ontario30. The prevalence of commercial tobacco use is about twice as high in First Nations peoples as in other Ontarians31. Thus, having 5.3% of recruited individuals and 5.7% of scanned individuals self-report to be Indigenous appears to correspond with provincial sociodemographic tobacco use patterns in Indigenous peoples. The number of individuals self-reported as Indigenous may be underestimated due to additional considerations, such as recruitment efforts. More effective ways to engage and enroll Indigenous individuals into lung cancer screening programs remain to be identified. Although the Pilot appears to have enrolled a satisfactory number of Indigenous high-risk individuals, it is critical to partner with Indigenous communities and organizations to improve the inclusion of Indigenous peoples within studies and programs.

Pilot findings have encouraged development of lung cancer screening programs in Ontario, elsewhere in Canada and internationally. The PLCOm2012noRace risk model is now used along with the LLPv2 risk model in the large UK National Health Service (NHS) England Targeted Lung Health Check program32. The NHS registered the PLCOm2012noRace model with the Medicines and Healthcare Products Regulatory Agency for clinical management support. The Pilot success has also led to adoption of the Ontario Lung Screening Program, which started on 1 April 2021, and is based in the initial three primary Pilot sites and two spoke sites, along with an additional Toronto site. Additional provincial screening sites are planned.

Pilot findings indicate that implementation of a hub-and-spoke model is feasible. This information may be particularly useful in jurisdictions that are geographically dispersed. Employment of spoke sites can be an effective way of managing demands on CT capacity to ensure timely patient access. Furthermore, the Pilot showed that navigator-administered risk assessment based on a prediction model works well. That the Lahey Hospital program in the United States appeared to perform similarly well as the Pilot in many indicators (except interval cancers) suggests that high-performing lung cancer screening programs are possible in diverse medical systems.

The Pilot identified successful approaches to implementation of lung cancer screening in Ontario. Intra-country and inter-country heterogeneity exists in lung cancer survival, and this may be explained by multiple factors, including differences in the medical care system. However, such heterogeneity for the large part does not negate generalizability of the findings of the Pilot33,34. Many additional aspects remain to be answered to further improve the Ontario Lung Screening Program. Future studies should investigate whether mobile scanning units can help to effectively reach underserved populations and explore centralization of radiologists and risk assessments. Additional studies are required to assess lowering the risk threshold for eligibility, expanding the starting and stopping ages for screen eligibility and extending the screening interval beyond 1 year for negative scans.

A strength of the Pilot was the ability to follow-up individuals outside of the screening Pilot. Because Ontario’s health system captures medical procedures in central databases, it was possible to track procedures carried out on participants who went outside of the screening sites to undergo assessments and treatments. We were, therefore, able to obtain a more complete picture of individual follow-ups than is possible in other jurisdictions35.

The Pilot also had limitations. Pilot enrollment duration was 2 years, with a 3-month extension for scan completions, and data collection periods were extended for 10 months to obtain diagnostic investigation results and extended for 8 months to obtain registry-based cancer diagnoses. These data collection periods may have been too short to collect some statistics, such as number of lung cancers and interval cancers. However, this timeframe appeared to be sufficient for the purposes of evaluating the feasibility and scalability of the Pilot to a provincial program and for enhancing program effectiveness. Furthermore, the size of the population from which the Pilot participants came was unknown, and, therefore, population rates could not be determined.

The Pilot demonstrates that a risk model based lung cancer screening program can be implemented in a universal healthcare medical system. The Pilot demonstrates that modern lung cancer screening practices have improved compared to the NLST and NELSON seminal trials that led to general acceptance of lung screening programs.

Methods

Setting

Health services are delivered to Ontario’s population of 15.1 million residents through a publicly funded single-payer provincial healthcare system. The province is subdivided into administrative health regions (West, Central, Toronto, East, North East and North West), and service delivery is overseen by Ontario Health (see Extended Data Fig. 1 for the Pilot screening sites in Ontario39). The sites were selected to achieve diversity based on geography and hospital setting (academic versus community). Two selected sites, the Renfrew Victoria Hospital and the Cornwall Community Hospital, were ‘spoke’ sites and shared resources with The Ottawa Hospital hub site. The Cornwall Community Hospital spoke site to The Ottawa Hospital hub was used to determine the impact of an alternative service delivery model for lung cancer screening, with evening and weekend screening, and a shared remote navigator.

Pilot design

The Pilot design was previously described40. Extended Data Fig. 2 presents a simplified Pilot pathway, and Extended Data Table 4 provides a link to an additional pathway map. In brief, individuals could be physician referred or self-presenting for a risk assessment for potential entry into the screening Pilot41. To reduce the referral and risk assessment burden, two triage criteria were used to screen individuals for referral to full risk assessment: ages 55–74 years and ≥20 years of smoking (not necessarily consecutive). Eligibility for screening in the Pilot was based on the PLCOm2012noRace lung cancer risk prediction model with a risk threshold of ≥2.0% over 6 years26,42. Risk assessments were done by phone by trained screening navigators (scripts are available upon reasonable request; Extended Data Table 4). Smoking cessation counseling was a key component of the Pilot and was offered to all individuals who identified that they were currently smoking, irrespective of screening eligibility. Details of the smoking cessation program and evaluation results are presented elsewhere43. Navigators helped guide participants through the screening pathway journey (‘informed participation’) by carrying out risk assessments, discussing risks and benefits, notifying participants of screening results, arranging follow-up visits whether for next screening or for diagnostic assessment and directing to treatment when necessary.

Management of scan results was based on the American College of Radiologyʼs Lung-RADS44. In the Pilot, LDCT scans were carried out at baseline and at 1-year follow-up if the initial scan was non-suspicious (Lung-RADS classification of 1 or 2). If the baseline scan was probably benign yet positive (Lung-RADS classification of 3) or suspicious positive (Lung-RADS classification of 4A), follow-up scans were performed at 6 months and 3 months, respectively. Until 1 October 2018, individuals with Lung-RADS classification of 4A were referred directly for diagnostic assessment. Afterwards, they were referred to 3-month surveillance scan.

If scan findings were highly suspicious (Lung-RADS classification of 4B or 4X), individuals were referred for diagnostic assessment. Pilot sites were responsible for facilitating recall and follow-up scans, whereas, outside of the Pilot, referring physicians were responsible for these activities. Lung-RADS 1.0 has since evolved into the current version, Lung-RADS v2022 (ref. 45). A summary of changes and updates are described in ref. 46.

Globally, many different lung cancer screening results management systems exist, but they have common features in that they guide the participant to the next regularly planned screening, to the early surveillance scan to obtain more information or directly to active clinical investigation when scans are highly suspicious for lung cancer. Extended Data Table 5 provides a simplified summary comparison of different lung cancer screening result management systems for primary findings on baseline scan.

To help ensure radiological reading quality, an expert working group for the Pilot produced the Radiology Quality Assurance Program Manual–Ontario Lung Screening Program47. This included a template for reporting screening results (for link, see Extended Data Table 4). Radiology readers underwent mandatory training, met minimum thoracic image reading requirements and participated in image reading peer reviews.

Incidental findings other than those suspicious for lung cancer were classified as ‘actionable’ by the reading radiologist if they were deemed to be of potential clinical importance. Pilot sites submitted the radiologist description of AIFs to Ontario Health (Cancer Care Ontario), where they were transcribed from the radiology report, as part of their regular data submission process. The AIF descriptions from the Pilot were reviewed and categorized by a Radiology Quality Assurance Working Group, including the Radiology Quality Assurance Lead. The most common categories were then discussed with appropriate specialists (for example, cardiologist, nephrologist, etc.) to determine which findings required clinical confirmation or workup and would be considered actionable. Where relevant, thresholds differentiating between actionable and non-actionable findings were provided. The recommendations took into consideration the actual findings as well as the clinical context. The link to the recommendations for the management of actionable incidental findings developed by the expert panel is available in Extended Data Table 4.

Pilot evaluation

A mixed-methods outcome and process evaluation was undertaken for the Pilot. Extended Data Fig. 3 describes the initial Pilot evaluation framework. Interim evaluation (year 1) findings were previously described26,48. The final evaluation assessed the first 2 years of recruitment (1 June 2017 to 31 May 2019, 24-month duration). The analytic period allowed for participants to complete their baseline and follow-up LDCT scans until 31 August 2019 (27-month duration). Diagnostic assessments of outcomes were included to 31 March 2020 (34-month duration). Registry-based cancer diagnoses were ascertained to 31 January 2020 (32-month duration). The outcome evaluation measured indicators across six key areas of the screening pathway: (1) recruitment, risk assessment, enrollment, scanning and retention of a high-risk population for lung cancer screening; (2) smoking cessation or reduction among Pilot participants; (3) appropriate follow-up of participants with abnormal screening results; (4) detection of lung cancers at an early stage; (5) minimization of harms due to diagnostic assessment of screen-detected abnormalities; and (6) participant satisfaction throughout the screening journey. The process evaluation included measures across three additional areas, including (1) wait times, (2) impact on existing hospital services and (3) stakeholder feedback.

Ethics statement

Ontario Health is designated as a ‘prescribed entity’ for the purposes of section 45 (1) of the Personal Health Information Protection Act of 2004. As a prescribed entity, Ontario Health is authorized to collect personal health information from health information custodians without the consent of the patient and to use such personal health information for the purpose of analysis or compiling statistical information with respect to the management, evaluation or monitoring of the allocation of resources to or planning for all or part of the health system, including the delivery of services. Because this study is in compliance with national and provincial privacy regulations, ethics review was not required. This was confirmed by the University of Toronto Health Sciences Research Ethics Board.

The Pilot was not a grant-funded study with design described by a single study protocol. Instead, conduct of the Pilot was determined by pathway maps and policy statements. Extended Data Table 6 summarizes 11 policies that determined Pilot operations.

Ethics and inclusion for global research

Ontario Health has an Indigenous Cancer Care Unit that aims to improve cancer outcomes for Ontario’s First Nations, Inuit and Métis (FNIM) peoples by reducing inequities in care and access to cancer services (https://www.cancercareontario.ca/en/cancer-care-ontario/programs/indigenous-cancer-care-unit). The Indigenous Cancer Care Unit provided expertise to guide the design of recruitment, enrollment, data collection, screening and follow-up processes as well as analyses, reporting and publication of data for FNIM peoples. The Cornwall Community Hospital spoke site was chosen to develop a recruitment strategy that would provide equitable access to lung cancer screening for individuals from a neighboring First Nations (Akwesasne Mohawk) community.

Sex was identified by self-report. No weighted sampling of participants occurred by sex. Selected results are presented stratified by sex.

Data sources and measures

Data used to calculate all performance measures were extracted from nine provincial or national administrative databases, hospital records submitted by Pilot sites to Ontario Health and anonymous participant experience surveys. Data sources are described in Extended Data Table 7.

‘Screening eligibility’ was defined as the percentage of individuals who underwent risk assessment and had a PLCOm2012noRace risk score of ≥2.0% over 6 years. ‘LDCT scan completion rate’ was defined as the percentage of individuals who were eligible for screening and completed an LDCT scan.

The ‘Lung-RADS classification distribution’ was defined as the number and percentage of completed baseline and annual follow-up scans according to each Lung-RADS classification (1, 2, 3, 4A, 4B and 4X). As performance differs according to whether a scan is a baseline or follow-up scan, stratified distributions are reported.

The ‘abnormal scan rate’ was defined as the percentage of individuals who had an LDCT scan with a Lung-RADS classification of 4A, 4B or 4X.

An AIF was defined as an LDCT scan that had a finding that was not related to lung cancer but was judged by the reading radiologist to be potentially clinically important. During the Pilot, there was no standard definition or clinical guideline for what constituted an AIF for this screening population, and, thus, the identification and recommendations were at the radiologist’s discretion. As the AIF rate was expected to differ according to scan type, the AIF rates were reported according to whether scans were baseline, surveillance or annual follow-up scans.

‘Screening adherence’ was defined as the percentage of participants who had a baseline scan with a classification of 1 or 2 who completed an annual follow-up scan between 11 months and 15 months after their baseline scan. ‘Adherence to 6-month surveillance’ was defined as the percentage of individuals who had a Lung-RADS classification of 3 who completed a follow-up scan within 5–7 months. ‘Adherence to 3-month surveillance’ scan was defined as the percentage of individuals who had a Lung-RADS classification of 4A who completed a follow-up scan within 2–4 months.

‘Adherence to diagnostic assessment’ was defined as the percentage of individuals who were referred for diagnostic assessment after an LDCT scan with a Lung-RADS classification of 4A, 4B or 4X who attended diagnostic assessment.

‘Cancer detection rate’ was defined as the percentage of lung cancers that were screen detected, calculated as the number of people diagnosed with lung cancer who had an abnormal screening result (Lung-RADS classification of 3, 4A, 4B or 4X) divided by the number of participants screened in the Pilot.

‘Sensitivity’ was defined as the percentage of lung cancers observed in the Pilot cohort that had an abnormal LDCT scan result (Lung-RADS classification of 3, 4A, 4B or 4X) that preceded the diagnosis (that is, that would initiate actions leading to an investigation leading to diagnosis of lung cancer or not). ‘Specificity’ was defined as the percentage of individuals in the Pilot cohort who were not diagnosed with lung cancer whose previous LDCT scan had a normal result (Lung-RADS classification of 1 or 2). Youden’s J index was calculated as sensitivity + specificity − 1. PPV was defined as the percentage of individuals with abnormal LDCT results (Lung-RADS classification of 4A, 4B or 4X) who underwent diagnostic assessment and had a definitive diagnosis of lung cancer. False-positive rate was defined as the percentage of individuals who had an abnormal screening result (Lung-RADS classification of 4A, 4B or 4X) and who were not diagnosed with lung cancer.

The ‘interval cancer rate’ was defined as the percentage of lung cancer diagnoses that occurred within 12 months of an LDCT scan with a normal result (Lung-RADS classification of 1 or 2) (that is, occurred before the next planned annual scan).

‘Stage distribution’ was defined as the number and percentage of screen-detected lung cancers that were diagnosed at TNM stage I, II, III and IV.

The ‘benign’ (non-lung cancer) invasive procedure rate was defined as the percentage of individuals who underwent an image-guided biopsy, surgical lung biopsy or resection after an abnormal screening result (Lung-RADS classification of 4A, 4B or 4X) who were not diagnosed with lung cancer.

The ‘rate of unplanned hospital visits’ was defined as the percentage of individuals who had an unplanned hospital visit within 30 d of an invasive diagnostic procedure (image-guided biopsy, surgical lung biopsy or resection) after an abnormal screening result (Lung-RADS classification of 4A, 4B or 4X).

The ‘potential serious complication rate’ was defined as the percentage of individuals who had a potentially serious complication within 30 d of an invasive diagnostic procedure after an abnormal screening result (Lung-RADS classification of 4A, 4B or 4X).

The ‘death rate’ was defined as the percentage of participants who died within 30 d of an invasive diagnostic procedure.

Impact on existing hospital services

The impact of the Pilot on existing hospital services was evaluated by examining (1) the proportion of total site non-emergency CT scan use in hours that was consumed by Pilot LDCT scans; (2) the wait time in days from CT referral to the date of the CT scan; and (3) wait time from date of CT referral to date of diagnosis. Wait time comparisons were made with provincial standards where available.

The median and 90th percentile wait times (in days) were measured for seven specific time intervals: wait time from Pilot entry (first contact) to risk assessment completed; from Pilot entry to baseline LDCT scan completion; from risk assessment to baseline LDCT scan completed; from LDCT scan order date to LDCT scan completion; from LDCT scan completion to communication of results to screening participant; from lung diagnostic assessment date of referral to first thoracic consultation; and from referral to diagnostic assessment to definitive diagnosis.

Participant satisfaction throughout the Pilot journey was assessed using structured Likert scale surveys administered after risk assessment and Pilot explanation (five questions), baseline scans (11 questions) and annual follow-up scans (nine questions). Supplementary Table 1 provides the survey questions.

Statistical analysis

Descriptive analyses reporting frequencies and percentages were performed using univariate statistical methods. To prevent privacy concerns, Pilot counts of five or fewer are reported as ≤5. The binomial exact method was used to estimate CIs for proportions. Chi-square or Fisher’s exact tests were used to determine if there were statistically significant differences in proportions reported across screening sites or strata of study groups or with findings of different comparison groups. Non-parametric Wilcoxon rank-sum tests were conducted to assess differences in comparison groups in which data had skewed distributions. The Stata ‘bitesti’ command was used to test if proportions observed in the Pilot were different from pooled summary results in published reports. Youden’s J index (sensitivity + specificity − 1) was used to compare sensitivity combined with specificity between groups. Wait times were reported by the median and 90th percentile in hours or days. Stata 17.1 MP, SAS version 9.4 and Microsoft Excel (2016) software were used for statistical analyses. A two-sided P < 0.05 was used to determine statistical significance.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.