Introduction

Pulmonary function measures using the spirometer are indicators of respiratory health and predict morbidity and mortality1,2. However, these parameters, which include the force expiratory volume in 1-second (FEV1), forced vital capacity (FVC), and peak expiratory capacity (PEF), vary significantly among populations of different ancestry backgrounds3 and show strong evidence of genetic and environmental influences1,4.

During the last decade, large-scale genome-wide association studies (GWASs) have used various pulmonary parameters to evaluate the genomic loci associated with pulmonary function and related traits that have yielded hundreds of associated variants5,6,7,8,9,10. These and other studies indicate that genomic loci associated with pulmonary function overlap with chronic obstructive pulmonary disease, asthma, pulmonary fibrosis, lung cancer, and other pulmonary phenotypes2,8,9,10. For example, a recent GWAS based on the UK Biobank cohort (N = 50,008), including heavy smokers and never smokers, identified six loci associated with low FEV110. Another study of individuals (N = 48,943) sampled from the extremes of pulmonary function distribution in the UK Biobank identified 95 variants strongly associated with chronic obstructive pulmonary disease susceptibility8. Importantly, these previous studies have applied the analyses to a selected population group of primarily European ancestry.

The UK Biobank cohort contains data on 389,449 individuals, providing an opportunity to use GWAS approaches to identify variants associated with pulmonary function among individuals of European and recent African descent by allowing large-scale comparisons of lung function parameters11. Furthermore, by integrating the genetic association of FEV1, PEF, and FVC, a list of shared loci that collectively modify pulmonary function could be identified. We hypothesise that different genetic variants are associated with pulmonary function in Africans. Thus, their identification will provide additional information relevant to understanding pulmonary function in physiology and disease in district populations. However, to our knowledge, no GWAS study has been performed to compare the SNPs associated with the full range of FEV1, FVC, and PEF parameters across the entire UK Biobank cohort and separately among Africans and Europeans.

Here, we compare variations in pulmonary function parameters among individuals of African and European ancestry represented in the UK biobank. First, we used the genome-wide associated summary statistics for three UK Biobank-defined continuous pulmonary function parameters: FEV1, FVC, and PEF. Then, we conducted further analyses to identify genes, regions, and gene sets associated with each pulmonary phenotype. Furthermore, we evaluate the candidate phenotype variants in relation to published GWAS results. Overall, this approach allows us to report credible loci associated with pulmonary function among Africans and Europeans, which were enriched across many plausible genes and gene sets involved in pulmonary function or related phenotypes.

Results

UK Biobank pulmonary function demographics

There were 389,449 participants, comprising Europeans (N = 383,471) and Africans (N = 5978). The average participant age at recruitment was 56.8 years (standard deviation = 8.0 years) for Europeans and 51 years (7.9) for Africans, respectively. This difference was statistically significant (Welch test: t = −41.3, p = 9.07 × 10−300) (see Supplementary Fig. 1).

Lung function parameters vary between individuals of European and African ancestry

We assessed the mean FVC, FEV1, and PEF between Europeans (N = 383,471) and Africans (N = 5978) represented in the UK Biobank datasets. We found that the mean FVC was significantly higher in the Europeans (mean = 3.73 L) compared to the Africans (mean = 2.95 L), (Welch test: t = 48.35, p < 1 × 10−320; Fig. 1a). Furthermore, we found that the FEV1 and the PEF were both significantly higher in Europeans (mean FEV1 = 2.82 L, mean PEF = 389.6 L/min) than those measured in the Africans (mean FEV1 = 2.28 L, mean PEF = 332.7 L/min), FEV1; t = 42.60, p = 1.0 × 10−291 (Fig. 1b) and PEV; t = 24.06, p = 1.7 × 10−107; Fig. 1c. According to a recent systematic review, “Whites” have higher pulmonary function parameters than other ethnic groups (including Africans)12. About 50% of these articles cited inherent factors and anthropometric differences to explain the observed differences. However, similar to other studies13,14,15, our findings show that these variations in pulmonary function measures exist across various ages, heights, and BMI percentiles (Fig. 1 and Supplementary Fig. 2ai). However, using a generalised linear model, we found that the observed higher FVC, FEV1, and PEF in Europeans compared to Africans is not due to the age difference between the two groups, even though the FVC (t = −19.26, p = 1.0 × 10−82), FEV1 (t = −16.68, p = 1.88 × 10−62), and PEF (t = −11.91, p = 1.01 × 10−32), tend to reduce with age (see Supplementary Table 1 and Supplementary Note 1). Recently, a lack of knowledge among healthcare workers concerning variations in pulmonary function measures among ethnic groups has been suggested to impact the assessment of minority patients’ recovery from COVID-1915. However, no studies have identified major genetic variants that vary by ethnic groups that can explain the disparities in lung function15,16.

Fig. 1: Comparison of the pulmonary function parameter among Africans and Europeans.
figure 1

The boxplots indicate the distribution of (a) FEV1, (b) FVC, and (c) PEF in Europeans (n = 383,471) and Africans (n = 5978). The p-values shown for each comparison were calculated from Welch’s t-test. On each box, the central mark indicates the median, and the left and right edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+‘ symbol. To make the visualisation clearer, the filled circle mark showing the distribution only includes 1000 randomly sampled points from the total sample size of each group. Error bars showing the variation in (d) FVC and (e) FEV1 across BMI percentiles and height, respectively, among Africans and Europeans. The middle point indicates the mean FVC or FEV1/FVC, and the error bars indicate the standard error of the mean at the BMI percentile.

Previous studies show that the FVC, FEV1, and PEF vary with age, body mass index (BMI), and height of individuals17,18,19,20. Here, we also found that FVC, FEV1, and PEF tend to reduce with age, an increase in BMI is observed at the 50 percentile, and all three parameters increase along with the height of the individuals (Fig. 1d,  e, and Supplementary Fig. 2a, i). However, unlike age and height, we found that the relationship between pulmonary function parameters and BMI appears to be associated with overweight/obesity, with a threshold effect and not a simple linear relationship (see Supplementary Note 1). Furthermore, we observed that the FEV1/FVC levels are conversely higher in Africans than Europeans across the BMI percentiles (Supplementary Fig. 2f).

Genetic variant associated with FVC, FEV1 and PEF among Europeans and Africans

Since the FVC, FEV1, and PEF values were significantly higher in Europeans than in Africans, we presumed that a genome-wide association analysis would identify the genetic variants associated with each of these pulmonary function parameters in each group. Therefore, we collected the GWAS summary statistics for each pulmonary function parameter within each ethnic group (see the “Methods” section). In these data, we discovered 1 variant in Africans and 67,855 variants in Europeans that were associated (GWA p-values 5 × 10−8) with FEV1, 6 in Africans and 79,132 in Europeans that were associated with FCV, and zero (0) in Africans and 26,432 in Europeans that were associated with PEF (Supplementary Fig. 3ac). The total number of significant variants discovered for each pulmonary function parameter, including those in substantial linkage disequilibrium (R2 > 0.4), and the intersection of these variants are displayed in Supplementary Fig. 3af.

We applied fine mapping21 to identify 310 (credible set) casual variants significantly associated (p-values < 5 × 10−8 and causal probability >0.1; see “Methods” section) with FVC in Europeans and 2 significant associations in Africans (Fig. 2ab). For FEV1, we found 308 significant causal variant associations in Europeans and 1 in Africans (Fig. 2c, d). Furthermore, for PEF, we identified 374 significant causal variant associations in Europeans and none (0) in Africans (Fig. 2e, f). Overall, we identified 820 unique credible SNPs associated with the three pulmonary functions. Surprisingly, the significant SNPs associated with FVC, FEV1, and PEF were unique to each ancestral group (Fig. 2gi and Supplementary Data 1).

Fig. 2: Manhattan plots and Venn diagrams of the SNP associations.
figure 2

The Manhattan plots include those of the SNPs associated with (a) FEV1 in Africans and (b) FEV1 in Europeans, (c) FVC in Africans and (d) FVC in Europeans, and (e) PEF in Africans and (f) PEF in Europeans for each chromosome. The Venn diagrams show the overlap among the significant causal SNPs associated with (g), FVC (h), PEF, and (i) FEV1 in Africans and Europeans. The distribution of genetic variants associated with three pulmonary function parameters among the (j) Africans and (k) Europeans. Refer to Supplementary Data 1 for details concerning individual SNPs and their frequencies among Africans and Europeans.

Next, we evaluated the independent SNPs associated with the three pulmonary function parameters while considering the population’s linkage disequilibrium structure (see the “Methods” Section). Here, we identified 630 independent SNPs from the 820 credible sets of causal SNPs associated with all three pulmonary function parameters. Finally, we compared the 627 independent SNPs in Europeans with the 3 SNPs in Africans significantly associated with the three pulmonary function parameters and found no common variants between the two sets. Conversely, we found that 164 SNPs were associated with FVC and FEV1 in Europeans (Fig. 2k). However, there was no overlap in the associated SNPs among Africans (Fig. 2j). Finally, it should be noted that smoking impacts pulmonary function, but the effect of smoking was not accounted for in the GWA analyses. Therefore, this is probably a limitation of our findings.

Since the SNPs significantly associated with pulmonary function were unique for Europeans and Africans, we next relaxed the GWAS significance threshold to a suggestive cut-off p-value22 of 1 × 10−6. Then, we compared the significant SNPs in Europeans and Africans for FVC, FEV1, and PEF. For all three pulmonary function metrics, even when using a less strict significance criterion, we were unable to discover any shared SNPs between Africans and Europeans (Supplementary Fig. 3df). Furthermore, we found that the most statistically significant SNPs in Africans had relatively larger beta estimates in Africans than Europeans for the FCV, FEV1, and PEF (see Supplementary Fig. 4 and Supplementary Notes 2). In addition, we have provided an interactive online visualisation that allows the user to evaluate the significance of SNPs in each group using an arbitrary significance threshold and compare the SNPs on different chromosomes, linkage disequilibrium loci, and genes, for FVC (Supplementary Figs. 5,  6), FEV1, and PEF (see the Supplementary Notes: Comparison of variants associated with pulmonary function).

We compared the minor allele frequency of SNPs in the UK Biobank between Europeans and Africans for the combined 820 SNPs (817 in Europeans plus 3 in Africans) associated with pulmonary function. We found that 788 out of 820 SNPs differed significantly in frequency between Africans and Europeans (Supplementary Data 2). The top-three variants that exhibited the most significantly higher frequencies in Europeans compared to Africans were rs2042395 (frequency in Europeans = 0.77, In Africans = 0.19, Fisher test p-value = 4.94 × 10−323), rs3748400 (Europeans = 0.78, Africans = 0.19, p = 6.92 × 10−323), rs8045843 (Europeans = 0.78, Africans = 0.17, p = 8.89 × 10−323), see Supplementary Data 2 and Supplementary Fig. 2g. Interestingly, the variants rs2042395 and rs8045843 have been previously associated with the “well-being spectrum”23 and “sensitivity to environmental stress and adversity”24, respectively, in individuals of European ancestry. Conversely, the top variants with higher frequency in Africans compared to Europeans were rs143384 (Europeans = 0.40 and Africans = 0.92, p = 2.0 × 10−323), rs3133084 (Europeans = 0.23 and Africans = 0.65, p = 8.4 × 10−323), and rs7853063 (Europeans = 0.20 and Africans = 0.60, p = 6.4 × 10−323), see Supplementary Fig. 2g. Among these, the variant rs143384 has been reported to be associated with FVC, lung function, and PEF25, and among anthropometric traits in Europeans26.

Altogether, these analyses revealed that different SNPs may be associated with FVC, FEV1, and PEF among Europeans and Africans and that the frequency of these SNPs significantly varies between these populations.

Pathway and GWAS catalog enrichments of the SNPs

We assessed the enrichment of GWAS Catalog27 annotation terms for the genes containing SNPs associated with lung function (suggestive cut-off p-value of 1 × 10−6) in each study population (see Supplementary Data 2).

The GWAS Catalog term analyses revealed that in Europeans, the genes were significantly enriched for GWAS terms associated with “Height” (hypergeometric test; p = 1.06 × 10−93), “Lung function (FEV1)” (p = 5.4 × 10−25), “Pulmonary function interaction” (p = 2.33 × 10−19) among others (Fig. 3a and Supplementary Data 3). In Africans, we found that the genes were significantly enriched for GWAS terms associated with “Subcutaneous adipose tissue” (p = 1.2 × 10−07), “Birth weight” (p = 3.7 × 10−04), “Cognitive decline rate in late mild cognitive impairment” (p = 7.3 × 10−04), among others (Fig. 3b and Supplementary Data 3). Overall, these results show that the SNPs identified among Europeans are in genes known to play roles in many phenotypes, most notably those related to pulmonary function or GWAS phenotypes related to pulmonary function. Conversely, the SNPs we identified associated with pulmonary function among Africans fall within genes that are not enriched for pulmonary function-related terms.

Fig. 3: GWAS catalog enrichment analysis plots.
figure 3

Volcano plots of the GWAS Catalog enrichment analysis for genes where significant SNPs are located for (a) Europeans and (b) Africans. All four plots show the adjusted p-value on the y-axis and the odds ratio of the enrichment score on the x-axis. Each circle represents a GWAS Catalog term or Elsevier pathway. The circles are coloured based on the levels of statistical significance, with the redder colours showing a greater degree of significance. Each circle is sized based on the combined enrichment score of the term represented by the circle.

Variant spanning loci associated with pulmonary function among Europeans and Africans

Many of the associated SNPs may simply reflect the linkage disequilibrium structure of the populations28,29 (see Supplementary Data 4). For example, we found 10 variants associated with FEV1 and FVC in Europeans within loci 12q14.3, and upon fine mapping21, we found that the most likely causal SNP within the loci was rs1351394 (Probabilistic Identification of Causal SNPs21, causal probability value = 0.7243), a 3-prime untranslated region variant located in the gene HMGA2 (Fig. 4a). The variant rs1351394 has previously been associated with variations that affect FEV1 capacity, including height30,31 and birth length32. Furthermore, HMGA2 is involved in lung development33.

Fig. 4: Regional association plots for genome-wide significant pulmonary function.
figure 4

These include the loci for the lead SNPs (a) rs1351394 at loci 12q14.3, (b) rs16909898 at 9q22.32, (c) rs147110934 at loci 19q13.42, and (d) rs369476290 on chromosome 22. The genes within the chromosomal loci are shown in the lower panel. The blue line indicates the recombination rate. The filled circles show the position of the SNPs along the region on the x-axis and the negative logarithm of the association p-value on the y-axis. The lead SNP is shown in purple, and the SNPs within the locus are coloured based on the linkage disequilibrium correlation value (r2) with the lead SNP based on the European HapMap haplotype (in panels a, b, and c) and African HapMap haplotype (panel d) from the 1000 genome project.

At locus 19q13.42, we found that the most likely causal SNP is rs147110934 (causal probability = 0.83), associated with FEV1 and FVC in Europeans (Fig. 4b, also see Supplementary Data 4). rs147110934 is a predicted missense variant that falls within the ZNF628 gene. In addition, whilst rs147110934 has not been previously associated with pulmonary function, we found it is associated with height34 and body weight35,36, both of which are associated with FVC and FEV1.

Furthermore, we found several SNPs in the loci 9q22.32 associated with pulmonary function (Fig. 4c). Here, the lead and predicted causal (causal probability = 1) variant is rs16909898, located in the PTCH1 gene previously identified to modify pulmonary function parameters37,38 and height31.

In addition, for individuals of African ancestry, at the locus 5q32, the lead SNP among the four associated with pulmonary function was rs369476290 (causal probability = 0.67), an intergenic variant located near the gene ISX. rs369476290 has not been previously linked to pulmonary function or disease (Fig. 4d).

Since the SNPs significantly associated with pulmonary function were unique for Europeans and Africans, we next set to compare the estimated beta values for all SNPs with a GWA significance of <0.05. Here, we found that the most statistically significant SNPs in Africans had relatively larger beta estimates in Africans than Europeans for the FCV, FEV1, and PEF (Supplementary Fig. 4). Overall, this finding showed that the SNPs significantly associated with pulmonary function in Africans demonstrated larger effect sizes than in Europeans. Conversely, we found thousands of variants associated with pulmonary function in Europeans that tended toward statistical significance in Africans (see Supplementary Notes: Comparison of variants associated with pulmonary function).

Furthermore, we aimed to replicate the causal variants associated (p < 5 × 10−8) with pulmonary function in Europeans in Africans at a p-value of less than 0.05. Interestingly, we found 56 independent variants that could be associated with pulmonary function in both Europeans and Africans (see Supplementary Note 3). These include, among others, the loci near the gene MECOM, where the causal SNP rs11709963 was associated with FEV1 (p-value = 5.3 × 10−19) in Europeans. There was some evidence for an association within the region for Africans (rs1362771, r2 = 0.51 the causal SNP rs11709963 in Europeans) was associated with FVC (replication p = 0.02), see Supplementary Data 4 and Supplementary Fig. 7. Furthermore, a SATB2 variant, rs77064030 (p-value in Europeans = 6.7 × 10−11) that is in linkage disequilibrium with rs78696503 (r2 = 0.8), associated with FEV in Africans (replication p-value in Africans = 0.007), see Supplementary Fig. 8. Among variants associated with PEF, is the FAM132A variants rs79361800 (p-values; Europeans = 9.20 × 10−10 and Africans = 1.02 × 10−5), see Supplementary Fig. 9.

Therefore, we suggest that our findings may be due to both the difference in the sample size (which is associated with the statistical power to identify the causal variants) and the existence of different variants associated with pulmonary function among European and African individuals.

Comparison to variants previously associated with pulmonary function

Next, we aimed to identify the previously described and novel SNPs among the significant SNPs that were also predicted to be causal within a particular linkage disequilibrium block (see the “Methods” section). Here, we grouped the SNPs into four ordinal categories based on confidence: (1) SNPs reported to be associated with pulmonary function, (2) SNPs related to phenotypes correlated to pulmonary function (e.g., height, see Supplementary Fig. 1), (3) SNPs that fall within genes reported to be associated with pulmonary function and/or disease, (4) SNPs that are expression quantitative trait loci (eQTLs) in the lung, and (5) the novel SNPs.

Interestingly, we found that among our list, 97 variants in Europeans and none (0) in Africans have been previously associated with pulmonary function (see Table 1 and Supplementary Data 4). These include variants in the genes PLEKHM1, HMGA2, KDM2A, and SYTL2 (Table 2). Likewise, we found that 69 variants in Europeans, and none (0) of the variants in African ancestry individuals had previously been associated with a phenotype correlated to pulmonary function (see Supplementary Data 4). Furthermore, we found that 178 variants in Europeans and 0 variants in Africans are located within genes associated with various pulmonary function phenotypes and diseases, and 4 variants in Europeans and none in Africans are significant eQTLs in the lungs. These four variants affect the expression of CAMLG, PHF15, RNF40, and MLLT6. Finally, we found 206 novel variants in Europeans and 3 in Africans associated with pulmonary function; see Supplementary Data 4 for the complete list of significant variants and the studies reporting the known variants. Among the novel discoveries, in Europeans, 104, 101 and 136 were associated with FVC, FEV1, and PEF, respectively, whereas in Africans, 2, 1 and 0 were associated with FVC, FEV1, and PEF, respectively.

Table 1 Known and novel variants associated with pulmonary function.
Table 2 Top significant variants associated with pulmonary function.

We focused on the genes in which the novel SNPs associated with pulmonary function among Europeans were located to perform enrichment analyses based on the Disease Gene Network database39, and the Phenotype and Genotype Integrator database40. Here, our Disease Gene Network analysis revealed that the novel genes are enriched for terms related to pulmonary function, including “Forced expiratory volume function” (p = 9.7 × 10−13) and body measures that modify pulmonary function, including “Body Height” (p = 1.33 × 10−15), see Supplementary Fig. 10. Similarly, our phenotype and genotype integrator enrichment analysis revealed that the genes are enriched for pulmonary function-related terms, including Forced Expiratory Volume (p = 2 × 10−4) and phenotypes associated with pulmonary function, including Body Height (p = 4.2 × 10−07), see Supplementary Fig. 10. These findings show that despite the SNPs being novel among Europeans, the genes within which the SNPs are located are known to be associated with pulmonary function.

Bias in GWAS studies explains why few SNPs were previously associated with pulmonary function in Africans

Since none of the SNPs we identified as being associated with pulmonary function among Africans has been reported in the literature, we queried the GWAS Catalog27 for previous studies of pulmonary function or phenotypes related to pulmonary function (such as asthma) across various ancestry backgrounds. We found those studies to be significantly biased toward individuals of European ancestry (Fig. 5a). Also, despite the number of studies conducted on individuals of African ancestry increasing over the last five years, the gap is widening between the number of studies reported on Europeans compared to Africans during the same time interval (Fig. 5a). Overall, among the 235 GWAS studies reported on pulmonary function or phenotypes related to pulmonary function, only eight were conducted on Africans or African Americans. In comparison, we found that 120 studies have been conducted exclusively on individuals of European ancestry (Fig. 5b). Furthermore, in the same studies, the cumulative sample size of the Europeans in 2021 (10,633,660 individuals) is approximately 235 times greater than that of the Africans (45,189 individuals; see Fig. 5c).

Fig. 5: The GWAS catalog of pulmonary function and lung phenotypes studies.
figure 5

a The plot of the running sum of GWAS studies reported from 2007 to 2021. The colours show details about the race/ancestry groups: Africans only, Europeans only, Europeans and Africans, Others, and those for which the race/ancestry group is “Not provided”. b The total number of GWAS studies reported for each race/ancestry group combination. The colours depict information about race and ethnic groups. c The trend of the cumulative sum of participants (on the y-axis) of studies from 2007 to 2021. The colours show details about the race/ancestry groups. The marks are labelled by the cumulative sum of participants. The figure insert shows the total number of participants by race/ancestry group.

Discussion

We analysed variations in pulmonary function and the associated genetic variants among individuals of African and European ancestry in the UK Biobank. Here, we report differences in FEV1, FVC, and PEF parameters among Africans and Europeans. Previous studies have examined the pulmonary function parameters between Africans and Europeans, with most reporting the differences we observed3,41,42,43. However, there has been no explanation for the genetic basis of these observed differences.

Here, we showed that the SNPs associated with pulmonary function differed between Europeans and Africans. Others have reported that the genetic variants associated with various phenotypes may differ among individuals of different ancestry44,45,46,47. For example, we found that the SNPs associated (p < 5 × 10−8) with pulmonary function in African individuals were non-significant in Europeans, even at a p-value cut-off threshold of 0.05 (see Supplementary Note 2). Our findings confirmed that different variants might be associated with pulmonary function among Africans and Europeans. Despite this observed difference between the two ancestral groups, we are also cognizant that the number of individuals of African ancestry represented in the UK Biobank is much lower than that of Europeans. To some extent, the smaller calculated beta estimates with larger standard errors in the African group compared to the European group are explained by the difference in the sample size (see the interactive plot here). Therefore, the smaller sample size of Africans may have resulted in us missing some common associations among the groups48,49. It would be interesting to evaluate our findings based on a larger sample of individuals of African ancestry.

Given that the frequency of SNPs, primarily those we found associated with pulmonary function, varies between Africans and Europeans, it is apparent why different variants are associated with these traits48. For example, we found that rs12925700 is approximately 21 times more frequent, and rs11205303 is 14 times more frequent in Europeans than Africans, and both SNPs are reported elsewhere50,51 and here as being associated with pulmonary function in Europeans. Furthermore, the frequency of genetic variants among individuals of a particular ancestry affects the penetrance of disease and phenotype associated with the alternate alleles48,52,53,54,55. For example, non-alcoholic fatty liver disease56, serum uric acid levels57, white blood cell count58, fatty acid desaturases59, and other phenotypes60,61,62 are associated with different alleles among Africans and Europeans. These alleles are sometimes located on the same gene, but their frequencies vary between ancestral groups.

Our enrichment analyses demonstrated a link between the significant SNPs and GWAS Catalog terms associated with pulmonary function in Europeans, with several results showing plausible biological mechanisms. Whereas it was apparent that the significantly enriched terms in Europeans were mainly associated with pulmonary function and related phenotypes (Fig. 3), we found that the top-ranking terms among SNPs in Africans are not related to pulmonary function. This finding exemplifies the bias in previous GWAS studies that have not picked up genes associated with pulmonary function in Africans. We believe that more GWAS on larger groups of Africans than those presented here are needed to identify the variants that modify pulmonary function and other traits.

We also showed that genetic association studies of pulmonary function, pulmonary physiology, and pathology are significantly biased toward individuals of European ancestry. Even in cases where individuals of African ancestry are included in the studies or studied separately, the number of participants is lower than that of individuals of European ancestry. Furthermore, the trend shows that this gap has widened vis-à-vis how Africans and Europeans are studied over the last few years (see Fig. 5).

In summary, we have revealed the extent of variations between Africans and Europeans in the pulmonary function parameters: FEV1, FVC, and PEF. In addition, we have identified the different genetic variants associated with pulmonary function among individuals of African and European ancestry. Our integrative analysis of the causal genetic variants, together with the GWAS phenotypes and diseases associated with the genes in which the variants fall, indicates that the significant SNPs are associated with pulmonary function and related phenotypes in Europeans. Therefore, more genetic association studies focusing on people of African ancestry are evidently needed to identify and validate additional causal variants for these traits and other diseases.

Methods

We analysed a UK Biobank11 dataset of 383,471 individuals of European ancestry (designated as White, British, Irish, and “any other white background”) and 5978 individuals of recent African ancestry. The UK biobank obtained all participant samples and body measurements from consenting individuals. Information on the UK biobank ethics policy and approval can be found here: https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics.The demographics of the UK Biobank participants are extensively described elsewhere11. The data elements we analysed include genotyping array data of imputed SNPs, anthropometric measurements, and pulmonary function parameters: FVC, FEV1, and PEF. The ancestry groups were initially defined by self-identification. Then, a principal component analysis was performed, followed by a random forest on the projected principal component analysis data to reassign the initial self-defined ancestries of individuals with a membership posterior probability >0.5. Other individuals with a posterior probability less than 0.5 for any given ancestry group were dropped from further analysis.

Comparison of pulmonary function parameters in Europeans and Africans

We compare the mean values of the pulmonary function parameters FVC, FEV1, and PEF between 383,471 Europeans and 5978 Africans using the Welch t-test. Furthermore, to evaluate how FVC, FEV1, PEF, and FEV1/FVC values vary with the participant’s body mass index, height, and age, we calculated the 10th percentile bins of each anthropometric measurement and visualised the trend using error bars plotted for each percentile.

Genome-wide identification of genetic variants and associations

The methods applied for genotyping participants in the UK Biobank are reported elsewhere11,63. Furthermore, the genotyping quality control implemented for the analyses is described at the following link https://pan.ukbb.broadinstitute.org/docs/qc. We obtained the GWAS summary statistics computed by the UK Biobank project for each pulmonary function parameter. The methods used to perform the GWA analyses are described elsewhere64,65. Briefly, the GWAS was performed for the pulmonary function phenotypes and ancestry groups using the Scalable and Accurate Implementation of Generalized Mixed Model Approach 65, using a linear or mixed logistic model including a kinship matrix as a random effect and covariates as fixed effects. The covariates included the participant’s age, sex, age multiplied by sex, the square of the age, the square of the age multiplied by the sex, and the first 10 principal components calculated from the genotype datasets. The Manhattan plots were produced in MATLAB using the software described here66. Furthermore, we used the Probabilistic Identification of Causal SNPs software with default settings to fine-map SNPs to identify the most credible causal SNPs within each linkage disequilibrium block while conditioning on the lead SNP signal in each locus ±500 kb21.

Identification of unique and common variants

We applied the following approach to identify the unique variants associated with pulmonary function traits in Africans and Europeans. First, we extracted all the credible sets of causal variants associated with pulmonary function (FVC, FEV, and PEF) within ±500 kb of the most statistically significant variant within a particular linkage disequilibrium block. Then, the linkage disequilibrium structure of the populations was estimated using the UK Biobank and the same individuals used in the analysis. If a causal variant associated with one pulmonary function parameter (e.g., FVC) was associated with another pulmonary function measure (e.g., p-values <5 × 10−8 for FEV1) or in linkage disequilibrium (r2 > 0.4) with a variant associated with another pulmonary function parameter (e.g., FEV1), then we return the most statistically significant variant (i.e., the variant with the smallest GWA estimated p-value). This approach allowed us to remove 190 non-independent variants from the 820 (FVC = 310, FEV1 = 309, and PEV = 374) causal variants, leaving 630 independent (credible set) causal variants (FVC = 256, FEV1 = 233, and PEF = 297) associated with pulmonary function.

Replication of variants in Africans

We attempted to replicate the significant finding from Europeans in Africans because the variants associated (p < 5 × 10−8) with pulmonary function in Europeans were not associated with pulmonary function in Africans. Here, for variants significantly associated with a pulmonary function parameter, we extracted all the variants linked (linkage disequilibrium: r2 > 0.4) to the causal variant. The linked variants in Europeans were then assessed for their association with the trait in Africans by extracting the estimate of GWA p-values in Africans and adjusting the p-values using the Benjamini and Hochberg procedure. Finally, we considered all variants with the adjusted p-values <0.05, within each linkage disequilibrium block, as evidence of local replication.

Pathways and enrichment analyses

We used NBCI’s dbSNP67,68 to ascribe the significant variants associated (suggestive cutoff p-value of 1 × 10−6)22 with pulmonary function identified using GWAS to specific genes. This yielded a list of genes associated with pulmonary function in Europeans or Africans. Finally, using these two gene lists (for Europeans and Africans), we separately performed gene set enrichment analysis69 using Enrichr70 to identify the Elsevier pathways70, Disease Gene Network database39, Phenotype and Genotype Integrator database, and GWAS Catalog27 ontology terms that are significantly enriched for (see Supplementary Data 3).

GWAS literature, disease phenotypes, and eQTLs

We retrieved data from the previous GWAS of pulmonary function and pulmonary function-related phenotypes from GWAS Catalog27. This information was subset into two categories: “pulmonary reported”; for those studies that reported pulmonary function phenotype, and “pulmonary associated” for those that reported associations related to pulmonary function-related phenotypes (see Supplementary Data 4). We used the approach described above to identify variants previously reported to be associated with pulmonary function or pulmonary disease in GWA studies in the GWAS Catalog to identify novel variants associated with pulmonary function separately for Europeans and Africans. Briefly, for each variant we found associated with pulmonary function, we searched for variants in the GWAS Catalog that are in strong linkage disequilibrium (r2 > 0.4) with the variant. If any variant meets this criterion, we consider the associated variant in our study to have been previously reported elsewhere or otherwise novel. Furthermore, we obtained information on diseases associated with the genes in which the variants are located from the Pharos database71. Finally, information on SNPs that are expression quantitative trait loci in the lungs was obtained from the Genotype-Tissue Expression consortium database72.

Statistics and reproducibility

We performed the statistical analyses in R programming language, MATLAB 2021a and Bash. We used the Welch test, Wilcoxon rank-sum test and the one-way analysis of variance to compare continuous measures among groups. All statistical tests were considered significant if the two-sided p-value was <0.05 for single comparisons. The multiple hypotheses tests were corrected by calculating a two-sided q-value (False Discovery Rate) for each group/comparison using the Benjamini and Hochberg procedure73.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.