Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power

Matters Arising to this article was published on 25 November 2021

Abstract

Admixed populations are routinely excluded from genomic studies due to concerns over population structure. Here, we present a statistical framework and software package, Tractor, to facilitate the inclusion of admixed individuals in association studies by leveraging local ancestry. We test Tractor with simulated and empirical two-way admixed African–European cohorts. Tractor generates accurate ancestry-specific effect-size estimates and P values, can boost genome-wide association study (GWAS) power and improves the resolution of association signals. Using a local ancestry-aware regression model, we replicate known hits for blood lipids, discover novel hits missed by standard GWAS and localize signals closer to putative causal variants.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Painted karyograms of a simulated African American individual showing local EUR and AFR ancestral tracts across data treatments.
Fig. 2: GWAS power gains across sample sizes, ancestral MAF differences, admixture proportions and effect-size differences.
Fig. 3: Tractor accurately estimates ancestry-specific effect sizes.
Fig. 4: Tractor GWAS replicates established hits for total cholesterol in individuals of admixed African–European ancestry and identifies new ancestry-specific loci.
Fig. 5: Tractor better localizes a top hit for total cholesterol.

Similar content being viewed by others

Data availability

All summary statistics described here for total and LDL cholesterol in ~4,300 admixed UK Biobank individuals can be found at https://github.com/eatkinson/Tractor_ms_results and have been uploaded to the GWAS catalog under accession numbers GCST90012868GCST90012873 (https://www.ebi.ac.uk/gwas/deposition/bodyofwork/GCP000093). The UK Biobank raw data can be obtained through a data access application available at https://www.ukbiobank.ac.uk. PGC-PTSD data can be obtained through a data access application at https://pgc-ptsd.com/data-samples/access-data/. BioBank Japan summary statistics are available at http://jenger.riken.jp/en/. The 1000 Genomes reference panel is available at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The Human Genome Diversity Project dataset is available at https://www.internationalgenome.org/data-portal/data-collection/hgdp.

Code availability

All code is freely available. The automated quality control pipeline to prepare datasets for Tractor and run LAI is located at https://github.com/eatkinson/Post-QC. We freely provide Tractor code in Python and Hail, as well as examples of implementation in Jupyter notebook at https://github.com/eatkinson/Tractor alongside a detailed wiki. Specific scripts used to produce the simulated data and results are additionally freely provided at https://github.com/eatkinson/Tractor_ms_results.

References

  1. Parker, K., Menasce Horowitz, J., Morin, R. & Lopez, M. H. Multiracial in America: Proud, Diverse and Growing in Numbers (Pew Research Center, 2015); https://www.pewsocialtrends.org/2015/06/11/multiracial-in-america/

  2. Bhardwaj, A. et al. Racial disparities in prostate cancer: a molecular perspective. Front. Biosci. 22, 772–782 (2017).

    Article  CAS  Google Scholar 

  3. Grizzle, W. E. et al. Self‐identified African Americans and prostate cancer risk: West African genetic ancestry is associated with prostate cancer diagnosis and with higher Gleason sum on biopsy. Cancer Med. 8, 6915–6922 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Duggan, M. A., Anderson, W. F., Altekruse, S., Penberthy, L. & Sherman, M. E. The Surveillance, Epidemiology, and End Results (SEER) program and pathology: toward strengthening the critical relationship. Am. J. Surg. Pathol. 40, e94–e102 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African–American men. Proc. Natl Acad. Sci. USA 103, 14068–14073 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bateman, E. D. et al. Global strategy for asthma management and prevention: GINA executive summary. Eur. Respir. J. 31, 143–178 (2008).

    Article  CAS  PubMed  Google Scholar 

  7. Daya, M. & Barnes, K. C. African American ancestry contribution to asthma and atopic dermatitis. Ann. Allergy Asthma Immunol. 122, 456–462 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wyss, A. B. et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 9, 2976 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Demenais, F. et al. Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat. Genet. 50, 42–50 (2018).

    Article  CAS  PubMed  Google Scholar 

  10. Benetos, A. & Aviv, A. Ancestry, telomere length, and atherosclerosis risk. Circ. Cardiovasc. Genet. 10, e001718 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Mozaffarian, D. et al. Heart disease and stroke statistics—2015 update. Circulation 131, e29–e322 (2015).

    PubMed  Google Scholar 

  12. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Popejoy, A. B. & Fullerton, S. M. Genomics is falling. Nature 538, 161–164 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Sul, J. H., Martin, L. S. & Eskin, E. Population structure in genetic studies: confounding factors and mixed models. PLoS Genet. 14, e1007309 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Huang, H. et al. Bootstrat: population informed bootstrapping for rare variant tests. Preprint at bioRxiv https://doi.org/10.1101/068999 (2016).

  16. Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

    Article  CAS  PubMed  Google Scholar 

  19. Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Walters, R. K. et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci. 21, 1656–1669 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Martin, E. R. et al. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol. 42, 214–229 (2018).

    Article  PubMed  Google Scholar 

  22. Stevenson, A. et al. Neuropsychiatric genetics of African populations—psychosis (NeuroGAP—Psychosis): a case-control study protocol and GWAS in Ethiopia, Kenya, South Africa and Uganda. BMJ Open 9, e025469 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  23. The H3Africa Consortium., Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).

    Article  PubMed Central  Google Scholar 

  24. Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. The Precision Medicine Initiative Cohort Program—Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative Working Group Report to the Advisory Committee to the Director, NIH (Precision Medicine Initiative Working Group, 2015).

  26. Logue, M. W. et al. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: posttraumatic stress disorder enters the age of large-scale genomic collaboration. Neuropsychopharmacology 40, 2287–2297 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Bien, S. A. et al. The future of genomic studies must be globally representative: perspectives from PAGE. Ann. Rev. Genom. Hum. Genet. 20, 181–200 (2019).

    Article  CAS  Google Scholar 

  28. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hero, J. O., Zaslavsky, A. M. & Blendon, R. J. The United States leads other nations in differences by income in perceptions of health and health care. Health Aff. 36, 1032–1040 (2017).

    Article  Google Scholar 

  31. Williams, D. R., Priest, N. & Anderson, N. B. Understanding associations among race, socioeconomic status, and health: patterns and prospects. Health Psychol. 35, 407–411 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  32. 2016 National Healthcare Quality and Disparities Report (Agency for Healthcare Research and Quality, 2017).

  33. Li, Y. R. & Keating, B. J.Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 6, 91 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Spain, S. L. & Barrett, J. C. Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wu, Y. et al. Trans-ethnic fine-mapping of lipid loci identifies population-specific signals and allelic heterogeneity that increases the trait variance explained. PLoS Genet. 9, e1003379 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Van de Bunt, M. et al. Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Mahajan, A. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).

    Article  CAS  PubMed  Google Scholar 

  39. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  PubMed Central  Google Scholar 

  40. Zhang, J. & Stram, D. O. The role of local ancestry adjustment in association studies using admixed populations. Genet. Epidemiol. 38, 502–515 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457–469 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Tang, H., Siegmund, D. O., Johnson, N. A., Romieu, I. & London, S. J. Joint testing of genotype and ancestry association in admixed families. Genet. Epidemiol. 34, 783–791 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Coram, M. A. et al. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet. 92, 904–916 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Aschard, H., Gusev, A., Brown, R. & Pasaniuc, B. Leveraging local ancestry to detect gene–gene interactions in genome-wide data. BMC Genet. 16, 124 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Zaitlen, N., Pas, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Pasaniuc, B. et al. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS Genet. 7, e1001371 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Pasaniuc, B. et al. Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. Bioinformatics 29, 1407–1415 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Chimusa, E. R. et al. Genome-wide association study of ancestry-specific TB risk in the South African coloured population. Hum. Mol. Genet. 23, 796–809 (2014).

    Article  CAS  PubMed  Google Scholar 

  49. Smith, E. N. et al. Genome-wide association study of bipolar disorder in European American and African American individuals. Mol. Psychiatry 14, 755–763 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Skotte, L., Jørsboe, E., Korneliussen, T. S., Moltke, I. & Albrechtsen, A. Ancestry‐specific association mapping in admixed populations. Genet. Epidemiol. 43, 506–521 (2019).

    Article  PubMed  Google Scholar 

  51. Shriner, D.Overview of admixture mapping. Curr. Protoc. Hum. Genet. 94, 1.23.1–1.23.8 (2013).

    Google Scholar 

  52. Chen, M. et al. Admixture mapping analysis in the context of GWAS with GAW18 data. BMC Proc. 8, S3 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Chen, W. et al. A generalized sequential Bonferroni procedure for GWAS in admixed populations incorporating admixture mapping information into association tests. Hum. Hered. 79, 80–92 (2015).

    Article  PubMed  Google Scholar 

  54. Hoggart, C. J., Shriver, M. D., Kittles, R. A., Clayton, D. G. & McKeigue, P. M. Design and analysis of admixture mapping studies. Am. J. Hum. Genet. 74, 965–978 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Spear, M. L. et al. A genome-wide association and admixture mapping study of bronchodilator drug response in African Americans with asthma. Pharmacogenomics J. 19, 249–259 (2019).

    Article  CAS  PubMed  Google Scholar 

  57. Gignoux, C. R. et al. An admixture mapping meta-analysis implicates genetic variation at 18q21 with asthma susceptibility in Latinos. J. Allergy Clin. Immunol. 143, 957–969 (2019).

    Article  CAS  PubMed  Google Scholar 

  58. Shetty, P. B. et al. Variants for HDL-C, LDL-C, and triglycerides identified from admixture mapping and fine-mapping analysis in African American families. Circ. Cardiovasc. Genet. 8, 106–113 (2015).

    Article  CAS  PubMed  Google Scholar 

  59. Shetty, P. B. et al. Variants in CXADR and F2RL1 are associated with blood pressure and obesity in African–Americans in regions identified through admixture mapping. J. Hypertens. 30, 1970–1976 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Reiner, A. P. et al. Genome-wide association and population genetic analysis of c-reactive protein in African American and Hispanic American women. Am. J. Hum. Genet. 91, 502–512 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Florez, J. C. et al. in Racial Identities, Genetic Ancestry, and Health in South America: Argentina, Brazil, Colombia, and Uruguay (eds Gibbon, S. et al.) 137–153 (Palgrave Macmillan, 2011).

  62. Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D.RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Geza, E. et al. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief. Bioinform. 20, 1709–1724 (2019).

    Article  PubMed  Google Scholar 

  64. Schubert, R., Andaleon, A. & Wheeler, H. E.Comparing local ancestry inference models in populations of two- and three-way admixture. PeerJ 8, e10090 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Choi, Y., Chan, A. P., Kirkness, E., Telenti, A. & Schork, N. J. Comparison of phasing strategies for whole human genomes. PLoS Genet. 14, e1007308 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Andrés, A. M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659–671 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  71. Natarajan, P. et al. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nat. Commun. 9, 3391 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Musunuru, K. & Kathiresan, S. Genetics of common, complex coronary artery disease. Cell 177, 132–145 (2019).

    Article  CAS  PubMed  Google Scholar 

  73. Rotimi, C. N. et al. The genomic landscape of African populations in health and disease. Hum. Mol. Genet. 26, R225–R236 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Superko, H. R., Momary, K. M. & Li, Y. Statins personalized. Med. Clin. North Am. 96, 123–139 (2012).

    Article  CAS  PubMed  Google Scholar 

  75. Fu, J. et al. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLoS Genet. 8, e1002431 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Avery, C. L. et al. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains. PLoS Genet. 7, e1002322 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lettre, G. et al. Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe project. PLoS Genet. 7, e1001300 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Talmud, P. J. et al. Gene-centric association signals for lipids and apolipoproteins identified via the HumanCVD BeadChip. Am. J. Hum. Genet. 85, 628–642 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Sandhu, M. S. et al. LDL-cholesterol concentrations: a genome-wide association study. Lancet 371, 483–491 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Sanna, S. et al. Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet. 7, e1002198 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Fox, C. S. et al. Genome-wide association to body mass index and waist circumference: the Framingham Heart Study 100K project. BMC Med. Genet. 8, S18 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Kathiresan, S. et al. A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med. Genet. 8, S17 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit 7.20 (2013).

  84. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Zhang, R. The ANGPTL3-4-8 model, a molecular mechanism for triglyceride trafficking. Open Biol. 6, 150272 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Fu, Z., Abou-Samra, A. B. & Zhang, R. A lipasin/Angptl8 monoclonal antibody lowers mouse serum triglycerides involving increased postprandial activity of the cardiac lipoprotein lipase. Sci. Rep. 5, 18502 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Zhang, R. Lipasin, a novel nutritionally-regulated liver-enriched factor that regulates serum triglyceride levels. Biochem. Biophys. Res. Commun. 424, 786–792 (2012).

    Article  CAS  PubMed  Google Scholar 

  88. Siddiqa, A. et al. Visualizing the regulatory role of Angiopoietin-like protein 8 (ANGPTL8) in glucose and lipid metabolic pathways. Genomics 109, 408–418 (2017).

    Article  CAS  PubMed  Google Scholar 

  89. Yamada, H. et al. Circulating betatrophin is elevated in patients with type 1 and type 2 diabetes. Endocr. J. 62, 417–421 (2015).

    Article  CAS  PubMed  Google Scholar 

  90. Espes, D., Martinell, M. & Carlsson, P.-O. Increased circulating betatrophin concentrations in patients with type 2 diabetes. Int. J. Endocrinol. 2014, 323407 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Hu, H. et al. Increased circulating levels of betatrophin in newly diagnosed type 2 diabetic patients. Diabetes Care 37, 2718–2722 (2014).

    Article  CAS  PubMed  Google Scholar 

  92. Fu, Z. et al. Elevated circulating lipasin/betatrophin in human type 2 diabetes and obesity. Sci. Rep. 4, 5013 (2015).

    Article  Google Scholar 

  93. Cannon, M. E. et al. Trans-ancestry Fine Mapping and Molecular Assays Identify Regulatory Variants at the ANGPTL8 HDL-C GWAS Locus. G3 7, 3217–3227 (2017).

  94. Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).

    Article  CAS  PubMed  Google Scholar 

  95. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Lin, M. et al. Population-specific reference panels are crucial for genetic analyses: an example of the CREBRF locus in Native Hawaiians. Hum. Mol. Genet. 29, 2275–2284 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Ntzani, E. E., Liberopoulos, G., Manolio, T. A. & Ioannidis, J. P. A. Consistency of genome-wide associations across major ancestral groups. Hum. Genet. 131, 1057–1071 (2012).

    Article  CAS  PubMed  Google Scholar 

  98. Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Waters, K. et al. Consistent association of type 2 diabetes risk variants found in Europeans in diverse racial and ethnic groups. PLoS Genet. 6, e1001078 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  100. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Liu, J. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Kuchenbaecker, K. et al. The transferability of lipid loci across African, Asian and European cohorts. Nat. Commun. 10, 4330 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  105. Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet. 43, 847–853 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Atkinson, E. G. et al. No evidence for recent selection at FOXP2 among diverse human populations. Cell 174, 1424–1435.e15 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Deng, L., Ruiz-Linares, A., Xu, S. & Wang, S. Ancestry variation and footprints of natural selection along the genome in Latin American populations. Sci. Rep. 6, 21766 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Jin, W. et al. Genome-wide detection of natural selection in African Americans pre- and post-admixture. Genome Res. 22, 519–527 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Bomba, L., Walter, K. & Soranzo, N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol. 18, 77 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  110. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Van Rossum G. & Drake, F. L. Jr. Python Reference Manual (Centrum Wiskunde en Informatica, 1995).

  112. GNU Project, Free Software Foundation. Bash (3.2.48) [Unix shell program] (2007).

  113. The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

    Article  Google Scholar 

  114. Chen, C. Y. et al. Improved ancestry inference using weights from external reference panels. Bioinformatics 29, 1399–1406 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Williams, A. admix-simu: program to simulate admixture between multiple populations. Zenodo https://doi.org/10.5281/ZENODO.45517 (2016).

  116. Cann, H. M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).

    Article  CAS  PubMed  Google Scholar 

  117. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. The Hail Team. Hail https://github.com/hail-is/hail (2008).

  119. Google Compute Engine launches, expanding Google’s cloud offerings. Google Cloud Platform Blog https://cloudplatform.googleblog.com/2012/06/google-compute-engine-launches.html (2012).

  120. Kluyver, T. et al. Jupyter Notebooks – a publishing format for reproducible computational workflows. in Positioning and Power in Academic Publishing: Players, Agents and Agendas (eds. Loizides, F. & Scmidt, B.) 87–90 (IOS Press, 2016).

  121. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Bokeh: Python Library for Interactive Visualization (Bokeh Development Team, 2020); https://bokeh.org/citation/

  124. Shin, J.-H., Blay, S., McNeney, B. & Graham, J.LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide oolymorphisms. J. Stat. Softw. 16, 1–10 (2006).

    Article  Google Scholar 

  125. Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Benner, C., Havulinna, A., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://doi.org/10.1101/318618 (2018).

  128. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).

    Article  Google Scholar 

  129. Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Kanai, M., Tanaka, T. & Okada, Y. Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set. J. Hum. Genet. 61, 861–866 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Harrell, F. E. & Davis, C. E. A new distribution-free quantile estimator. Biometrika 69, 635–640 (1982).

    Article  Google Scholar 

Download references

Acknowledgements

We thank the PGC-PTSD working group, P. Natarajan, S. Gagliano Taliun and many other scientists within and beyond Boston for their intellectual contributions to this work. This project was supported by the National Institute of Mental Health (K01 MH121659 and T32 MH017119 to E.G.A., K99MH117229 to A.R.M., R37 MH107649 to B.M.N., and 2R01MH106595 to C.M.N. and K.C.K.). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. M.L.S. was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo (#2018/09328-2). The BioBank Japan Project was supported by the Tailor-Made Medical Treatment Program of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) and the Japan Agency for Medical Research and Development (AMED). This research has been conducted using the UK Biobank Resource under application number 31063.

Author information

Authors and Affiliations

Authors

Contributions

E.G.A. designed and implemented the pipeline, ran the analyses and drafted the manuscript. A.X.M. designed and ran the analyses. M.K. designed and ran the analyses with the aid of J.C.U., Y.K., Y.O. and H.K.F. A.R.M. contributed code and aided in writing the manuscript. K.J.K. and M.L.S. aided in code implementation. K.C.K., C.M.N., B.M.N. and M.J.D. supervised and advised on the project. All authors reviewed and approved the final draft.

Corresponding author

Correspondence to Elizabeth G. Atkinson.

Ethics declarations

Competing interests

M.J.D. is a founder of Maze Therapeutics. A.R.M. serves as a consultant for 23andMe and is a member of the Precise.ly Scientific Advisory Board. B.M.N. is a member of the Deep Genomics Scientific Advisory Board and serves as a consultant for the CAMP4 Therapeutics Corporation, Takeda Pharmaceutical and Biogen. The remaining authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Michelle Daya and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Painted karyograms of a simulated AA individual showing EUR (red) and AFR (blue) ancestral tracts across demographic models.

The first column shows the results for the demographic model of one pulse of admixture 3 generations ago, the middle column shows the realistic model of one pulse 9 generations ago, and the right column shows a pulse 20 generations ago. In all cases the model involved 84% AFR ancestry and 16% EUR. The rows show the results from treatments of the data across steps of the Tractor pipeline. The top row shows the truth results from our simulations. Painted karyograms after statistical phasing of this truth cohort is shown in the second row. The third row illustrates the recovery of tracts broken by switch errors in phasing obtained by unkinking. The bottom row shows the smoothing and further improvement of tracts acquired through an additional round of LAI.

Extended Data Fig. 2 Tractor recovers disrupted tracts, improving tract distributions.

The top row (A-C) shows the improvements to the distributions of the number of discrete EUR tracts observed in simulated AA individuals under demographic models of 1 pulse of admixture at 3, 9 (realistic for AA population history) and 20 generations ago. The bottom row (D,E) shows the results from different initial admixture fractions, of 70% and 50% AFR, respectively, at 9 generations since admixture. These can be compared to the inferred realistic demographic model shown in B. In all panels, the simulated truth dataset is shown in black, after statistical phasing in purple, immediately after tract recovery procedures is in orange, and after one additional round of LAI after tract recovery in yellow.

Extended Data Fig. 3 The contribution of absolute MAF and effect size to Tractor power.

All cases assume an 80/20 AFR/EUR admixture ratio, 10% disease prevalence, 12k cases/30k controls with an effect only in the AFR genetic background. In all panels, the solid line uses a traditional GWAS model while the dashed line is our LAI-incorporating Tractor model. (A,B): Equal effect in EUR and AFR with shifted absolute MAF. (C,D): effect only in AFR background. (A,C): MAF is set to 10% in both AFR and EUR. (B,D): MAF is set to 40% in both AFR and EUR. Panels E and F illustrate the heterogeneity in effect sizes required to observe gains in Tractor power over traditional GWAS assuming 20% MAF in both ancestries and an effect that is stronger in AFR with varying difference to the EUR effect.

Extended Data Fig. 4 The interaction of between-ancestry MAF differences and effect sizes on Tractor power.

In all cases, the grey solid line uses a traditional GWAS model while the black dashed line is our LAI-incorporating model, admixture proportions are 80/20 AFR/EUR, disease prevalence is 10%, and the AFR MAF is fixed at 20%. A and E model the same effect size between EUR and AFR while varying the EUR MAF. B,D,F model the case when there is no effect in the EUR background while varying EUR MAF. C models an effect size difference of 30% with the effect being stronger in the EUR background. For comparison, Fig. 2f shows the same effect at matched 20% MAF.

Extended Data Fig. 5 The impact of LAI accuracy on Tractor’s performance as compared to standard GWAS and asaMap.

We modeled perfect accuracy, realistic accuracy as derived from simulations of our AA demographic model (98%), and a lower bound of 90% LAI accuracy. Black lines all indicate Tractor runs: the solid black line is Tractor’s performance with perfect LAI accuracy, the dashed line is at 98% accuracy, and the dotted line is at 90% accuracy. The red line represents the power obtained from standard GWAS, and the blue line for the asaMap model for the ancestry in which the effect was modeled (AFR for A,B, and C, and EUR for D). In all cases we included 10 PCs as covariates and 1000 replicates were run.

Supplementary information

Supplementary Information

Supplementary Discussion, Tables 1–3 and Figs. 1–7.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Atkinson, E.G., Maihofer, A.X., Kanai, M. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat Genet 53, 195–204 (2021). https://doi.org/10.1038/s41588-020-00766-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-020-00766-y

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research