A resource-efficient tool for mixed model association analysis of large-scale data

Jiang, Longda; Zheng, Zhili; Qi, Ting; Kemper, Kathryn E.; Wray, Naomi R.; Visscher, Peter M.; Yang, Jian

doi:10.1038/s41588-019-0530-8

Technical Report
Published: 25 November 2019

A resource-efficient tool for mixed model association analysis of large-scale data

Nature Genetics volume 51, pages 1749–1755 (2019)Cite this article

19k Accesses
204 Citations
90 Altmetric
Metrics details

Subjects

Abstract

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Median λ of null variants under different simulation scenarios.**

**Fig. 2: Mean χ² of causal variants under different simulation scenarios.**

**Fig. 3: Estimates of genetic variance by fastGWA and BOLT-LMM for 24 traits in the UKB.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Genome-wide association studies

Article 26 August 2021

Data availability

The individual-level genotype and phenotype data are available through formal application to the UK Biobank (http://www.ukbiobank.ac.uk). All the summary-level statistics are available at our data portal (http://cnsgenomics.com/software/gcta/#DataResource). Source data for Extended Data Figs. 1–3 are available online.

Code availability

fastGWA is available at http://cnsgenomics.com/software/gcta/#fastGWA. The fastGWA online tool was built on the code modified from the PheWeb project (https://github.com/statgen/pheweb/).

References

Visscher, P. M. et al. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
CAS PubMed PubMed Central Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
CAS PubMed Google Scholar
Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).
CAS PubMed PubMed Central Google Scholar
DeWan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989–992 (2006).
CAS PubMed Google Scholar
Burton, P. R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
CAS Google Scholar
Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).
CAS PubMed PubMed Central Google Scholar
Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).
CAS PubMed PubMed Central Google Scholar
Sanna, S. et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nat. Genet. 40, 198–203 (2008).
CAS PubMed PubMed Central Google Scholar
Unoki, H. et al. SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in east asian and european populations. Nat. Genet. 40, 1098–1102 (2008).
CAS PubMed Google Scholar
Yasuda, K. et al. Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nat. Genet. 40, 1092–1097 (2008).
CAS PubMed Google Scholar
Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).
CAS PubMed PubMed Central Google Scholar
Aulchenko, Y. S., Ripke, S., Isaacs, A. & Van Duijn, C. M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).
CAS PubMed Google Scholar
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
PubMed PubMed Central Google Scholar
Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).
PubMed Google Scholar
Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).
CAS PubMed Google Scholar
Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005).
PubMed PubMed Central Google Scholar
Astle, W. & Balding, D. J. Population structure and cryptic relatedness in genetic association studies. Statist. Sci. 24, 451–471 (2009).
Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
CAS PubMed Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
CAS PubMed PubMed Central Google Scholar
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet 38, 203–208 (2006).
CAS PubMed Google Scholar
Aulchenko, Y. S., de Koning, D. J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
CAS PubMed PubMed Central Google Scholar
Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
PubMed PubMed Central Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
CAS PubMed PubMed Central Google Scholar
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833 (2011).
CAS PubMed Google Scholar
Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).
CAS PubMed PubMed Central Google Scholar
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).
CAS PubMed PubMed Central Google Scholar
Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).
CAS PubMed PubMed Central Google Scholar
Svishcheva, G. R., Axenovich, T. I., Belonogova, N. M., van Duijn, C. M. & Aulchenko, Y. S. Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).
CAS PubMed Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
CAS PubMed PubMed Central Google Scholar
Jakobsdottir, J. & McPeek, M. S. MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013).
CAS PubMed PubMed Central Google Scholar
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
PubMed PubMed Central Google Scholar
Loh, P. R. et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
CAS PubMed PubMed Central Google Scholar
Canela-Xandri, O., Law, A., Gray, A., Woolliams, J. A. & Tenesa, A. A new tool called DISSECT for analysing large genomic data sets using a big data approach. Nat. Commun. 6, 10162 (2015).
CAS PubMed Google Scholar
Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
CAS PubMed PubMed Central Google Scholar
Eu-Ahsunthornwattana, J. et al. Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS. Genet. 10, e1004445 (2014).
PubMed PubMed Central Google Scholar
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS. Genet. 9, e1003520 (2013).
CAS PubMed PubMed Central Google Scholar
Patterson, H. D. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971).
Google Scholar
Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).
CAS PubMed PubMed Central Google Scholar
Gilmour, A. R., Thompson, R. & Cullis, B. R. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
CAS PubMed PubMed Central Google Scholar
Ge, T., Chen, C.-Y., Neale, B. M., Sabuncu, M. R. & Smoller, J. W. Phenome-wide heritability analysis of the UK Biobank. PLoS Genet. 13, e1006711 (2017).
PubMed PubMed Central Google Scholar
Band, G. & Marchini, J. BGEN: a binary file format for imputed genotype and haplotype data. Preprint at bioRxiv https://doi.org/10.1101/308296 (2018).
Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor Popul. Biol. 60, 155–166 (2001).
CAS PubMed Google Scholar
Verbeke, G. & Lesaffre, E. The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput. Stat. Data Anal. 23, 541–556 (1997).
Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
CAS PubMed PubMed Central Google Scholar
Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).
PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
CAS PubMed PubMed Central Google Scholar
Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).
CAS PubMed PubMed Central Google Scholar
Amin, N., Van Duijn, C. M. & Aulchenko, Y. S. A genomic background based method for association analysis in related individuals. PloS ONE 2, e1274 (2007).
PubMed PubMed Central Google Scholar
Galinsky, K. J. et al. Fast principal-component analysis reveals convergent evolution of ADH1B in europe and east asia. Am. J. Hum. Genet. 98, 456–472 (2016).
CAS PubMed PubMed Central Google Scholar
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).
CAS PubMed Google Scholar
Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
CAS PubMed PubMed Central Google Scholar
Van Hout, C. V. et al. Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. Preprint at bioRxiv https://doi.org/10.1101/572347 (2019).
Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9, 4038 (2018).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank H. Wang and J. Sidorenko for assistance in data preparation, A. McRae for organizing computing resources, P.-R. Loh for constructive comments on the manuscript, L. Yengo for helpful discussion, the Neale Lab for making the data processing pipelines publicly available, and Alibaba Cloud Australia and New Zealand for hosting the online tool. This research was supported by the Australian Research Council (DP160101343, DP160101056, FT180100186, and FL180100072), the Australian National Health and Medical Research Council (1078037, 1078901, 1113400, and 1107258), and the Sylvia & Charles Viertel Charitable Foundation. This study makes use of data from the UK Biobank (project ID: 12514). A full list of acknowledgements relating to this data set can be found in the Supplementary Note.

Author information

These authors contributed equally: Longda Jiang, Zhili Zheng.

Authors and Affiliations

Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
Longda Jiang, Zhili Zheng, Ting Qi, Kathryn E. Kemper, Naomi R. Wray, Peter M. Visscher & Jian Yang
Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang, China
Zhili Zheng & Jian Yang
Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
Naomi R. Wray

Authors

Longda Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ting Qi
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn E. Kemper
View author publications
You can also search for this author in PubMed Google Scholar
Naomi R. Wray
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Visscher
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Y. conceived and supervised the study. J.Y., L.J., and Z.Z. designed the experiment. Z.Z. developed the software tools. L.J. and Z.Z. performed the simulations and data analyses under the assistance and guidance from J.Y., P.M.V., T.Q., N.R.W., and K.E.K. P.M.V., N.R.W., and J.Y. contributed resources and funding. L.J. and J.Y. wrote the manuscript with the participation of all authors. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison between fastGWA-REML and AI-REML.

The phenotypes were simulated based on real genotypes of 100,000 individuals from the UKB with V_g = 0.4 (see part 5 of the Supplementary Note for details of the simulation method and data). Plotted are the \(\hat \sigma _g^2\) values estimated by fastGWA-REML against those estimated by the AI-REML in GCTA. Each dot represents one simulation replicate (100 simulations in total). The Pearson’s correlation coefficient of \(\hat \sigma _g^2\) between the two methods is >0.9999.

Source data

Extended Data Fig. 2 Comparison between the approximate and exact fastGWA tests.

We selected four quantitative traits from the UKB for comparison, including height (HT, n_HT = 455,332), forced expiratory volume in 1-second (FEV, n_FEV = 415,931), pulse rate (PR, n_PR = 149,082), and educational attainment (EA, n_EA = 304,998) (see Supplementary Table 4 for more information about the traits). Plotted are the estimated variant effects (a) or χ²-statistics (b) of 8,531,416 variants computed by the exact fastGWA method (fastGWA-Exact) against those by the fastGWA test using the GRAMMAR-GAMMA approximation (see part 2 of the Supplementary Note for details). The Pearson’s correlation coefficients of the estimated variant effect or χ²-statistic between the two methods are > 0.9999 for all the four traits.

Source data

Extended Data Fig. 3 The first and second principal components (PC1 and PC2) of all of the UKB participants of European ancestry (n = 456,422) compared to their self-reported ethnicity.

The red dots represent those individuals who self-reported as ‘British’, the green dots represent those who self-reported as ‘Irish’, and the purple dots represent those who self-reported as ‘other-white background’.

Source data

Extended Data Fig. 4 Comparison of \(\hat \sigma _g^2\) estimated by fastGWA-REML to that estimated by BOLT-REML (used in BOLT-LMM) at different degrees of relatedness in simulations.

The x-axis represents different degrees of relatedness with (0, 0) representing no common environmental effect, (1^st, 0.1V_p) or (1^st, 0.2V_p) representing common environmental effects explaining 10% or 20% of the phenotypic variance (V_p) among 1^st degree relatives, (≥2^nd, 0.1V_p) or (≥2^nd, 0.2V_p) representing common environmental effects explaining 10% or 20% of V_p among all pairs of the 1^st and 2^nd degree relatives, and (≥2^nd, Gradient) representing common environmental effects explaining 20% of V_p among the 1^st degree relatives and 10% of V_p among the 2^nd degree relatives. The y-axis represents the value of \(\hat \sigma _g^2\). The black dashed line represents the true simulation parameter (h² = 0.4). Each boxplot represents the distribution of \(\hat \sigma _g^2\) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR. We also show the Haseman–Elston (HE) regression estimate of \(\sigma _g^2\) in the fastGWA model, with a gray bar to indicate its expected value computed using the approximation theory presented in part 9 of the Supplementary Note.

Extended Data Fig. 5 Comparison of false positive rate (FPR) among different association methods.

We used the simulated data as presented in Figs. 1 and 2 to compute the FPR of each association method across different simulation scenarios with different levels of common environmental effects. Each boxplot represents the distribution of FPR across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), whiskers indicate data up to 1.5 times the IQR and outliers are shown as separate dots. In each simulation replicate, the P value of each variant was calculated based on the reported effect estimate and s.e. using a \(\chi _{df = 1}^2\) test.

Extended Data Fig. 6 Genomic inflation and power of fastGWA with the sparse GRM thresholded at different genetic relatedness cut-off values.

This simulation was performed based on real genotypes from the UKB (see simulation settings in part 5 of the Supplementary Note). We constructed different sparse GRMs by setting off-diagonal elements below a certain threshold (varying from 0.03 to 0.10) to 0 and performed fastGWA analyses using these sparse GRMs. Each boxplot represents the distribution of estimates (that is, median λ, or mean χ²) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 7 Comparison of genomic inflation and power between fastGWA, fastGWA-LOCO, and fastGWA-Ped.

Shown are the results from the analyses of a simulated data set based on the simulation strategy described in part 5 of the Supplementary Note (with \(\sigma _g^2 = 0.4V_p\), \(\sigma _c^2 = 0.1V_p,\,or\,0.2V_p\) for all 1^st and 2^nd relatives and \(\sigma _c^2 = 0\) for all unrelated individuals). We did not observe any increase in power when applying the LOCO scheme to fastGWA because fastGWA estimates pedigree relatedness by a sparse GRM, to model phenotypic covariance between close relatives due to genetic and/or common environmental effects, and the pedigree relatedness estimated using all autosomes are similar to those using 21 chromosomes under the LOCO scheme. Each boxplot represents the distribution of estimates (that is, median λ, or mean χ²) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 8 Comparison of genomic inflation between BOLT-LMM (estimating the variance components only once using all variants) and BOLT-LMM_fine-tuning (re-estimating the variance components when a chromosome is left out).

The simulation setting was the same as the (0, 0) scenario in Fig. 1. The median λ was computed at the null variants. Each boxplot represents the distribution of median λ across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 9 Genomic inflation of BOLT-LMM-Mix using LD score based on different LD window sizes and references.

a, Results from simulations based on the simulated genotype data (part 5 of the Supplementary Note) using the same setting as in the (0, 0) case in Fig. 1. The LD scores were computed from the sample using three window sizes; that is, 1 Mb (BOLT-LMM-Mix_wind-1Mb), 10 Mb (BOLT-LMM-Mix_wind-10Mb), and 20 Mb (BOLT-LMM-Mix_wind-20Mb). b, Results from simulations based on real genotypes (part 5 of the Supplementary Note) using the same settings as in the (0, 0) and (≥2^nd, 0.1V_p) cases in Fig. 1. Two sets of LD score were tested; LD scores computed from the sample using a window size of 1 Mb (BOLT-LMM-Mix_UKB-LDsc) and LD scores obtained from the BOLT-LMM website (BOLT-LMM-Mix_provided-LDsc). Each boxplot represents the distribution of estimates (that is, median λ, or mean χ²) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 10 Comparison between the reported genetic relatedness and the SNP-derived genetic relatedness of the UKB participants.

The y-axis represents the SNP-derived genetic relatedness computed from GCTA using 565,631 common variants on HapMap3 (175,708 individual pairs with estimated genetic relatedness ≥ 0.05). The x-axis represents the expected genetic relatedness based on the pedigree information provided by the UKB (monozygotic twin, 1; parent-offspring/full sib, 0.5; second degree relatives, 0.25; third degree relatives, 0.125; and unlabelled pair, ‘none’) on x-axis. Each circle represents one pair of relatives, the dashed diagonal line represents y = x, and the red horizontal lines represent the mean value of each relatedness group.

Supplementary information

Supplementary Information

Supplementary Figures 1–10, Notes 1–11 and Tables 1–8

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, L., Zheng, Z., Qi, T. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51, 1749–1755 (2019). https://doi.org/10.1038/s41588-019-0530-8

Download citation

Received: 03 April 2019
Accepted: 16 October 2019
Published: 25 November 2019
Issue Date: December 2019
DOI: https://doi.org/10.1038/s41588-019-0530-8

This article is cited by

Genome-wide association study of nausea and vomiting during pregnancy in Japan: the TMM BirThree Cohort Study
- Yudai Yonezawa
- Ippei Takahashi
- Shinichi Kuriyama
BMC Pregnancy and Childbirth (2024)
Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits
- Ziqi Ling
- Jing Li
- Lusheng Huang
Communications Biology (2024)
Disease clusters subsequent to anxiety and stress-related disorders and their genetic determinants
- Xin Han
- Qing Shen
- Huan Song
Nature Communications (2024)
The genetic architecture of multimodal human brain age
- Junhao Wen
- Bingxin Zhao
- Christos Davatzikos
Nature Communications (2024)
Unsupervised deep representation learning enables phenotype discovery for genetic association studies of brain imaging
- Khush Patel
- Ziqian Xie
- Degui Zhi
Communications Biology (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links