Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification

Abstract

Although the cohort-level accuracy of polygenic risk scores (PRSs)—estimates of genetic value at the individual level—has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual’s PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated ‘white British’), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: LD and finite GWAS sample size introduce uncertainty into PRS estimation.
Fig. 2: Framework for extracting uncertainty from Bayesian methods for probabilistic PRS-based stratification.
Fig. 3: Expected \({{{\mathrm{s.d.}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{i}})\) estimated as a function of heritability, polygenicity and training GWAS sample size is highly correlated with average \({{{\mathrm{s.d.}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{i}})\) across testing individuals.
Fig. 4: Genetic architecture (polygenicity, pcausal; SNP heritability, \({{h}}_{{g}}^2\)) and GWAS sample size impact uncertainty in PRS estimates in simulations.
Fig. 5: Uncertainty in real data and its influence on PRS-based stratification.
Fig. 6: Stratification uncertainty at different thresholds t and credible set levels ρ.

Similar content being viewed by others

Data availability

The individual-level genotype and phenotype data are available by application from the UKBB at http://www.ukbiobank.ac.uk.

Code availability

LDpred2 software implementing individual PRS credible intervals: https://privefl.github.io/bigsnpr/articles/prs_uncertainty.html Scripts for simulations and real data analyses: https://github.com/bogdanlab/prs-uncertainty67 The scripts have been archived on Zenodo using https://doi.org/10.5281/zenodo.5527263

References

  1. Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).

    Article  CAS  PubMed  Google Scholar 

  2. Li, R., Chen, Y., Ritchie, M. D. & Moore, J. H. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21, 493–502 (2020).

    Article  CAS  PubMed  Google Scholar 

  3. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sugrue, L. P. & Desikan, R. S. What are polygenic scores and why are they important? JAMA 321, 1820–1821 (2019).

    Article  PubMed  Google Scholar 

  5. Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lee, A. et al. BOADICEA: a comprehensive breast cancer risk prediction modelincorporating genetic and nongenetic risk factors. Genet. Med. 21, 1708–1718 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hindy, G. et al. Genome-wide polygenic score, clinical risk factors, and long-term trajectories of coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 40, 2738–2746 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wray, N. R. et al. Research review: polygenic methods and their application to psychiatric traits. J. Child Psychol. Psychiatry 55, 1068–1087 (2014).

    Article  PubMed  Google Scholar 

  10. Fritsche, L. G. et al. Association of polygenic risk scores for multiple cancers in a phenome-wide study: results from the michigan genomics initiative. Am. J. Hum. Genet. 102, 1048–1061 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lambert, S. A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 28, R133–R142 (2019).

    Article  CAS  PubMed  Google Scholar 

  12. Meisner, A. et al. Combined utility of 25 disease and risk factor polygenic risk scores for stratifying risk of all-cause mortality. Am. J. Hum. Genet. 107, 418–431 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    Article  CAS  PubMed  Google Scholar 

  14. Seibert, T. M. et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. Brit. Med. J. 360, j5757 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Dai, J. et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 7, 881–891 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Harrison, J. W. et al. Type 1 diabetes genetic risk score is discriminative of diabetes in non-Europeans: evidence from a study in India. Sci. Rep. 10, 9450 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Läll, K., Mägi, R., Morris, A., Metspalu, A. & Fischer, K. Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet. Med. 19, 322–329 (2017).

    Article  PubMed  Google Scholar 

  19. Zhang, Q. et al. Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture. Nat. Commun. 11, 4799 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Choi, S. W., Mak, T. S.-H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).

    Article  PubMed  Google Scholar 

  23. Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).

  25. Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010).

    Article  PubMed  Google Scholar 

  29. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits. (Oxford Univ. Press, 1998).

  31. Sorenson, D. & Gianola, D. Likelihood, Bayesian and MCMC Methods in Genetics. (Springer, 2002).

  32. Gorjanc, G., Bijma, P. & Hickey, J. M. Reliability of pedigree-based and genomic evaluations in selected populations. Genet. Sel. Evol. 47, 65 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).

    Article  CAS  PubMed  Google Scholar 

  34. Su, G., Guldbrandtsen, B., Gregersen, V. R. & Lund, M. S. Preliminary investigation on reliability of genomic estimated breeding values in the Danish Holstein population. J. Dairy Sci. 93, 1175–1183 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Misztal, I. & Wiggans, G. R. Approximation of prediction error variance in large-scale animal models. J. Dairy Sci. 71, 27–32 (1988).

    Article  Google Scholar 

  36. Meyer, K. Approximate accuracy of genetic evaluation under an animal model. Livest. Prod. Sci. 21, 87–100 (1989).

    Article  Google Scholar 

  37. Jamrozik, J., Schaeffer, L. R. & Jansen, G. B. Approximate accuracies of prediction from random regression models. Livest. Prod. Sci. 66, 85–92 (2000).

    Article  Google Scholar 

  38. Tier, B. & Meyer, K. Approximating prediction error covariances among additive genetic effects within animals in multiple-trait and random regression models. J. Anim. Breed. Genet. 121, 77–89 (2004).

    Article  Google Scholar 

  39. Hickey, J. M., Veerkamp, R. F., Calus, M. P. L., Mulder, H. A. & Thompson, R. Estimation of prediction error variances via Monte Carlo sampling methods using different formulations of the prediction error variance. Genet. Sel. Evol. 41, 23 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Klau, S., Martin-Magniette, M.-L., Boulesteix, A.-L. & Hoffmann, S. Sampling uncertainty versus method uncertainty: a general framework with applications to omics biomarker selection. Biom. J. 62, 670–687 (2020).

    Article  PubMed  Google Scholar 

  41. Bycott, P. & Taylor, J. A comparison of smoothing techniques for CD4 data measured with error in a time-dependent Cox proportional hazards model. Stat. Med. 17, 2061–2077 (1998).

    Article  CAS  PubMed  Google Scholar 

  42. Hart, J. E. et al. The association of long-term exposure to PM 2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction. Environ. Health 14, 38 (2015).

  43. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Grinde, K. E. et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol. 43, 50–62 (2019).

    Article  PubMed  Google Scholar 

  45. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Article  CAS  PubMed  Google Scholar 

  46. Faraway, J. J. Practical Regression and ANOVA Using R (University of Bath, 2002).

  47. Dudbridge, F. Criteria for evaluating risk prediction of multiple outcomes. Stat. Methods Med. Res. 29, 3492–3510 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Kerr, K. F. et al. Net reclassification indices for evaluating risk prediction instruments. Epidemiology 25, 114–121 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. Ser. B Stat. Methodol. 34, 187–202 (1972).

    Google Scholar 

  50. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Hu, Y. et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 13, e1005589 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience 8, giz082 (2019).

  53. Kuchenbaecker, K. B. et al. Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers. J. Natl. Cancer Inst. 109, djw302 (2017).

  54. Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Pazokitoroudi, A., Chiu, A. M., Burch, K. S., Pasaniuc, B. & Sankararaman, S. Quantifying the contribution of dominance effects to complex trait variation in biobank-scale data. Cold Spring Harbor Lab. https://doi.org/10.1101/2020.11.10.376897 (2020).

  56. Hivert, V. et al. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am. J. Hum. Genet. 108, 786–798 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Dahl, A. et al. A robust method uncovers significant context-specific heritability in diverse complex traits. Am. J. Hum. Genet. 106, 71–91 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Wang, H. et al. Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci. Adv. 5, eaaw3538 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Vaart, A. W. van der. Asymptotic Statistics. (Cambridge Univ. Press, 1998).

  64. Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap. (Chapman & Hall/CRC, 1994).

  65. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ding, Y. bogdanlab/prs-uncertainty. R package version 0.1 https://doi.org/10.5281/zenodo.5527263 (2021).

Download references

Acknowledgements

This research was conducted using the UKBB Resource under application 33297. We thank the participants of UKBB for making this work possible. This work was funded in part by National Institutes for Health (NIH) awards (nos. R01HG009120, R01MH115676 and R01HG006399, all to B.P.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

Authors and Affiliations

Authors

Contributions

Y.D., K.H. and B.P. conceived and designed the experiments. Y.D., K.H. and S.L. performed the experiments and statistical analyses. F.P., B.V. and S.S. provided statistical support. K.H. and K.S.B. collected and managed the data. Y.D., K.H., K.S.B. and B.P. wrote the manuscript with the participation of all authors.

Corresponding authors

Correspondence to Yi Ding, Kangcheng Hou or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Jian Zeng, Vincent Plagnol and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 GWAS sample size and causal effect size impact the relative ordering of marginal GWAS effects at tag versus true causal SNPs.

We simulated a GWAS of N individuals (XN×3) for 3 SNPs with LD structure R (SNP2 and SNP3 are in LD of 0.9 whereas SNP1 is uncorrelated to other SNPs) where SNP1 and SNP2 are causal with the same effect size βc = (β, β, 0) such that the variance explained by this region is \(var({{{\mathbf{X\beta }}}}_{{{\mathbf{c}}}}) = 0.5/m_{causal}\) corresponding to a trait with total heritability of 0.5 equally distributed across mcausal regions in the genome. For each parameter setting we quantified the proportion of times the marginal GWAS effect at SNP3 (tag SNP) is larger than the observed marginal effect at SNP2 (true causal) across 1,000 randomly drawn GWASs. To explore the impact of different causal effect sizes, we varied mcausal from 1,000 to 10,000 causal regions in the genome.

Extended Data Fig. 2 Analytical estimator of \({{{\mathrm{sd}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{{\mathrm{i}}}})\) provides an approximately unbiased estimates of average sd(PRSi) of testing individuals.

The x-axis is the average \({{{\mathrm{sd}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{{\mathrm{i}}}})\) in testing individuals within each simulation replicate. The y-axis is the expected \({{{\mathrm{sd}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{{\mathrm{i}}}})\) computed with equation (1), replacing M and \(h_g^2\) with estimates of the number of causal variants and SNP-heritability, respectively, from LDpred2. Each dot is an average of 10 simulation replicates for each \({{{\mathrm{p}}}}_{{{{\mathrm{causal}}}}} \in \left\{ {0.001,0.01,0.1,1} \right\}\). The horizontal whiskers represent ±1.96 standard deviations of average \({{{\mathrm{sd}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{{\mathrm{i}}}})\). The vertical whiskers represent ±1.96 standard deviations of expected \({{{\mathrm{sd}}}}(\widehat {{{{\mathrm{PRS}}}}}_{{{\mathrm{i}}}})\). Note that when pcausal = 1, the independent LD assumption is violated but the analytical form still provides approximately unbiased estimates. When \(p_{{{{\mathrm{causal}}}}} \ne 1\), the infinitesimal assumption is violated, leading to downward bias in the analytical estimator. In these scenarios, since we simply replace M with \(M \times p_{{{{\mathrm{causal}}}}}\), the uncertainty identifying the causal variants is ignored by equation (1).

Extended Data Fig. 3 Calibration of ρ-level genetic value credible interval with respect to proportion of causal effects and SNP-heritability in testing individuals.

Each row of panels corresponds to one heritability parameter \(h_g^2 \in \left\{ {0.05,0.1,0.25,0.5,0.8} \right\}\) and each column of panels corresponds to one polygenicity parameter \(p_{causal} \in \{ 0.001,0.01,0.11\}\). The x-axis is the expected coverage of ρ-GV CI (ρ). The y-axis is the empirical coverage calculated as the proportion of ρ-GV CIs that contain the true genetic value for one simulation repeat. The dots and error bars are mean ±1.96 s.e.m of the empirical coverage calculated from 10 simulation repeats.

Extended Data Fig. 4 Distribution of individual PRS absolute standard deviation with respect to polygenicity under different heritability.

Each panel represents simulation with one \(h_g^2\) from \(\left\{ {0.05,0.1,0.25,0.5,0.8} \right\}\). The x-axis is four polygenicity parameters (\(p_{causal} \in \left\{ {0.0001,0.01,0.1,1} \right\}\)). The y-axis is standard deviation in PRS estimation of an individual. Each violin plot represents 21,273 testing individuals across 10 simulations (212,730 values).

Extended Data Fig. 5 Distribution of individual PRS absolute standard deviation with respect to heritability under different polygenicity.

Each panel represents simulation with one polygenicity from \(\left\{ {0.001,0.01,0.1,1} \right\}\). The x-axis is five heritability parameters (\(h_g^2 \in \left\{ {0.05,0.1,0.25,0.5,0.8} \right\}\)). The y-axis is scaled standard deviation in PRS estimation of an individual. Each violin plot represents 21,273 testing individuals across 10 simulations (212,730 values).

Extended Data Fig. 6 Posterior distribution of genetic value is mis-calibrated when causal variants are partially absent in the SNP panel used for PRS training.

For all panels, we performed simulation based on 124,080 SNPs (a union of 36,987 UK Biobank (UKBB) array SNPs and 93,767 HapMap3 SNPs) on chromosome 2. We trained the PRS model on either the HapMap3 + UKBB SNPs (all causal variants are observed in the training data) or UKBB SNP panel (~70% of causal variants are excluded). (a) Calibration of ρ-level genetic value credible interval. The x-axis is the expected coverage of ρ-GV CI (that is ρ). The y-axis is the empirical coverage calculated as the proportion of GV CIs that contain the true genetic value in one simulation replicate. (b) Calibration of ρ-level rank credible interval. The x-axis is the expected coverage of the rank CI (ρ). The y-axis is the empirical coverage calculated as the proportion of ρ-rank CIs that contain the true rank of individual among testing individuals in one simulation replicate. (c) Calibration of probability of GV above threshold t. The x-axis is the expected probability set as middle of each bin. The y-axis is the empirical probability calculated as the proportion of individuals having GV within the lower and upper bound of the bin of one simulation replicate. Different colors represent different prespecified thresholds. (d) Distribution of individual PRS scaled standard deviation. For (a-c), the dots and error bars are mean ±1.96 s.e.m empirical coverage/probability calculated from 10 simulation replicates. For (d), the boxplot center line is the median; the lower and upper hinges correspond to the first and third quartiles, and boxplot whiskers extend to the minimum and maximum estimates located within 1.5 × interquartile range (IQR) from the first and third quartiles, respectively.

Extended Data Fig. 7 Posterior distribution of genetic value is well-calibrated for mixture of normal effect size distribution.

Each column summarizes results for each of the three genetic architectures. Small effects are simulated under pcausal = 0.01, h2g = 0.02; large effects are simulated under pcausal = 0.001, h2g = 0.02; Mixture refers to a half and half mixture of the two simulations (small effects: pcausal = 0.0005, h2g = 0.01; large effects: pcausal = 0.005, h2g = 0.01). (a) Calibration of ρ-level genetic value credible intervals. (b) Calibration of ρ-level rank credible intervals. (c) Calibration of probability of GV above threshold t. (d) Distribution of individual PRS standard deviations. See Extended Data Fig. 6 for a detailed figure description.

Extended Data Fig. 8 Posterior distribution of genetic value is well-calibrated with external LD.

Each column summarizes the calibration and uncertainty of PRS trained with LD computed from four different cohorts: I. 250 K UKB training individuals; II. 2 K held-out UKBB individuals; III. 1 K held-out UKBB individuals. (a) Calibration of ρ-level genetic value credible intervals. (b) Calibration of ρ-level rank credible intervals. (c) Calibration of probability of GV above threshold t. (d) Distribution of individual PRS scaled standard deviation. See Extended Data Fig. 6 for a detailed figure description (h2g = 0.02, pcausal= 0.01).

Extended Data Fig. 9 Calibration of ρ-level rank credible interval with respect to proportion of causal effects and SNP-heritability in testing individuals.

The x-axis is the expected coverage of ρ-Rank CI. The y-axis is the empirical coverage calculated as the proportion of ρ-Rank CIs that contain the true rank of individual among testing individuals for one simulation. The dots and bars are mean ± 1.96 s.e.m of empirical coverage calculated from 10 simulation repeats.

Extended Data Fig. 10 Individual ranking is consistent when ranking by PRS estimates versus probability of genetic value above threshold.

The x-axis is the PRS estimates of testing individuals and the y-axis is the probability that GV is above threshold t, where t is (arbitrarily) set to the 90th percentile in the testing individuals. For the individuals whose PRS estimates are far away from threshold, the probability is 0 and 1 respectively. For individuals close to the stratification threshold, the probability of larger than the threshold increases as PRS estimates increase. The histogram on the x-axis is the distribution of PRS estimates in testing individuals and the histogram on the y-axis is its distribution in testing individuals.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, Y., Hou, K., Burch, K.S. et al. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet 54, 30–39 (2022). https://doi.org/10.1038/s41588-021-00961-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00961-5

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research