Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Gene selection by incorporating genetic networks into case-control association studies

Abstract

Large-scale genome-wide association studies (GWAS) have been successfully applied to a wide range of genetic variants underlying complex diseases. The network-based regression approach has been developed to incorporate a biological genetic network and to overcome the challenges caused by the computational efficiency for analyzing high-dimensional genomic data. In this paper, we propose a gene selection approach by incorporating genetic networks into case-control association studies for DNA sequence data or DNA methylation data. Instead of using traditional dimension reduction techniques such as principal component analyses and supervised principal component analyses, we use a linear combination of genotypes at SNPs or methylation values at CpG sites in a gene to capture gene-level signals. We employ three linear combination approaches: optimally weighted sum (OWS), beta-based weighted sum (BWS), and LD-adjusted polygenic risk score (LD-PRS). OWS and LD-PRS are supervised approaches that depend on the effect of each SNP or CpG site on the case-control status, while BWS can be extracted without using the case-control status. After using one of the linear combinations of genotypes or methylation values in each gene to capture gene-level signals, we regularize them to perform gene selection based on the biological network. Simulation studies show that the proposed approaches have higher true positive rates than using traditional dimension reduction techniques. We also apply our approaches to DNA methylation data and UK Biobank DNA sequence data for analyzing rheumatoid arthritis. The results show that the proposed methods can select potentially rheumatoid arthritis related genes that are missed by existing methods.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The true positive rates of the methods based on different gene-level signals for balance case-control studies with DNA sequence data in scenario 1, where there are five rare variants and five common variants in each gene.
Fig. 2: The true positive rates of the methods based on different gene-level signals for balance case-control studies with DNA methylation data in scenario 1.
Fig. 3: Venn diagrams of genes identified by BWS, LD-PRS, OWS, and nPC for DNA methylation data and DNA sequence data in the real data analyses.

Similar content being viewed by others

Data availability

All data analyzed during this study are included in this published article and its supplemental materials.

References

  1. Ritchie MD. Large-scale analysis of genetic and clinical patient data. Annual Review of Biomedical Data. Science. 2018;1:263–74.

    Google Scholar 

  2. Li R, Duan R, Kember RL, Rader DJ, Damrauer SM, Moore JH, et al. A regression framework to uncover pleiotropy in large-scale electronic health record data. J Am Med Inform Assoc. 2019;26:1083–90.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science. 1998;280:1077–82.

    Article  ADS  CAS  PubMed  Google Scholar 

  4. Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012;13:705–19.

    Article  CAS  PubMed  Google Scholar 

  5. Waldmann P, Mészáros G, Gredler B, Fuerst C, Sölkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet. 2013;4:270.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang H, Lengerich BJ, Aragam B, Xing EP. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics. 2019;35:1181–7.

    Article  CAS  PubMed  Google Scholar 

  7. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc: Ser B (Stat Methodol). 2006;68:49–67.

    Article  MathSciNet  Google Scholar 

  8. Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. J R Stat Soc: Ser B (Stat Methodol). 2008;70:53–71.

    Article  MathSciNet  Google Scholar 

  9. Kim K, Sun H. Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data. BMC Bioinforma. 2019;20:1–15.

    Article  Google Scholar 

  10. Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008;24:1175–82.

    Article  CAS  PubMed  Google Scholar 

  11. Sun H, Wang S. Network‐based regularization for matched case‐control analysis of high‐dimensional DNA methylation data. Stat Med. 2013;32:2127–39.

    Article  MathSciNet  PubMed  Google Scholar 

  12. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012;36:561–71.

    Article  PubMed  Google Scholar 

  14. Yan S, Sha Q, Zhang S. Gene-based association tests using new polygenic risk scores and incorporating gene expression data. Genes. 2022;13:1120.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Baker E, Schmidt KM, Sims R, O’Donovan MC, Williams J, Holmans P, et al. POLARIS: Polygenic LD‐adjusted risk score approach for set‐based analysis of GWAS data. Genet Epidemiol. 2018;42:366–77.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Choi J, Kim K, Sun H. New variable selection strategy for analysis of high-dimensional DNA methylation data. J Bioinforma Computational Biol. 2018;16:1850010.

    Article  Google Scholar 

  17. Sun H, Wang S. Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics. 2012;28:1368–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Meinshausen N, Bühlmann P. Stability selection. J R Stat Soc: Ser B (Stat Methodol). 2010;72:417–73.

    Article  MathSciNet  Google Scholar 

  19. Kuhn M, Johnson K. Applied predictive modeling. Springer; 2013.

  20. Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol. 2013;31:142–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kular L, Liu Y, Ruhrmann S, Zheleznyakova G, Marabita F, Gomez-Cabrero D, et al. DNA methylation as a mediator of HLA-DRB1* 15: 01 and a protective variant in multiple sclerosis. Nat Commun. 2018;9:1–15.

    Article  CAS  Google Scholar 

  22. Jiang X, Källberg H, Chen Z, Ärlestig L, Rantapää-Dahlqvist S, Davila S, et al. An Immunochip-based interaction study of contrasting interaction effects with smoking in ACPA-positive versus ACPA-negative rheumatoid arthritis. Rheumatology. 2016;55:149–55.

    Article  CAS  PubMed  Google Scholar 

  23. Traylor M, Knevel R, Cui J, Taylor J, Harm-Jan W, Conaghan PG, et al. Genetic associations with radiological damage in rheumatoid arthritis: Meta-analysis of seven genome-wide association studies of 2,775 cases. PloS One. 2019;14:e0223246.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Eyre S, Bowes J, Diogo D, Lee A, Barton A, Martin P, et al. High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat Genet. 2012;44:1336–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Govind N, Choudhury A, Hodkinson B, Ickinger C, Frost J, Lee A, et al. Immunochip identifies novel, and replicates known, genetic risk loci for rheumatoid arthritis in black South Africans. Mol Med. 2014;20:341–9.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, et al. TRAF1–C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl J Med. 2007;357:1199–209.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bossini-Castillo L, De Kovel C, Kallberg H, van’t Slot R, Italiaander A, Coenen M, et al. A genome-wide association study of rheumatoid arthritis without antibodies against citrullinated peptides. Ann Rheum Dis. 2015;74:e15–e.

    Article  CAS  PubMed  Google Scholar 

  28. Consortium WTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661.

    Article  Google Scholar 

  29. Wei W-H, Viatte S, Merriman TR, Barton A, Worthington J. Genotypic variability based association identifies novel non-additive loci DHCR7 and IRF4 in sero-negative rheumatoid arthritis. Sci Rep. 2017;7:1–7.

    Google Scholar 

  30. Julia A, Ballina J, Canete JD, Balsa A, Tornero‐Molina J, Naranjo A, et al. Genome‐wide association study of rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility. Arthritis Rheumatism: Off J Am Coll Rheumatol. 2008;58:2275–86.

    Article  Google Scholar 

  31. Negi S, Juyal G, Senapati S, Prasad P, Gupta A, Singh S, et al. A genome‐wide association study reveals ARL15, a novel non‐HLA susceptibility gene for rheumatoid arthritis in North Indians. Arthritis Rheumatism. 2013;65:3026–35.

    Article  CAS  PubMed  Google Scholar 

  32. Aterido A, Cañete JD, Tornero J, Ferrándiz C, Pinto JA, Gratacós J, et al. Genetic variation at the glycosaminoglycan metabolism pathway contributes to the risk of psoriatic arthritis but not psoriasis. Ann Rheum Dis. 2019;78:355–64.

    Article  CAS  Google Scholar 

  33. Kochi Y, Okada Y, Suzuki A, Ikari K, Terao C, Takahashi A, et al. A regulatory variant in CCR6 is associated with rheumatoid arthritis susceptibility. Nat Genet. 2010;42:515–9.

    Article  CAS  PubMed  Google Scholar 

  34. Raychaudhuri S, Remmers EF, Lee AT, Hackett R, Guiducci C, Burtt NP, et al. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat Genet. 2008;40:1216–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Weyand CM, Goronzy JJ. Association of MHC and rheumatoid arthritis: HLA polymorphisms in phenotypic variants of rheumatoid arthritis. Arthritis Res Ther. 2000;2:1–5.

    Article  Google Scholar 

  36. Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am J Hum Genet. 2017;101:37–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Article  Google Scholar 

  38. Huber PJ. Robust estimation of a location parameter. Breakthroughs in statistics: Springer; 1992. p. 492–518.

Download references

Acknowledgements

Part of this research has been conducted using the UK Biobank Resource under application number 41722 and the NHGRI-EBI GWAS Catalog. XC was partially supported by the Michigan Technological University Health Research Institute Fellowship program and the Portage Health Foundation Graduate Assistantship.

Funding

No financial assistance was received in support of the study.

Author information

Authors and Affiliations

Authors

Contributions

Formal analysis: XC; Methodology: XC, SZ, XL, and QS; Data curation: XC and XL; Visualization: XC; Writing original draft: XC, XL, and QS; Writing review and editing: XC, SZ, XL, and QS.

Corresponding author

Correspondence to Qiuying Sha.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

This study used DNA sequence data from the UK Biobank, which has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) approval (approval number: 11/NW/0382). No specific ethical approval was required for DNA methylation data in this study, which is downloaded from GEO publicly available database with access number GSE42861.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, X., Liang, X., Zhang, S. et al. Gene selection by incorporating genetic networks into case-control association studies. Eur J Hum Genet 32, 270–277 (2024). https://doi.org/10.1038/s41431-022-01264-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41431-022-01264-x

This article is cited by

Search

Quick links