Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications

Abstract

Short-Tandem-Repeats (STRs) have long been studied for possible roles in biological phenomena, and are utilized in multiple applications such as forensics, evolutionary studies and pre-implantation-genetic-testing (PGT). The two reference genomes most used by clinicians and researchers are GRCh37/hg19 and GRCh38/hg38, both constructed using mainly short-read-sequencing (SRS) in which all-STR-containing-reads cannot be assembled to the reference genome. With the introduction of long-read-sequencing (LRS) methods and the generation of the CHM13 reference genome, also known as T2T, many previously unmapped STRs were finally localized within the human genome. We generated STRavinsky, a compact STR database for three reference genomes, including T2T. We proceeded to demonstrate the advantages of T2T over hg19 and hg38, identifying nearly double the number of STRs throughout all chromosomes. Through STRavinsky, providing a resolution down to a specific genomic coordinate, we demonstrated extreme propensity of TGGAA repeats in p arms of acrocentric chromosomes, substantially corroborating early molecular studies suggesting a possible role in formation of Robertsonian translocations. Moreover, we delineated unique propensity of TGGAA repeats specifically in chromosome 16q11.2 and in 9q12. Finally, we harness the superior capabilities of T2T and STRavinsky to generate PGTailor, a novel web application dramatically facilitating design of STR-based PGT tests in mere minutes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison of short-tandem-repeat amounts between hg19, hg38 and T2T reference genomes.
Fig. 2: TGGAA short-tandem-repeats in human acrocentric chromosomes are more abundant in T2T compared with hg38 in orders of magnitude.
Fig. 3: TGGAA short-tandem-repeats of acrocentric chromosomes in T2T.
Fig. 4: TGGAA short-tandem-repeats of chromosomes 9 and 16 in T2T and hg38.
Fig. 5: PGTailor user interface.

Similar content being viewed by others

Data availability

Code for the generation of STRavinsky is available in the GitHub repository: https://github.com/Noam-Hadar/STRavinsky. PGTailor is available at: https://fohs.bgu.ac.il/birklab/PGTailor.

References

  1. Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res. 1997;7:401–9.

    Article  CAS  PubMed  Google Scholar 

  2. Craig Venter J, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51.

    Article  Google Scholar 

  3. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2010;8:61–5.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteom Bioinforma. 2015;13:278–89.

    Article  Google Scholar 

  5. Jain M, Olsen HE, Paten B, Akeson M. The Oxford nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:1–11.

    Google Scholar 

  6. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Noyes MD, Harvey WT, Porubsky D, Sulovari A, Li R, Rose NR, et al. Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet. 2022;109:631–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science. 2022;376:eabk3112.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, et al. Utility of long-read sequencing for all of us. bioRxiv. 2023;2023.01.23.525236.

  10. Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol. 2022;23:1–19.

    Article  Google Scholar 

  11. Hardy T. The role of prenatal diagnosis following preimplantation genetic testing for single-gene conditions: a historical overview of evolving technologies and clinical practice. Prenat Diagn. 2020;40:647–51.

    Article  PubMed  Google Scholar 

  12. Alfonse LE, Garrett AD, Lun DS, Duffy KR, Grgicak CM. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt. Forensic Sci Int Genet. 2018;32:62–70.

    Article  CAS  PubMed  Google Scholar 

  13. Roewer L. Y-chromosome short tandem repeats in forensics—sexing, profiling, and matching male DNA. Wiley Interdiscip Rev Forsenic Sci. 2019;1:e1336.

    Google Scholar 

  14. Truong DT, Minh NVN, Nhung DP, van Luong H, Quyet D, Anh TN, et al. Short tandem repeats used in preimplantation genetic testing of β-thalassemia: genetic polymorphisms for 15 linked loci in the Vietnamese population. J Med Sci. 2019;7:4383–8.

    Google Scholar 

  15. Basille C, Frydman R, Aly A el, Hesters L, Fanchin R, Tachdjian G, et al. Preimplantation genetic diagnosis: state of the art. Eur J Obstet Gynecol Reprod Biol. 2009;145:9–13.

    Article  CAS  PubMed  Google Scholar 

  16. Wang W, Yap CHA, Loh SF, Tan ASC, Lim MN, Prasath EB, et al. Simplified PGD of common determinants of haemoglobin Bart’s hydrops fetalis syndrome using multiplex-microsatellite PCR. Reprod Biomed Online. 2010;21:642–8.

    Article  CAS  PubMed  Google Scholar 

  17. Alkuraya FS. Impact of new genomic tools on the practice of clinical genetics in consanguineous populations: the Saudi experience. Clin Genet. 2013;84:203–8.

    Article  CAS  PubMed  Google Scholar 

  18. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2004;25:4.10.1–14.

    Google Scholar 

  20. Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc Natl Acad Sci USA 2019;116:23243–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Grady DL, Ratliff RL, Robinson DL, Mccanlies EC, Meyne J, Moyzis RK. Highly conserved repetitive DNA sequences are present at human centromeres. Proc Natl Acad Sci USA 1992;89:1695.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Page SL, Shin JC, Han JY, Choo KHA, Shaffer LG. Breakpoint diversity illustrates distinct mechanisms for Robertsonian translocation formation. Hum Mol Genet. 1996;5:1279–88.

    Article  CAS  PubMed  Google Scholar 

  23. Zhu L, Chou SH, Reid BR. A single G-to-C change causes human centromere TGGAA repeats to fold back into hairpins. Proc Natl Acad Sci. 1996;93:12159–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3-new capabilities and interfaces. Nucleic Acids Res. 2012;40:e115.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform. 2012;13:134.

    Article  CAS  Google Scholar 

  26. Hossain S. Visualization of bioinformatics data with dash bio. Proc of the 18th Python in science conference. 2019; https://dash.plot.ly/dash-bio.

  27. Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol Biol Evol. 2019;36:2415–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Giacalone JP, Francke U. Common sequence motifs at the rearrangement sites of a constitutional X/autosome translocation and associated deletion. Am J Hum Genet. 1992;50:725.

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

Funding

The study was funded by the Morris Kahn Family Foundation, the Israel Science Foundation (Grant no. 2034/18) awarded to OSB, and the National Knowledge Center for Rare/Orphan Diseases of the Israel Ministry of Science, Technology and Space, at Ben-Gurion University of the Negev and Soroka Medical Center, Beer-Sheva, Israel.

Author information

Authors and Affiliations

Authors

Contributions

The manuscript was written by NH with the assistance of OSB. NH planned and performed bioinformatic analysis. Software development was done by NH. GN and SA defined software requirements specification for PGTailor. MV and GCP performed quality assurance for PGTailor. AS assisted in the deployment of STRavinsky and PGTailor. The study was supervised by OSB.

Corresponding author

Correspondence to Ohad S. Birk.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

The study required no ethical approval, no personal individual data was used.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadar, N., Narkis, G., Amar, S. et al. STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications. Eur J Hum Genet 31, 738–743 (2023). https://doi.org/10.1038/s41431-023-01352-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41431-023-01352-6

This article is cited by

Search

Quick links