Abstract
Short-Tandem-Repeats (STRs) have long been studied for possible roles in biological phenomena, and are utilized in multiple applications such as forensics, evolutionary studies and pre-implantation-genetic-testing (PGT). The two reference genomes most used by clinicians and researchers are GRCh37/hg19 and GRCh38/hg38, both constructed using mainly short-read-sequencing (SRS) in which all-STR-containing-reads cannot be assembled to the reference genome. With the introduction of long-read-sequencing (LRS) methods and the generation of the CHM13 reference genome, also known as T2T, many previously unmapped STRs were finally localized within the human genome. We generated STRavinsky, a compact STR database for three reference genomes, including T2T. We proceeded to demonstrate the advantages of T2T over hg19 and hg38, identifying nearly double the number of STRs throughout all chromosomes. Through STRavinsky, providing a resolution down to a specific genomic coordinate, we demonstrated extreme propensity of TGGAA repeats in p arms of acrocentric chromosomes, substantially corroborating early molecular studies suggesting a possible role in formation of Robertsonian translocations. Moreover, we delineated unique propensity of TGGAA repeats specifically in chromosome 16q11.2 and in 9q12. Finally, we harness the superior capabilities of T2T and STRavinsky to generate PGTailor, a novel web application dramatically facilitating design of STR-based PGT tests in mere minutes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Code for the generation of STRavinsky is available in the GitHub repository: https://github.com/Noam-Hadar/STRavinsky. PGTailor is available at: https://fohs.bgu.ac.il/birklab/PGTailor.
References
Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res. 1997;7:401–9.
Craig Venter J, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51.
Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2010;8:61–5.
Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteom Bioinforma. 2015;13:278–89.
Jain M, Olsen HE, Paten B, Akeson M. The Oxford nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:1–11.
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53.
Noyes MD, Harvey WT, Porubsky D, Sulovari A, Li R, Rose NR, et al. Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet. 2022;109:631–46.
Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science. 2022;376:eabk3112.
Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, et al. Utility of long-read sequencing for all of us. bioRxiv. 2023;2023.01.23.525236.
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol. 2022;23:1–19.
Hardy T. The role of prenatal diagnosis following preimplantation genetic testing for single-gene conditions: a historical overview of evolving technologies and clinical practice. Prenat Diagn. 2020;40:647–51.
Alfonse LE, Garrett AD, Lun DS, Duffy KR, Grgicak CM. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt. Forensic Sci Int Genet. 2018;32:62–70.
Roewer L. Y-chromosome short tandem repeats in forensics—sexing, profiling, and matching male DNA. Wiley Interdiscip Rev Forsenic Sci. 2019;1:e1336.
Truong DT, Minh NVN, Nhung DP, van Luong H, Quyet D, Anh TN, et al. Short tandem repeats used in preimplantation genetic testing of β-thalassemia: genetic polymorphisms for 15 linked loci in the Vietnamese population. J Med Sci. 2019;7:4383–8.
Basille C, Frydman R, Aly A el, Hesters L, Fanchin R, Tachdjian G, et al. Preimplantation genetic diagnosis: state of the art. Eur J Obstet Gynecol Reprod Biol. 2009;145:9–13.
Wang W, Yap CHA, Loh SF, Tan ASC, Lim MN, Prasath EB, et al. Simplified PGD of common determinants of haemoglobin Bart’s hydrops fetalis syndrome using multiplex-microsatellite PCR. Reprod Biomed Online. 2010;21:642–8.
Alkuraya FS. Impact of new genomic tools on the practice of clinical genetics in consanguineous populations: the Saudi experience. Clin Genet. 2013;84:203–8.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.
Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2004;25:4.10.1–14.
Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc Natl Acad Sci USA 2019;116:23243–53.
Grady DL, Ratliff RL, Robinson DL, Mccanlies EC, Meyne J, Moyzis RK. Highly conserved repetitive DNA sequences are present at human centromeres. Proc Natl Acad Sci USA 1992;89:1695.
Page SL, Shin JC, Han JY, Choo KHA, Shaffer LG. Breakpoint diversity illustrates distinct mechanisms for Robertsonian translocation formation. Hum Mol Genet. 1996;5:1279–88.
Zhu L, Chou SH, Reid BR. A single G-to-C change causes human centromere TGGAA repeats to fold back into hairpins. Proc Natl Acad Sci. 1996;93:12159–64.
Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3-new capabilities and interfaces. Nucleic Acids Res. 2012;40:e115.
Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform. 2012;13:134.
Hossain S. Visualization of bioinformatics data with dash bio. Proc of the 18th Python in science conference. 2019; https://dash.plot.ly/dash-bio.
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol Biol Evol. 2019;36:2415–31.
Giacalone JP, Francke U. Common sequence motifs at the rearrangement sites of a constitutional X/autosome translocation and associated deletion. Am J Hum Genet. 1992;50:725.
Funding
The study was funded by the Morris Kahn Family Foundation, the Israel Science Foundation (Grant no. 2034/18) awarded to OSB, and the National Knowledge Center for Rare/Orphan Diseases of the Israel Ministry of Science, Technology and Space, at Ben-Gurion University of the Negev and Soroka Medical Center, Beer-Sheva, Israel.
Author information
Authors and Affiliations
Contributions
The manuscript was written by NH with the assistance of OSB. NH planned and performed bioinformatic analysis. Software development was done by NH. GN and SA defined software requirements specification for PGTailor. MV and GCP performed quality assurance for PGTailor. AS assisted in the deployment of STRavinsky and PGTailor. The study was supervised by OSB.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The study required no ethical approval, no personal individual data was used.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hadar, N., Narkis, G., Amar, S. et al. STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications. Eur J Hum Genet 31, 738–743 (2023). https://doi.org/10.1038/s41431-023-01352-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41431-023-01352-6
This article is cited by
-
Unusual genomic variants require unusual analyses
European Journal of Human Genetics (2023)