Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations

Key Points

  • The ability to identify low-frequency genetic variants among heterogeneous populations of cells or DNA molecules is important in many fields of basic science, clinical medicine and other applications, yet current high-throughput DNA sequencing technologies have an error rate between 1 per 100 and 1 per 1,000 base pairs sequenced, which obscures their presence below this level.

  • As next-generation sequencing technologies evolved over the decade, throughput has improved markedly, but raw accuracy has remained generally unchanged. Researchers with a need for high accuracy developed data filtering methods and incremental biochemical improvements that modestly improve low-frequency variant detection, but background errors remain limiting in many fields.

  • The most profoundly impactful means for reducing errors, first developed approximately 7 years ago, has been the concept of single-molecule consensus sequencing. This entails redundant sequencing of multiple copies of a given specific DNA molecule and discounting of variants that are not present in all or most of the copies as likely errors.

  • Consensus sequencing can be achieved by labelling each molecule with a unique molecular barcode before generating copies, which allows subsequent comparison of these copies or schemes whereby copies are physically joined and sequenced together. Because of trade-offs in cost, time and accuracy, no single method is optimal for every application, and each method should be considered on a case-by-case basis.

  • Major applications for high-accuracy DNA sequencing include non-invasive cancer diagnostics, cancer screening, early detection of cancer relapse or impending drug resistance, infectious disease applications, prenatal diagnostics, forensics and mutagenesis assessment.

  • Future advances in ultra-high-accuracy sequencing are likely to be driven by an emerging generation of single-molecule sequencers, particularly those that allow independent sequence comparison of both strands of native DNA duplexes.

Abstract

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The signal-to-noise problem.
Figure 2: Methods of consensus-based error correction on short-read platforms.
Figure 3: Methods of single-molecule sequencing consensus-based error correction.
Figure 4: Impact of error correction technology on detection sensitivity.
Figure 5: Applications of rare variant detection.

Similar content being viewed by others

References

  1. Darwin, C. On the Origin of Species (John Murray Press, 1859).

    Google Scholar 

  2. Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Cairns, J. Mutation selection and the natural history of cancer. Nature 255, 197–200 (1975).

    Article  CAS  PubMed  Google Scholar 

  4. Fisher, R. et al. Deep sequencing reveals minor protease resistance mutations in patients failing a protease inhibitor regimen. J. Virol. 86, 6231–6237 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Schmitt, M. W., Loeb, L. A. & Salk, J. J. The influence of subclonal resistance mutations on targeted cancer therapy. Nat. Rev. Clin. Oncol. 13, 335–347 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. Maher, G. J. et al. Visualizing the origins of selfish de novo mutations in individual seminiferous tubules of human testes. Proc. Natl Acad. Sci. USA 113, 2454–2459 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kennedy, S. R., Loeb, L. A. & Herr, A. J. Somatic mutations in aging, cancer and neurodegeneration. Mech. Ageing Dev. 133, 118–126 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. Vijg, J. Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26, 141–149 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. Goodwin, S., Mcpherson, J. D. & Mccombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article  CAS  PubMed  Google Scholar 

  11. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977). One of two Nobel prize-winning DNA sequencing methodologies published in 1977 (the other being that of Maxam and Gilbert). The Sanger approach formed the basis of The Human Genome Project.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Zagordi, O., Klein, R., Däumer, M. & Beerenwinkel, N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 38, 7400–7409 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Parsons, B. L. & Heflich, R. H. Genotypic selection methods for the direct analysis of point mutations. Mutat. Res. 387, 97–121 (1997).

    Article  CAS  PubMed  Google Scholar 

  15. Bielas, J. H. & Loeb, L. A. Quantification of random genomic mutations. Nat. Methods 2, 285–290 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Li, J. et al. Replacing PCR with COLD-PCR enriches variant DNA sequences and redefines the sensitivity of genetic testing. Nat. Med. 14, 579–584 (2008).

    Article  CAS  PubMed  Google Scholar 

  17. Sykes, P. J. et al. Quantitation of targets for PCR by use of limiting dilution. Biotechniques 13, 444–449 (1992).

    CAS  PubMed  Google Scholar 

  18. Vogelstein, B. & Kinzler, K. W. Digital, P. C. R. Proc. Natl Acad. Sci. USA 96, 9236–9241 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hindson, B. J. et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 8604–8610 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 1, 1000106 (2014).

    PubMed  PubMed Central  Google Scholar 

  21. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998). Among the first and most important uses of rigorous statistical methods to assign degree of certainty to DNA sequencing data.

    Article  CAS  PubMed  Google Scholar 

  23. Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).

    Article  CAS  PubMed  Google Scholar 

  24. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Wang, Q. et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at ArXiV arXiv:1303.3997v2 [q-bio.GN] (2013).

    Google Scholar 

  28. Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132–e132 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gerstung, M. et al. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67–e67 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).

    Article  CAS  PubMed  Google Scholar 

  33. Schirmer, M., D'Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17, 125 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012). An important description of the commonness of PCR chimaeras, optical duplicates and index swapping that occurs during NGS library preparation and polony formation. This contributed to the now common practice of dual indexing for error-sensitive applications.

    Article  CAS  PubMed  Google Scholar 

  38. Potapov, V. & Ong, J. L. Examining sources of error in PCR by single-molecule sequencing. PLOS ONE 12, e0169774 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Brodin, J. et al. PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data. PLOS ONE 8, e70388 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Star, B. et al. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA. PLOS ONE 9, e89676 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Van Allen, E. M. et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat. Med. 20, 682–688 (2014).

    Article  CAS  PubMed  Google Scholar 

  42. Arbeithuber, B., Makova, K. D. & Tiemann-Boege, I. Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive sequencing applications. DNA Res. 23, 547–559 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lindahl, T. & Nyberg, B. Rate of depurination of native deoxyribonucleic acid. Biochemistry 11, 3610–3618 (1972).

    Article  CAS  PubMed  Google Scholar 

  44. Knierim, E., Lucke, B., Schwarz, J. M., Schuelke, M. & Seelow, D. Systematic comparison of three methods for fragmentation of long-range PCR products for next generation sequencing. PLOS ONE 6, e28240 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Do, H. & Dobrovic, A. Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem. 61, 64–71 (2015).

    Article  CAS  PubMed  Google Scholar 

  46. Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013). The first important description of consensus sequencing by tandem duplication of library molecules. Although challenging on short-read sequencers, this concept is likely to become very important as single-molecule sequencers improve in the coming years.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chen, G., Mosier, S., Gocke, C. D., Lin, M.-T. & Eshleman, J. R. Cytosine deamination is a major cause of baseline noise in next-generation sequencing. Mol. Diagn. Ther. 18, 587–593 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Schaaper, R. M., Kunkel, T. A. & Loeb, L. A. Infidelity of DNA synthesis associated with bypass of apurinic sites. Proc. Natl Acad. Sci. USA 80, 487–491 (1983).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Sagher, D. & Strauss, B. Insertion of nucleotides opposite apurinic/apyrimidinic sites in deoxyribonucleic acid during in vitro synthesis: uniqueness of adenine nucleotides. Biochemistry 22, 4518–4526 (1983).

    Article  CAS  PubMed  Google Scholar 

  50. Nishimura, S. 8-Hydroxyguanine: a base for discovery. DNA Repair 10, 1078–1083 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. Sinha, R. et al. Index switching causes 'spreading-of-signal' among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. https://doi.org/10.1101/125724 (2017).

  52. Hiatt, J. B., Turner, E. H., Patwardhan, R. P., Caperton, L. & Shendure, J. Next-generation DNA sequencing for de novo genome assembly. Western Student Medical Research Forum (2009).

  53. Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010). The first description of consensus sequencing PCR duplicates for error correction, both with UMIs and without.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Casbon, J. A., Osborne, R. J., Brenner, S. & Lichtenstein, C. P. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 39, e81 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011). A key early description of single-strand tag-based error correction for rare variant detection. This publication put the significance in clinical context and was probably the most important launch for the field.

    Article  PubMed  PubMed Central  Google Scholar 

  56. Jabara, C. B., Jones, C. D., Roach, J., Anderson, J. A. & Swanstrom, R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl Acad. Sci. USA 108, 20166–20171 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).

    Article  CAS  PubMed  Google Scholar 

  59. Shiroguchi, K., Jia, T. Z., Sims, P. A. & Xie, X. S. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc. Natl Acad. Sci. USA 109, 1347–1352 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012). The initial description of DupSeq and the concept of labelling copies of both strands of individual double-stranded molecules to allow them to be sequenced and compared for even greater accuracy. This technique opened the door to investigations of ultra-rare variants, such as those that occur in ageing and with mutagenic chemical exposure.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016). A duplex sequencing approach at very low depth and not requiring exogenous UMIs. An excellent example of genotoxicity and ageing applications.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Nachmanson, D. et al. CRISPR-DS: an efficient, low DNA input method for ultra-accurate sequencing. Preprint at bioRxivhttps://doi.org/10.1101/207027 (2017).

  63. Liang, R. H. et al. Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing. Nucleic Acids Res. 42, e98 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Zhang, T.-H., Wu, N. C. & Sun, R. A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genomics 17, 108 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Ståhlberg, A. et al. Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing. Nucleic Acids Res. 44, e105 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ståhlberg, A. et al. Simple multiplexed PCR-based barcoding of DNA for ultrasensitive mutation detection by next-generation sequencing. Nat. Protoc. 12, 664–682 (2017).

    Article  CAS  PubMed  Google Scholar 

  68. Hiatt, J. B., Pritchard, C. C., Salipante, S. J., O'Roak, B. J. & Shendure, J. Single molecule molecular inversion probes for targeted, high accuracy detection of low frequency variation. Genome Res. https://doi.org/10.1101/gr.147686.112 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Carlson, K. D. et al. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res. 25, 750–761 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Boyle, E. A., O'Roak, B. J., Martin, B. K., Kumar, A. & Shendure, J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics 30, 2670–2672 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Wang, K. et al. Ultra-precise detection of mutations by droplet-based amplification of circularized DNA. BMC Genomics 17, 214 (2016). An important description of several biochemical techniques to improve consensus making efficiency and reduce cost.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Hong, L. Z. et al. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads. Genome Biol. 15, 517 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Schmitt, M. W., Fox, E. J. & Salk, J. J. Risks of double-counting in deep sequencing. Proc. Natl Acad. Sci. USA 111, E1560 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Hong, J. & Gresham, D. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. Biotechniques 63, 221–226 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Narayan, A. et al. Ultrasensitive measurement of hotspot mutations in tumor DNA in blood using error-suppressed multiplexed deep sequencing. Cancer Res. 72, 3492–3498 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Gregory, M. T. et al. Targeted single molecule mutation detection with massively parallel sequencing. Nucleic Acids Res. 44, e22–e22 (2016).

    Article  CAS  PubMed  Google Scholar 

  77. Pel, J. et al. Duplex Proximity Sequencing (Pro-Seq): a method to improve DNA sequencing accuracy without the cost of molecular barcoding redundancy. Preprint at bioRxiv https://doi.org/10.1101/163444 (2017).

    Google Scholar 

  78. Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLOS Genet. 9, e1003794 (2013). The first description of high-accuracy consensus sequencing to measure the effect of human ageing on somatic mutation load.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Taylor, P. H., Cinquin, A. & Cinquin, O. Quantification of in vivo progenitor mutation accrual with ultra-low error rate and minimal input DNA using SIP-HAVA-seq. Genome Res. 26, 1600–1611 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Hoekstra, J. G., Hipp, M. J., Montine, T. J. & Kennedy, S. R. Mitochondrial DNA mutations increase in early stage Alzheimer disease and are inconsistent with oxidative damage. Ann. Neurol. 80, 301–306 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Pickrell, A. M. et al. Endogenous parkin preserves dopaminergic substantia nigral neurons following mitochondrial DNA mutagenic stress. Neuron 87, 371–381 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Reid-Bayliss, K. S., Arron, S. T., Loeb, L. A., Bezrookove, V. & Cleaver, J. E. Why Cockayne syndrome patients do not get cancer despite their DNA repair deficiency. Proc. Natl Acad. Sci. USA 113, 10151–10156 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Chawanthayatham, S. et al. Mutational spectra of aflatoxin B1 in vivo establish biomarkers of exposure for human hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 114, E3101–E3109 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Mattox, A. K. et al. Bisulfite-converted duplexes for the strand-specific detection and quantification of rare mutations. Proc. Natl Acad. Sci. USA 114, 4733–4738 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Kumar, V. et al. Partial bisulfite conversion for unique template sequencing. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx1054 (2017).

    Article  CAS  PubMed Central  Google Scholar 

  88. Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. 323, 133–138 (2009).

  90. Madoui, M.-A. et al. Genome assembly using nanopore-guided long and error-free DNA reads. BMC Genomics 16, 327 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Schüle, B. et al. Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis. 3, 27 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Travers, K. J., Chin, C.-S., Rank, D. R., Eid, J. S. & Turner, S. W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010). The first description of consensus sequencing based on iterative resequencing of both strands of individual molecules. This concept, although currently challenging, will probably become very important as single-molecule DNA sequencers improve.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Loomis, E. W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Russo, G. et al. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing. Appl. Transl Genom. 7, 32–39 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Frank, J. A. et al. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci. Rep. 6, 25373 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Hestand, M. S., Van Houdt, J., Cristofoli, F. & Vermeesch, J. R. Polymerase specific error rates and profiles identified by single molecule sequencing. Mutat. Res. 784–785, 39–45 (2016).

    Article  CAS  PubMed  Google Scholar 

  99. Heerema, S. J. & Dekker, C. Graphene nanodevices for DNA sequencing. Nat. Nanotechnol. 11, 127–136 (2016).

    Article  CAS  PubMed  Google Scholar 

  100. Beechem, J. Library free targeted sequencing of native genomic DNA FFPE samples using Hyb & Seq technology-the hybridization based single molecule sequencing system. Advances in Genome Biology and Technology Annual Meeting https://www.nanostring.com/application/files/3815/0206/1895/AGBT2017_HybSeq_Chemistry_Final.pdf (2017).

    Google Scholar 

  101. Johnson, S. S., Zaikova, E., Goerlitz, D. S., Bai, Y. & Tighe, S. W. Real-time DNA sequencing in the Antarctic dry valleys using the Oxford Nanopore sequencer. J. Biomol. Tech. 28, 2–7 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Wang, K. et al. Using ultra-sensitive next generation sequencing to dissect DNA damage-induced mutagenesis. Sci. Rep. 6, 25310 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Stoler, N., Arbeithuber, B., Guiblet, W., Makova, K. D. & Nekrutenko, A. Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol. 17, 180 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016). An important early comprehensive description of a cfDNA liquid biopsy approach using tag-based error correction techniques.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Zheng, Z. et al. Anchored multiplex PCR for targeted next-generation sequencing. Nat. Med. 20, 1479–1484 (2014).

    Article  CAS  PubMed  Google Scholar 

  106. Kennedy, S. & Hipp, M. J. Removing sequencer and PCR artifacts for forensic DNA analysis on massively parallel sequencing platforms: https://www.promega.com/-/media/files/products-and-services/genetic-identity/ishi-28-oral-abstracts/kennedy-ishipaper.pdf (2017).

  107. Krimmel, J. D., Salk, J. J. & Risques, R.-A. Cancer-like mutations in non-cancer tissue: towards a better understanding of multistep carcinogenesis. Transl Cancer Res. https://doi.org/10.21037/tcr.2016.11.67 (2016).

    Article  Google Scholar 

  108. Loeb, L. A., Springgate, C. F. & Battula, N. Errors in DNA replication as a basis of malignant changes. Cancer Res. 34, 2311–2321 (1974).

    CAS  PubMed  Google Scholar 

  109. Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).

    Article  CAS  PubMed  Google Scholar 

  110. Gatenby, R. A. & Gillies, R. J. A microenvironmental model of carcinogenesis. Nat. Rev. Cancer 8, 56–61 (2008).

    Article  CAS  PubMed  Google Scholar 

  111. Salk, J. J., Fox, E. J. & Loeb, L. A. Mutational heterogeneity in human cancers: origin and consequences. Annu. Rev. Pathol. 5, 51–75 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).

    Article  CAS  PubMed  Google Scholar 

  114. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA 110, 4009–4014 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Naxerova, K. et al. Hypermutable DNA chronicles the evolution of human colon cancer. Proc. Natl Acad. Sci. USA 111, E1889–E1898 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Reiter, J. G. et al. Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, 14114 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Marusyk, A. et al. Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity. Nature 514, 54–58 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Sequist, L. V. et al. Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci. Transl Med. 3, 75ra26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  124. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  125. Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).

    Article  CAS  PubMed  Google Scholar 

  126. Mroz, E. A. et al. High intratumor genetic heterogeneity is related to worse outcome in patients with head and neck squamous cell carcinoma. Cancer 119, 3034–3042 (2013).

    Article  PubMed  Google Scholar 

  127. Parker, W. T., Ho, M., Scott, H. S., Hughes, T. P. & Branford, S. Poor response to second-line kinase inhibitors in chronic myeloid leukemia patients with multiple low-level mutations, irrespective of their resistance profile. Blood 119, 2234–2238 (2012).

    Article  CAS  PubMed  Google Scholar 

  128. Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Klco, J. M. et al. Association between mutation clearance after induction therapy and outcomes in acute myeloid leukemia. JAMA 314, 811–822 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Misale, S. et al. Emergence of KRAS mutations and acquired resistance to anti-EGFR therapy in colorectal cancer. Nature 486, 532–536 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Stroun, M., Anker, P., Lyautey, J., Lederrey, C. & Maurice, P. A. Isolation and characterization of DNA from the plasma of cancer patients. Eur. J. Cancer Clin. Oncol. 23, 707–712 (1987).

    Article  CAS  PubMed  Google Scholar 

  132. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl Med. 6, 224ra24 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

    Article  CAS  PubMed  Google Scholar 

  134. Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).

    Article  CAS  PubMed  Google Scholar 

  135. Garcia-Murillas, I. et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci. Transl Med. 7, 302ra133 (2015).

    Article  PubMed  Google Scholar 

  136. Tie, J. et al. Circulating tumor DNA analysis detects minimal residual disease and predicts recurrence in patients with stage II colon cancer. Sci. Transl Med. 8, 346ra92 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Fujii, T. et al. Mutation-enrichment next-generation sequencing for quantitative detection of KRAS mutations in urine cell-free DNA from patients with advanced cancers. Clin. Cancer Res. 23, 3657–3666 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Wang, Y. et al. Detection of tumor-derived DNA in cerebrospinal fluid of patients with primary tumors of the brain and spinal cord. Proc. Natl Acad. Sci. USA 112, 9704–9709 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Kinde, I. et al. Evaluation of DNA from the Papanicolaou test to detect ovarian and endometrial cancers. Sci. Transl Med. 5, 167ra4 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Maritschnegg, E. et al. Lavage of the uterine cavity for molecular detection of Müllerian duct carcinomas: a proof-of-concept study. J. Clin. Oncol. 33, 4293–4300 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Wang, Y. et al. Detection of somatic mutations and HPV in the saliva and plasma of patients with head and neck squamous cell carcinomas. Sci. Transl Med. 7, 293ra104 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Sidransky, D. et al. Identification of ras oncogene mutations in the stool of patients with curable colorectal tumors. Science 256, 102–105 (1992).

    Article  CAS  PubMed  Google Scholar 

  144. Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).

    Article  CAS  PubMed  Google Scholar 

  145. Armitage, P. & Doll, R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016). A description of the use of a single-strand tag-based error correction technique to identify preneoplastic clones in nearly all adults, which had only 2 years earlier been believed to occur in only a subset of very elderly individuals. It is an important example of how a fundamental biological understanding can change quickly with improved discovery technologies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Krimmel, J. D. et al. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues. Proc. Natl Acad. Sci. USA 113, 6005–6010 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Salk, J. J. et al. Duplex Sequencing detects cancer-associated mutations arising during normal aging: clonal evolution over a century of human lifetime [abstract]. Cancer Res. 77, 3041 (2017).

    Google Scholar 

  151. Jee, J. et al. Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693–696 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Maslov, A. Y., Quispe-Tintaya, W., Gorbacheva, T., White, R. R. & Vijg, J. High-throughput sequencing in mutation detection: a new generation of genotoxicity tests? Mutat. Res. 776, 136–143 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Fielden, M. R. et al.Modernizing human cancer risk assessment of therapeutics. Trends Pharmacol. Sci. https://doi.org/10.1016/j.tips.2017.11.005 (2017).

    Article  CAS  PubMed  Google Scholar 

  154. Kim, D., Kim, S., Kim, S., Park, J. & Kim, J.-S. Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26, 406–415 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Caperton, L. et al. Assisted reproductive technologies do not alter mutation frequency or spectrum. Proc. Natl Acad. Sci. USA 104, 5085–5090 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Nelson, J. L. The otherness of self: microchimerism in health and disease. Trends Immunol. 33, 421–427 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Eun, J. K., Guthrie, K. A., Zirpoli, G. & Gadi, V. K. In situ breast cancer and microchimerism. Sci. Rep. 3, 2192 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  158. Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  159. Chiu, R. W. K. et al. Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ 342, c7401 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  160. Bianchi, D. W. et al. Noninvasive prenatal testing and incidental detection of occult maternal malignancies. JAMA 314, 162–169 (2015).

    Article  CAS  PubMed  Google Scholar 

  161. Jamuar, S. S. & Walsh, C. A. Somatic mutations in cerebral cortical malformations. N. Engl. J. Med. 371, 2038–2038 (2014).

    Article  CAS  PubMed  Google Scholar 

  162. Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758–1237758 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. De Vlaminck, I. et al. Circulating cell-free DNA enables noninvasive diagnosis of heart transplant rejection. Sci. Transl Med. 6, 241ra77 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  164. Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11, 653–655 (2014).

    Article  CAS  PubMed  Google Scholar 

  165. DeWitt, W. S. et al. Dynamics of the cytotoxic T cell response to a model of acute viral infection. J. Virol. 89, 4517–4526 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Hsu, M. S. et al. TCR sequencing can identify and track glioma-infiltrating T cells after DC vaccination. Cancer Immunol. Res. 4, 412–418 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Tumeh, P. C. et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature 515, 568–571 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Goodnow, C. C. Multistep pathogenesis of autoimmune disease. Cell 130, 25–35 (2007).

    Article  CAS  PubMed  Google Scholar 

  169. Qian, J. et al. B cell super-enhancers and regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  171. Lynch, S. V. & Pedersen, O. The human intestinal microbiome in health and disease. N. Engl. J. Med. 375, 2369–2379 (2016).

    Article  CAS  PubMed  Google Scholar 

  172. Van de Wiele, T., Van Praet, J. T., Marzorati, M., Drennan, M. B. & Elewaut, D. How the microbiota shapes rheumatic diseases. Nat. Rev. Rheumatol. 12, 398–411 (2016).

    Article  CAS  PubMed  Google Scholar 

  173. Rosenbaum, M., Knight, R. & Leibel, R. L. The gut microbiota in human energy homeostasis and obesity. Trends Endocrinol. Metab. 26, 493–501 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. Alexander, J. L. et al. Gut microbiota modulation of chemotherapy efficacy and toxicity. Nat. Rev. Gastroenterol. Hepatol. 1805, 105 (2017).

    Google Scholar 

  175. Vindigni, S. M. & Surawicz, C. M. Fecal microbiota transplantation. Gastroenterol. Clin. North Am. 46, 171–185 (2017).

    Article  PubMed  Google Scholar 

  176. Dominguez-Bello, M. G. et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. Roach, D. J. et al. A year of infection in the intensive care unit: prospective whole genome sequencing of bacterial clinical isolates reveals cryptic transmissions and novel microbiota. PLOS Genet. 11, e1005413 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. Cummings, L. A. et al. Clinical next generation sequencing outperforms standard microbiological culture for characterizing polymicrobial samples. Clin. Chem. 62, 1465–1473 (2016).

    Article  CAS  PubMed  Google Scholar 

  179. Grumaz, S. et al. Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Med. 8, 73 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. Kim, S. et al. High-throughput automated microfluidic sample preparation for accurate microbial genomics. Nat. Commun. 8, 13919 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014).

    Article  CAS  PubMed  Google Scholar 

  182. Eigen, M. The concept of the quasispecies will soon be 50 years old. Introduction. Curr. Top. Microbiol. Immunol. 392, vii (2016).

    PubMed  Google Scholar 

  183. Henn, M. R. et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathog. 8, e1002529 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. Solmone, M. et al. Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J. Virol. 83, 1718–1726 (2009).

    Article  CAS  PubMed  Google Scholar 

  185. Svarovskaia, E. S., Martin, R., McHutchison, J. G., Miller, M. D. & Mo, H. Abundant drug-resistant NS3 mutants detected by deep sequencing in hepatitis C virus-infected patients undergoing NS3 protease inhibitor monotherapy. J. Clin. Microbiol. 50, 3267–3274 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. Daum, L. T. et al. Next-generation ion torrent sequencing of drug resistance mutations in Mycobacterium tuberculosis strains. J. Clin. Microbiol. 50, 3831–3837 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  187. Katz, M., Hover, B. & Brady, S. Culture-independent discovery of natural products from soil metagenomes. J. Ind. Microbiol. Biotechnol. 43, 129–141 (2016).

    Article  CAS  PubMed  Google Scholar 

  188. Bassil, N. M., Bryan, N. & Lloyd, J. R. Microbial degradation of isosaccharinic acid at high pH. ISME J. 9, 310–320 (2015).

    Article  CAS  PubMed  Google Scholar 

  189. Yamamoto, S. et al. Environmental DNA metabarcoding reveals local fish communities in a species-rich coastal sea. Sci. Rep. 7, 40368 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  190. Mayo, B. et al. Impact of next generation sequencing techniques in food microbiology. Curr. Genom. 15, 293–309 (2014).

    Article  CAS  Google Scholar 

  191. Jäger, A. C. et al. Developmental validation of the MiSeq FGx Forensic Genomics System for targeted next generation sequencing in forensic DNA casework and database laboratories. Forensic Sci. Int. Genet. 28, 52–70 (2017).

    Article  CAS  PubMed  Google Scholar 

  192. Stiller, M. et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA 103, 13578–13584 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  193. Avery, O. T., Macleod, C. M. & McCarty, M. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med. 79, 137–158 (1944).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  Google Scholar 

  195. Mostovoy, Y. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods 13, 587–590 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  197. King, D. A. et al. Mosaic structural variation in children with developmental disorders. Hum. Mol. Genet. 24, 2733–2745 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  199. Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  200. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. Rosenberg, A. B. et al. Scaling single cell transcriptomics through split pool barcoding. Preprint at bioRxiv https://doi.org/10.1101/105163 (2017).

    Google Scholar 

  202. Ullal, A. V. et al. Cancer cell profiling by barcoding allows multiplexed protein analysis in fine-needle aspirates. Sci. Transl Med. 6, 219ra9 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  203. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  204. Sun, W.-J. et al. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 44, D259–265 (2016).

    Article  CAS  PubMed  Google Scholar 

  205. Wellcome Collection. Charles Robert Darwin. Photograph by L. Darwin. Wellcome Trust https://wellcomecollection.org/works/s6x9wbsj?page=1&query=darwin (2016).

Download references

Acknowledgements

The authors thank R. Risques, J. Hiatt and A. Boswell for critical review; E. Fox and E. H. Ahn for contributions to early drafts; K. Loubet-Senear, R. Risques and M. Emond for graphics ideas; N. Homer and C. Valentine for software information and members of the Loeb, Kennedy and Risques laboratories at the University of Washington for many lively discussions. This work was supported by National Institutes of Health grants T32CA009515 (J.J.S.) and R01CA193649, P01CA77852, and R33CA181771 (L.A.L.).

Author information

Authors and Affiliations

Authors

Contributions

J.J.S and L.A.L. contributed to discussion of content and reviewing and editing the manuscript before submission. J.J.S. was primarily responsible for researching data and writing the manuscript. M.W.S. researched the literature and contributed to writing parts of the initial drafts of the article.

Corresponding authors

Correspondence to Jesse J. Salk or Lawrence A. Loeb.

Ethics declarations

Competing interests

J.J.S., M.W.S. and L.A.L. are equity holders in TwinStrand Biosciences, Inc.

Related links

PowerPoint slides

Glossary

Clonal

When referring to a genetic variant or mutation, it is one that is present in all or most molecules in a population being sequenced. The term typically implies that it arose from a common ancestor, such as a fertilized egg in the case of germline variation, or the earliest founder cell of a tumour.

Subclonal

When referring to a genetic variant or mutation, it is one that is present in only a subset of molecules being sequenced. This may refer to either a variant carried by a subpopulation that arose and expanded within a larger population or through mixing of two or more distinct populations.

Sequencing accuracy

The number of errors made per base pair sequenced. It may be stratified by subtype of error, such as a specific type of base substitution.

Sequencing sensitivity

The ability to detect a variant at a particular variant allele frequency. This depends on both the sequencing accuracy and the number of independent DNA molecules successfully sequenced that include the genomic position (or positions) of interest.

Variant allele frequency

(VAF). The fraction of all molecules being sequenced that carry a specific genetic change or mutation at a particular genomic position.

Digital PCR

DNA amplification carried out in single-molecule reaction chambers. Recently, this has most often entailed microscopic aqueous droplets immersed in oil. When DNA input is sufficiently low, only one molecule will seed each reaction. When allele-specific amplification conditions are used, the number of droplets that successfully amplify can be digitally tabulated to determine the variant allele frequency.

Polony

A population of identical amplification copies that originated from a single founder molecule and are spatially colocalized, such as on the surface of a microbead or as a spot on a surface. It is the biochemical analogue of a bacterial colony on a Petri dish.

Tag-based error correction

Also known as consensus sequencing, an approach for error correction whereby individual DNA molecules are uniquely labelled before amplification and sequencing, and the sequences of the related derivative copies are then compared with each other to exclude errors.

Short-read platforms

Next-generation sequencing systems that generate reads that are dozens to several hundreds of nucleotides in length, for example, the current Illumina and Thermo Fisher Scientific Ion Torrent platforms and previously manufactured Roche 454 and ABI SOLiD platforms. Current versions sequence amplified polonies, not single molecules.

Long-read platforms

Next-generation sequencing systems that generate reads that are thousands to tens of thousands of nucleotides in length. These currently include Pacific Biosciences (PacBio) and Oxford Nanopore Technologies, which sequence single molecules, not polonies, and therefore have a higher error rate than short-read platforms.

Molecular barcode

Also known as a unique molecular identifier (UMI). A set of DNA nucleotide codes where each is affixed to only one or a subset of individual DNA molecules within a sample. The purpose is to uniquely label single molecules for consensus-based error correction or molecular counting. These may be informatically combined with molecule fragmentation points for greater label diversity.

Index sequence

A particular DNA nucleotide code affixed to all molecules within a given DNA sample that is used for multiplexing samples on a single sequencer run.

Sequencing depth

The number of sequencing reads that include a particular genomic position in their sequence. Some may be simply PCR copies of the same molecule.

Molecular depth

The number of collapsed consensus reads derived from an independent DNA molecule that include a particular genomic position.

Tag clashes

The occurrence of two independent molecules being identically labelled by random chance. This may happen if the diversity of the applied molecular barcodes is too low for the number of DNA molecules sequenced. True mutations may erroneously be excluded.

False families

Sets of related molecules where an error has occurred during amplification that mutates the common tag sequence to erroneously make it appear that two independent molecules gave rise to these molecules.

Consensus-making efficiency

The number of raw sequencing reads that are required to form a consensus read. This typically refers to an average: total raw reads divided by total consensus reads.

Molecular conversion efficiency

The fraction of inputted DNA molecules of interest that are recovered as consensus sequences. This is often described in terms of genome-equivalents.

Aneuploidies

Abnormal numbers of chromosomes in a cell. This may be inherited, such as trisomy 21, the basis of Down syndrome, or somatically acquired, such as in cancer.

Metagenomics

The study of complex microbial populations encompassing many co-mingling species that form an ecosystem, for example, an individual's gut microbiota.

Phasing

The proper assignment of two or more variants at spatially distant genomic locations to the derivative nucleic acid molecule, for example, the maternal or paternal allele.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salk, J., Schmitt, M. & Loeb, L. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 19, 269–285 (2018). https://doi.org/10.1038/nrg.2017.117

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg.2017.117

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer