Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer

Abstract

In most cases of sporadic colorectal cancers, tumorigenesis is a multistep process, involving genomic alterations in parallel with morphologic changes. In addition, accumulating evidence suggests that the human gut microbiome is linked to the development of colorectal cancer. Here we performed fecal metagenomic and metabolomic studies on samples from a large cohort of 616 participants who underwent colonoscopy to assess taxonomic and functional characteristics of gut microbiota and metabolites. Microbiome and metabolome shifts were apparent in cases of multiple polypoid adenomas and intramucosal carcinomas, in addition to more advanced lesions. We found two distinct patterns of microbiome elevations. First, the relative abundance of Fusobacterium nucleatum spp. was significantly (P < 0.005) elevated continuously from intramucosal carcinoma to more advanced stages. Second, Atopobium parvulum and Actinomyces odontolyticus, which co-occurred in intramucosal carcinomas, were significantly (P < 0.005) increased only in multiple polypoid adenomas and/or intramucosal carcinomas. Metabolome analyses showed that branched-chain amino acids and phenylalanine were significantly (P < 0.005) increased in intramucosal carcinomas and bile acids, including deoxycholate, were significantly (P < 0.005) elevated in multiple polypoid adenomas and/or intramucosal carcinomas. We identified metagenomic and metabolomic markers to discriminate cases of intramucosal carcinoma from the healthy controls. Our large-cohort multi-omics data indicate that shifts in the microbiome and metabolome occur from the very early stages of the development of colorectal cancer, which is of possible etiological and diagnostic importance.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Global metagenomic and metabolomic characteristics of the fecal samples.
Fig. 2: Distinct stage-specific taxonomic and metabolomic signatures with cancer progression.
Fig. 3: CRC-associated changes in microbial genes summarized in KO genes and KEGG pathway modules.
Fig. 4: Microbial dynamics and their diagnostic potential during multistep CRC progression.

Similar content being viewed by others

Data availability

The raw sequencing data reported in this paper have been deposited in DDBJ Sequence Read Archive (DRA) as DRA006684 and DRA008156.

References

  1. Brenner, H., Kloor, M. & Pox, C. P. Colorectal cancer. Lancet 383, 1490–1502 (2014).

    Article  Google Scholar 

  2. Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990).

    Article  CAS  Google Scholar 

  3. Jones, S. et al. Comparative lesion sequencing provides insights into tumor evolution. Proc. Natl Acad. Sci. USA 105, 4283–4288 (2008).

    Article  CAS  Google Scholar 

  4. Ashktorab, H., Kupfer, S. S., Brim, H. & Carethers, J. M. Racial disparity in gastrointestinal cancer risk. Gastroenterology 153, 910–923 (2017).

    Article  Google Scholar 

  5. Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).

    Article  Google Scholar 

  6. Castellarin, M. et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 22, 299–306 (2012).

    Article  CAS  Google Scholar 

  7. Kostic, A. D. et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 22, 292–298 (2012).

    Article  CAS  Google Scholar 

  8. Yang, Y. et al. Fusobacterium nucleatum increases proliferation of colorectal cancer cells and tumor development in mice by activating Toll-like receptor 4 signaling to nuclear factor-kB, and up-regulating expression of microRNA-21. Gastroenterology 152, 851–866 (2017).

    Article  CAS  Google Scholar 

  9. Louis, P., Hold, G. L. & Flint, H. J. The gut microbiota, bacterial metabolites and colorectal cancer. Nat. Rev. Microbiol. 12, 661–672 (2014).

    Article  CAS  Google Scholar 

  10. Hirayama, A. et al. Quantitative metabolome profiling of colon and stomach cancer microenvironment by capillary electrophoresis time-of-flight mass spectrometry. Cancer Res. 69, 4918–4925 (2009).

    Article  CAS  Google Scholar 

  11. Liao, M. et al. Comparative analyses of fecal microbiota in Chinese isolated Yao population, minority Zhuang and rural Han by 16sRNA sequencing. Sci. Rep. 8, 1142 (2018).

    Article  Google Scholar 

  12. Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).

    Article  CAS  Google Scholar 

  13. Ding, T. & Schloss, P. D. Dynamics and associations of microbial community types across the human body. Nature 509, 357–360 (2014).

    Article  CAS  Google Scholar 

  14. Wong, J. M., de Souza, R., Kendall, C. W., Emam, A. & Jenkins, D. J. Colonic health: fermentation and short chain fatty acids. J. Clin. Gastroenterol. 40, 235–243 (2006).

    Article  CAS  Google Scholar 

  15. Emiola, A. & Oh, J. High throughput in situ metagenomic measurement of bacterial replication at ultra-low sequencing coverage. Nat. Commun. 9, 4956 (2018).

    Article  Google Scholar 

  16. Brestoff, J. R. & Artis, D. Commensal bacteria at the interface of host metabolism and the immune system. Nat. Immunol. 14, 676–684 (2013).

    Article  CAS  Google Scholar 

  17. Zarling, E. J. & Ruchim, M. A. Protein origin of the volatile fatty acids isobutyrate and isovalerate in human stool. J. Lab. Clin. Med. 109, 566–570 (1987).

    CAS  PubMed  Google Scholar 

  18. Devkota, S. et al. Dietary-fat-induced taurocholic acid promotes pathobiont expansion and colitis in Il10 −/− mice. Nature 487, 104–108 (2012).

    Article  CAS  Google Scholar 

  19. Mottawea, W. et al. Altered intestinal microbiota–host mitochondria crosstalk in new onset Crohn’s disease. Nat. Commun. 7, 13419 (2016).

    Article  CAS  Google Scholar 

  20. Xu, H. et al. Isoleucine biosynthesis in Leptospira interrogans serotype lai strain 56601 proceeds via a threonine-independent pathway. J. Bacteriol. 186, 5400–5409 (2004).

    Article  CAS  Google Scholar 

  21. Bui, T. P. et al. Production of butyrate from lysine and the Amadori product fructoselysine by a human gut commensal. Nat. Commun. 6, 10062 (2015).

    Article  CAS  Google Scholar 

  22. Prieto, M. A., Díaz, E. & García, J. L. Molecular characterization of the 4-hydroxyphenylacetate catabolic pathway of Escherichia coli W: engineering a mobile aromatic degradative cluster. J. Bacteriol. 178, 111–120 (1996).

    Article  CAS  Google Scholar 

  23. Teufel, R. et al. Bacterial phenylalanine and phenylacetate catabolic pathway revealed. Proc. Natl Acad. Sci. USA 107, 14390–14395 (2010).

    Article  CAS  Google Scholar 

  24. Russell, W. R. et al. High-protein, reduced-carbohydrate weight-loss diets promote metabolite profiles likely to be detrimental to colonic health. Am. J. Clin. Nutr. 93, 1062–1072 (2011).

    Article  CAS  Google Scholar 

  25. Russell, W. R. et al. Major phenylpropanoid-derived metabolites in the human gut can arise from microbial fermentation of protein. Mol. Nutr. Food Res. 57, 523–535 (2013).

    Article  CAS  Google Scholar 

  26. Windey, K., De Preter, V. & Verbeke, K. Relevance of protein fermentation to gut health. Mol. Nutr. Food Res. 56, 184–196 (2012).

    Article  CAS  Google Scholar 

  27. Attene-Ramos, M. S., Wagner, E. D., Plewa, M. J. & Gaskins, H. R. Evidence that hydrogen sulfide is a genotoxic agent. Mol. Cancer Res. 4, 9–14 (2006).

    Article  CAS  Google Scholar 

  28. Loubinoux, J., Bisson-Boutelliez, C., Miller, N. & Le Faou, A. E. Isolation of the provisionally named Desulfovibrio fairfieldensis from human periodontal pockets. Oral Microbiol. Immunol. 17, 321–323 (2002).

    Article  CAS  Google Scholar 

  29. Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).

    Article  CAS  Google Scholar 

  30. Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78 (2017).

    Article  CAS  Google Scholar 

  31. Bullman, S. et al. Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer. Science 358, 1443–1448 (2017).

    Article  CAS  Google Scholar 

  32. Carbonero, F., Benefiel, A. C., Alizadeh-Ghamsari, A. H. & Gaskins, H. R. Microbial pathways in colonic sulfur metabolism and links with health and disease. Front. Physiol. 3, 448 (2012).

    Article  CAS  Google Scholar 

  33. Könönen, E. & Wade, W. G. Actinomyces and related organisms in human infections. Clin. Microbiol. Rev. 28, 419–442 (2015).

    Article  Google Scholar 

  34. Kasai, C. et al. Comparison of human gut microbiota in control subjects and patients with colorectal carcinoma in adenoma: terminal restriction fragment length polymorphism and next-generation sequencing analyses. Oncol. Rep. 35, 325–333 (2016).

    Article  CAS  Google Scholar 

  35. Bernstein, H., Bernstein, C., Payne, C. M. & Dvorak, K. Bile acids as endogenous etiologic agents in gastrointestinal cancer. World J. Gastroenterol. 15, 3329–3340 (2009).

    Article  CAS  Google Scholar 

  36. Suzuki, K. & Bruce, W. R. Increase by deoxycholic acid of the colonic nuclear damage induced by known carcinogens in C57BL/6J mice. J. Natl Cancer Inst. 76, 1129–1132 (1986).

    CAS  PubMed  Google Scholar 

  37. Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).

    Article  CAS  Google Scholar 

  38. Tahara, T. et al. Fusobacterium in colonic flora and molecular features of colorectal carcinoma. Cancer Res. 74, 1311–1318 (2014).

    Article  CAS  Google Scholar 

  39. Ito, M. et al. Association of Fusobacterium nucleatum with clinical and molecular features in colorectal serrated pathway. Int J. Cancer 137, 1258–1268 (2015).

    Article  CAS  Google Scholar 

  40. Mima, K. et al. Fusobacterium nucleatum in colorectal carcinoma tissue and patient prognosis. Gut 65, 1973–1980 (2016).

    Article  CAS  Google Scholar 

  41. Nishimoto, Y. et al. High stability of faecal microbiome composition in guanidine thiocyanate solution at room temperature and robustness during colonoscopy. Gut 65, 1574–1575 (2016).

    Article  Google Scholar 

  42. Tsugane, S. & Sawada, N. The JPHC study: design and some findings on the typical Japanese diet. Jpn J. Clin. Oncol. 44, 777–782 (2014).

    Article  Google Scholar 

  43. Furet, J. P. et al. Comparative assessment of human and farm animal faecal microbiota using real-time quantitative PCR. FEMS Microbiol. Ecol. 68, 351–362 (2009).

    Article  CAS  Google Scholar 

  44. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  45. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).

    Article  Google Scholar 

  46. Mori, H., Maruyama, T., Yano, M., Yamada, T. & Kurokawa, K. VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing. BMC Syst. Biol. 12, 30 (2018).

    Article  Google Scholar 

  47. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Article  Google Scholar 

  48. Yarza, P. et al. Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses. Syst. Appl. Microbiol. 33, 291–299 (2010).

    Article  CAS  Google Scholar 

  49. Yarza, P. et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).

    Article  CAS  Google Scholar 

  50. Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

    Article  CAS  Google Scholar 

  51. Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    Article  CAS  Google Scholar 

  52. Holmes, I., Harris, K. & Quince, C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE 7, e30126 (2012).

    Article  CAS  Google Scholar 

  53. Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).

    Article  Google Scholar 

  54. Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    Article  CAS  Google Scholar 

  55. Besemer, J. & Borodovsky, M. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 27, 3911–3920 (1999).

    Article  CAS  Google Scholar 

  56. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  57. Soga, T. et al. Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. J. Proteome Res. 2, 488–494 (2003).

    Article  CAS  Google Scholar 

  58. Mishima, E. et al. Evaluation of the impact of gut microbiota on uremic solute accumulation by a CE-TOFMS-based metabolomics approach. Kidney Int. 92, 634–645 (2017).

    Article  CAS  Google Scholar 

  59. Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996).

    Google Scholar 

  60. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

Download references

Acknowledgements

We thank all patients and their families who participated in this study, S. Goto for technical advice and A. Kaya, C. Shima, K. Igarashi, R. Usui, K. Murakami, I. Take, M. Sezawa, M. Iwahara, M. Komori, Z. Nakagawa, Y. Ohara and K. Kamezaki for expert technical assistance. Computations were partially performed on the NIG supercomputer at the ROIS National Institute of Genetics. This work was supported by grants from the National Cancer Center Research and Development Fund (25-A-4 and 28-A-4 to S.Y., S.F and T. Yamada and 29-A-6 to T. Yamada and T. Shibata); Practical Research Project for Rare/Intractable Diseases from the Japan Agency for Medical Research and Development (AMED) (JP18ek0109187 to S.Y., S.M., Y.S., S.F. and T. Yamada; JP18jk0210009 to S.Y. and T. Shibata); AMED-CREST (JP18gm0710003 to S.Y. and T. Soga.); JST (Japan Science and Technology Agency)-PRESTO (JPMJPR1537 to S.F. and JPMJPR1507 to T. Yamada); JSPS (Japan Society for the Promotion of Science) KAKENHI (16H04901, 17H05654 and 18H04805 to S.F.; 16J10135, 142558 and 221S0002 to T. Yamada); Joint Research Project of the Institute of Medical Science, the University of Tokyo (2017–2107 to S.Y. and T. Shibata); Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University (to S.Y.); the Takeda Science Foundation (to S.Y.) and Suzuken Memorial Foundation (H25–2–11 to S.Y.).

Author information

Authors and Affiliations

Authors

Contributions

S.Y., T.N., Y.S., S.F., T. Shibata and T. Yamada contributed to the study concept and design. S.Y., T.N., T. Sakamoto, S.S., M.M., H.T., M.Y., T.M., M.I., T. Yamaji, T. Yachida and Y.S. collected the clinical samples and information. S.Y., F.H., H.R., T. Soga, A.T., Y.O., T.H. and S.F. performed metagenome and metabolome experiments. S.M., H.S., H.W., K.M., Y.N., M.K. and T. Yamada performed bioinformatics analyses. S.Y., S.M., H.S., K.M. and T. Yamada wrote the manuscript. K.K., M.H., H.N. and T. Shibata supervised the study.

Corresponding authors

Correspondence to Shinichi Yachida or Takuji Yamada.

Ethics declarations

Competing interests

S.F. and T. Yamada are founders of Metabologenomics. The company is focused on the design and control of the gut environment for human health. The company had no control over the interpretation, writing or publication of this work. The terms of these arrangements are being managed by Keio University and Tokyo Institute of Technology in accordance with its conflict of interest policies. S.Y., S.F. and T. Yamada are currently applying for a patent (2018–18134/PCT/JP2019/3825, ‘Method for diagnosing the early stage of colorectal cancers based on the gut microbiome and metabolome profiles’).

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended Data

Extended Data Fig. 1 Overview of the study and metagenomic analysis pipeline.

a, Overview of the study. Fecal samples from 616 subjects were used to collect whole-genome shotgun sequencing data, from which functional and taxonomic profiles were generated. Fecal samples from 406 subjects were used to perform CE-TOFMS analysis to generate metabolite profiles. Samples from 347 subjects were available for both the sequencing and CE-TOFMS data analyses. KO, KEGG orthology. b, Flow chart of the pipeline used for the metagenomics analysis. Our metagenomics pipeline consists of three parts, quality control, functional profiling and taxonomic profiling, in which raw reads first undergo a quality control check and are then used to run several analytical steps to finally generate the KEGG orthology gene-based functional and taxonomic profiles.

Extended Data Fig. 2 Clinical information for the study subjects.

a,b, Distribution of age, gender, BMI, Brinkman index and alcohol consumption in 616 subjects with metagenome data (a) and 406 subjects with metabolome data (b). The boxes represent 25th–75th percentiles, black lines indicate the median and whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.

Extended Data Fig. 3 Microbial community structure and human genome content in fecal metagenomes.

a, Change in the fraction of human genome reads out of the total number of raw reads along with colorectal cancer progression. Significant increases (P < 0.005; one-sided Mann–Whitney U test) in the human genome ratio (percentage reads mapped to the human genome) in feces in the MP (n = 67), S0 (n = 73), SI/II (n = 111) and SIII/IV (n = 74) groups are evident compared to the healthy group (H, n = 251). The boxes represent 25–75th percentiles, black lines indicate the median, vertical lines show maximum values within 1.5× the interquartile range and dots indicate outliers beyond the 1.5× interquartile range. b,c, PCAs of genera (n = 251) (b) and metabolites (n = 149) (c) in healthy controls. d, Fitting to the Dirichlet multinomial mixture model indicated optimal classification of fecal metagenomes (n = 616) into four community types. e, Composition of top 30 genera in each of the four community types. fi, Distribution of stages (healthy, MP, S0, SI/II, SIII/IV and HS) (f), tumor locations (right colon, left colon, rectum and double or triple cancers) (g), gender (h) and age (i) in each of the four community types. Patient distribution of the community types are as follows: community type 1, n = 191; community type 2, n = 172; community type 3, n = 129; community type 4, n = 124. The boxes in i represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.

Extended Data Fig. 4 Distributions of tumor locations and gender in the overall structures of metagenomes and metabolomes.

a,b, PCA plots of genus profiles (n = 616) grouped by tumor location (a) and gender (b). c,d, PCA plots of metabolite profiles (n = 406) grouped by tumor location (c) and gender (d).

Extended Data Fig. 5 Comparison of A. parvulum abundance in metagenomic and qPCR analyses.

a,b, Abundance of A. parvulum estimated with whole-genome shotgun metagenomic sequence data (a) and by qPCR using the 16S rRNA gene copy number (b) in samples from 73 patients with S0 CRCs (green) and 73 healthy controls (red). c, Spearman’s correlation coefficient of A. parvulum abundances between the two methods was calculated with P-value computation using asymptotic approximation. d, qPCR demonstrated statistically significant differences in the gene number of A. parvulum between the healthy controls and patients with S0 CRCs (one-sided Mann–Whitney U test). The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.

Extended Data Fig. 6 Replication rates were estimated using GRiD.

Replication rates were plotted for the 20 species shown in Fig. 2. The y axis (GRiD) is defined as the ratio between coverage at the peak (ori) and trough (ter) for the reference bacterial genome. Samples that had sufficient coverage are plotted (coverage > 0.2) for mapping against the reference genomes. Sample numbers varied with dependent species and are indicated in parentheses. P values were calculated using one-sided Mann–Whitney U tests for each of the stages (MP, S0, SI/II, SIII/IV) and were compared to the healthy controls. Significant elevation or depletion are denoted as follows: +++, elevation with P < 0.005; ++, elevation with P < 0.01; +, elevation with P < 0.05; −−−, depletion with P < 0.005; −−, depletion with P < 0.01; −, depletion at P < 0.05. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.

Extended Data Fig. 7 Metabolite changes in colorectal cancer stages.

Sixty-five metabolites with statistically significant (P < 0.005; one-sided Mann–Whitney U test) differences for any of the stages (MP, n = 45; S0, n = 30; SI/II, n = 80; SIII/IV, n = 68 compared to the healthy controls (n = 149) on capillary electrophoresis time of flight mass spectrometer (CE-TOFMS) analysis. Significant changes (elevation and depletion) are denoted as follows: +++, elevation with P < 0.005; ++, elevation with P < 0.01; +, elevation with P < 0.05; −−−, depletion with P< 0.005; −−, depletion with P < 0.01; −, depletion at P < 0.05. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.

Extended Data Fig. 8 Metabolomic changes in the tricarboxylic acid (TCA) pathways and metagenomic changes in amino acid metabolism and other representative pathways.

a, Quantified levels of metabolites involved in the tricarboxylic acid (TCA) pathway. The levels of the three TCA metabolites, succinate, fumarate and malate, were significantly higher P < 0.005; one-sized Mann–Whitney U test) in S0 samples (and SIII/IV samples for fumarate) (+++, P < 0.005; ++, P < 0.01; +, P < 0.05) compared to healthy control samples. It is uncertain what is the cause of accumulation of succinate, fumarate and malate in the feces of patients with early colorectal cancer, despite extremely low concentrations of other TCA intermediates such as 2-oxoglutarate. It is known that some bacteria synthesize ATP using a reverse reaction of succinate dehydrogenase and produce succinate as a byproduct, as part of fumarate respiration, in which fumarate rather than molecular oxygen is used as electron acceptor. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers. The concentration is shown on the y axis (nmol g−1). Healthy (n = 127), MP (n = 45), S0 (n = 30), SI/II (n = 80), SIII/IV (n = 68). N.D., not detected and/or not determined. b, Pathway modules for metabolism types omitted from Fig. 3b. The pathway modules are modified from KEGG pathway maps ‘Alanine, aspartate and glutamate metabolism’, ‘Cysteine and methionine metabolism’, ‘Methane metabolism’ and ‘Arginine and proline metabolism’. ‘Leucine degradation’ is constructed based on leucine metabolism of Clostridium difficile, as the bacterial map is not available in KEGG. For each KO gene, bar plots show KO gene abundances averaged over samples within each of the five groups, healthy (n = 251), MP (n = 67), S0 (n = 73), SI/II (n = 111) and SIII/IV (n = 74) in order of left to right, and colored according to the order of the values. Each KO gene is composed of organism genes represented by circles. The sizes and colors of the circles are proportional to the relative abundances of the organism genes. Organism genes are grouped into one row and indicated by the organism name. The three most abundant organisms in the healthy controls are shown using three letter codes (for example, ova for Oscillibacter valericigenes, kpe for Klebsiella pneumoniae 342). Abbreviations for other organism names can be found in Supplementary Table 4. Gene numbers linked to each of the genes are listed in Supplementary Table 5. Dots in each pathway represent intermediate metabolites. The colors of the boxes of pathway components are marked in red for significant elevation (P < 0.005; one-sided Mann–Whitney U test) for any of the stages (MP, S0, SI/II and SIII/IV) compared to the healthy controls.

Extended Data Fig. 9 Metagenomic and metabolomic potentials as diagnostic markers for early (S0) and advanced (SIII/IV) CRCs.

a, ROC curves as performance evaluation for LASSO logistic regression and random-forest models and to distinguish samples from patients with S0 (left two panels) and SIII/IV (right two panels) CRCs from samples of healthy controls. Models were designed based on species (red), KO genes (blue) or metabolites (black) individually, or a combination (green) of the three features. For the S0 classification, the species-based model used 29 species, the KO gene-based model used 16 KO genes and the metabolite-based model used 24 metabolites. For SIII/IV classification, the species-based model used 55 species, the KO gene-based model used 5 KO genes and the metabolite-based model used 62 metabolites. In the combination models, species, KO gene and metabolite features were selected from the individual models. Classification accuracy was evaluated on the AUC using 10 randomized 10-fold cross-validation testing. For the LASSO logistic regression models, all features that satisfied the abundance thresholds were used to construct both the individual models and the combination model. The discriminant features among all are shown. b, Discriminant features identified from LASSO logistic regression classifiers and random-forest classifiers for distinguishing S0 (n = 27) and SIII/IV (n = 54) cases from the healthy controls (n = 127). The colors of box plots represent significant increases (red) and decreases (light blue) in each group, compared to the healthy controls (P < 0.005; one-sided Mann–Whitney U test). Dark gray boxes indicate features without statistical significance. The x axis shows the percentage contribution of the features to the model in each test (see Methods). The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers. c, Analysis of potential confounding factors that might affect metagenomic and metabolomic classifiers. We analyzed AUCs for factors such as age, gender, BMI, smoking and alcohol exposure. Smoking and alcohol values are indicated as the Brinkman index and the amount of alcohol consumption, respectively. Whereas the gender of the patient and the Brinkman index significantly differed between groups (Supplementary Table 1), neither the random-forest nor the logistic regression model achieved high accuracy.

Extended Data Fig. 10 Correlation between dietary intake and the gut microbiome.

Fusobacterium spp., Akkermansia muciniphila and sulfidogenic bacteria (B. wadsworthia and Pyramidobacter piscolens), which were previously reported to exhibit relationships with dietary intake, were examined for Spearman’s correlation coefficients with dietary fiber (water-soluble, insoluble and total dietary fiber), dietary protein intake (protein and meat), dietary fat (lipid), dietary calcium (calcium), serving of dairy products (milk) and energy intake (energy). +++, correlation with P < 0.005; +, correlation with P < 0.05. Samples that lack dietary data are omitted from the computation of correlation coefficients. Healthy, n = 242; MP, n = 67; S0, n = 72; SI/II, n = 109; SIII/IV, n = 71.

Supplementary information

Supplementary Information

Supplementary Results and Discussion

Reporting Summary

Supplementary Tables

Supplementary Tables 1–15

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yachida, S., Mizutani, S., Shiroma, H. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat Med 25, 968–976 (2019). https://doi.org/10.1038/s41591-019-0458-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-019-0458-7

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology