Abstract
Viruses are crucial in shaping soil microbial functions and ecosystems. However, studies on soil viromes have been limited in both spatial scale and biome coverage. Here we present a comprehensive synthesis of soil virome biogeographic patterns using the Global Soil Virome dataset (GSV) wherein we analysed 1,824 soil metagenomes worldwide, uncovering 80,750 partial genomes of DNA viruses, 96.7% of which are taxonomically unassigned. The biogeography of soil viral diversity and community structure varies across different biomes. Interestingly, the diversity of viruses does not align with microbial diversity and contrasts with it by showing low diversity in forest and shrubland soils. Soil texture and moisture conditions are further corroborated as key factors affecting diversity by our predicted soil viral diversity atlas, revealing higher diversity in humid and subhumid regions. In addition, the binomial degree distribution pattern suggests a random co-occurrence pattern of soil viruses. These findings are essential for elucidating soil viral ecology and for the comprehensive incorporation of viruses into soil ecosystem models.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All GSV sequences, GSV database viral information and map TIFF files can be downloaded from Zenodo at https://zenodo.org/records/10463783. The interactive GSV map is available at https://bmalab.shinyapps.io/global_soil_viromes.
Code availability
Scripts used in this manuscript are available on microbma GitHub under project ‘global soil viromes’ (https://microbma.github.io/project/gsv.html).
References
Emerson, J. B. Soil viruses: a new hope. mSystems 4, e00120-19 (2019).
Guidi, L. et al. Plankton networks driving carbon export in the oligotrophic ocean. Nature 532, 465–470 (2016).
van den Hoogen, J. et al. Soil nematode abundance and functional group composition at a global scale. Nature 572, 194–198 (2019).
Delgado-Baquerizo, M. et al. A global atlas of the dominant bacteria found in soil. Science 359, 320–325 (2018).
Bahram, M. et al. Structure and function of the global topsoil microbiome. Nature 560, 233–237 (2018).
Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell 177, 1109–1123.e14 (2019).
Paez-Espino, D. et al. Uncovering Earth’s virome. Nature 536, 425–430 (2016).
Roux, S. et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 49, D764–D775 (2021).
ter Horst, A. M. et al. Minnesota peat viromes reveal terrestrial and aquatic niche partitioning for local and global viral populations. Microbiome 9, 233 (2021).
Emerson, J. B. et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 3, 870–880 (2018).
Jin, M. et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome 7, 58 (2019).
Han, L.-L. et al. Distribution of soil viruses across China and their potential role in phosphorous metabolism. Environ. Microbiome 17, 6 (2022).
Bi, L. et al. Diversity and potential biogeochemical impacts of viruses in bulk and rhizosphere soils. Environ. Microbiol. 23, 588–599 (2021).
Williamson, K. E., Fuhrmann, J. J., Wommack, K. E. & Radosevich, M. Viruses in soil ecosystems: an unknown quantity within an unexplored territory. Annu. Rev. Virol. 4, 201–219 (2017).
Santos-Medellin, C. et al. Viromes outperform total metagenomes in revealing the spatiotemporal patterns of agricultural soil viral communities. ISME J. 15, 1956–1970 (2021).
Leinonen, R., Sugawara, H. & Shumway, M., the International Nucleotide Sequence Database Collaboration. The Sequence Read Archive. Nucleic Acids Res. 39, D19–D21 (2011).
Trubl, G., Hyman, P., Roux, S. & Abedon, S. T. Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Syst. 4, 23 (2020).
Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021).
Gregory, A. C. et al. The gut virome database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28, 724–740 (2020).
Brister, J. R., Ako-adjei, D., Bao, Y. & Blinkova, O. NCBI Viral Genomes Resource. Nucleic Acids Res. 43, D571–D577 (2015).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Fierer, N. & Jackson, R. B. The diversity and biogeography of soil bacterial communities. Proc. Natl Acad. Sci. USA 103, 626–631 (2006).
Bates, S. T. et al. Examining the global distribution of dominant archaeal populations in soil. ISME J. 5, 908–917 (2011).
Halliday, F. W. & Rohr, J. R. Measuring the shape of the biodiversity–disease relationship across systems reveals new findings and key gaps. Nat. Commun. 10, 5032 (2019).
Declerck, S. A. J., Winter, C., Shurin, J. B., Suttle, C. A. & Matthews, B. Effects of patch connectivity and heterogeneity on metacommunity structure of planktonic bacteria and viruses. ISME J. 7, 533–542 (2013).
Leibold, M. A. & Mikkelson, G. M. Coherence, species turnover, and boundary clumping: elements of meta-community structure. Oikos 97, 237–250 (2002).
Presley, S. J., Higgins, C. L. & Willig, M. R. A comprehensive framework for the evaluation of metacommunity structure. Oikos 119, 908–917 (2010).
Rahman, G. et al. Determination of effect sizes for power analysis for microbiome studies using large microbiome databases. Genes 14, 1239 (2023).
Jansson, J. K. & Wu, R. Soil viral diversity, ecology and climate change. Nat. Rev. Microbiol. 21, 296–311 (2023).
Kimura, M., Jia, Z.-J., Nakayama, N. & Asakawa, S. Ecology of viruses in soils: past, present and future perspectives. Soil Sci. Plant Nutr. 54, 1–32 (2008).
Faust, K. & Raes, J. Microbial interactions: from networks to models. Nat. Rev. Microbiol. 10, 538–550 (2012).
Eisenberg, E. & Levanon, E. Y. Preferential attachment in the protein network evolution. Phys. Rev. Lett. 91, 138701 (2003).
Ma, B. et al. Genetic correlation network prediction of forest soil microbial functional organization. ISME J. 12, 2492–2505 (2018).
Ma, B. et al. Geographic patterns of co-occurrence network topological features for soil microbiota at continental scale in eastern China. ISME J. 10, 1891–1901 (2016).
Ma, B. et al. Earth microbial co-occurrence network reveals interconnection pattern across microbiomes. Microbiome 8, 82 (2020).
Zhou, J. et al. Functional molecular ecological networks. mBio 1, e00169-10 (2010).
Knowles, B. et al. Lytic to temperate switching of viral communities. Nature 531, 466–470 (2016).
Coutinho, F. H. et al. Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat. Commun. 8, 15955 (2017).
Knowles, B. et al. Variability and host density independence in inductions-based estimates of environmental lysogeny. Nat. Microbiol. 2, 17064 (2017).
Crowther, T. W. et al. The global soil community and its influence on biogeochemistry. Science 365, eaav0550 (2019).
Lance, J. C. & Gerba, C. P. Virus movement in soil during saturated and unsaturated flow. Appl. Environ. Microbiol. 47, 335–337 (1984).
Hurst, C. J., Gerba, C. P. & Cech, I. Effects of environmental variables and soil characteristics on virus survival in soil. Appl. Environ. Microbiol. 40, 1067–1079 (1980).
Zhao, B., Zhang, H., Zhang, J. & Jin, Y. Virus adsorption and inactivation in soil as influenced by autochthonous microorganisms and water content. Soil Biol. Biochem. 40, 649–659 (2008).
Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6, 960–970 (2021).
Sakowski, E. G. et al. Interaction dynamics and virus–host range for estuarine actinophages captured by epicPCR. Nat. Microbiol. 6, 630–642 (2021).
Johansen, J. et al. Genome binning of viral entities from bulk metagenomics data. Nat. Commun. 13, 965 (2022).
de Jonge, P. A. et al. Adsorption sequencing as a rapid method to link environmental bacteriophages to hosts. iScience 23, 101439 (2020).
Džunková, M. et al. Defining the human gut host–phage network through single-cell viral tagging. Nat. Microbiol. 4, 2192–2203 (2019).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Kuzyakov, Y. & Mason-Jones, K. Viruses in soil: nano-scale undead drivers of microbial life, biogeochemical turnover and ecosystem functions. Soil Biol. Biochem. 127, 305–317 (2018).
Liao, H. et al. Response of soil viral communities to land use changes. Nat. Commun. 13, 6027 (2022).
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015).
Roux, S. et al. Minimum Information about an Uncultivated Virus Genome (MIUViG). Nat. Biotechnol. 37, 29–37 (2019).
Kim, K.-H. et al. Amplification of uncultured single-stranded DNA viruses from rice paddy soil. Appl. Environ. Microbiol. 74, 5975–5985 (2008).
Guo, J., Vik, D., Pratama, A. A., Roux, S. & Sullivan, M. Viral sequence identification SOP with VirSorter2. protocols.io https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-5qpvoyqebg4o/v3 (2021).
Wang, B. et al. Tackling soil ARG-carrying pathogens with global-scale metagenomics. Adv. Sci. 10, 2301980 (2023).
Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88, 105906 (2021).
Whitman, T. et al. Dynamics of microbial community composition and soil organic carbon mineralization in soil following addition of pyrogenic and fresh organic matter. ISME J. 10, 2918–2930 (2016).
Swenson, T. L., Karaoz, U., Swenson, J. M., Bowen, B. P. & Northen, T. R. Linking soil biology and chemistry in biological soil crust using isolate exometabolomics. Nat. Commun. 9, 19 (2018).
Högfors-Rönnholm, E. et al. Metagenomes and metatranscriptomes from boreal potential and actual acid sulfate soil materials. Sci. Data 6, 207 (2019).
Mackelprang, R. et al. Microbial community structure and functional potential in cultivated and native tallgrass prairie soils of the midwestern United States. Front. Microbiol. 9, 1775 (2018).
Nuccio, E. E. et al. Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J. 14, 999–1014 (2020).
Mushinski, R. M. et al. Nitrogen cycling microbiomes are structured by plant mycorrhizal associations with consequences for nitrogen oxide fluxes in forests. Glob. Change Biol. 27, 1068–1082 (2021).
Ouyang, Y. & Norton, J. M. Short-term nitrogen fertilization affects microbial community composition and nitrogen mineralization functions in an agricultural soil. Appl. Environ. Microbiol. 86, e02278-19 (2020).
Abraham, B. S. et al. Shotgun metagenomic analysis of microbial communities from the Loxahatchee nature preserve in the Florida Everglades. Environ. Microbiome 15, 2 (2020).
Kalyuzhnaya, M. Systems level insights into methane cycling in arid and semi-arid ecosystems via community metagenomics and metatranscriptomics. DOE Data Explorer https://www.osti.gov/dataexplorer/biblio/dataset/1488146 (2015).
Banfield, J. Terabase sequencing for comprehensive genome reconstruction to assess metabolic potential for environmental bioremediation. OSTI.GOV https://www.osti.gov/dataexplorer/biblio/dataset/1487721 (2011).
West-Roberts, J. A. et al. The Chloroflexi supergroup is metabolically diverse and representatives have novel genes for non-photosynthesis based CO2 fixation. Preprint at bioRxiv https://doi.org/10.1101/2021.08.23.457424 (2021).
Kakalia, Z. et al. The Colorado East River Community Observatory data collection. Hydrol. Process. 35, e14243 (2021).
Jun, C., Ban, Y. & Li, S. Open access to Earth land-cover map. Nature 514, 434 (2014).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).
Ren, J. et al. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020).
von Meijenfeldt, F. A. B. et al. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
Paez-Espino, D. et al. IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes. Nucleic Acids Res. 47, D678–D686 (2019).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Rodriguez-R, L. M., Gunturu, S., Tiedje, J. M., Cole, J. R. & Konstantinidis, K. T. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity. mSystems 3, e00039-18 (2018).
Ma, B. et al. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat. Commun. 14, 7318 (2023).
van Dongen, S. M. Graph Clustering by Flow Simulation. PhD thesis, Univ. Utrecht (2000).
Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Bland, C. et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Galiez, C., Siebert, M., Enault, F., Vincent, J. & Söding, J. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 33, 3113–3114 (2017).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Wang, B. et al. Network enhancement as a general method to denoise weighted biological networks. Nat. Commun. 9, 3108 (2018).
Chavent, M., Kuentz-Simonet, V., Liquet, B. & Saracco, J. ClustOfVar: an R package for the clustering of variables. J. Stat. Softw. 50, 1–16 (2012).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2022).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Phillips, H. R. P. et al. Global distribution of earthworm diversity. Science 366, 480–485 (2019).
GDAL/OGR Contributors. GDAL/OGR Geospatial Data Abstraction Library. Open Source Geospatial Foundation https://gdal.org/ (2021).
Tennekes, M. tmap: thematic maps in R. J. Stat. Softw. 84, 1–39 (2018).
Davison, A. C. & Hinkley, D. V. Bootstrap Methods and their Application Ch. 5 (Cambridge Univ. Press, 1997).
Canty, A. & Ripley, B. boot: Bootstrap R (S-Plus) functions. R version 1.3-28.1. CRAN https://CRAN.R-project.org/package=boot (2022).
Ginestet, C. ggplot2: elegant graphics for data analysis. J. R. Stat. Soc. A 174, 245–246 (2011).
Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. dplyr: a grammar of data manipulation. R version 1.1.2. RStudio https://dplyr.tidyverse.org/ (2023).
Wickham, H. Reshaping data with the reshape package. J. Stat. Softw. 21, 1–20 (2007).
Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).
Luo, F., Zhong, J., Yang, Y., Scheuermann, R. H. & Zhou, J. Application of random matrix theory to biological networks. Phys. Lett. A 357, 420–423 (2006).
Bivand, R. & Piras, G. Comparing implementations of estimation methods for spatial econometrics. J. Stat. Softw. 63, 1–36 (2015).
Bivand, R., Hauke, J. & Kossowski, T. Computing the Jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr. Anal. 45, 150–179 (2013).
Dormann, C. F. et al. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30, 609–628 (2007).
Acknowledgements
We thank C. Kelly, C. Averill, D. Buckley, D. Goodheart, D. Duncan, D. Myrold, E. Eloe-Fadrosh, E. Brodie, E. Högfors-Rönnholm, H. Cadillo-Quiroz, J. Tiedje, J. Jansson, J. Norton, J. Blanchard, J. Schweitzer, J. Banfield, J. Gladden, J. Raff, K. Peay, K. Gravuer, K. M. DeAngelis, L. Meredith, M. Kalyuzhnaya, M. Waldrop, N. Fierer, P. Dijkstra, P. Baldrian, S. Theroux, S. Tringe, T. Woyke, T. Whitman, W. Mohn and San Diego State University for permission to use their metagenome data. We also thank Amazon Web Services for providing computing resources. This work was supported by the National Natural Science Foundation of China (grants 41721001, 42090060, 42277283 and 41991334), the Key R&D Program of Zhejiang Province (2023C02004, 2023C02015) and the Fundamental Research Funds for the Central Universities (226-2022-00139).
Author information
Authors and Affiliations
Contributions
B.M. and J.X. created the study design. Y.W., K.Z., X.T., H.D. and R.X. collected all datasets. B.M., Y.W., K.Z., C.T., C.W. and B.D. performed the data analysis and visualization. J.X., B.M., Y.W., E.S., K.Z., X.L., R.X., X.T., R.A.D., Y.-G.Z., Y.Y., L.H. and H.C. contributed to scientific discussion and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks Kyle Meyer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Flow diagram of sample identification.
The arrow delineates sequential steps. There are three main stages: identification, screening and inclusion. The number in each box represents the total number of samples involved in the step.
Extended Data Fig. 2 Bioinformatic Workflow.
The red background highlights the software used along with version specifics. The blue background outlines information on data volumes. Arrows illustrate the order of computational procedures, encompassing (A) prediction of viral contigs from metagenome-assembled contigs, (B) creation of OTU tables and conducting biogeography analyses, (C) clustering of genomes for database comparison (a) and detailing phylogenetic levels (b), (D) assignment of viral taxonomy, (E) identification of temperate phages and (F) determination of host assignment.
Extended Data Fig. 3 Viral information.
Virus validation (a) Density plot of the number of BUSCO hits divided by the total number of genes (BUSCO ratio) for all viruses in GSV dataset. (b) Histogram of the number of GSV vOTUs with different numbers of viral protein family (VFP) hits. Histograms of the number of (c) vOTUs, (d) viral genus-level vOTUs and (e) viral family-level vOTUs present in different percentages of GSV samples. (f) The proportion of genome populations that are putative prophages for this study (GSV), IMG/VR v3 ‘soil only’ metagenomes (IMGsoil), Phages and Integrated Genomes Encapsidated Or Not database (PIGEON), Global Oceans Viromes 2.0 database (GOV2), Gut Virome Database (GVD), Gut Phage Database (GPD) and Viral Refseq v201 (Refseq). (g) Distribution of sequence quality determined by CheckV. (h) Viral contigs sorted by relative abundance and contig length, and those identified at Family level (blue).
Extended Data Fig. 4 Host-virus linkages.
Host-virus network wherein nodes indicate species (hosts; blue) or vOTUs (viruses; bronze); edges indicate a host-virus relationship. A small number of viral nodes were responsible for a large number of host-viral relationships in the virus-host network. Microbial interaction networks often follow a scale-free format in which the majority of connections belong to a small number of nodes. As such, keystone (or hub) nodes enact substantial leverage over the community as a whole.
Extended Data Fig. 5 Assessing the Impact of Sequencing Depth on Diversity Results.
(a & b) Correlations between Shannon index obtained from subsampled reads and those obtained from all reads. Each dot represents a soil metagenome sample that colored by the biome type. The lines denote the predicted values based on the linear mixed model and the shaded areas flanking the lines indicate the upper and lower 95% confidence intervals. The numbers in the lower right corner are the spearman correlation results. (c) Viral Shannon index across varying sequencing depths, with second-order fit for total samples (left upper corner) and for subsamples separated by biomes (upper) and continents (bottom). The lines in the graph represent the predicted values as calculated by the linear mixed model. Surrounding these lines, the shaded regions illustrate the upper and lower bounds of the 95% confidence intervals. (d) Correlation between microbial diversity and viral Shannon index normalized by sample read number (Shannon per Read Count), and each dot represents a soil metagenome sample that colored by the biome type. (e) Median and interquartile ranges for Shannon per Read Count, with whiskers extending to ≤1.5× interquartile range. Significance differences were assessed using one-way ANOVA with LSD test; biomes with different lowercase letters are significantly different at α=0.05; (n = 620 (Agricultural Land), n = 42 (Artificial Surfaces), n = 40 (Bare Land), n = 310 (Wetland), n = 293 (Grassland), n = 56 (Tundra), n = 417 (Forest), n = 21 (Shrubland)). (f) Correlation between microbial diversity and viral Shannon index for samples with sequencing depths ≥100 million reads. (g) Median and interquartile ranges for viral Shannon index at species level for samples with sequencing depths ≥100 million reads, with whiskers extending to ≤1.5× interquartile range. Significance was assessed using one-way ANOVA and LSD tests, with varying lowercase letters marking significant differences at α = 0.05 (n = Same as (e)).
Extended Data Fig. 6 Expanded viral diversity across biomes (including paddy soil and coastal soil).
Median and interquartile ranges for viral Shannon index at species level, with whiskers extending to ≤1.5× interquartile range. Significance differences were assessed using one-way ANOVA with LSD test; biomes with different lowercase letters are significantly different at α = 0.05. The numbers in the figure represent sample sizes (n).
Extended Data Fig. 7 Model validation, accuracy assessment and extent of interpolation across all terrestrial pixels for the 10 environmental covariate layers.
(a) Clustering tree of covariates (main effects circled with a red box). (b) Leave-One-Out cross validation result of the models forecasting viral alpha diversity (Shannon index). Linear regression was used to analyze the relationship between observed and predicted Shannon indices, assuming a two-sided test. (c) Percentage of pixels falling within the convex hulls of the first 5 principal component spaces (covering >80% of the sample space variation collectively). Prediction outliers occurred at latitudinal extremes. The limited sample footprint in equatorial sites, Sahara Desert area, middle Asia and Australia resulted in lower forecast confidence for these regions. (d) Bootstrapped (100 iterations) coefficient of variation (standard deviation divided by the mean predicted value) results represent prediction accuracy of Shannon index. Sampling was stratified by biome. The Shannon predictions had low certainty in Sahara Desert area, middle Asia and areas between the Tropic of Capricorn and the Equator.
Extended Data Fig. 8 Accumulation curves.
Accumulation curves for total samples (left upper corner) and for subsamples separated by biomes (upper) and continents (bottom). The curves depict mean values, and the shaded regions around these curves represent the standard deviation (SD).
Supplementary information
Supplementary Tables 1–7
Supplementary Table 1. Metagenomes used in the GSV dataset. Sample ID (NCBI number, JGI ID), longitude, latitude, biome, sequencing size, continent, data contributor and library strategy and so on. Table 2. Host–virus linkage information. Table 3. The 84 associated environmental factors used to analyse viral biogeography. Table 4. A total of 84 global covariate layers used in model establishment. The 7 Nadir Reflectance Band layers (that is, MCD43A4.005 BRDF-Adjusted Reflectance 16-Day Global 500 m) are grouped as one item in the table. Table 5. Effect size results of 84 global covariate layers, biome, latitude and longitude on ⍺-diversity and β-diversity. Table 6. Overview of metagenomic data size from previous studies (Sheet 1). Analysis of read count thresholds and their implications on viral diversity (Sheet 2). Table 7. Spearman’s correlation results after spatial regression for environmental factors and for viral and microbial community diversity.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, B., Wang, Y., Zhao, K. et al. Biogeographic patterns and drivers of soil viromes. Nat Ecol Evol 8, 717–728 (2024). https://doi.org/10.1038/s41559-024-02347-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41559-024-02347-2