Main

The continental subsurface is the planet’s largest carbon reservoir1, housing up to 19% of its total biomass2,3 and 99% of its freshwater4. Despite accounting for only 6% of total stores, modern groundwater (the fraction accrued in aquifers over the past 50 years) is the single most significant source of potable water. Carbonate karst aquifers alone are thought to supply people with nearly 10% of their drinking water5. Unfortunately, modern groundwater is also the most vulnerable to anthropogenic and climatic impacts4. While subsurface ecosystems have long fascinated ecologists6, and more recently microbiologists7, accessibility, enormous spatial heterogeneity and challenges in interpreting process rate measurements have obscured a meaningful understanding of their contributions to global biogeochemical cycles8.

The widespread recognition that Earth’s biosphere extends deep into the subsurface occurred only recently9. Historically, carbon supply in such environments was thought to be limited to the trickling of surface-produced organic matter into the shallow subsurface10 or what was stored within sedimentary rocks11. By stark contrast, a wealth of compelling genetic evidence suggests that in situ carbon fixation is critical for sustaining highly diverse microbial metabolic networks in groundwater, in both the shallow and the deep subsurface12,13,14,15,16,17,18,19. Despite the implications of gene-based surveys, the empirically derived activity measurements required to corroborate such inferences, constrain biogeochemical fluxes, understand system dynamics and integrate processes into regional and global models have yet to be reported. Here we report our use of a novel radiocarbon method to derive empirical carbon fixation rates and place them in the context of global groundwater.

Groundwater CO2 fixation rates resemble marine waters

Primary productivity in the shallow subsurface (groundwater wells 5–90 m deep), experimental carbon fixation rates varied from 0.043 ± 0.01 to 0.23 ± 0.10 μgC l−1 d−1 (mean ± 1 standard deviation (s.d.); Fig. 1a, Extended Data Table 1 and Supplementary Information). The ultra-low level 14C-labelling approach developed in this investigation exploits the high sensitivity of accelerator mass spectrometry, thereby minimizing impacts to groundwater hydrochemical equilibria and affording shorter incubation times. This method is particularly useful within a carbonate geological setting, where high dissolved inorganic carbon (DIC) backgrounds and a scarcity of microbes warrant greater sensitivity than is achievable via scintillation-based 14C-labelling approaches. Rates resulting from our new labelling technique probably approximate net primary productivity rather than gross productivity, as has been reported for marine systems20,21, and we further expect them to be conservative estimates for carbon fixation as they consider only contributions from the planktonic portion of the community (Supplementary Information).

Fig. 1: Carbon fixation rates and metagenomic potentials across the aquifer.
figure 1

a, Rates of carbon fixation. Outer error bars depict one standard deviation; inner grey bars delineate standard error of the mean. Rates for well H51 are derived from non-labelled controls (Supplementary Information). Letters denote the results of ANOVA and post hoc Tukey tests (excluding H51). b, Relative importance of the predicted carbon fixation pathways. c, Electron donor sources in each well. Values are averages from the triplicate 0.2 μm filtered fraction metagenome samples. d, Mean dissolved oxygen (DO) concentrations in groundwater collected in summer months (May–September 2010–2018); error bars depict standard deviation. e, Redox-potential measurements from identical time points; error bars depict standard deviation. While groundwater from H52 and H43 exhibited anoxic or hypoxic conditions, the positive redox potentials were due to oligotrophic conditions and mean nitrate concentrations of ~4 mg l–1.

We compared these carbon fixation rates that were measured in groundwater of varying biogeochemical characteristics22 with the only other subsurface 14CO2 assimilation measurements reported: those of a deep (830–1,078 m) groundwater borehole from crystalline bedrock in Sweden23. To do so, we converted the published rates of isotopic incorporation to carbon equivalents, revealing the lower but overlapping range of 0.0095 to 0.0560 μgC l−1 d−1.

To better understand the relevance of the rates measured, we compared them with those of well-documented oligotrophic marine surface waters. Unlike our samples, the carbon fixed in these waters was sourced almost entirely by bacterial photoautotrophs24,25. When compared directly with a comprehensive dataset compiled by ref. 26, our rates overlapped with those of global marine waters at depths to 140 m, equating to roughly 10% of the reported global median for 0–20 m depths (2.65 μg l−1 d−1, interquartile range (IQR) = 1.74, 6.02) and 20% of the median for 20–140 m depths (1.2 μg l−1 d−1, IQR = 0.6, 1.7). Comparisons with the extensively studied Sargasso Sea in the Bermuda Atlantic Timeseries Study (BATS)27,28 and the Hawaiian Oceanographic Timeseries (HOTs)29 datasets yielded similar findings (Fig. 2). Our rate measurements ranged between 3% and 23% of the median reported net primary productivity in the upper euphotic zones (down to ~120 m) and between 20% and 600% of the median of the lower euphotic region (100–120 m).

Fig. 2: Comparison of carbon fixation rates within groundwater and the marine euphotic zone.
figure 2

Violin plots depicting the distribution of carbon fixation rates measured in oligotrophic marine surface waters and groundwater. HOTs, Hawaiian Oceanographic Timeseries (1999, cruises 101–110); BATs, data from 1988 to 2016 for the Bermuda Atlantic Timeseries; Liang, a collated dataset compiled by Liang et al.26; GW, the range of groundwater samples shown in Fig. 1.

We also considered contributions to existing particulate organic carbon (POC) stocks and new carbon inputs per microbial cell count. After normalizing for estimated total bacterial cell numbers, groundwater yielded 0.3–10.8 fg fixed carbon per bacterial cell per day (Extended Data Table 2), which matched estimates of 0.25–12.1 fgC per bacterial cell per day across the marine photic zone (5–150 m). However, groundwater received new daily carbon inputs of only 0.47% ± 0.22% (mean ± s.d.; Extended Data Table 2) of its existing POC, much lower than the marine system’s 2.6% ± 2.9% gain in the lower euphotic zone and 22% ± 18% at the surface30,31. This disparity might stem from the larger recalcitrant fraction of POC in groundwater compared with oligotrophic oceans, which is supported by deviations in 14C and 13C signatures of total groundwater POC concentrations compared with lipid signatures of resident microbes32.

An ecosystem dominated by chemolithoautotrophs

To identify dominant microbial primary producers, a dereplicated and quality-controlled set of 1,224 metagenome-assembled genomes (MAGs) were generated from groundwater samples. Of these, 102 putative chemolithoautotrophs exhibiting at least 50% completion scores for carbon fixation pathways were identified (Fig. 3). Almost exclusively bacterial (101), these MAGs represented 17 distinct phyla, 21 classes and 35 families (Fig. 3 and Supplementary Dataset 1). In some samples, up to 12% of metagenomic reads from a sample could be recruited to these potentially chemolithoautotrophic MAGs (Supplementary Information and Extended Data Figs. 1 and 2). A single archaeal MAG of the family Nitrosopumilaceae encoded gene products for the 4-hydroxybutyrate/3-hydroxypropionate pathway and was not relatively abundant (<5× maximum normalized coverage).

Fig. 3: Phylogeny, relative abundances and transcriptional activities of putative chemolithoautotrophic MAGs.
figure 3

Approximately maximum-likelihood phylogenetic tree based on concatenated single-copy protein alignments for all bacterial MAGs considered. Branches are coloured according to the predicted carbon fixation pathway, and the matching leaf is indicated by a point. Bar charts present average normalized metagenomic coverages within each well from triplicate 0.2 μm filtered fractions. Pie charts show the coverage of mRNA transcripts recruited, normalized to gene number and library size. Asterisks denote MAGs discussed in greater detail in Supplementary Information. The tree is rooted using Patescibacteria (CPR) as an outgroup, indicated by the collapsed grey leaf in the upper left. Node numbers: (1) c_Nitrospiria, (2) c_Thermodesulfovibrionia, (3) o_Nitrospirales, (4) f_Nitrospiraceae, (5) g_Nitrospira_D, (6) g_Nitrospira, (7) p_Nitrospinota, (8) c_Gammaproteobacteria, (9) o_Acidiferrobacterales, (10) f_Sulfurifustaceae, (11) g_SM1-46, (12) f_UBA6901, (13) o_Burkholderiales, (14) f_Nitrosomonadaceae, (15) g_Nitrosomonas, (16) f_SG8-41, (17) f_SG8-39, (18) c_Brocadiae, (19) o_Brocadiales, (20) f_Brocadiaceae and (21) f_Scalinduaceae.

Three chemolithoautotrophic pathways were detected (Fig. 1b): the Calvin–Benson–Bassham (CBB), Wood–Ljungdahl (WL) and reverse tricarboxylic acid (rTCA) cycles were present in 37, 50 and 15 MAGs, respectively. The summed and normalized relative coverages of MAGs equipped with these metabolic pathways aligned with the carbon fixation rates measured in wells H52, H32 and H14 while contrasting with rate data from wells H41 and H43 (Fig. 1, Extended Data Fig. 2 and Supplementary Information). The greatest relative abundances of predicted chemolithoautotrophs were detected in oxic well H41 and anoxic well H52. Anoxic groundwater was dominated by putative sulfur-oxidizing (53% of summed and normalized coverages of all chemolithoautotrophic MAGs) and putative anaerobic ammonium-oxidizing (anammox (10%)) autotrophic microbes, while oxic groundwaters harboured greater abundances of potential nitrifiers (76%; Fig. 1c and Supplementary Information).

Uncharacterized microbes influence CO2 fixation potential

The most abundant putative chemolithoautotrophic populations represented by MAGs generated from anoxic groundwater were of poorly studied and/or uncharacterized microbial lineages. Those most abundant in oxic groundwaters, however, were phylogenetically and metabolically similar to well-characterized microbes (Supplementary Information). In both cases, metabolic reconstructions suggested that dominant subpopulations could access a diverse suite of (in)organic electron acceptors and donors. We mapped previously generated transcripts to these MAGs to confirm the active expression of gene products involved in energy acquisition and carbon fixation. As opposed to the broad distributions posited by deoxyribonucleic acid- (DNA-) based abundances, transcript data revealed far more restrictive ranges in which specific gene products were favoured (Fig. 3). Given their metabolic versatility and the results of previous cultivation-based analyses33, these populations are expected to be mixotrophic (capable of supplementing carbon requirements with available organic matter). Overall, carbon fixation in anoxic groundwater was predicted to be fuelled by reduced sulfur, and there were three highly abundant, putative sulfur-oxidizing MAGs identified, each accounting for >2% of the total metagenomic reads in some samples (100–400× normalized coverages). Diverse reduced sulfur species fuelling these metabolisms are released from pyrite weathering of the karst rock34.

The most abundant MAG (H51-bin250-1) encountered in this study belongs to a deep-branching order, 9FT-COMBO-42-15, of class Nitrospira and is the first representative of class Nitrospiria thought to fix carbon via the WL pathway (Fig. 3, Fig. 4a and Supplementary Information). As there is precedence for autotrophic WL-utilizing bacteria within phylum Nitrospirota, and Candidatus Magnetobacterium was characterized with an equally flexible metabolism, these traits may be more widespread within the phylum than previously thought35. In addition, two MAGs with the potential to couple sulfur oxidation to carbon fixation via the CBB cycle were identified as members of the Sulfurifustaceae family of Proteobacteria (Supplementary Information and Fig. 4b; H32-bin014, H32-bin069). These MAGs recruited tenfold more transcripts than their Nitrospirota counterparts and were among the most transcriptionally active putative chemolithoautotrophic genomes detected (Figs. 3 and 4, Extended Data Fig. 4 and Supplementary Information). With its closest reference genomes Sulfuricaulis limicola and Ca. Muproteobacteria (RIFCSPHIGHO2_12_FULL_60_33), the taxonomic identity of this family is under debate. Per Genome Taxonomy Database (GTDB) classification nomenclature, Muproteobacteria belong to the Sulfurifustaceae family, and members of this family have been posited to oxidize sulfur in both aquatic and terrestrial environments36,37,38.

Fig. 4: Metabolic reconstructions of dominant putatively chemolithoautotrophic MAGs.
figure 4

Bar charts below each metabolic model summarize the average normalized coverage across each sample, scaled proportionally. Values within the bar chart indicate the sum normalized coverage for each MAG. Balloon plots depict normalized transcript coverages for genes affiliated with each pathway. If multiple copies were present, only the most active copy was plotted. The text information over each panel includes the predicted taxonomy (a, Nitrospiria; b, Sulfurifustaceae; c, Brocadiaceae), MAG identifier and estimated % completion/% redundancy. DNRA, dissimilatory nitrate reduction to ammonia.

Planctomycetota MAGs, predicted to couple anaerobic ammonium oxidation to carbon fixation via the WL pathway, exhibited mean transcriptional activities on par with their Sulfurifustaceae MAG counterparts (Figs. 3 and 4c, Extended Data Fig. 5 and Supplementary Information). The elevated transcriptional activity of gene products within the CBB and WL pathways suggests that these taxa play a disproportionately large role in chemolithoautotrophy relative to their DNA-based abundances. Surprisingly, all putative anammox MAGs detected were transcriptionally active in oxic groundwater (Figs. 3 and 4b; wells H41 and H51). Anammox reactions are typically inhibited in the presence of oxygen39, although microbes will still express critically important genes in low oxygen environments40,41.

N-based rate measurements validate carbon fixation rates

To evaluate the relationship between anammox and carbon fixation in anoxic groundwaters, we compared the rates of each in a well harbouring the greatest relative abundance of putative anaerobic ammonium-oxidizing bacteria (well H52). Here, anammox rates of 1.2 ± 0.5 nmol l−1 d−1 N2 were measured, similar to rates in another freshwater aquifer42. Empirical stoichiometric data demonstrate that 1.02 moles of N2 is produced via anaerobic ammonium oxidation for every 0.066 moles of CH2O0.5N0.15 reduced to biomass43. Assuming equivalent stoichiometry, the rate of carbon fixation via anammox in groundwater would be 0.93 ± 0.39 ngC l−1 d−1, more than 200 times lower than the 220 ngC l−1 d−1 measured. This result is corroborated by metagenomic data that suggest the high rate of carbon fixation in anoxic groundwater is more likely driven by reduced sulfur than by reduced nitrogen.

Metagenomic and metatranscriptomic data predicted that nearly all the organic carbon produced under oxic conditions in well H41 would be coupled to nitrification. To test this, we monitored the rate of aerobic ammonium oxidation in this well and recorded a mean production of 125.8 ± 5.9 nmol NO2– + NO3 l−1 d−1. Since the most abundant nitrifiers detected were most closely related to complete ammonium-oxidizing bacteria (Supplementary Information), we based our calculations on the 394 mg protein per mol of ammonia growth yields of Nitrospira inopinata, a comammox organism44. Assuming a cellular composition of C5H7O2N (ref. 45) and 55% protein content, we estimated a rate of 48.5 ± 1.9 ngC l−1 d−1, which was well within the range of error for our measured rate of 43 ± 13 ngC l−1 d−1 and confirms the importance of nitrification for carbon fixation at this site. Furthermore, the stoichiometry determined for oligotrophic marine rTCA nitrifiers46 of 0.0216 mol C/mol N matched our calculated ratio of 0.0276 ± 0.0084, indicating they are responsible for most of the fixed carbon.

Global estimates for groundwater primary productivity

There are an estimated 22.6 million km3 of groundwater on Earth4, 2.26 and 12.66 million km3 of which are housed in carbonate and crystalline aquifers, respectively. If we assume that our average rates accurately represent carbonate groundwater systems, then 0.108 ± 0.069 PgC (mean ± s.d.) is fixed every year in this global ecosystem (Extended Data Table 3). If the values reported from crystalline aquifers23 are representative of this environment, then another 0.15 ± 0.11 PgC would be fixed there annually. Collectively, the net primary productivity of ~66% of the planet’s groundwater reservoirs would total 0.26 PgC yr−1, approximately 0.5% that of marine systems and 0.25% of global NPP estimates47. As these projections exclude the missing contributions from groundwaters within siliciclastic and volcanic geologic settings and activities of attached microorganisms, global contributions to the carbon cycle are expected be many-fold higher.

We showed that conservative estimates of carbon fixation rates in a carbonate aquifer reached 10% of the median rates reported in oligotrophic marine surface waters and six-fold greater than those observed in the lower euphotic zone. Within oxic groundwaters, our carbon fixation method was independently validated by nitrification rate measurements. Normalizing carbon fixation rates by estimated bacterial numbers revealed equivalent carbon input (0.3–12 fgC per cell) for both marine and groundwater systems, despite the fact that daily inputs of new POC were 40 times greater in marine waters. This disparity makes sense since trophic webs are simpler in the subsurface, and the export of organic matter is constrained by long water residence times within the aquifer. Complementary metagenomic analyses revealed that groundwater carbon fixation is not dominated by a single functional guild but rather has contributions from diverse pathways and versatile microorganisms that are setting specific. As the majority of photosynthetically derived carbon in marine systems is labile (half-life <1 day), the findings of this study solicit new hypotheses regarding carbon cycling in the subsurface, particularly those positing newly synthesized carbon rather than surface-derived organic matter as the primary source of fuel for microbiota. Indeed, subsurface primary producers need to be considered as important to ecosystem processes as marine phototrophs are known to be in the surface ocean. Applying these rates of carbon fixation to ecosystem processes alters the way we think about these environments, challenges the importance of surface-derived organic matter fluxes on shallow subsurface functioning and establishes a framework broadly applicable across groundwater systems.

Methods

Site description

Groundwater samples were sampled from the Hainich Critical Zone Exploratory (NW Thuringia, Germany)22,48,49. This aquifer assemblage consists of a multistory fractured system composed of alternating layers of limestone and mudstone that developed along a hillslope of Upper Muschelkalk bedrock22. The primary aquifer, represented in this study by wells H41 and H51, is oxic and lies within the Trochitenkalk Formation (moTK). Primarily suboxic to anoxic, mudstone-dominated overhanging strata lie within the Meissner Formation (moM) and are represented here by wells H14 (moM—substory 1), H32 (moM—5, 6, 7), H43 (moM—8) and H52 (moM—3, 4). Geochemically, H32 and H41 coalesce into a single cluster while each of the other wells represents a distinct regime. Consistent with previous microbiological characterizations, however, each well studied represented a distinct community state50.

14C–DIC incorporation assay

This method, similar to a sensitive methane oxidation technique previously described51, is a modification of traditional 14C–CO2 primary productivity approaches52 predicated on the sensitivity offered by accelerator-based mass spectrometry. Groundwater was collected in July 2020 during sampling campaign PNK130, as described by ref. 19. After approximately three well volumes had been discharged and physicochemical parameters stabilized, groundwater was collected directly into nine pre-sterilized 2 l borosilicate bottles, from the bottom up. Bottles were then overfilled with greater than two volumes and sealed with gas-tight rubber stoppers. Triplicate samples from each well were then subjected to three treatments. A labelling treatment consisted of 6.77 × 10−7 mmol C–NaHCO3 that contained 200 Bq of activity (50 μCi; American Radiolabeled Chemicals) diluted to 9.38 Bq μl–1 with sterilized milliQ water, adjusted to pH 10 and verified using a scintillation counter. An advantage of this 14C technique is that the small amount of tracer added (representing 0.000006% of the total DIC) did not change the substrate concentration or influence conditions such as pH that could affect microbial populations. Kill controls were prepared in the same way, except 10 ml 50% ZnCl2 (w/v; final concentration 36.7 mM) was added to inhibit microbial activity. Unamended groundwater was also used as a control. All bottles were incubated in the dark at near in situ temperature for ~24 hours. Entire volumes were acidified to pH 4 with 3 M HCl, bubbled with N2 for one hour to remove DIC and then filtered through pre-baked (550 °C, eight hours) quartz fibres (47 mm, 0.3 um pore size, Macherey–Nagel QN-10) using pre-baked filter stands (EMD Millipore).

Filters were vacuum dried, sealed in quartz tubes with cupric oxide wire under vacuum and combusted at 900 °C for two hours. Evolved CO2 was purified cryogenically, measured as pressure in a known volume to determine C content and reduced to graphite for measurement by accelerator mass spectrometry at the WM Keck Carbon Cycle Accelerator Mass Spectrometry facility53. From the label incorporation and amount of carbon retained on the filters (Supplementary Data File 2), fixation rates were calculated using equation (1):

The technical variation was at most 3.6% (median = 0.78%) of the biological variation for the 14C measurements and was not considered in standard error of the mean calculations. Standard error of the mean was determined for both the 14C-based measurements (difference between two sets of triplicates, label and control, or label and kill controls) and POC measurements (all nine bottles from each well), separately. These errors were then propagated to yield the final error estimations. Analyses of variance and post hoc Tukey honestly significant difference (HSD) tests were conducted on resulting summary statistics (mean ± s.e.m.) using the following utility54. All 14C enrichment values were calculated using the differences between the 200 Bq-labelled samples and the 200 Bq-labelled kill controls. Rates calculated on the basis of no-label addition controls are presented in Extended Data Table 1. Data from global oligotrophic marine systems were included from Supplementary Data Sheet 126, the Bermuda Atlantic Timeseries years 1988–2016 via FTP27,55 and the Hawaiian Oceanographic Timeseries via FTP56. POC data from both sites were extracted from Dryad datasets generated by refs. 30,31. Bacterial cell number estimates for Hawaiian Oceanographic Timeseries were obtained from the FTP site57.

15N-isotope incubation experiments

Groundwater from wells H41 and H52 was collected in September 2018 and November 2018 to measure nitrification rates and anammox rates, respectively. Briefly, groundwater was collected into sterile glass bottles, from the bottom up, using a sterile tube. Bottles were then overfilled with three volume exchanges and sealed headspace free with silicone septa. Each sample was collected in triplicate alongside one control bottle per well. Samples were kept at 4 °C until they were processed (no more than 2 hours post-collection).

For nitrification measurements, 10 ml was removed from each sampling bottle (total volume 0.5 l) and replaced with N2 to analyse inorganic nitrogen and pH. Groundwater from control bottles was sterile filtered through a 0.2 µm filter (Supor, Pall Corporation). Sterile filtered 15N ammonium sulfate solution (98%, Cambridge Isotope Laboratories), serving as a substrate for ammonia-oxidizing prokaryotes, was then added to a final concentration of 50 µM. Samples were incubated at 15 °C in the dark sans agitation for five days. Ten-millilitre fractions were removed and replaced with N2 at the outset of the experiment and after 12, 24, 48, 70 and 120 hours via filtration through 0.2 µm filters; these fractions were stored at –20 °C for isotopic ratio mass spectrometry analyses. Additional 10 ml fractions were removed at intervals to monitor pH and inorganic nitrogen during the incubation.

For anammox rate measurements, sampling bottles (total volume 1 l) were flushed with N2 under sterile conditions for 30 minutes to remove all remnants of oxygen. Five-millilitre fractions were removed and replaced with N2 from each sample (and control) bottle to assess background 14NH4+ concentrations. Subsequently, samples were spiked with either (1) 50 µM 15NH4+ + 5 µM 14NO2 or (2) 5 µM 15NO2 as previously described58. Control bottles, serving as abiotic controls, were sterile filtered (0.2 µm filters; Supor, Pall Corporation) before flushing and the addition of nitrogen compounds. To facilitate destructive sampling at eight time points, groundwater (30 ml; in triplicate) was dispensed into sterile serum bottles leaving ~8 ml of headspace. Bottles were immediately sealed with butyl septa and crimp sealed and the headspace was purged with He. All bottles were then incubated in the dark at 15 °C sans agitation, and incubations were terminated after 0, 12, 24, 36, 48, 60, 72 and 96 hours by adding 300 µl 50% (v/w) aqueous zinc chloride solution.

Nitrification rates were determined on the basis of 15NO2 + 15NO3 production in incubations with 15NH4+. 15NO2- and 15NO3 were converted to N2 via cadmium reduction followed by a sulfamic acid addition59,60. The N2 produced (14N15N and 15N15N) was analysed on a gas chromatography isotope ratio mass spectrometer as previously described61. Rates were evaluated from the slope of the linear regression of 15N produced with time and corrected for the fraction of the NH4+ pool labelled in the initial substrate pool. The production of 15N-labelled N2 from anammox was analysed on the same isotopic ratio mass spectrometer as for nitrification rates and calculated as described62. Note, denitrification was not detected in any of the 15NO2 incubations. T tests were applied (P < 0.05) to assess whether rates were significantly different from zero (Extended Data Fig. 3).

DNA extraction and sample preparation

Samples used to generate metagenomic libraries were collected in January 2019 during sampling campaign PNK 110. For each sample replicate, approximately 50–100 l of groundwater was filtered sequentially through 0.2-µm- and 0.1-µm-pore-sized polytetrafluoroethylene (PTFE) filters (142 mm, Omnipore Membrane, Merck Millipore; Supplementary Data File 3). With the exception of H32 (which did not yield sufficient volumes), each well was sampled in triplicate. H32 was duplicated using a sample previously collected during campaign PNK108 (November 2018). Filters were frozen on dry ice and stored at –80 °C before extraction. DNA was extracted using a phenol-chloroform-based method, as previously described63, and resulting DNA extracts were purified using a Zymo DNA Clean & Concentrator kit. Metagenome libraries were generated with a NEBNext Ultra II FS DNA library preparation kit, in accordance with manufacturer’s protocols. DNA fragment sizes were estimated using an Agilent Bioanalyzer DNA 7500 instrument with High Sensitivity kits depending on DNA concentrations and recommendations of protocols (Supplementary Data File 3). Sequencing of the 32 samples was performed at the Core DNA Sequencing Facility of the Fritz Lipmann Institute using an Illumina NextSeq 500 system (2 × 150 bp). Resulting metagenomic library sizes ranged from 16.4 to 22.1 Gbp (mean = 19.6 Gbp; Supplementary Data File 3), and raw data were deposited into the European Nucleotide Archive under project PRJEB36523.

Metagenomic assembly and binning

Adaptors were trimmed and raw sequences subjected to quality control processing using BBduk v.38.5164. Assembly and binning were performed as previously described65. Briefly, all libraries were independently assembled into scaffolds using metaSPAdes v.3.1266, all of which were taxonomically classified per ref. 65. For individual assemblies, open reading frames (ORFs) were identified using Prodigal v.2.6.3 in meta mode67. To generate coverage profiles, all quality-assessed and quality-controlled (QAQC) sequences from each of the 32 metagenomic libraries were mapped back to each of the 32 scaffold databases using Bowtie2 v.2.3.4.3 in the sensitive mode68.

Scaffolds were binned using differential coverages and tetranucleotide frequencies with Maxbin269. In addition, ESOM and abawaca70 were used for both manual and automatic binning, based on tetranucleotide sequence signatures, using 3 kbp and 5 kbp or 5 kbp and 10 kbp as minimum scaffold sizes, respectively. DAS Tool71 was used with default parameters to reconcile resulting bin sets. Complete sets of bins from each of the samples were dereplicated using dRep v.2.4.072. All scaffolds, bin assignments, ORF predictions and taxonomic annotations were then imported into Anvi’o v.6.073. Each of the resulting 1,275 bins was manually curated in Anvi’o v.6, considering both coverage and sequence compositions. In the end, 1,224 bins passed the 30% completeness (median = 61%, IQR = (49%,73%)) and 10% redundancy (median = 0%, IQR = (0%,1.4%)) quality thresholds.

Characterizations of the MAGs

ORFs originating from all of the resulting MAGs were annotated using kofamscan74 with the ‘detail’ flag, and KO annotations were filtered using a custom script (https://git.io/JtHVw). This utility preserves hits with scores of at least 80% of the kofamscan defined threshold, as well as those exhibiting a score >100 if there is no threshold. We elected to relax the default thresholds since all MAGs representing putatively chemolithoautotrophic microbes were verified manually, and we noticed that the best reciprocal blast hits with known reference sequences routinely scored below the kofamscan thresholds; that is, we favoured false positives over false negatives since we included a secondary verification step.

KEGGDecoder75 was used to assess the metabolic potential of five of the primary chemolithoautotrophic pathways: the CBB cycle, the WL pathway, the reverse citric acid cycle, the 4-hydroxybutyrate 3-hydroxypropionate pathway and the 3-hydroxypropionate bi-cycle. MAGs were examined in greater depth if a given pathway was >50% complete. The MAGs representing potential chemolithoautotrophs were re-annotated using the online BlastKoala server76 with essential steps verified through blast77 against the RefSeq database. A collection of HMM models was used to determine which form of Rubisco was detected, along with potential hydrogenases37. Using blastp77, dissimilatory bisulfite reductases (dsrAB) were compared with a database compiled by ref. 78 to predict whether the pathway operated in an oxidative or reductive manner. Blast was used to compare gene hits for narGH/nxrAB (nitrate reductase/nitrite oxidoreductase) with a custom database based on sequences presented within ref. 79.

All QAQC reads were remapped to a database consisting of only contigs of dereplicated MAGs. Normalized coverages for each of the MAGs were determined by scaling the resulting Anvi’o-determined coverages on the basis of the number of RNA polymerase B (rpoB) genes identified in the QAQC filtered reads. RpoB sequences were identified using ROCker with the precomputed model80. Scaling factors were calculated by dividing the maximum number of rpoB identified in the 32 metagenomic libraries by the number of rpoB detected in each sample. Reported values represent averages of the triplicates/replicates, unless stated otherwise. The taxonomy of each MAG was evaluated using the GTDB_TK tool kit81 in concert with the Genome Taxonomy Database (release 89)82,83 and its associated utilities67,84,85,86,87,88. Single-copy marker genes were identified and aligned with GTDB_TK for all bacterial MAGs, and a phylogenetic tree of the concatenated alignment was constructed using FastTree2 v.2.1.10 in accordance with the JTT + CAT evolutionary model. The resulting phylogenetic tree was then imported into iToL89 for visualization, and all MAGs were subjected to growth rate index analysis within each metagenomic library90.

Previously generated mRNA-enriched and post-processed metatranscriptomic libraries were procured from project PRJEB2878391. The groundwater source of these metatranscriptomes was collected in August and November 2015. QAQC filtered reads were mapped to MAGs using Bowtie2 v.2.3.5 in sensitive mode68, and the total number of rpoB transcripts from each metatranscriptomic library was determined, as described in the preceding for metagenomes. The transcriptomic coverages for each ORF from each MAG were determined using Anvi’o v.6 and normalized via scaling-factor calculations based on the total number of rpoB reads from the original metatranscriptome library (the coverage of each ORF from each MAG was normalized to a community-wide estimate of the transcriptional activity of a housekeeping gene in each sample). Means were determined considering all of the metatranscriptomes generated from a given well, including different sampling time points. While well H32 was sampled only once, mean values from all other wells account for three to four metatranscriptome coverages each. In addition, an average of the resulting normalized coverages for each MAG from each sample (sum of the MAG transcriptional coverage divided by the number of ORFs) was determined to estimate the relative transcriptional activity of the MAGs across the transect. Data were compiled and processed using R v.3.5.2 with Rstudio v.1.1.46392,93 and the tidyverse package94, and colour schemes were generated using the RColorBrewer utility95. All MAGs were deposited in project PRJEB36505’s data repository.