Main

Rivers are considered the most renewable, most accessible and, hence, most sustainable source of freshwater. Accordingly, several studies have sought to quantify the water in our world’s rivers1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20. Yet, surprisingly little is known about the average and temporal variability of global river water storage, as well as the temporal variability of global river flows. Nearly all estimates of global river water storage1,2,3,5,6,7,13,20 trace back to a report published as part of the UNESCO International Hydrological Decade from 1965 to 1974 (ref. 2), and only one study quantifies temporal variability20. Notably, to our knowledge, there are no previously published time series of global river storage. A recent study mentioned such computations, but it focused on their participation in total terrestrial water storage variability20. Meanwhile, estimates of average global river flows are ubiquitous1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19; however, the temporal variability of flows has received much less attention and reported values have considerable spread14,15,18,19. Consequently, our understanding of global river water storage and of temporal variability of global river flows has so far been limited. A more complete characterization of global historical river discharge and water storage is therefore critical to advancing our understanding of the world’s waters.

In situ stream gauges around the world provide key information on the spatial and temporal distribution of river discharge. However, the spatial and temporal coverage of in situ measurements are severely limited. Gauges are sparsely distributed globally, with placement bias towards specific environmental conditions (for example, large rivers21). Moreover, data sharing constraints across political boundaries, in combination with a worldwide decrease in stream gauge reporting, have further constrained the amount of river discharge observations available for scientific research22,23,24,25,26. Modelling approaches such as river network routing, which at global scales commonly uses gridded runoff from a land surface model (LSM) as input, can be used to seamlessly estimate river discharge and water storage around the world, including in ungauged basins27. Yet, the quality of river water estimates produced from these models is greatly influenced by the resolution of the underlying hydrography28 and by uncertainties present in the input runoff data29. For global river discharge and water storage estimates to be most useful, the uncertainties in these simulations must be constrained by observations28,29. Substantial progress has been made to correct for uncertainties and biases from runoff inputs that influence river discharge outputs in regional, continental and global river network routing28,30,31,32,33. However, the majority of past correction approaches are either computationally expensive31—limiting the geographic extent over which they can be applied—or rely on modelled reference data that contain errors28,30,32. Given that surface and subsurface runoff are not directly measurable at the global scale, discharge observations from in situ stream gauge networks provide the closest proxy. Seeking to combine the strengths of models and in situ observations, one study leveraged runoff fields and observations at river outlets (that is, a ‘hybrid’ approach) to generate global discharge estimates16. However, surprisingly, a global hybrid methodology has yet to be developed that leverages gauges beyond river outlets to produce high-resolution estimates of river discharge that are spatially seamless and match average monthly observations where available. In this Article, we derive such a method and apply it to generate the first globally corrected monthly river flow and storage dataset, which we name Mean Discharge Runoff and Storage (MeanDRS). Using MeanDRS, we quantify total discharge to the ocean, reconciling the wide range in previous estimates of monthly variability in continental flow. Our study also identifies previously underappreciated freshwater sources to the ocean, and produces the first time series of global river water storage, demonstrating residence time as a fundamental driver of river water stores.

Global discharge aided by observations

We used a global database of monthly discharge observations at 998 locations along with monthly runoff outputs from an ensemble of LSMs for 1980–2009 to bias-correct simulated runoff and route it through a recent high-resolution river network containing ~3 million reaches28,30. Our novel bias correction approach, called long-term inverse routing (LTIR; Methods), compares average discharge simulations and average observations to calculate temporally constant multiplicative correction factors for runoff in all river reaches located upstream of available discharge observations. These corrections are then applied at each time step to generate global 30-year (1980–2009) corrected monthly estimates of mean river discharge extrapolated from and consistent with in situ observations. We evaluated the geographic distribution of common discharge metrics (for example, normalized bias, Nash–Sutcliffe efficiency (NSE)) at the monthly time scale (Extended Data Figs. 13) and found that the bias-corrected estimates provided a better statistical match to observed discharge at gauges than uncorrected estimates; 99% of the 998 gauges showed no normalized bias (the difference between simulated and observed temporal average, relative to the observed temporal average), as expected from bias correction (see ‘Model validation’ section in Methods for an explanation of the remaining 1%). Our corrections also led to improvements in other metrics: 55% of gauges showed improvements in normalized standard deviation of error (NSTDERR), 75% in normalized root mean square error (NRMSE) and 75% in NSE (a measure of how well the simulated time series matches the observed). The improvements in simulation metrics are further visually confirmed by discharge time series (see Extended Data Fig. 4 for example hydrographs before and after correction). More details are provided in Methods and Supplementary Information, including independent validation results.

Evaluating the sign and magnitude of our corrections along with the spatial distribution of river discharge estimates globally enables us to understand and visualize the impact of our gauge-based corrections (Fig. 1). Global estimates of river discharge averaged across 30 years of monthly simulations ranged from 0 to 192,683 m3 s−1 across all river reaches for uncorrected simulations (Extended Data Fig. 5). Using our LTIR approach (Methods), we developed multiplicative runoff correction factors for 29% of river reaches globally. The remaining 71% of river reaches were not located upstream of a gauge used in this analysis. Correction factors can be positive or negative. Positive factors between 0 and 1 lead to a decrease in discharge, and those greater than 1 lead to an increase in discharge. Negative correction factors are indicative of hydrological inconsistencies in the gauge network where upstream discharge is greater than downstream discharge, that is, a distinct fingerprint of anthropogenic water withdrawals. Note that the magnitude of hydrological inconsistencies, and therefore withdrawals, is a direct result of gauge observations and can therefore be trusted. Of the correction factors developed, 95% were positive (51% of which led to a decrease in discharge and the remaining 49% to an increase) and 5% were negative. After applying LTIR and then propagating corrections downstream via routing through the river network, 31% of river reaches were impacted globally, which is 918 times more river reaches than gauges (the 998 gauges used in this analysis cover 0.03% of global river reaches). Our average corrected discharge estimates ranged from −12,064 to 195,849 m3 s−1 (Fig. 1a), with negative discharges due to the aforementioned hydrological inconsistencies. These seemingly erroneous negative discharge values are not only justifiable from the mass conservation (that is, water balance) perspective, they also specifically highlight regions of the world characterized by intense water management34,35,36,37, including water withdrawals: the south-western United States, south-eastern Australia and countries in South America (for example, Brazil, Peru and Colombia) and Africa (South Africa, Botswana and Namibia; Fig. 1a). Our methodology can therefore be used to detect severe anthropogenic water withdrawals.

Fig. 1: Global 30-year mean river discharge, corrected using mean observations.
figure 1

a, Corrected global river discharge estimates. Black arrows point to locations that result in negative river discharges, which are indicative of the human footprint on the water cycle. b, The difference between corrected and uncorrected estimates. Green areas indicate positive differences and, therefore, locations where the corrected simulations resulted in greater discharge estimates than uncorrected. Brown areas indicate negative differences and, therefore, locations where the corrected simulations resulted in smaller discharge estimates than uncorrected. Crosshairs in Antarctica indicate no data. Maps created in QGIS, using graticules from Natural Earth.

Differences in 30-year average river discharge between corrected and uncorrected simulations ranged from −20,432 to 63,694 m3 s−1 (Fig. 1b). Positive differences indicate that the average corrected estimates are higher than the average uncorrected estimates, while negative differences indicate that corrected estimates are lower than uncorrected. Most of the world’s river reaches (69%) display no difference between average corrected and uncorrected simulations, largely due to the lack of available gauge data in those locations. Positive differences (15% of river reaches) are located in northern Europe, north-eastern and north-western United States (including Alaska), Canada, Russia, Peru, Bolivia and portions of western Brazil, Argentina, Chile, India, Australia, Japan and New Zealand. Negative differences (16% of river reaches) are located across the southern and central United States, central Canada, eastern Brazil, Argentina, Paraguay, various countries in Africa (for example, South Africa and Democratic Republic of the Congo) and Europe (for example, Spain and Poland), Iran, Australia and India. Note that using even just a few gauges (for example, Africa and Australia; Extended Data Fig. 1) can lead to large differences in our estimates of basin-level discharge (Fig. 1b). Further visualization of our corrections as a percentage of average river discharge confirms spatial consistency in our correction factors for each gauge subbasin, while also highlighting inland subbasins that underwent large positive and negative corrections (for example, south-western United States and Russia; Supplementary Fig. 1).

We expect that the spatial information gained from our corrections will be important for the land surface modelling community to evaluate and calibrate simulated runoff outputs, an otherwise unobservable quantity. The corrected discharge can also provide an estimate of the state-of-the-art in global river discharge, offering future opportunities for comparison with the Surface Water and Ocean Topography (SWOT) mission that has begun such retrievals from space22. SWOT may also show similar evidence of severe water withdrawals—the spatial coverage of space-based estimates from SWOT and sparser ground observations used with LTIR can together be leveraged to document and quantify the human footprint on the water cycle.

Total discharge to the global ocean

Given that continental river discharge into oceans is a key feature of Earth’s water cycle, we summed all discharge values at coastal river termini globally (with the exception of Antarctica) and plotted the monthly time series for 1980–2009 (Fig. 2). After confirming that the ensemble of LSMs outperforms simulations from any single LSM, we found that the temporal average and monthly variability (that is, standard deviation) of water discharged into the ocean from rivers is 37,808 ± 6,704 km3 yr−1 (average ± variability) for uncorrected and 37,411 ± 7,816 km3 yr−1 for corrected simulations. While the LTIR approach results in large differences in discharge at the river reach scale (Fig. 1b), uncorrected and corrected estimates produce similar results when globally summed at the coast (positive and negative differences cancel each other out on spatial aggregate), hence building confidence in the global sum even if some regions did not benefit from corrections due to lacking observations. Both uncorrected and corrected averages of total ocean discharge in our study are consistent with previous estimates in the literature, which range from 29,485 to 45,900 km3 yr−1 (Extended Data Table 1)1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19. These values are also encompassed within our simulated monthly variability (Fig. 2). The topic of monthly variability in total discharge to the ocean has received much less attention, and reported values range from 4,800 to 16,164 km3 yr−1 (refs. 15,18). Other studies did not report variability but include graphical time series that allow making an inference (1,116 km3 yr−1 (ref. 19) and 3,606 km3 yr−1 (ref. 14)). The previously reported and inferred estimates are equivalent to a monthly variability that is between 3% and 45% of the corresponding reported temporal average (Extended Data Table 1). Our own estimates of monthly standard deviations for uncorrected (6,704 km3 yr−1) and corrected (7,816 km3 yr−1) are equivalent to 18% (uncorrected) and 21% (corrected) of the temporal average and therefore are in the middle of the limited number of prior studies. Yet, our comparison of coefficients of variations between our simulations and observations (Extended Data Fig. 3 and Methods) suggests that the magnitude of our discharge variability generally matches observations (where available) and can hence help reconcile the sizable range in temporal variability among prior values.

Fig. 2: Monthly variability in total discharge to the global ocean (except for Antarctica and endorheic basins).
figure 2

Uncorrected and corrected global monthly discharge accumulated into the ocean for 1980–2009. The estimated average is indicated as horizontal coloured dotted (uncorrected) and solid (corrected) lines. An arrow in proportion to the plot units on the left interior of the plot indicates temporal variability (that is, one standard deviation). Previous long-term estimates in the literature (2009 and after) are indicated with a grey line and associated reference on the right exterior. Additional previous estimates in the literature before 2009 are included in Extended Data Table 1.

We also estimated which hydrologic regions contribute most to global discharge and variability (Fig. 3). Our findings are mostly consistent with previous knowledge (Methods); however, we found that the Maritime Continent (Indonesia, Malaysia and Papua New Guinea) discharges 8% of the global total, that is, the equivalent of 1.6 times the Congo River (5%). Little attention has been given to Maritime Continent basins11,16,19 in previous global discharge studies; however, one study found that the islands of Oceania and Southeast Asia are important contributors of ocean discharge, thus supporting our finding16. We suspect the region might have escaped prior scrutiny in part because its largest rivers and streams (including Mahakam, Kapuas, Sepik and Fly) are poorly observed (for example, Fly River is one of the largest rivers in the world, yet it is ungauged). Accounting for these large aggregate water fluxes from the Maritime Continent could impact ocean circulation models and change our understanding of carbon/sediment/solute delivery to the ocean. Note that, in our analysis, the Maritime Continent did not benefit from corrections, and therefore the simulations were not constrained by observations, which may influence the accuracy of estimates. However, additional analysis further supports our finding (Supplementary Text 5)38,39.

Fig. 3: Global ocean discharge by basin.
figure 3

a,b, Corrected estimates of global hydrologic contributors to average (a) and variability (b) of discharge accumulated into the ocean for 1980–2009. Crosshairs in Antarctica indicate no data. Maps created in QGIS, using graticules from Natural Earth.

Implications for river water storage

To provide a global assessment of the spatial and temporal variations of river water storage, we produced monthly estimates for each river reach (including those in endorheic basins), which we summed spatially (Fig. 4) and averaged temporally (Fig. 5). Our estimates of storage depend on river flow wave residence time and on the monthly discharge at each river reach (see equation (15) in Methods). Residence time is computed as river length divided by the speed (that is, celerity) of river flow waves. Wave celerity—despite being unobserved—is perhaps the most fundamental parameter of global river routing models, and a range of values tends to be used at such scale40. We used three characteristic values for propagation speed (that is, celerity) for each river reach with resulting residence times (short, medium and long) to calculate a range of possible storage estimates. Note that our analysis assumes spatial consistency in short, medium and long residence times, while the world’s rivers and streams are likely to include a distribution thereof. This conscious assumption allows drawing lower and upper bounds for potential storage and its temporal variability and, hence, constrain expected values. We found that the global average and monthly variability (standard deviation) of river water storage is 1,246 ± 225 km3 (short residence time), 2,181 ± 394 km3 (medium) and 3,116 ± 564 km3 (long) for uncorrected simulations (Fig. 4). For corrected simulations (also in Fig. 4), storage average and variability are 1,283 ± 288 km3 (short), 2,246 ± 505 km3 (medium) and 3,208 ± 721 km3 (long). Both uncorrected and corrected estimates of average water storage are in the same order of magnitude relative to previous estimates in the literature, which range from 1,200 to 2,858 km3 (refs. 1,2,3,5,6,7,13,20). Our storage estimates by basin (Fig. 5) are also commensurate with the limited number of prior studies. For further evaluation of our estimates, see Supplementary Text 7 (refs. 20,41).

Fig. 4: Global river water storage (except in Antarctica).
figure 4

Uncorrected and corrected global monthly water storage for each flow wave residence time for 1980–2009. The estimated average is indicated as horizontal coloured dotted (uncorrected) and solid (corrected) lines. An arrow in proportion to the plot units on the left interior indicates temporal variability (that is, one standard deviation). Previous long-term estimates in the literature are indicated on each plot with a grey line and associated reference on the right exterior.

Fig. 5: Global river water storage by basin.
figure 5

a,b, Corrected estimates of global hydrologic contributors to average (a) and variability (b) of river water storage for 1980–2009. Values are displayed for the medium flow wave residence time. Crosshairs in Antarctica indicate no data. Maps created in QGIS, using graticules from Natural Earth.

The impact of wave celerity on discharge computations has long been known40. However, given the considerable range we found in mean water storage across residence times (Fig. 4), we suggest that knowledge of flow wave propagation celerity is as critical as river discharge to accurately estimating global river water storage, which has so far not been appreciated. Longer residence times (that is, slower flow wave celerity) lead to larger estimates of mean global river water storage, but also to greater temporal variability (Fig. 4). One of the future challenges in global river water science is therefore bound to be the accurate estimation of residence times, a globally unobserved quantity that is currently determined from empirical equations42. Ongoing satellite measurements of changes in surface water storage22 can be expected to narrow the likely range of variability, thus helping refine understanding of wave propagation in Earth’s rivers, an unanticipated benefit beyond SWOT mission requirements43. In turn, accurate estimates of wave celerity may help support flood warning systems42.

Methods

Hydrography

Several hydrography datasets are available as the river network for global routing. Here, we used the vector-based river network called Multi-Error-Removed Improved Terrain (MERIT) Hydro (v0.7) Basins (v1.0) due to the high spatial resolution of its underlying digital elevation model (~90 m) and to the geographic coverage above 60° N (refs. 28,30,44). MERIT Hydro Basins was derived from the MERIT Hydro digital elevation model by using a 25 km2 channelization threshold, which resulted in ~3 million river reaches and catchments globally, as well as 61 hydrologic regions. The MERIT Hydro Basins dataset also contains derived attributes for each individual reach polyline and each associated catchment polygon (for example, reach length, downstream reach and catchment contributing area).

Runoff estimates

The primary dynamic (that is, time variable) input file for our monthly routing was the lateral inflow into each river reach. To partially alleviate uncertainty in the runoff outputs from different LSMs, we used an ensemble of three LSMs from version 2.0 of the Global Land Data Assimilation System (GLDAS; Supplementary Text 1)45. Specifically, we averaged the sum of monthly surface and subsurface runoff outputs from Variable Infiltration Capacity46,47, Catchment Land Surface Model48 and Noah49,50,51, all of which have a spatial resolution of 1°. The gridded runoff was converted into lateral inflow to each river reach using a centroid-based approach—the centroid of each catchment is used to identify the corresponding LSM grid cell52 before multiplying the runoff by the area of the catchment. The three-model ensemble average inflow was calculated on a monthly time step across the 30-year period of interest for this study, which is January 1980 to December 2009.

Discharge observations

We compiled an extensive global database of in situ gauges (for bias correction and model evaluation) by collecting daily gauge data from a combination of international and national organizations (Supplementary Table 1)53,54,55,56,57,58,59,60,61,62,63,64,65. We removed gauges located within ~100 m of each other, on the basis that the same exact gauge from the same organization can be included in multiple independent databases, hence filtering for duplicates66. In total, 45,837 daily gauge records were collected. We subset the database to only those gauges that had 95% daily availability for the 1980–2009 study period and to those with an average discharge greater than or equal to 100 m3 s−1 (based on the 1980–2009 average; n = 1,148). To identify the locations of river gauges along the MERIT Hydro river network, we mapped the gauges using a distance buffer, as well as order of magnitude and duplicate gauge checks (Supplementary Text 2). The mapping process resulted in a final dataset of 1,001 gauges, three of which did not benefit our corrections due to limitations in our method (see ‘Discharge corrections’ section). Since we are modelling at a monthly temporal resolution, we calculated monthly average discharge for each gauge across the 1980–2009 time period (360 time steps).

Given that potential temporal trends in discharge observations cannot be explicitly accounted for in our long-term correction method (see ‘Discharge corrections’ section), we performed a trend analysis and identified that 39% of the gauges had a statistically significant trend, although of minimal magnitude (Supplementary Text 3). We note that this is a limitation of the methodology.

Discharge estimates

Lateral inflow can be used with the monthly mass conservation equation to determine monthly average flows throughout a river network with r river reaches with limited negative impact of neglecting horizontal transfer times29, as traditionally done through lumped river models as

$$({I}-{N}\;)\times {\mathbf{Q}}={{\mathbf{Q}}}^{{\mathrm{e}}},$$
(1)

where I is the r×r identity matrix, N is the r × r river network matrix (for example, ref. 27), Qe is an r-sized vector of external lateral inflows (e) entering each river reach, and Q is an r-sized vector of river discharge outflows exiting each reach. Given that lumped river models accumulate runoff from upstream to downstream without accounting for horizontal travel time from land to rivers or within the river system, we applied the model on a monthly time step. Lumped routing at this time scale can produce a fair approximation of discharge except for the largest basins of the world; however, it is fair even at the scale of the Colorado and Columbia river basins29. The monthly ensemble average inflow was used as input to the lumped routing model to generate uncorrected ensemble river discharge estimates across the 30 year period.

Discharge corrections

To generate corrected estimates of river discharge, we developed a novel inverse routing algorithm called LTIR that is capable of correcting bias in long-term mean lateral inflow and long-term mean discharge together. Our approach allows for corrections of lateral inflows upstream of gauges while matching observed discharge values, with impacts on discharge estimates both upstream and downstream of gauges.

Extended Data Fig. 6 shows a schematic that summarizes much of our notation for an example river network containing five reaches and two gauges and illustrates the mathematical derivation that follows. We use a river network with r river reaches and g gauges (with g < r), S the g × r observation selector matrix (for example, ref. 45), t for time, \({{\mathbf{Q}}}^{{\mathrm{e}}}\left(t\right)\) an r-sized vector of simulated monthly external lateral water inflows (e) entering upstream of each river reach, Q(t) an r-sized vector of simulated monthly water outflows exiting each reach and q(t) a g-sized vector of monthly discharge observations. Vinculum symbols are used to indicate long-term means; for example, \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\) is the long-term mean of \({{\mathbf{Q}}}^{{\mathrm{e}}}\left(t\right)\). Double-struck symbols are used to indicated corrected quantities; for example, \({\mathbb{Q}}\) is the corrected equivalent to discharge Q.

The long-term continuity equation enforces equality between the downstream outflows \(\overline{{\mathbf{Q}}}\) and the upstream inflows \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}+{{N}}\times \overline{{\mathbf{Q}}}\) (that is, the sum of lateral inflows \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\) and inflows from upstream reaches \({{N}}\times \overline{{\mathbf{Q}}}\)). This is traditionally done through lumped river models, and can be described in matrix–vector form (for example, ref. 29) for both simulated and corrected states as

$$\left\{\begin{array}{c}{({{I}}-{{N}}\;)}^{-1}\times \overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}=\overline{{\mathbf{Q}}}\\ {({{I}}-{{N}}\;)}^{-1}\times \overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}=\overline{{\mathbb{Q}}}\end{array}.\right.$$
(2)

Enforcing that long-term corrected discharge equals observations at gauges leads to

$${{S}}\times \overline{{\mathbb{Q}}}=\overline{{\mathbf{q}}}.$$
(3)

Equations (2) and (3) together give rise to an ‘inverse routing’ problem for which an r-sized vector of corrected lateral inflow \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}\) is the unknown as

$${{S}}\cdot {({{I}}-{{N}}\;)}^{-1}\times \overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}=\overline{{\mathbf{q}}}.$$
(4)

Equation (4) is a g × r linear system with r unknowns, that is, an underdetermined problem with an infinite number of solutions, and one must therefore narrow the mathematical problem down. Because the number of equations is the same as the number of gauges (g), one might first focus on the individual subbasins associated with each one of the gauges. Let \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\) be a g-sized array with long-term means of total lateral inflows for these individual subbasins. As a preliminary step, let \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}{{\upalpha }}}}\) be one of the infinite number of solutions to equation (4) that crudely applies the total lateral inflow of each subbasin solely at river reaches that are home to a gauge such that

$$\overline{{{\mathbb{Q}}}^{{\mathrm{e}}{{\upalpha }}}}={{{S}}}^{{\mathrm{t}}}\times \overline{{{\mathbf{q}}}^{{\mathrm{e}}}},$$
(5)

where St is the transpose of S and the Greek letter superscript is used for incremental versions of corrected estimates for \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}\), with α being the first such estimate. Combining equations (4) and (5) leads to

$$\left({{S}}\times {({{I}}-{{N}})}^{-1}\times {{{S}}}^{{\mathrm{t}}}\right)\times \overline{{{\mathbf{q}}}^{{\mathrm{e}}}}=\overline{{\mathbf{q}}}.$$
(6)

Equation (6) is now a g × g linear system with the g unknowns of \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\), and can therefore be solved. Using n, the g × g matrix describing connectivity among gauges, equation (6) can also be seen as a continuity equation that relates the total lateral inflow within each subbasin to the outflow of each subbasin as

$${({{I}}-{{n}})}^{-1}\times \overline{{{\mathbf{q}}}^{{\mathrm{e}}}}=\overline{{\mathbf{q}}}.$$
(7)

As a result, and while the shape of \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}{{\upalpha }}}}\) was based on a crude assumption, it reveals that the inverse routing problem of equation (4) can be reduced to solve for \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\), the total lateral inflows of each subbasin. This in turn offers multiple avenues for constructing valid options for \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}\) from \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\) at the subbasin level. To do so, one must first understand how the various elements of \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\) get accumulated downstream of a river network for which the connection between subbasins were removed, creating the ‘disconnected’ discharge \(\overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\), valid for both simulated and corrected states as

$$\left\{\begin{array}{c}\overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}={({{I}}-[{{N}}-{{N}}\times {{{S}}}^{{\mathrm{t}}}\times {{S}}])}^{-1}\times \overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\\ \overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}={({{I}}-[{{N}}-{{N}}\times {{{S}}}^{{\mathrm{t}}}\times {{S}}])}^{-1}\times \overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}\end{array}\right.$$
(8)

\(\overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}\) can be seen as the long-term mean of discharge at every reach of a river network where the connectivity downstream of each gauge was removed and with long-term mean inflow \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}}}\). Provided adequate corrections are made, \({{S}}\times \overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}\) (that is, the values of \(\overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}\) at river reaches that have a gauge) should be equal to \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\) (that is, the total lateral inflows for these individual subbasins), thus

$${{S}}\times \overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}=\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}.$$
(9)

We can now look for multiplicative scalars, one per subbasin, stored in a g-sized vector λ and allowing to correct \({S}\times \overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\) into \({{S}}\times {\overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}}\) as

$${\mathbf{\uplambda }}\otimes \left({{S}}\times \overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\right)={{S}}\times \overline{{{\mathbb{Q}}}_{{\mathrm{D}}}},$$
(10)

where is elementwise multiplication. Equations (9) and (10) together allow the computation of λ as

$${\mathbf{\uplambda }}=\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\oslash \left({{S}}\times \overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\right),$$
(11)

where \(\oslash\) is elementwise division. The multiplicative scalars stored for each subbasin in λ can then be applied for each river reach of the relevant subbasin and stored in an r-sized vector Λ by applying the following transformation:

$${\mathbf{\Lambda }}\ominus 1=\left[{{S}}\times {({{I}}-[{{N}}-{{N}}\times {{{S}}}^{{\mathrm{t}}}\times {{S}}])}^{-1}\right]^{{{t}}}\times ({\mathbf{\uplambda }}\ominus 1),$$
(12)

where is elementwise subtraction and used here to ensure that places with no gauge retain their initial value. We can then build a corrected lateral inflow vector \(\overline{{{\mathbb{Q}}}^{{\mathrm{e}}{\mathbf{\upbeta }}}}\), with β being the second version of the corrected estimates, such that the lateral inflow values of each subbasin are proportional to the initial values of \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\) thus

$$\overline{{{\mathbb{Q}}}^{{\mathrm{e}}{\mathbf{\upbeta }}}}={\mathbf{\Lambda }}\otimes \overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}.$$
(13)

Equation (13) is a linear transformation that can equally be applied at the monthly time step as

$${{\mathbb{Q}}}^{{\mathrm{e}}{\mathbf{\upbeta }}}(t)={\mathbf{\Lambda }}\otimes {{\mathbf{Q}}}^{{\mathrm{e}}}(t).$$
(14)

Overall, our inverse routing methodology can hence be summarized in six implementation steps (Extended Data Fig. 7): (1) determine \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\) from \(\overline{{\mathbf{q}}}\) using equation (6), (2) determine \({{S}}\times \overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\) from \(\overline{{{\mathbf{Q}}}^{{\mathrm{e}}}}\) using equation (8), (3) determine λ from \(\overline{{{\mathbf{q}}}^{{\mathrm{e}}}}\) and \(\overline{{{\mathbf{Q}}}_{{\mathrm{D}}}}\) using equation (11), (4) determine Λ from λ using equation (12), (5) computing \({{\mathbb{Q}}}^{{\mathrm{e}}{\mathbf{\upbeta }}}(t)\) from Λ and \({{\mathbf{Q}}}^{{\mathrm{e}}}(t)\) for all monthly time steps using equation (14) and (6) determine \({\mathbb{Q}}(t)\) from \({{\mathbb{Q}}}^{{\mathrm{e}}{\mathbf{\upbeta }}}(t)\) using equation (2).

Note that the design of this methodology is flawed in the cases in which occasional elements of \({{S}}\times \overline{{{\mathbb{Q}}}_{{\mathrm{D}}}}\) are null, that is, when the total lateral inflow within given subbasins is zero, in which case a multiplicative scaling correction is bound to fail. Such a challenge was encountered for three subbasins when using the full 1,001 gauge dataset for correction; three gauges were dropped from the analysis, resulting in a final correction gauge dataset of size 998.

Discharge evaluation metrics

To evaluate the performance of bias correction, we calculated test statistics to compare uncorrected and corrected simulations with observations where in situ observations exist. Before applying the correction to our full gauge dataset, we first ensured independent evaluation by splitting the gauge observation dataset into 70% for calibration (that is, correction; n = 702) and 30% for validation (n = 299; Supplementary Text 4). For each of the calibration, validation and full gauge datasets, we used monthly observations and simulations to calculate normalized absolute bias (NBIAS; absolute value of observed minus simulated; normalized using the mean of the observations), NSTDERR, NRMSE and NSE (a measure of how well the simulated time series matches the observed)67. For the final correction with the full gauge dataset, we also calculate coefficient of variation (CV) for monthly observations, uncorrected simulations and corrected simulations.

Model validation

After validating our correction algorithm with the independent dataset (Supplementary Text 4), we performed the correction with the full gauge dataset (n = 998; see ‘Discharge corrections’ section), and evaluated the model performance at each of the 998 gauges (Extended Data Table 2). The mean/median NBIAS decreased from 0.45/0.28 for uncorrected simulations to 0.00/0.00 for corrected simulations, and 99% of the 998 gauges showed an improvement in NBIAS (Extended Data Fig. 1a). Note that four gauges worsened minimally in NBIAS after correction, which is due to using all simulated monthly discharge values and only available monthly observed discharge to calculate our correction factors but then calculating NBIAS only at time steps in which observations were available. We confirmed that bias went to 0.00 at these gauges when using all time steps to calculate NBIAS rather than only time steps in which observations were available. Reduction in normalized bias led to improvements in the other model test statistics. The mean/median NSTDERR changed from 0.84/0.61 for uncorrected simulations to 0.68/0.63 for corrected simulations (Extended Data Fig. 1b), hence showing very limited impacts of our bias correction on the temporal variability of flow errors. The mean/median NRMSE decreased from 0.99/0.73 for uncorrected simulations to 0.68/0.63 for corrected simulations (Extended Data Fig. 2a), indicating that all of NRMSE was composed of NSTDERR. The mean/median NSE increased from −3.74/0.07 for uncorrected simulations to −0.79/0.38 for corrected simulations (Extended Data Fig. 2b), which indicates that corrected simulations better matched observations than uncorrected simulations; 55% of gauges showed improvements in NSTDERR, 75% in NRMSE and 75% in NSE. Improvements in test statistics can be seen across most of the world, except for portions of north-western and north-eastern United States, western and south-eastern Canada, western Brazil, western Argentina, northern Australia and northern Europe. Deteriorations in NSTDERR were also present in central Europe and New Zealand.

For each gauge location, we calculated the CV and found that the mean/median was 0.76/0.73 for observations, 0.86/0.79 for uncorrected simulations and 0.87/0.79 for corrected simulations. Based on mean/median values, simulations showed slightly greater variability than observations, with minimally greater variability in corrected simulations. After fitting a linear regression enforcing a zero-intercept between simulated CV and observed CV, we found slopes of 0.98 (uncorrected) and 0.99 (corrected) and R2 of 0.84 (uncorrected) and 0.83 (corrected; Extended Data Fig. 3), showing that observed and simulated CV were close to the line of unity (that is, observed CV = simulated CV) and that ~83% of the variance was explained. However, the regression residuals failed the Shapiro–Wilks normality test (uncorrected P value of 1.43 × 10−12; corrected P value <2.2 × 10−16), hence limiting valid inference.

Future studies might consider correcting for NSTDERR in addition to bias, which can be expected to have a positive impact on NRMSE, and probably also on NSE. The broad improvement in bias, NSE and NRMSE reported here—with limited impacts on NSTDERR and on CV—are therefore sufficient for the stated purpose of our study, which is correcting bias and evaluating global discharge and storage in rivers.

Estimates of discharge into the ocean

The uncorrected and corrected river discharge estimates at coastal river termini were used to calculate the global total discharge into the ocean, along with its variability. Coastal river termini were identified by extracting all river reaches with no downstream river reach and then selecting all river reaches within 200 m of the coastline. Using a buffer from the coastline was necessary to remove river termini located in the middle of continents. Global discharge into the ocean was calculated on a monthly time step by summing the discharge for all coastal river termini (n = 48,200). Variability was calculated as the standard deviation of total discharge into the ocean across the time series.

Across the globe, average discharge into the ocean is highest for the Amazon (18% of global discharge for uncorrected and 18% for corrected simulations), South America north of the Amazon (for example, Orinoco, Catatumbo; 6%, 6%), the Congo (6%, 5%) and Ganges–Brahmaputra (5%, 5%) basins (Fig. 3a). Variability in discharge to the ocean is highest in the Amazon, Nile, La Plata and Congo basins (Fig. 3b).

Water storage estimates

Each of Earth’s river reaches can fundamentally be reduced to an individual control volume. At steady state, assuming that water is incompressible and neglecting friction by viscous forces, this control volume follows Bernoulli’s principle. In turn, the river reach becomes ruled by a linear relationship between water storage and water flow, and involving residence time. Assuming that residence times are much shorter than 1 month, such a relationship can be applied at the monthly time scale. Under the same steady-state assumption, the Muskingum method68 for river routing also reduces to a linear storage–discharge relationship:

$${{V}}={{k}}\times {{Q}},$$
(15)

where V is the storage volume and k is the Muskingum time parameter that is known to be related to the celerity—or speed—of flow wave propagation69 (Supplementary Text 6). To calculate Muskingum k, we divided the length of each river reach by a reference celerity for the flow wave of 1 km h−1, and then multiplied that quantity by a scaling factor λ specific to the Muskingum kk)—generating a unique value for each reach and for each residence time characterization. Based on our experience with automated parameter estimation for the Muskingum method (Extended Data Table 3) and that all used the same reference value27,29,52,70,71,72, we used a low (0.20), medium (0.35) and high (0.50) value of λk to calculate three different possible sets of Muskingum k values (for short, medium and long flow wave residence times) associated to each river reach. After confirming that our residence times are indeed much shorter than 1 month (mean/median values are 1.87 h/1.37 h, 3.27 h/2.39 h and 4.67 h/3.41 h for short, medium and long experiments, respectively), we calculated global water storage using each Muskingum k on a monthly time step by summing the storage for all river reaches. Variability was calculated as the standard deviation of river water storage across the monthly time series.

On average, the majority of river water is stored in the Amazon (34% of global river water storage for uncorrected and 38% for corrected simulations), Congo (8%, 6%), Nile (5%, 5%) and South America north of the Amazon (for example, Orinoco, Catatumbo; 5%, 5%) basins (Fig. 5a). Water storage variability is the highest in the Amazon, Nile, Ganges–Brahmaputra, La Plata and South America north of the Amazon basins (Fig. 5b).

Methodological limitations

We believe that there are several avenues for future research and refinement. First, in this study, we focus on the impacts of observations and parameters (that is, flow wave propagation celerity) on estimates of global water storage; however, future work could explore the impacts of multi-model runoff uncertainty by using more LSMs and atmospheric forcings. Second, the correction algorithm creates a multiplicative factor based on the average discharge across the 30-year study period and therefore does not incorporate a constraint on river discharge extremes (high and low flows) or on variability. Future work could focus on correcting the amplitude in addition to the average. Third, an assumption of our approach is that errors in river discharge are attributed solely to errors in runoff. Such errors could also be due to lack of representation of other components of the water cycle (for example, lakes and wetlands) in our routing model; future work could further explore potential errors, particularly with the release of SWOT data. Lastly, human influences on the water cycle (for example, water withdrawals, dams and reservoirs) are not directly incorporated into the routing model, as the only aspect of anthropogenic influence we are looking at is observed water balance. While the gauge observations used for correction and the range of celerity that we use to calculate global river water storage (which were optimized from real case studies) indirectly inform the model of some anthropogenic activities, explicit incorporation of such activities may improve river discharge and storage estimates73.