Introduction

Mathematical modeling has become ubiquitous in the natural sciences, particularly in public health and pharmacology, as a tool to both understand existing data and make projections about the future. Specifically, models have been used to retrospectively analyze experimental data, lending insight into the mechanisms underlying the data and suggesting possible strategies that enhance a desired outcome or limit an undesirable one. More recently, models are being developed and calibrated in tandem with data collection, allowing for model-based predictions to inform future experimental design1,2,3,4,5,6,7.

Excitingly, in some arenas, such model-based predictions are empowering scientists to intervene in real time. One example is in the adaptive treatment of metastatic castration-sensitive prostate cancer, where personalized model-informed treatment strategies were adjusted based on a patient’s past and current prostate-specific antigen levels8. Another example emerged during the COVID-19 pandemic, wherein model-suggested strategies were implemented to mitigate disease spread, which in turn necessitated re-calibrating the model as new data emerged5,9.

One of the key benefits of a good model is the ability to extrapolate from it. However, depending on how the data are structured and when they are collected, a variety of potential models or parameter values can potentially describe the data used to calibrate it. For instance, in ref. 10, the authors selected four different tumor growth models and evaluated the goodness of fit of each model to a set of available data. While the authors did show that some models fit the data better than others, for several data sets, the differences were marginal. Further, if one were to extrapolate tumor growth projections from the various models that well-describe the data, the growth projections would be quite different. Consequently, model-based decisions can vary significantly based on the model used to extrapolate beyond the data. More recently, Harshe and colleagues11 noted that even for a simple model, such as the classic logistic model, multiple data points need to be sampled in order to uniquely parametrize the tumor growth curve, with the number of necessary points increasing with the amount of noise in the collected data.

Furthermore, to find parameters that can enable a model to have predictive utility, one sometimes needs a “critical” piece of data that allows all the other model pieces to fit into place. One such example was shown in12, where the authors used a model that connects drug concentration over time to projected levels of target occupancy (TO) in the tumor microenvironment (TME) and parametrized it using published data for the anti-PD-1 checkpoint inhibitor pembrolizumab13. The authors then used this model to analyze potential criteria for efficacious dose selection for a different compound targeting a co-expressed target TIGIT. They showed the difference between the doses that achieve full TO in the plasma as compared to doses that achieve full TO in the TME, and used the model to re-discover doses that were in fact taken forward in the clinic. A model that does not reasonably capture the actual relationship between plasma concentration of the drug and its level of target engagement in the TME would not have been able to make this assessment.

As these examples show, not all data are created equal. In order to parametrize a model that can enable decision making, it is critical to have both the correct data type12 and a sufficient number of data points11. Unfortunately, not all data can be easily collected due to financial, logistical, or technical reasons, as is the case for invasive procedures, such as biopsies. Therefore, it is particularly important to select appropriate and sufficient data for parametrization if a model is to be used to guide decision making.

Identifiability analysis allows one to rigorously study if model parameters can be uniquely determined from available experimental data. A model is considered structurally identifiable if parameters can be uniquely determined given perfect data1, a condition which, of course, can never be satisfied in practice. However, structural identifiability is a necessary, though not sufficient, condition for practical identifiability, which answers this question in the context of real and noisy data. Practical identifiability analysis has been used to improve experimental design by suggesting measurements that need to be collected to resolve parameter non-identifiability issues given some pre-existing data14. For instance, using a model of tumor spheroid growth under treatment with taxol, practical identifiability analysis revealed that an experiment that measures either the maximum rate of drug-induced death or the drug half-saturation constant is sufficient to resolve parameter non-identifiability issues and consequently to increase confidence in model projections1. In a mathematical model of ligand binding and trafficking, the authors show how in addition to having measurements of extracellular and bound ligand concentrations, the absolute concentration of one pathway species and intracellular ligand concentrations are needed to make model parameters practically identifiable15.

When such an analysis is used to determine the most informative targets and time points for the new measurement, this is referred to as optimal experimental design. Typically, this entails the identification of an additional measurement (or set of measurements) that contains maximal information for a parameter of interest16. For instance, using a model describing lactation in cattle, an assessment of practical identifiability was used to discover four-time samplings across 100 days that provide high information content for estimating model parameters17. In a mathematical model of a gene regulatory network, the most informative experimental conditions were established using a practical identifiability analysis18. It is of note that methods that do not use identifiability analysis have also been employed to select time points for measurement in an optimal way2,19.

Herein, we propose a workflow for utilizing practical identifiability analysis as a tool for experimental planning. The model-informed method seeks to determine both the minimal number of experimental measurements needed for a quantity of interest and when those measurements must be collected, in order to “trust” predictions of a data-calibrated mathematical model. The methodology requires developing a model and using it to create simulated data for a variable of interest such that “complete” simulated data results in the parameters of interest being practically identifiable. We then proceed to find the minimal amount of data needed, and when these data must be collected, to ensure that the parameters remain practically identifiable.

The paper is organized as follows. In the Methods section (found after the Discussion, though we recommend reading it first), we review the well-established profile likelihood method for assessing the practical identifiability of parameters given available data. We also introduce a modification of the classic pharmacokinetic/pharmacodynamic (PKPD) model with a TME compartment, where drug distribution into the tumor, as well as target synthesis in the TME, are not static but are a function of tumor volume. This model, which we calibrate and validate using available experimental data on pembrolizumab, will serve as a case study to demonstrate the execution and applicability of the proposed minimal experimental design methodology. In the Methods section, we also introduce in detail the proposed methodology for model-driven discovery of a minimally sufficient experimental design. In the Results section, we use the proposed methodology to recommend a minimal protocol for collecting data on the experimental variable of interest (in this case, percent target occupancy in the TME, which is assumed to drive efficacy). In the Discussion section, we address both the challenges in employing the method and the benefits of this iterative experimentation-modeling approach for the efficient and robust design of experiments.

Results

As summarized in Fig. 1, we developed a modified version of a two-compartment site-of-action model which describes the drug concentration of the immune checkpoint inhibitor pembrolizumab over time in the central (plasma), peripheral (tissue), and TME compartments. Model assumptions, structure, and parameter values are presented in the “Case Study: Site-of-Action Model Parametrized for Pembrolizumab” subsection of the Methods and Table 1 therein. Figure 2 shows the calibration of the model parameters to pembrolizumab PK data reported in ref. 13 and tumor growth inhibition (TGI) data reported in ref. 20, along with the validation of the model on untrained data.

Fig. 1: Schematic diagram of model described by system (3).
figure 1

Drug distributes from the central compartment to the peripheral compartment and the TME, which is the site of action.

Table 1 Parameter values used in system (2).
Fig. 2: Model calibration and validation.
figure 2

A, B Calibrated fits of model in System (3) to A PK curves for three doses of 10 mg/kg of pembrolizumab given weekly13, B TGI data for five doses of 10 mg/kg of pembrolizumab given on average every 3.5 days20. C Corresponding projected levels of %TO in plasma and the TME. D, E Validation of model in System (3) on untrained data. D PK curves for three doses of 1 mg/kg of pembrolizumab given weekly13, E TGI data for five doses of 2 mg/kg of pembrolizumab given on average every 3.5 days20. F Percent target occupancy in TME data, with data digitized from (Lindauer et al.13).

Here, we apply the workflow proposed in the “Designing Minimally Sufficient Experimental Protocol” subsection of the Methods, summarized in Fig. 3, that uses the profile likelihood method to design a minimally sufficient experimental protocol. In particular, we determine how much experimental data on percent target occupancy in the TME is needed to confidently connect PK in the plasma to PD in the TME. That is, what is the minimal experimental protocol that would give us confidence in model predictions? To achieve this goal, we follow the steps set forth in the Methods :

  1. 1.

    Identify experiment that measures the variable of interest. Here, the variable of interest is percent target occupancy in the TME. This is defined in System (3) as \(T{O}_{{TME}}\).

  2. 2.

    Model development, parameterization, and validation. We use the validated model in System (3), parameterized using values specified in Table 1.

  3. 3.

    Select parameters of interest. We determined that the parameters to be varied in our analyses are the rate of drug-target complex formation in TME, \({k}_{{onT}}\), and the rate of target synthesis in TME, \({k}_{{synt}}\). We arrived at this choice by first removing parameters from consideration that are relatively easy to measure experimentally. This eliminated the PK parameters associated with pembrolizumab, which are readily calibrated using drug concentration data. The intrinsic tumor growth rate \(r\) was removed from consideration, as it can be readily measured from control growth experiments. We also removed the maximum rate of tumor kill by pembrolizumab \(d\) and the percent target occupancy in the TME that results in 50% kill by pembrolizumab \({{TO}}_{50}\), as these values could conceivably be estimated through in vitro cell kill assays. The local sensitivity of the remaining parameters was then assessed (Supplementary Fig. 1). The four most sensitive parameters, in order, for fitting both the simulated %TO in the TME data and the TGI data (pembrolizumab administered on average every 3.5 days at a dose of 10 mg/kg) are: \({k}_{{synt}}\), \({K}_{x}\), \({k}_{1T}\), \({k}_{{onT}}\). We chose \({k}_{{synt}}\) and \({k}_{{onT}}\) for our analyses, as these parameters are unlikely to be estimated using any available experimental techniques but have proven to be critical in prior work that analyzed a structurally similar model12.

  4. 4.

    Generate simulated data for the measurement of interest. Here, we assume that both parameters of interest are normally distributed with a mean equal to the calibrated value of the parameter in Table 1. To get the desired variability in the simulated data, we choose the standard deviation for \({k}_{{onT}}\) to be one fifth of its mean value (resulting in \({k}_{{onT}}{\mathscr{ \sim }}{\mathscr{N}}\left(\mathrm{0.01,4}\times {10}^{-6}\right)\)) and the standard deviation for \({k}_{{synt}}\) to be one twentieth of its mean value (resulting in \({k}_{{synt}}{\mathscr{ \sim }}{\mathscr{N}}\left(\mathrm{14190,503390.25}\right)\)). We generate \(K_{p}=10\) random samplings of these distributions and extract the model-predicted value of percent target occupancy in the TME at each day over a one-month period. That is, for \(k=1,\ldots ,10\) we compute \({Y}_{k}\left(\bar{{\boldsymbol{t}}}\right)={{TO}}_{{TME}}^{k}\left(\bar{{\boldsymbol{t}}}\right)\), where \(\bar{{\boldsymbol{t}}}=\left(\mathrm{0,1},\ldots ,{t}_{f}\right)\), \({t}_{f}=30\) days. The resulting simulated percent target occupancy in the TME data are shown in Fig. 4A.

  5. 5.

    Practical identifiability using the data from “complete” simulated experiments. Figure 4B, C shows that both \({k}_{{onT}}\) and \({k}_{{synt}}\) are practically identifiable when %TO in the TME is measured daily, given the parabolic shape of their profile likelihood curves.

Fig. 3: Flowchart of the proposed method of using practical identifiability for model-driven minimal experimental design.
figure 3

This details how to progress from a variable (experiment) of interest, through model design and analysis, to the final step of making a minimal experimental design recommendation.

Fig. 4: Simulated percent target occupancy data and parameter identifiability given this “complete” simulated data.
figure 4

A shows 10 simulated data sets of percent target occupancy in the TME, generated using model parameters in Table 1 under assumptions \({k}_{{onT}}{\mathscr{ \sim }}{\mathscr{N}}\left(\mathrm{0.01,4}\times {10}^{-6}\right)\) and \({k}_{{synt}}{\mathscr{ \sim }}{\mathscr{N}}\left(\mathrm{14190,503390.25}\right)\). Profile likelihood curve for B \({k}_{{onT}}\) and C \({k}_{{synt}}\) when percent target occupancy in the TME is available every day (markers shown in A) indicate that both parameters are practically identifiable in this “complete” experimental scenario.

As we have verified that the parameter set \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\) is practically identifiable when percent target occupancy in the TME data is collected every day for the duration of the experiment (complete data), we are ready to move to Step 6 of the method to search for the minimal number of data points that ensures practical identifiability of the parameters in \({\boldsymbol{p}}\).

Insufficient predictability with a 1-day experimental protocol

We first sought to determine if the identifiability of \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\) can be ensured with only a single measurement of percent target occupancy in the TME. To achieve this goal, we computed the profile likelihood curves for both \({k}_{{onT}}\) and \({k}_{{synt}}\) and examined whether practical identifiability is ensured if we only had a single time point available from the simulated dataset. If a single measurement is sufficient, we can also determine if the timing of the measurement (that is, what day it is taken) matters for preserving identifiability. The results of this analysis are shown in Fig. 5.

Fig. 5: Profile likelihood curves using only a single experimental measurement of percent target occupancy in the TME.
figure 5

A displays all \({k}_{{onT}}\) profiles and B displays all \({k}_{{synt}}\) profiles. The color indicates which day (from day 1 to 30) the single experimental measurement was “collected”. The solid black curve is the profile likelihood curve for the specified parameter when complete data (that is, daily measurements) are used, and the black dashed line is the corresponding 95% confidence threshold.

In all cases, we find that \({k}_{{onT}}\) is not practically identifiable over the domain of interest. In fact, with one exception, the profiles appear flat (structurally non-identifiable) on a linear scale, though viewing the parameters on a log scale does reveal the profiles are not completely flat (see Supplementary Fig. 2A). If the experiment were conducted on day 1, \({k}_{{onT}}\) would clearly be structurally identifiable, though it is still not practically identifiable. In contrast, we find that generally \({k}_{{synt}}\) is structurally identifiable, as the profiles are not flat. However, the parameter does not achieve practical identifiability independent of when the single measurement of target occupancy in the TME is taken. This is evident in Fig. 5B as none of the profiles exceed the 95% confidence threshold as we both decrease and increase the parameter from its global minimum value.

These profiles lead us to conclude that a single experimental measurement of percent target occupancy in the TME is insufficient to confidently identify the values of either \({k}_{{onT}}\) (rate of complex formation in the TME) or \({k}_{{synt}}\) (rate of target synthesis in the TME). In other words, the practical identifiability of these parameters is lost (compared to the case of complete data) if we only have a single measurement of %TO in the TME, regardless of when that measurement is collected.

It is known that “parameter unidentifiability can lead to erroneous conclusions in the model inferences, predictions, and parameter estimates”1. For this reason, we next explored the consequences of parameter non-identifiability on the proposed model’s predictive abilities. We do this by comparing the projected tumor volume over a parameter’s 95% confidence interval to the actual tumor volume in the experimental data to which the model was calibrated. These parameter sets are found by considering each profile likelihood curve generated from the experimental collection day of interest and identifying each value of the profiled parameter that falls below its 95% confidence threshold (see Supplementary Fig. 2A, C). For each such parameter value, pairing it with the corresponding best-fit value of the non-profiled parameter (see Supplementary Fig. 2B, D) forms what we call a “plausible parameter set” for the model, given the available data. We do note that other methods have been proposed for identifying the prediction confidence interval, including the prediction profile likelihood approach21.

As shown in Fig. 6A, when percent target occupancy in the TME is collected early (day 1), the range of predicted tumor volumes over its plausible parameter set is very large: the tumor is predicted to be anywhere from eradicated to more than 34 times the initial volume by day 30. Although the experimental data are all contained within these bounds, the model essentially has no predictive abilities if our single percent target occupancy in the TME data point is collected early, as the model cannot even infer whether the tumor volume decreases or increases due to treatment.

Fig. 6: Predicted tumor volume over plausible parameter sets using a single measurement of percent target occupancy.
figure 6

Shaded blue region indicates the model-predicted range for the tumor volume when percent target occupancy in the TME is only measured at A the beginning of week 1 (day 1), B the beginning of week 3 (day 15), and C the beginning of week 5 (day 29). The true experimental measurement for tumor volume from ref. 20 is indicated with *s.

If instead percent target occupancy in the TME was measured at an intermediate time (day 15), the range of predicted tumor volumes over the plausible parameter set narrows significantly, as shown in Fig. 6B. While a narrower predicted range is desirable, most of the experimental data do not lie within the model-predicted range for tumor volume. Thus, collecting percent target occupancy in the TME at an intermediate time point is also problematic. The best-case scenario occurs when the single measurement is collected near the end of the one-month period (Fig. 6C). In this case, all plausible parameter sets predict that the tumor volume increases during and after treatment, and the experimental data are largely contained within the predicted range. That said, the tumor size at the end of the month is predicted to be anywhere from 441–1700 mm3. This large variation still highlights the problematic nature of using the model to predict treatment response when only a single measurement of percent target occupancy in the TME is available to calibrate model parameters.

In conclusion, we find that collecting a single measurement of percent target occupancy in the TME is insufficient for calibrating the model described in System (3) that bridges PK and PD components using a TME compartment. If we are in a situation where only a single measurement can be obtained, the experimental design recommendation is to collect that measurement as late in the month as possible. However, this experimental design is far from ideal, as the unidentifiable parameters greatly limit our confidence in the model’s predictions.

Non-robust predictability over a subset of 2-day experimental protocols

We next sought to determine if the identifiability of \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\) can be ensured using two measurements of percent target occupancy in the TME. Thus, we computed the profile likelihood curves for \({k}_{{onT}}\) and \({k}_{{synt}}\) when experimental data is collected at days \(\left({t}_{1},{t}_{2}\right),0 \,< \,{t}_{1} < {t}_{2}\le 30\). The resulting \({{30}\choose{2}}=\,435\) profile likelihood curves for each parameter are shown in Supplementary Fig. 3. We find that the structural non-identifiability issues that \({k}_{{onT}}\) had using only a single measurement have been resolved, as none of the profiles are flat over their domain (Supplementary Fig. 3A). Further, it appears that some profiles for both \({k}_{{onT}}\) and \({k}_{{synt}}\) cross over the 95% confidence threshold, suggesting that some experiments that collect two measurements of percent target occupancy in the TME ensure practical identifiability.

To further investigate this, we classified all possible 2-day protocols for collecting %TO in TME at days \(\left({t}_{1},{t}_{2}\right)\) by whether they result in both, one of, or none of \({k}_{{onT}}\) and \({k}_{{synt}}\) being practically identifiable (Fig. 7A). We find that only six of the 435 possible 2-day protocols result in both parameters being practically identifiable: \(\left({t}_{1},{t}_{2}\right)=(\mathrm{1,3}),(\mathrm{3,4}),(\mathrm{4,6}),(\mathrm{4,7}),(\mathrm{4,10}),(\mathrm{4,13})\). The profiles for \({k}_{{onT}}\) and \({k}_{{synt}}\) corresponding to these 2-day protocols are shown in Fig. 7B, C. Those profiles all have global minimum values that are very close to the true best-fit value of the parameter if complete experimental data were used (that is, if the percent target occupancy in the TME were collected daily). Thus, if these data were collected using one of the six identifiable 2-day protocols, we can confidently estimate the value of parameters \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\).

Fig. 7: Exploration of two-day experimental design protocols.
figure 7

A Classification of all possible 2-day protocols \(\left({t}_{1},{t}_{2}\right)\) by whether they result in both (blue circles), one (red diamonds) or none (orange squares) of the parameters being practically identifiable. The profile likelihood curves using all 2-day protocols that result in practically identifiable parameters are shown in B for \({k}_{{onT}}\) and C for \({k}_{{synt}}\). DF Blue region shows the range for the predicted tumor volume over the plausible parameter sets for three of the six 2-day experimental protocols that correspond to practically identifiable parameters. \(\left({t}_{1},{t}_{2}\right)=(\mathrm{1,3})\) in D, \((\mathrm{3,4})\) in E and \((\mathrm{4,6})\) in F. Note the y scale is the same as in Fig. 6 to clearly highlight the predictive differences when using protocols that do not result in parameter identifiable, versus protocols that do.

We next explore the validity of model TGI predictions using the 2-day experimental protocols which resulted in practically identifiable parameters. As with the 1-day protocols, we do this by simulating the predicted tumor volume over the “plausible parameter sets”. However, this time, we restrict ourselves to considering the 2-day protocols that correspond to both parameters being practically identifiable. As shown in Fig. 7D–F, the predicted tumor region is highly constrained, and most of the experimental data lay within this model-predicted range. This demonstrates that the model can well predict the experimental tumor volume measurements, provided one of the identified 2-day protocols was used to collect percent target occupancy in the TME.

Notably, even though our approach has identified six 2-day experimental protocols that achieve our goal of preserving the identifiability of \({k}_{{onT}}\) and \({k}_{{synt}}\) (and thus result in confident model predictions of TGI), we are disinclined to recommend designing an experiment based around them, as the identified protocols are not robust. To detail, if we perturb any of the recommended 2-day protocols by a single day (that is, if we consider \(\left({t}_{1}\pm 1,{t}_{2}\right),\left({t}_{1},{t}_{2}\pm 1\right)\) for any of the recommended protocols), only two of the perturbed protocols ensure the practical identifiability of both parameters. This can be clearly seen in Fig. 7A, which shows that experiments with both parameters classified as practically identifiable (blue circles) generally have neighboring points for which either none (orange squares) or one (red diamonds) of the parameters are identifiable. Thus, even though there do exist protocols that collect percent target occupancy in the TME at two days that can well-inform model parametrization, we recommend against this experimental design given its sensitivity to the precise timing for when the data must be collected.

Robust identifiability over a subset of 3-day experimental protocols

Given the lack of robustness of the discovered 2-day protocols, we next sought to determine if identifiability of \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\) can be ensured using three measurements of percent target occupancy in the TME. There are \({{30}\choose{2}}=4060\) such 3-day protocols. Generating the profile likelihood for each of these protocols becomes computationally expensive, even in our scenario where only two parameters are being profiled. Thus, rather than generate profile likelihood curves for all 4060 protocols, we instead select a random third of the protocols for our analysis and assume that this sampling is sufficient to understand the entirety of the 3-day protocol space. The sufficiency of analyzing only a fraction of the possible protocols is verified below (Fig. 8E, F). Removing any non-unique random samplings resulted in the consideration of 1126 3-day protocols. The profile likelihood curves corresponding to each of these 3-day protocols are found in Supplementary Fig. 4. We observe that a number of these protocols appear practically identifiable.

Fig. 8: Exploration of three-day experimental design protocols.
figure 8

A Classification of a random sampling of 3-day protocols \(\left({t}_{1},{t}_{2},{t}_{3}\right)\) by whether they result in both parameters being practically identifiable (blue circles) or not (red circles). The profile likelihood curves from the randomly sampled 3-day protocols that result in practically identifiable parameters are shown in B for \({k}_{{onT}}\) and C for \({k}_{{synt}}\). D Projection of plot in A with \({t}_{1}=4\) and \({t}_{2}\ge 5\). E Further narrowing the protocol space to the region \({P}\) defined in Eq. (1) results in all sampled protocols contained in \(P\) being practically identifiable. F Testing all protocols in \(P\), not just randomly sampled ones, confirms that any protocol in \(P\) results in both parameters being practically identifiable.

To investigate further, we classified all possible 3-day protocols \(\left({t}_{1},{t}_{2},{t}_{3}\right)\) by whether they result in both \({k}_{{onT}}\) and \({k}_{{synt}}\) being practically identifiable (Fig. 8A). We find that 80 of the 1126 3-day protocols considered result in both model parameters being practically identifiable. The profiles for \({k}_{{onT}}\) and \({k}_{{synt}}\) corresponding to these 3-day protocols are shown in Fig. 8B, C. Those profiles all have global minimum values that are very close to the true best-fit value of the parameter if complete experimental data were used (that is, if the percent target occupancy in the TME were collected daily). Thus, if these data were collected using any of the 80 identifiable 3-day protocols, we can confidently estimate the value of parameters \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\).

An analysis of the data in Fig. 8A shows that 73.75% of the 3-day protocols that correspond to practically identifiable parameters (59 of 80) collect the first measurement of percent target occupancy in the TME at day \({t}_{1}=4\). Further, the second measurement must be taken after the first, but by day \({t}_{2}=16\) if we want to ensure parameter identifiability. Fixing \({t}_{1}=4\) in protocol space allows us to project our results into a two-dimension plot in \({t}_{2}-{t}_{3}\) space (Fig. 8D, where we also impose \(5\le {t}_{2}\le 16\)). This region contains 65 of our tested 3-day protocols, 90.77% of which are practically identifiable. Thus, an experimental protocol with \({t}_{1}=4,5\le {t}_{2}\le 16,{t}_{3} > {t}_{2}\) is very likely to result in identifiability of both parameters, but it is not guaranteed to do so. In Fig. 8E, we show that if we further narrow down protocol space to

$$P=\left\{\left({t}_{1},{t}_{2},{t}_{3}\right):{t}_{1}=4,5\le {t}_{2}\le 10,16\le {t}_{3}\le 30\right\},$$
(1)

then, every 3-day protocol we had randomly sampled in the region \(P\) results in practically identifiable parameters. We note that there are many practically identifiable protocols outside of \(P\), as this region only contains 40% of all the practically identifiable 3-day protocols we tested. That said, we find using a protocol in \(P\) to be highly desirable, as the experimental design is robust to small changes on the day the measurements are collected. We confirm this robustness in Fig. 8F, where we tested the identifiability associated with every protocol in \(P\), not just the ones that had been randomly sampled during our analysis. As predicted from the random sampling, every protocol in \(P\) corresponds to both model parameters being practically identifiable. This also confirms the sufficiency of using a random sampling of possible protocols in the proposed minimal experimental design methodology.

In conclusion, our identifiability analysis has led to a robust, minimally sufficient, experimental design recommendation. We suggest an experimental protocol where percent target occupancy in the TME is collected three times over a one-month period. The first sample should be collected on day \({t}_{1}=4\). The second sample can be taken anywhere from day 5 to day 10, \(5\le {t}_{2}\le 10\). The third sample should be taken on day 16 or later, \(16\le {t}_{3}\le 30\). Even though ~60% of 3-day protocols that result in practically identifiable parameters lie outside the recommended region, we have shown that this experimental design guarantees that every choice made using these specifications results in practically identifiable parameters. Using any experimental design away from the boundary of this region is thus predicted to robustly provide sufficient data for confidently estimating the values of the parameters \({\boldsymbol{p}}=\left({k}_{{onT}},{k}_{{synt}}\right)\).

Sensitivity of experimental design recommendation to methodological assumptions

A number of decisions must be made to implement the proposed minimal experimental design algorithm. Beyond the standard decisions that go into model development (building the right-sized model for the problem at hand, given the available data), a subset of model parameters must be chosen for further study using a combination of experimental knowledge and sensitivity results (see Supplementary Fig. 1). Once these parameters have been selected, the modeler has another decision to make: from what underlying distribution should the parameter values be sampled to generate the simulated data needed at the fourth step (see Fig. 3) of the workflow?

In the results presented thus far, we assumed that the parameters are normally distributed. The mean of each distribution is the best-fit value of the parameter when the model is fit to TGI data (10 mg/kg of pembrolizumab administered on average every 3.5 days). The standard deviations were selected to strike a balance between being too “optimistic” (i.e., having such low variability that it is unlikely to represent real data) and being too “pessimistic” (i.e., having such high variability that the experimental data are too noisy to be informative). In Table 2, we explore what happens if we change the assumptions placed on the parameter distributions. We particularly ask how the experimental design recommendation changes if: (1) parameters are lognormally distributed instead of normally distributed (keeping the mean and standard deviation fixed), (2) parameters remain normally distributed about the same mean, but the standard deviation is decreased by 25%, and (3) parameters remain normally distributed about the same mean but the standard deviation is increased by 25%.

Table 2 Robustness of experimental design recommendation to parameter distributions.

One conclusion is consistent across all cases considered: measuring %TO in the TME at a single time point is not a sufficient experimental design, as it never results in practically identifiable parameters. Focusing next on changing the shape of the distribution, we find that shifting from a normal to a lognormal distribution (with the same mean and standard deviation) has minimal impact. Table 2 shows that a very similar set of 2-day protocols results in practically identifiable parameters, and a very similar percentage of 3-day protocols results in practically identifiable parameters. Supplementary Fig. 5A shows the classification of all tested 3-day protocols, and Supplementary Fig. 5B demonstrates that all protocols in the experimentally-recommended region \(P\) identified using a normal distribution result in practically identifiable parameters when the simulated data is generated from a lognormal distribution. Thus, the experimental design recommendation made using normally distributed parameters holds if we change the distribution to lognormal.

We next consider the impact of keeping the parameters normally distributed about the same mean but changing the standard deviation. The experimental design recommendation is influenced by the standard deviation in the following way: the smaller the standard deviation, the larger the set of experimental protocols that result in practically identifiable parameters. To detail, 2.99% of 2-day protocols result in practically identifiable parameters when the standard deviation is 25% smaller than the default value. That number drops to 1.38% at the default standard deviation, and 0.02% when the standard deviation is 25% larger than the default value. The decrease in the number of identifiable protocols as a function of standard deviation is to be expected, as more noisy data should necessitate the collection of more data (Harshe et al. 2023). Table 2 shows that every protocol that resulted in practically identifiable parameters using the default value of \({\rm{\sigma }}\) also results in practically identifiable parameters at 0.75\({\rm{\sigma }}\). Thus, a smaller standard deviation means more experimental designs are allowable, but it also preserves the recommendations made using (somewhat) more noisy data.

Looking at the randomly sampled 3-day protocols, we see the same qualitative trend as a function of standard deviation as with the 2-day protocols. In total, 16.16% of protocols result in practically identifiable parameters at 0.75\({\rm{\sigma }}\) (see Supplementary Fig. 5C), compared to 7.10% at \({\rm{\sigma }}\) and 3.46% at 1.25\({\rm{\sigma }}\) (see Supplementary Fig. 5E). All protocols in region \(P\) identified using the default value of \({\rm{\sigma }}\) still result in practically identifiable parameters using \(0.75{\rm{\sigma }}\) (Supplementary Fig. 5D). At \(1.25{\rm{\sigma }}\), only a subset of the protocols in region \(P\) result in practically identifiable parameters. In particular, collecting the second sample on day 5 or 8 is no longer a recommended experimental design. Thus, a larger standard deviation imposes more restrictions on when data should be collected.

Discussion

As mathematical models continue to be used for real time interventions, it becomes increasingly important to design experiments that collect the right data, at the right time, to maximize the model’s predictive power. In this work, we propose an approach for using identifiability analysis to design a minimally sufficient set of experiments. The approach involves generating simulated data from the model, and then identifying a set of parameters that would be practically identifiable in the ideal scenario of “complete” experimental data (i.e., conducting an experiment daily). Then, we work backwards, seeking to find the minimal number of data points that must be collected to ensure the practical identifiability of selected model parameters. We applied the proposed minimal experimental design technique to a site-of-action PKPD model of tumor growth in response to treatment with pembrolizumab that explicitly accounts for drug-target binding in the TME. We were particularly interested in identifying two very specific quantities that are difficult to measure experimentally, but that significantly affect model dynamics—target synthesis in the TME and apparent drug-target affinity in the TME.

Applying our minimally sufficient experimental design algorithm led to the conclusion that, even if all other parameters are estimated using other methods and experiments, no single measurement of percent TO in the TME can result in parameter practical identifiability. Consequently, the model cannot be confidently parameterized, and the resulting predictions about tumor response to treatment cannot be trusted. We did identify a small number of 2-day experimental protocols for collecting %TO in the TME that ensure parameter practical identifiability. However, these protocols are not robust to small perturbations in when the data is collected. Further, if the experimental data were somewhat noisier (25% more) than assumed, only one of the identified 2-day protocols resulted in practically identifiable parameters. It took considering 3-day protocols to identify a robust set of experimental designs that ensure parameter identifiability: the first measurement of %TO in the TME must be collected on day 4, the second can be collected any time between days 5 and 10 (inclusive), and the third taken any time from day 16 until the end of the 30-day experiment. Any such set of three measurements is sufficiently informative for the selected parameters to be practically identifiable, which lends strong confidence to predictions made by the model. This recommendation is fairly robust to the underlying assumptions about the simulated data: the recommendation holds if the data is lognormally distributed instead of normally distributed and if the data is normally distributed but 25% less noisy than assumed. If the data is normally distributed and 25% noisier than assumed, a subset of the recommended protocols will result in practically identifiable parameters.

While the sampling schedule recommended here is particular to the model and experimental system under consideration, the proposed minimally sufficient experimental design methodology can be applied broadly to other biological systems. The method is best applied, however, to design an experiment that yields important data for model parametrization but is challenging/invasive/expensive to perform. For instance, there is no need to optimize the experimental design for measuring percent target occupancy in the TME if sufficient dynamics can be inferred from the correlation between plasma PK and tumor burden reduction. Similarly, since collecting volumetric data in mice is straightforward and non-invasive, there is no need to design a minimally sufficient experiment for this task. Conversely, there are times when a reliable understanding of TO dynamics in the TME is critical, as would be the case for bispecific T-cell engagers (BiTes), whose efficacy is contingent on the drug binding to both targets explicitly in the TME22. Additionally, assessing levels of target engagement can be very important for addressing safety considerations, both for BiTEs23, and for broader classes of drugs, such as immune cell agonists24. For these cases, the additional investment of resources in data collection, even for an invasive procedure, may prove indispensable. The proposed methodology enables collecting the minimal amount of data sufficient to inform an associated mathematical model.

A key to successful implementation of the proposed minimally sufficient experimental design methodology is determining an appropriate model for the data, and then selecting the subset of parameters for analysis. Herein, we chose to validate our model by fitting data for a high dose of drug and comparing model predictions to data at a low dose. Alternatively, one could consider multiple random splits of the data into training and validation sets, and select the model that best describes the average predictive abilities in the validation sets25. Once a model has been selected, a parameter subset must be identified. The set must be small enough to ensure that all parameters are practically identifiable under the “ideal” experimental conditions; for instance, if the data of interest could be collected daily over the entire course of the experiment. Any parameters not in the analysis set must be fixed, thus posing the challenge of how to approximate the value of such fixed parameters. Fortunately, for PKPD models, some common approaches exist for parameter estimation. In the proposed example, for instance, PK parameters can be estimated using standard software, such as Monolix or WinNonLin; \({k}_{{on}}\) and \({k}_{{off}}\) can be assessed using Biacore26,27 as a first step and then refined using in vitro assays to further capture the expected PK-TO relationships. Target clearance rates can be estimated using internalization assays, and normal synthesis rates can be calculated from the expected normal steady-state levels of the target. With such careful preparatory work, this technique enables the estimation of the few remaining elusive parameters that may be essential to a model’s predictive value.

The benefits of the proposed experimental design methodology are best realized when experimentalists and modelers are working in close collaboration. Such a partnership ensures alignment of key measurable aspects of the mechanism of interest, while ensuring that modelers have a clear understanding of what experiments are feasible. Constraints on what experiments can be conducted include experiment duration, the possible number of measurements that can be collected, and any constraints on when the samples can be collected (due to costs, implementation issues, animal welfare, etc.). While optimizing time point collection does not answer all experimental design questions—for instance, it does not answer questions related to selecting sample size—the proposed methodology can be used in combination with other optimal experimental design approaches to maximize the utility of collected data.

More broadly, this work reinforces the idea that there is no “correct” model in absolute; instead, there are data for the model and models for the data. Indeed, when we talk about fit-for-purpose models, we typically talk about models that are just complex enough to answer the question of interest. This work arguably adds another dimension to fit-for-purpose modeling, where fit-for-purpose data are collected to inform the model and then enable it to answer the underlying question. Therefore, the structure of a model, even for the same question, can conceivably be different but equally useful depending on the data available to identifiably parametrize it. Once a research team has identified a motivating question and built and parameterized a fit-for-purpose model for the question of interest, the proposed experimental design framework can be a powerful tool for identifying the minimal amount of experimental data required to maximize the model’s predictive power.

Methods

Profile likelihood for practical identifiability

Let \(p\) be the parameter vector for our differential equation model and let \({\bar{y}}_{{data}}(t)\) and \({\sigma }^{2}(t)\) represent the average and variance, respectively, in the data at time point \(t\). For normally distributed measurement noise, the likelihood function is defined as follows28:

$$L\left({\bar{y}}_{{data}}\left(t\right)|\,p\right)=\mathop{\prod }\limits_{t=1}^{N}\frac{1}{\sqrt{2\pi {\sigma }^{2}\left(t\right)}}\exp \left(-\frac{{\left({y}_{{model}}\left(t\right)-{\bar{y}}_{{data}}\left(t\right)\right)}^{2}}{{2\sigma }^{2}\left(t\right)}\right).$$

To find the parameter set that gives the best-fit to the data, the likelihood function is maximized:

$${p}^{* }={\rm{argma}}{{\rm{x}}}_{p}L\left({\bar{y}}_{{data}}\left(t\right)\,|\,p\right).$$

For computational simplicity, the negative of log-likelihood function is often minimized instead, which can be shown to be equivalent to minimizing the cost function \({\rm{\zeta }}\) that describes the normalized discrepancy between model predictions and data28:

$${\rm{\zeta }}=\frac{1}{2}\mathop{\sum }\limits_{t=1}^{N}\frac{{\left({y}_{{model}}\left(t\right)-{\bar{y}}_{{data}}\left(t\right)\right)}^{2}}{{\sigma }^{2}\left(t\right)}.$$
(2)

The function \(g\left({p}_{i}\right)\) that gives the maximum possible likelihood value for each parameter \({p}_{i}\) is called the profile likelihood function. To assess the practical identifiability of model parameters given available data, we will compute the profile likelihood function of each parameter \({p}_{i}\) as follows14:

  1. 1.

    Determine a range for the value of \({p}_{i}\) from any available theoretical or physiological considerations.

  2. 2.

    Fix \({p}_{i}={p}_{i}^{* }\), where \({p}_{i}^{* }\) is a value in the range defined in Step 1.

  3. 3.

    For the fixed value of \({p}_{i}\) in Step 2, find the parameter set that minimizes the cost function \(\zeta\) defined in Eq. (2).

  4. 4.

    Save the optimal value of the cost function, \(\zeta \left({p}_{i}^{* }\right)\).

  5. 5.

    Repeat Steps 2–4 for a discrete set of values in the range of parameter \({p}_{i}\).

  6. 6.

    Plot \(\zeta \left({p}_{i}\right)\) to arrive at the profile likelihood curve for parameter \({p}_{i}\).

The three possible scenarios for the profile of \({p}_{i}\) are shown in Supplementary Fig. 6. Supplementary Fig. 6A shows a practically identifiable parameter, as the profile likelihood curve is parabolic with a clear minimum at the optimal value of \({p}_{i}\). Further, the range of \({p}_{i}\) values within the 95% confidence interval (dashed red line) is finite. Supplementary Fig. 6B represents a parameter that lacks structural identifiability. This is indicated by a flat profile for which an infinite set of parameter values give equally good fits to the data. Supplementary Fig. 6C is indicative of a parameter that is structurally identifiable, but not practically identifiable. While the profile does have a global minimum, it is insensitive to changing the parameter value in one direction. This can be seen in Supplementary Fig. 6C by observing that the profile does not twice cross its 95% confidence threshold over its domain.

In this work, the profile likelihood method has been implemented in MATLAB® using ode45 as the numerical differential equation solver with a relative error tolerance of 10–6. Parameter fitting is performed using MATLAB’s fmincon function with a first-order optimality termination tolerance of 10–10. This built-in function executes an interior-point method for solving constrained minimization problems29. As we are only fitting a single parameter in all profile likelihood curves generated in this manuscript, the fit parameter is kept on a linear scale. However, we note that when fitting multiple parameters simultaneously, a log scale should be used in the parameter estimation (provided the parameters are non-negative) to avoid potential numerical complications28.

Case study: site-of-action model parametrized for pembrolizumab

Experiments aimed at characterizing the dose-dependent relationship between drug concentration and tumor size form the backbone of pre-clinical studies in oncology. Typically, the collected time-course measurements are tumor volume and drug concentration in the plasma, which are phenomenologically captured by a simple indirect response model, such as ref. 30. This correlation might be sufficient for assessing the general dose-response relationship but cannot answer the question of whether the underlying mechanism of action of the drug has been fully engaged.

For this question, we often use pharmacobinding (PB) models31, which describe the dynamics of the target (such as PD-1 for pembrolizumab), and the reversible binding kinetics between the drug and its target. This allows calculating levels of projected target occupancy, and it is typically expected that if over 90% of the target has been engaged without an effect, then the target may not be the correct one for the selected indication31. Such PK-PB models can facilitate the development of a mechanistic understanding of the dose-response relationship between the drug and the tumor size.

A step further can be taken with site-of-action models32,33,34 that take into account the drug-target dynamics not only in the plasma but also, as the name suggests, at the site of action, such as the TME. These models can vary in degrees of complexity from more mechanistic35 to more detailed physiologically based pharmacokinetic models36,37. While such models can be used to calculate projected levels of target occupancy in the TME, it is unclear whether these estimates are truly reliable without actually sampling the TME, the question we will be addressing here.

For that, we developed a modified version of a two-compartment site-of-action model which describes drug concentration over time in the central (plasma), peripheral (tissue), and TME compartments. We assume that pharmacobinding occurs in the plasma and TME compartments; while it is possible that some drug-target dynamics occur in the peripheral compartment as well, we assume that it is either negligible with regards to overall dose-response dynamics or cannot be measured; these assumptions can be relaxed if needed.

The model has a standard structure in the plasma compartment, with an assumption of intravenous drug administration that is cleared at a rate \({k}_{10}\); the drug distributes to the peripheral compartment at a rate \(\left({V}_{1}/{V}_{2}\right){k}_{12}\) and back at a rate \(\left({V}_{2}/{V}_{1}\right){k}_{21}\), where \({V}_{1}\) is the volume of distribution in the central compartment, and \({V}_{2}\) is the volume of distribution in the peripheral compartment. We assume that the free target \({T}_{p}\) is synthesized in the plasma at a rate \({k}_{{syn}}\) and, since the model is calibrated to pembrolizumab data whose target PD-1 is membrane-bound, we assume that it is cleared primarily through internalization at a rate \({k}_{\mathrm{int}P}\). We also assume reversible binding kinetics between the drug and its target, with the drug-target complex in the plasma forming at a rate \({k}_{{on}}\), dissociating at a rate \({k}_{{off}},\) and clearing at the rate \({k}_{\mathrm{int}P}\).

The PK-PB dynamics in the TME compartment are largely similar, with several proposed modifications. Firstly, we assume that the rate of drug distribution into the TME is not constant but is a function of the tumor volume, namely, \(\left({V}_{1}/\left(x+\delta \right)\right){k}_{1T}\), where \(x\) is tumor volume and \(\delta\) is introduced to prevent division by zero in the limiting case, where the tumor volume tends to zero. We propose that while \({V}_{1}\) and \({V}_{2}\) are treated as constant volumes of distribution (as is standard), the volume of distribution into the tumor be treated as variable, thereby capturing the higher or lower distribution of the drug into the TME depending on tumor size. As a consequence of this assumption, we further propose that the rate of target synthesis in the tumor is not constant or at equilibrium as would likely be in the plasma or non-disease compartment, but instead is treated as a function of tumor size. In particular, we assume this rate increases according to a saturating function \(\frac{{k}_{{synt}}x}{x+{K}_{x}}\), where \({k}_{{synt}}\) is the rate of target synthesis in the tumor (which is likely higher than in the plasma), and \({K}_{x}\) is the half-maximal concentration of free target \({T}_{{TME}}\) in the TME.

Additionally, we hypothesize that the apparent rate of drug-target binding in the TME is not necessarily the same as in the plasma, i.e., that \({k}_{{onT}}\) may be different from \({k}_{{on}}\). That said, we expect that once the drug-target complex has been formed, the dissociation rate \({k}_{{off}}\) will remain the same, as that is more likely to be an intrinsic property38. Finally, we assume that the tumor grows logistically and is killed as a function of the percent target occupancy in the tumor, which is calculated as \(\frac{D{R}_{{TME}}}{D{R}_{{TME}}+{T}_{{TME}}}\), where \(D{R}_{{TME}}\) is the concentration of the drug-target complex in the tumor and \({T}_{{TME}}\) is the free target in the TME.

The resulting system of equations is as follows:

$$\left.\begin{array}{c}\frac{d{D}_{p}}{{dt}\,}=u\left(t\right)-{k}_{10}{D}_{p}-{k}_{{on}}{D}_{P}{T}_{p}+{k}_{{off}}{D}_{R}-{k}_{12}{D}_{p}+{k}_{21}\frac{{V}_{2}}{{V}_{1}}{D}_{t}\\ -{k}_{1T}{D}_{p}+{k}_{T1}{\frac{x+\delta }{{V}_{1}}D}_{{TME}}\\ \frac{d{T}_{p}}{{dt}}={k}_{{syn}}-{k}_{intP}{T}_{p}-{k}_{{on}}{D}_{p}{T}_{P}+{k}_{{off}}{D}_{R}\\ \frac{d{D}_{R}}{{dt}}={k}_{{on}}{D}_{p}{T}_{P}-{k}_{{off}}{D}_{R}-{k}_{intP}{D}_{R}\\ \frac{d{D}_{t}}{{dt}}={k}_{12}\frac{{V}_{1}}{{V}_{2}}{D}_{p}-{k}_{21}{D}_{t}\end{array}\,\right\}{\rm{Non}}-{\rm{tumor}}\; {\rm{compartments}}$$
(3)
$$\left.\begin{array}{c}\frac{d{D}_{{TME}}}{{dt}}={k}_{1T}{\frac{{V}_{1}}{x+\delta }D}_{P}-{k}_{T1}{D}_{{TME}}-{k}_{{onT}}{D}_{{TME}}{T}_{{TME}}+{k}_{{off}}D{R}_{{TME}}\\ \frac{d{T}_{{TME}}}{{dt}}=\frac{{k}_{{synt}}x}{x+{K}_{x}}-{k}_{{int}}{T}_{{TME}}-{k}_{{onT}}{D}_{{TME}}{T}_{{TME}}+{k}_{{off}}D{R}_{{TME}}\\ \frac{{dD}{R}_{{TME}}}{{dt}}={k}_{{onT}}{D}_{{TME}}{T}_{{TME}}-{k}_{{off}}D{R}_{{TME}}-{k}_{{int}}D{R}_{{TME}}\\ T{O}_{{TME}}=100\times \frac{D{R}_{{TME}}}{D{R}_{{TME}}+{T}_{{TME}}}\\ \frac{{dx}}{{dt}}={rx}\left(1-\frac{x}{K}\right)-d\frac{T{O}_{{TME}}}{T{O}_{{TME}}+T{O}_{50}}x\end{array}\,\right\}{\rm{TME}} \,{\rm{compartment}}$$

The structure of the model is summarized in Fig. 1. Variable definitions, initial conditions, and calibrated parameter values are summarized in Table 1.

The model was calibrated to digitized PK data (Fig. 2A) for pembrolizumab reported in ref. 13 and TGI data reported in ref. 20. The reason this particular PK dataset was chosen is that it includes measurements of percent TO in the TME, which is typically not available. TGI curves in20 are measured for 2 mg/kg and 10 mg/kg of pembrolizumab, administered on average 3.5 days apart, for C57BL/6 mice implanted with MC38 syngeneic colon adenocarcinoma cells. We further calibrated model parameters to fit the PK-TGI relationship for the dose of 10 mg/kg (Fig. 2B). We also report the projected levels of percent TO in plasma as compared to the TME (Fig. 2C) to emphasize the importance of capturing drug-target dynamics in the TME, as this is where it is expected to drive efficacy.

Model parameterization was validated using untrained data. Figure 2D demonstrates that we were able to successfully recapitulate the PK curves for three doses of 1 mg/kg of pembrolizumab given weekly13, and Fig. 2E shows that we were able to describe the TGI data for five doses of 2 mg/kg of pembrolizumab given on average every 3.5 days20. We note that, without the %TO in TME data (Fig. 2F), there was a large number of parameter sets that could recapitulate the PK and TGI equally well, further emphasizing that this piece of data was critical to model parameterization.

Designing minimally sufficient experimental protocol

We propose the following workflow, summarized in Fig. 3, for using the profile likelihood method to design a minimally sufficient experimental protocol:

  1. 1.

    Identify experiment that measures the variable of interest. We recommend focusing on an experiment that, while potentially expensive and/or invasive, provides data needed to inform decision making from your modeling work.

  2. 2.

    Model development, parameterization, and validation. As is always the case in modeling, the validity of its predictions is constrained by the accuracy of the model. Best-practices for designing a fit-for-purpose model have been extensively reviewed elsewhere39,40,41.

  3. 3.

    Select parameters of interest. As fit-for-purpose biological models often contain double digit numbers of parameters, it is often not feasible or desirable to allow all parameters in a model \({\mathscr{M}}\) to vary. If \({\mathscr{M}}\) has \(r\) parameters, the goal of this step is to identify a subset of \(n \,<\, r\) model parameters for further analysis. This can be done via the following considerations:

    1. a.

      Work closely with experimental colleagues, and consult the literature, to identify values for as many parameters as possible.

    2. b.

      For those parameters that cannot be easily estimated from experimental data, conduct a sensitivity analysis42,43,44. Those parameters the model is least sensitive to should be fixed.

    3. c.

      Let \({\boldsymbol{q}}=\left({q}_{1},\ldots ,{q}_{m}\right)\) be the set of \(m \,< \,r\) model parameters that are fixed based on (a) and (b).

    4. d.

      Let \({\boldsymbol{p}}=\left({p}_{1},\ldots ,{p}_{n}\right)\) be the set of parameters that have not been fixed in the prior step, where \(n={r}-m\). This parameter set will be used in the subsequent analysis.

  4. 4.

    Generate simulated data for the measurement of interest. Define an underlying distribution for each parameter in \({\boldsymbol{p}}\). Randomly sample \(K_{p}\) values of each parameter from its corresponding distribution and determine the model-predicted response by solving \({\mathscr{M}}\left({\boldsymbol{q}},{{\boldsymbol{p}}}_{k},t\right),{k}=1,\ldots ,K_{p}\). From the model-predicted response, extract the value of the variable of interest \({Y}_{k}\left(\bar{{\boldsymbol{t}}}\right)\) at the time points of interest \(\bar{{\boldsymbol{t}}}\). For instance, it is often only feasible to conduct a maximum of one experiment per day, so \(\bar{{\boldsymbol{t}}}\) might represent the vector of all days during the experimental time frame; that is, \(\bar{{\boldsymbol{t}}}\) \({\boldsymbol{=}}\left({\boldsymbol{0}}{\boldsymbol{,}}{\boldsymbol{1}}{\boldsymbol{,}}{{\ldots }}{\boldsymbol{,}}{{\boldsymbol{t}}}_{{\boldsymbol{f}}}\right)\).

    1. a.

      These distributions could be the output of a fitting procedure, i.e., if nonlinear mixed effects modeling45,46 or Approximate Bayesian Computation47,48 is utilized.

    2. b.

      Alternatively, one can assume that each parameter is normally or lognormally (to avoid the possibility of negative parameter values) distributed about its calibrated value. The choice of distribution and its standard deviation should be guided by biological considerations and/or historical data (for instance, how much noise is observed in experimental data), where possible.

  5. 5.

    Practical identifiability using the data from “complete” simulated experiments. Use the profile likelihood approach to assess the practical identifiability of all parameters in \({\boldsymbol{p}}\) using the simulated data \({Y}_{k}\left(\bar{{\boldsymbol{t}}}\right)\) from Step 4. We note that the use of noisy simulated data to explore model identifiability has been considered by others1,15,18,49. The goal at this step is for all parameters in \({\boldsymbol{p}}\) to be practically identifiable, so if this is not the case, revisit Steps 2 and 3 to right-size your model and reconsider which parameters in the vector \({\boldsymbol{p}}\) could be fixed. Proceed to the next step once all parameters in \({\boldsymbol{p}}\) are practically identifiable given the simulated data \({Y}_{k}\left(\bar{{\boldsymbol{t}}}\right)\).

  6. 6.

    Determine the minimal number of data points needed to ensure practical identifiability. Start by asking if it is possible to only conduct \(j=1\) experiment to collect data on the variable of interest while preserving the practical identifiability of parameters in \({\boldsymbol{p}}\). That is, does any \({t}_{i}\in \bar{{\boldsymbol{t}}}\) exist for which model parameters \({\boldsymbol{p}}\) remain practically identifiable given only the measurement \({Y}_{k}\left({t}_{i}\right)\)?

    1. a.

      If yes, proceed to Step 7.

    2. b.

      If no, repeat the analysis in Step 6 by considering if practical identifiability is ensured if instead you could conduct \(j+1\) experiments to collect data for \({Y}_{k}\).

  7. 7.

    Make minimal experimental design recommendation. The value of \(j\) in Step 6 for which practical identifiability of all parameters in \({\boldsymbol{p}}\) can be ensured determines the number of experiments that must be conducted and the time when they need to occur. Such a \(j\) is guaranteed to exist because Step 5 ensured that the parameters in \({\boldsymbol{p}}\) are practically identifiable using all the simulated data \({Y}_{k}\left(\bar{{\boldsymbol{t}}}\right)\). For instance, if \(j=1\) ensured practical identifiability of all parameters in \({\boldsymbol{p}}\), only a single experiment needs to be conducted. However, this does not mean that just any single experiment can be conducted. The results in Step 6 can also be used to determine which subset of experiments ensure practical identifiability. That is, the method can determine not only the minimal number of experiments to conduct, but when those experiments must be conducted.

We note that while it may appear counterintuitive that the experimental design precedes the modeling step in the proposed workflow, often it is the experimental constraints that will determine the structure of a useful model. For instance, if the only data available to a modeler is TO in the plasma but not in the TME, modeling TO in other compartments will likely not be instructive. In fact, it is likely to introduce a source of additional uncertainty. It is thus the understanding of experimental constraints that guides the development of a practically useful model that can be used to optimize the sampling schedule.

The MATLAB code implementing this minimal experimental design methodology for the model described in System (3) is available at https://github.com/jgevertz/minimal_experimental_design. The computational resources and costs of implementing the various stages of the workflow are summarized in Table S1.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.