## Abstract

Quantitative prediction of resource allocation for living systems has been an intensive area of research in the field of biology. Resource allocation was initially investigated in higher organisms by using empirical mathematical models based on mass distribution. A challenge is now to go a step further by reconciling the cellular scale to the individual scale. In the present paper, we review the foundations of modelling of resource allocation, particularly at the cellular scale: from small macro-molecular models to genome-scale cellular models. We enlighten how the combination of omic measurements and computational advances together with systems biology has contributed to dramatic progresses in the current understanding and prediction of cellular resource allocation. Accurate genome-wide predictive methods of resource allocation based on the resource balance analysis (RBA) framework have been developed and ensure a good trade-off between the complexity/tractability and the prediction capability of the model. The RBA framework shows promise for a wide range of applications in metabolic engineering and synthetic biology, and for pursuing investigations of the design principles of cellular and multi-cellular organisms.

## Empirical models of mass allocation in life science

System modelling can be very different depending on the considered scale: individual/organs or (infra-) cellular. At the individual scale, the multicellular organism, in interaction with its environment, is composed of organs that exchange biological materials, molecules to achieve the physiological functions of the organism lifecycle. The behaviour of the organism in a specific environment is usually described by mechanistic models combining biophysics and systemic description (e.g. compartment models) such as eco-physiological models for plants [1], carbon allocation models for trees [2] or models of nutrient partitions between life functions for animals [3,4]. When properly calibrated on a dataset, these models are usually of reasonable size and lead to accurate quantitative predictions of mass distribution between organs or life functions. So, a simple principle of mass distribution has a high capability of prediction for the long-term behaviour of organisms despite the complexity of the living organisms [5,6]. However, the predictive capability of mass distribution based models decreases for short-term predictions, particularly with respect to multi-stress challenges. In such situations, the organism adopts a complex strategy based on the integration of cell decisions to cope with effect of the stress. Actually, the trade-off between the simplicity and the predictive capability of the model hides the intrinsic complexity of the cell functioning, making it difficult to use the recent advances in the understanding of the functioning of the cell for improving the model prediction capability. This raises a question of how to use the current knowledge on cell functioning in individual-scale models (see Figure 1). And, in this context, can the framework of mass allocation be useful?

Mass allocation between cellular functions was first investigated at the cellular scale for bacteria such as *Escherichia coli* in the 1960s and 1970s. Maaløe and Kjeldgaard, known as the ‘Copenhagen School’, were the first to experimentally determine the biomass composition of *E. coli* [7]. They found that the abundances of the macro-molecular constituents of the cell (e.g. global protein, DNA and RNA contents, ribosomes, RNA polymerases) exhibit a functional dependence on the growth rate [7]. Interestingly, the growth rate is not a state of the cellular system, but rather seems to be a hidden explanatory variable that in some sense organizes the macro-molecular composition of the cell. At the beginning of the 1970s, Bremer and Dennis summarized and consolidated these works on *E. coli* and provided an extended view of the quantitative evolution of cellular parameters and chemical composition of the cell with respect to the growth rate [8].

At the beginning of the 1990s, the results of the Copenhagen School were revisited from a theoretical perspective by Marr [9] and led to the first explicit bridge between empirical laws of mass repartition between subcellular processes and the regulatory molecular mechanisms underlying them. Marr developed a mechanistic model composed of ribosome synthesis, protein translation and regulation by the alarmone ppGpp while assuming the cellular density to be constant [9]. Using this model, he was able to recover the known change in the abundances of the ribosomal and the non-ribosomal proteins with respect to growth rate as summarized by Bremer and Denis in [8,10]. In such a model, the growth rate results from a trade-off between the availability of charged tRNAs and the protein synthesis. The Marr model already implicitly contained the underlying mechanisms governing resource allocation that were to be revealed 20 years later.

## Emergence of the concept of intracellular resource allocation

### Phenomenological model of the cell

At the end of the 2000s, the growth rate management, and particularly the phenomenological laws of the Copenhagen School, were revisited from a resource allocation perspective [11–15]. The main idea was that the protein and ribosome synthesis, together with the translational capacity of the ribosomes, impose global constraints on the cellular economics [11,12,15]. By using this rough description of the cellular behaviour, the growth rate results from a global trade-off between constraints governing resource allocation between three sets of macro-molecules [12,15]: (i) the translational apparatus; (ii) the metabolic enzymes and transporters, and (iii) the housekeeping proteins. Like for higher organisms, parsimonious resource allocation was identified as a possible cellular design principle even if the biological validation remained difficult in 2011.

However, considering the evolution of molecular entities only with growth rate is usually insufficient to explain the finer cellular configuration. Indeed, if the macro-molecular composition of the cell is growth-rate dependent, the configuration of subcellular entities, and particularly of the metabolic pathways, is also medium-dependent. The metabolic pathways are turned on/off with respect to the nutrient availability by a large diversity of regulatory mechanisms [16]. Two different metabolic configurations can lead to the same growth rate [17], and thus to the same macro-molecular composition. Investigation of the resource allocation in the cell in more detail is then faced with a challenge of refining the cell description from the macro-molecular to the genome-wide scale.

### Towards the advent of genome-wide models of resource allocation

This shift from the macro-molecular to the genome-wide scale raised a series of open questions: how can the principle of resource allocation be applied to a more detailed description of the cell? What are the mathematical and computational methods available to allow for such a description? How to validate such methods? Refining the cell description implies introducing into the model a larger set of cellular entities, such as mRNAs, tRNAs, enzymes, transporters, ribosomes, chaperones, the metabolic fluxes, etc. The development of a predictive method combining all of these cellular entities with the principle of parsimonious resource allocation has to take into account the large size of the system. The computational issue of the method is thus critical – any formulation utilizing non-convex optimization problems should be avoided.

Since the beginning of the 21st century, this series of open questions could finally be investigated, mainly thanks to the recent and dramatic progress both in the acquisition of the new omics datasets and in the systemic modelling of the bacterial cell. New omics datasets, such as absolute quantitative proteomics, allowed for the quantitative determination of the weight of cellular processes in protein cost (including metabolic pathways) with respect to the total cellular mass. Systemic modelling of the cell led to predictive methods of computing the genome-wide resource allocation with respect to the medium composition. To do so, the notion of cellular constraints was formalized and generalized at the genome scale in [12,14,18] (see below) and fall into the so-called constraint-based modelling framework.

## Resource allocation at the cellular scale

### Genome-wide omics quantification

The advent of genomics opened up opportunities for new lines of investigation in biology, especially in deciphering the adaptation of the whole cell in response to environmental or genetic perturbations. First, genome-wide analyses in bacteria provided relative quantification of molecular entities (condition *A* compared with condition *B*). The combination of relative transcriptomics and proteomics together with fluxomics and metabolomics enabled the investigation of the cellular reshaping between different growth rates in the chemostat [19] or during a nutritional shift [17]. Cells appear to be very robust to perturbations and adapt through an interplay of genetic (e.g. transcriptional, translational, post-translational) and enzymatic regulations [16,17]. The cellular reshaping could indeed be investigated by relative techniques, but tackling the problem of resource allocation for a specific growth condition requires the determination of absolute numbers of molecular entities.

In 2007, the advent of genome-wide quantitative techniques, and particularly quantitative proteomics, dramatically changed the perspective for investigation of resource allocation. Absolute quantification of proteins at the genome scale [20–24] and ribosome profiling [25] revealed the reshaping of cellular processes in terms of resource investment with respect to the growth rate [18,24,26–29] or under stressful conditions [30,31] for different bacteria and for yeast [20,32]. One important consequence of absolute protein quantification was to elucidate the actual effect of relative variations in protein amounts. Some variations in certain cellular processes or metabolic pathways between two growth conditions, which in the relative context would have been considered quite small (below 2-times repression), were shown in the absolute context to allow the cell to save ∼6% of the total mass of proteins (see Dataset E1 in [18], [24,33]). Even modulations and fine retuning of cellular processes can thus strongly affect the cellular economics.

### Genome-scale constraint-based approaches and systems biology

Bacteria (and, more generally, any cell) have to face a large diversity of constraints to ensure growth, adaptation and survival such as biophysical, thermodynamics, stoichiometric or osmotic constraints to cite a few. Considering the cell as a set of biophysical and structural constraints falls within the framework of the so-called constraint-based modelling [34]. Mass conservation in the metabolic pathways in steady state was the first set of constraints to be identified [35]. From an engineering perspective, these constraints are mathematically formalized as a set of linear equalities and inequalities that define a feasibility domain of the cellular behaviour: satisfying these constraints is necessary but not enough to accurately predict the cell behaviour. This limitation has usually been bypassed by the addition of a general design principle such as the maximization of biomass formation [36]. The general principle is formalized as a linear objective function that, in addition to the set of linear equalities and inequalities, leads to linear optimization models such as the flux balance analysis (FBA) method. Such linear optimization problems are convex and can easily be solved by algorithms such as interior-point methods [37,38].

The parsimonious resource management had often been postulated as a general design principle for plants and animals, and applied, at the cellular scale, recently to bacteria. By considering the bacterial cell as a self-replicating system composed of sub-systems, a new computational constraint-based modelling method, named resource balance analysis (RBA), has been developed [12,14]. The RBA method intrinsically captures the bottlenecks that are due to resource sharing between all biological processes at genome scale. In addition to the mass conservation principle, the main constraints governing the resource allocation that have so far been identified in [11,12,18,39–41] are: (I) the capability of the metabolic network must be sufficient to produce all metabolic precursors necessary for biomass production; (II) the capability of the translation apparatus must be sufficient to produce all proteins; (III) the cytosolic density and the membrane protein occupancy are limited. The biological constraints were mathematically formalized as a set of convex mathematical constraints. Satisfying these constraints then led to a linear convex optimization problem. The underlying problem is thus tractable and solvable rapidly even at genome scale in a few seconds [37,38]. Solving this problem for a wide range of growth conditions enabled us in [13] to recover the phenomenological laws obtained by Marr [9] and Scott and colleagues [15] while considering the entire cell. Since the RBA publications in 2009 [12] and 2011 [13], other constraint-based modelling methods integrating some of the constraints (I), (II) and (III) have been developed [41,42], confirming that resource allocation is a relevant cell design principle. Altogether, such constraint-based models correspond more to genome-scale *cellular* models in contrast to genome-scale *metabolic* models that embed only the metabolic pathways and an aggregated complex reaction for biomass composition [43].

Beyond its prediction capability on the genome scale, the constraint-based modelling framework is highly versatile and ensures a good trade-off between the model complexity/tractability and the prediction capability. Any known cellular process composed of molecular machineries can be integrated using constraint-based formalism. The challenge is to obtain a convex formulation of the biological constraints. The key aspect is to relate the function of the molecular machinery to its abundance by a convex formulation such as a simple efficiency parameter. For instance, the flow of proteins produced by the ribosomes is related to the abundance of ribosomes by translation efficiency. Another example concerns the enzymes. The metabolic flux (e.g. the function of the enzyme) is related to the enzyme abundance by the apparent catalytic rate. The non-linear enzyme kinetics with respect to substrates and products is thus simplified and aggregated into a single parameter, the apparent catalytic rate. This simplification was necessary to obtain a convex constraint formulation at genome scale.

### Non-convexity of the general resource allocation problem combining metabolic fluxes, protein and metabolite concentrations

Actually, the first attempt in 2009 of the simultaneous computation of metabolic fluxes, protein and metabolite concentrations resulted in a non-convex optimization problem [11], which raised the following question: is the underlying optimization problem intrinsically non-convex? Or is the non-convexity due to the chosen formulation (i.e. the problem is convex but the mathematical formulation was non-convex)? In [44,45], the authors studied a constrained enzyme allocation problem with general enzyme kinetics in metabolic networks and mathematically proved that optimal solutions of the non-linear optimization problem are elementary flux modes. Computing optimal solutions is obtained by the enumeration of elementary flux modes, which is computationally hard [46] and intractable in practice for large metabolic networks at genome scale [47]. The intrinsic non-convexity of the nonlinear optimization problem was confirmed recently in [48], where two local minimum cellular configurations were obtained for the same nutritional conditions: two different configurations of metabolic fluxes, protein and metabolite concentrations led to a strict local maximum. Altogether, solving the general nonlinear optimization problem of resource allocation will be highly challenging and should necessitate the design of new algorithms to restrict the number of elementary flux modes to be computed [49], and/or methods of relaxation of the general non-linear optimization problem.

However, the non-convexity of the general optimization problem can be handled in some cases for a smaller network. For instance, in the 1990s, the structural design of one or a few metabolic pathways were investigated in depth [50–52]. The authors usually considered optimality principles integrating constraints on parsimonious abundance, thermodynamics and kinetics of enzymes. Recently, the simultaneous computation of optimal enzyme and metabolite abundances and of metabolic fluxes for glycolysis has been formalized as a convex optimization problem [53].

## Towards quantitative predictions of resource: how to calibrate and validate such generic models?

Full validation of genome-scale cellular models is obviously quite challenging [41] and necessitates utilization and reconciliation of various sources of quantitative information. Ever more datasets are being generated and can be used to estimate the efficiency parameters of molecular machineries and of enzymes using a dedicated identification procedure. Since most of the parameters are the apparent catalytic rates, absolute protein quantification and fluxome measurements are the primary dataset of interest enabling both method calibration and validation. Currently, combining absolute quantification of the proteome and the fluxome resulted in the genome-scale estimation of ∼600 *in vivo* apparent catalytic rates of enzymes for *Bacillus subtilis* in five growth conditions [18] and for *Escherichia coli* [54,55]. These estimations could be used as reference parameters from Gram-positive and Gram-negative bacteria. Most of the apparent catalytic rates were found to increase linearly with growth rate [18,54,55]. An empirical relationship can be mapped on the estimates and used as a predictor of the apparent catalytic rates for a new growth condition [18]. Any advances or refinement in these acquisition techniques or in the characterization of cellular processes would permit a refinement of the parameter set or of the constraint description and would therefore serve to strengthen the capability of the method prediction. RBA has been validated for *B. subtilis* and led to accurate predictions at genome scale of the resource allocation for a wide range of growth conditions [18].

Although a precise calibration is necessary to obtain an accurate quantitative prediction, qualitative validation can already be highly informative. Qualitative validation provides a rational way to check the consistency of the RBA method and to prioritize specific parts of the network needing further calibration. For instance, the empirical laws of Marr [9] and Scott and colleagues [15] were obtained by using the same value for all apparent catalytic rates of enzymes for *B. subtilis* [17]. The metabolic network configurations predicted by RBA in response to nutrient availability showed good coincidence with the ones resulting from the known genetic and metabolic regulatory network [14]. RBA is able to predict the theoretical hierarchy of use of carbon and nitrogen sources, and thus the theoretical catabolite repression. In [56], the authors recently revisited the Warburg effect for tumour cells in the context of parsimonious resource allocation and predicted a switch between fermentation and respiration depending on the protein cost of the two metabolic pathways. Such switch between metabolic pathways is also predicted at the genome scale by RBA [12,14]. Altogether, RBA can thus be used as a way to rapidly explore how each metabolic pathway turns on or off in response to nutrient availability, at the scale of an entire cell. Furthermore, it can be used to infer rules describing gene activation. To cope with unavoidable combinatorial explosion, a Boolean formalism was chosen, aiming at the inference of logical rules, relating the medium composition with cellular response [57]. The obtained logical rules can be systematically compared with the known genetic and metabolic regulatory network [16] to provide a direct validation of the method or to highlight parts of the model that need to be refined in case of discrepancies.

## Integration of functional constraints or optimality

Does the apparent resource allocation of organisms maximize a (multi-objective) criterion? If yes, which one could it be? The nature of the criterion that an organism possibly *optimizes* has been an intensive area of research, notably to improve the capability of prediction of FBA-like methods [58]. Maximization of biomass or of ATP production were the most common objective functions chosen for fast-growing bacteria, such as *E. coli* or *B. subtilis*. The search of the criteria was usually performed by comparing fluxomics with FBA predictions using the postulated criterion [59]. A major problem of the approach is that the criterion may change according to the extracellular conditions. For instance, some bacteria such as *Mycobacterium tuberculosis* grow very slowly at ∼0.8 doublings per day in rich media [60]. They seem to adopt radically different strategies than the fast-growing bacteria. The maximization of biomass would thus not be a suitable criterion if such slow-growing bacteria were modelled.

The real criterion necessarily results from a trade-off between performance (e.g. growth rate maximization, rapid dynamical adaptation to a nutritional shift) and robustness (e.g. guarantee the survival of a part of the population in case of perturbations) with respect to the ecological niche. Consequently, the paradigm changed – the question no longer is ‘which criterion is maximized?' but rather ‘which situations would bacteria face in their environment?' and ‘which cellular constraints were imposed on the cell by the need to cope with these situations?' The next step is to mathematically formalize such survival constraints and use them to complement the constraints related to biomass synthesis. To ensure their long-term survival, slow-growing bacteria may invest a lot of resources in preventing the external aggressions or in escaping the immune system, leading to a lower growth rate. Even in such a case, each individual bacterium remains in competition with the other bacteria within the colony. Achieving the highest possible growth rate for the same set of constraints that ensure biomass production and survival will allow the bacterium to conquer the ecological niche. Following the RBA framework, the improvement of the growth rate for the same set of constraints can only be achieved by saving resources at the level of some cellular process. As a consequence, a solution offered by RBA will guarantee that all of the cellular processes are as efficient as possible with respect to resources. The optimization of the (global) system (the whole cell) strongly constrains the design of each (local) subsystem (the metabolic network, the translation apparatus, the chaperones). In this respect, the RBA framework intrinsically captures the local optimization of cellular processes in realizing their function [14].

## Conclusion and perspectives

RBA is a predictive and versatile framework computing the genome-wide resource allocation at the cellular scale. Different levels of knowledge can be easily handled by adding new constraints in the optimization problem. For instance, constraints miming transcriptional regulations could be integrated in the same vein as [61]. To do so, the key element was the formulation as a convex optimization problem. Convexity ensures the tractability of the underlying optimization problem at genome scale and opens promising perspectives of extensions such as dynamics as evoked in [62], thermodynamics and kinetics in some cases [53], stochastics to cope with noise in gene expression, but also extensions to multi-cellular organisms. These extensions rely on sparse robust or stochastic convex optimization. Robust and stochastic optimization is currently an intensive area of research in engineering science [63]. We expect that the most up-to-date optimization methods could be directly imported for RBA resolution. The future for the RBA framework sounds highly promising for predictive biology, in particular for the rational design of biological systems for synthetic biology, and for further investigating the design principles of multi-cellular organisms for reconciling the (infra-) cellular scale to the individual scale.

## Competing Interests

The Authors declare that there are no competing interests associated with the manuscript.

## Acknowledgements

This work was partly supported by the French Lidex-IMSV. We thank Bertrand Dubreucq for providing the *Arabidopsis thaliana* image, and Ana Bulovic, Laurent Tournier and Marc Dinh for critical comments on the manuscript.

**Abbreviations:** FBA, flux balance analysis; RBA, resource balance analysis

- © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society