## Abstract

Metabolic pathway analysis is a key method to study metabolism and the elementary flux modes (EFMs) is one major concept allowing one to analyze the network in terms of minimal pathways. Their practical use has been hampered by the combinatorial explosion of their number in large systems. The EFMs give the possible pathways at steady state, but the real pathways are limited by biological constraints. In this review, we display three different methods that integrate thermodynamic constraints in terms of Gibbs free energy in the EFMs computation.

- elementary flux modes
- Gibbs free energy
- metabolic pathway analysis
- thermodynamics

## Introduction

Understanding the structure of biochemical networks is an important step in characterizing cell functioning. The concept of elementary flux modes (EFMs) provides a mathematical tool to define all metabolic pathways that are feasible in a given metabolic network. It is a central concept in Metabolic Pathway Analysis. An elementary mode is a minimal set (with respect to inclusion) of enzymes that can operate at steady state with all irreversible reactions proceeding in the appropriate direction [19]. The feasible steady-state flux vectors *v* of a given metabolic network (with *r* reactions and *m* internal metabolites) constitute a polyhedral convex cone *P* of dimension at least *r* − *m* with at most *r* facets. The EFMs are the minimal support vectors of *P* (where support of *v*, supp(*v*), is defined as the set of indexes *j* such that *v*_{j} ≠ 0). When the reactions are irreversible (which can be assumed after splitting each reversible reaction into two irreversible ones), EFMs are the extreme rays of *P*, and can be computed by the double description (DD) algorithm [15].

The main problem of this approach is the combinatorial explosion of the number of EFMs, which from the direct proof [11] is bounded above by (actually, the general McMullen's upper bound theorem provides a lower upper bound for the number of vertices of a (*r* − *m*)-polytope with *r* facets given by *O*(*r*^{(r−m)/2}) [13], which is thus better than the previous one as approximately its square root). Deciding if there exists an EFMs containing two or more specific reactions is an NP-complete problem, as is deciding the existence of an EFMs involving at most a given number of reactions. Counting EFMs is #P-complete and the enumeration of EFMs with polynomial delay is still an open problem [1,2].

EFMs constitute a generating set for the potential pathways (satisfying steady state and stoichiometric conditions) in a metabolic network, but the biologically active pathways are constrained by various biological principles: thermodynamics, kinetics and regulations. Several works integrate the transcriptomic regulation constraints as Boolean formulas on propositional variables either in an EFMs or FBA (Flux Balance Analysis) computation framework [5,10] or, more recently, based on the SAT method (for *SATisfiability*) [17] which, coupled with the Linear Real Arithmetic theory in an SMT (for *Satisfiability Modulo Theories*) framework, improved expressiveness and computation performances [14].

One of the main ideas underlying Metabolic Pathway Analysis is to use a restricted set of parameters only. This is because the rate laws of enzymes are often imperfectly known, or even if they are known, the kinetic parameters involved are not available for all enzymes. Clearly, an elementary mode can only be biologically relevant if it is thermodynamically feasible. In traditional EFMs analysis, thermodynamic information is usually included by a binary distinction between reversible and irreversible reactions. In other theoretical frameworks, such as evolution studies [22,24], thermodynamics has been considered for a long time. Importantly, thermodynamic parameters play a different role than kinetic parameters. Equilibrium constants, for example, are relatively easy to measure and many of these have been deposited in databases.

In recent years, more and more approaches have been published in which quantitative thermodynamic information is employed [8,9,12,16]. In this review, we give an overview of the literature dealing with methods for taking into account the thermodynamic constraints in EFMs computation. We will first describe the DD method and the network-embedded thermodynamics (NET) analysis, then we will introduce three different methods for integrating thermodynamics in EFMs computation.

## DD method

Assuming that each reversible reaction is split into two irreversible ones, an EFMs is a vector of minimal support such that *S*.*v* = 0 with *v* ≥ 0, where *S* is the (*m* × *r*) stoichiometry matrix. This linear system can be rewritten as a set of inequalities:1The matrix is called the representation matrix of the cone *P*. The DD method [15] considers both descriptions of a polyhedral cone *P*: the implicit description by a set of linear inequality constraints defining *P* as the intersection of half-spaces, as described in eqn (1), and the explicit description by a set of generating vectors, represented as columns of a generating matrix *R*, defining *P* as non-negative linear combinations of these generators, which are thus the extreme rays: *v* = *R*.*λ* with *λ* ≥ 0. The DD method describes how to derive one description from the other. In our case, we use the transformation of the implicit description given by the representation matrix (eqn (1)) into the explicit description given by a generating matrix whose columns provide the EFMs. The algorithm is incremental, processing one inequality at each step, by building the new intermediate cone from the previous one by intersecting it with the half-space defined by the inequality constraint. The new intermediate extreme rays are computed by intersecting the hyperplane defined by the equality form of the constraint with the two-faces defined by all pairs of adjacent extreme rays, one on each side of this hyperplane.

Figure 1 illustrates one step of the algorithm on a 3D cone represented by its 2D-cut with a plane (thus the cone is represented by a polytope and its extreme rays by the vertices of this polytope). The new constraint is represented by the 2D-cut of a half-space, i.e. a half-plane delimited by a line (say at left of this line). One moves from four extreme rays to five: three are kept (at left of the line), one is eliminated (at right of the line) and two new ones are built (red vertices) as the intersections of the line with the edges joining adjacent extreme rays from both sides of the line.

Performance is improved by taking as initial generating matrix *R* a full rank nullspace matrix of the stoichiometry matrix *S* [23]. In this way, all *m* stoichiometric equalities *S*.*v* = 0 are by construction satisfied at initialization and one can also manage to choose column vectors of *R* so that *r* − *m* irreversibility inequalities on a total of *r* are satisfied too. Thus, the DD algorithm processes iteratively in *m* steps the *m* remaining irreversibility inequalities.

Using a binary approach for coding vectors reduces memory demands [6]. The key point is that, as only positive combinations of rays are performed, then only if the coordinates of the combining vectors are zero or non-zero matters in the minimization process. Thus, at each iteration step, when a new constraint is processed, the components already processed of the rays in construction are coded as bit vectors. Finally, the use of bit pattern trees optimizes searching of subsets during elementary testing [20].

## NET analysis

Assuming constant pressure and a closed system, according to the second law of thermodynamics, a reaction *j* proceeds spontaneously only in the direction of its negative Gibbs free energy Δ_{r}*G*_{j} [3]:2where is the standard Gibbs free energy of reaction *j*, *M*_{i} the concentration of metabolite *i*, *R* the molar gas constant, *T* the absolute temperature and *S*_{ij} the stoichiometric coefficient of metabolite *i* in reaction *j* (positive if *i* is a product and negative if *i* is a substrate).

In [9], a detailed proof is given showing that a flux distribution containing an infeasible EFM is always infeasible.

In NET analysis [12], the Gibbs free energies of reactions are constrained by the mutual thermodynamic interdependencies of the reactions in a network (i.e. the reactions' simultaneous action in the network). The metabolite concentrations have to be feasible in the entire network. NET analysis couples metabolite concentrations to an operating metabolic network via the second law of thermodynamics and the metabolites' Gibbs free energies of formation Δ_{f}*G*_{i}. The following optimization problem determines the feasible range of the Gibbs free energy Δ_{r}*G*_{j} of a particular reaction *j* with its directionality *d*_{j} (+ in forward direction and − in backward direction) for metabolite concentrations *M*_{i} constrained by lower and upper bounds *M*^{−}_{i} and *M*^{+}_{i} (concentrations are assumed to be positive and dimensionless quantities after division by the unit concentration *M*_{0}, in order to be able to consider their logarithm):3The NET analysis algorithm is implemented in the anNET software [25] which checks quantitative data sets for thermodynamic consistency.

NET analysis has been used in [9] to characterize the flux solution space by determining EFMs that are subsequently classified as thermodynamically feasible or infeasible on the basis of experimental metabolome data. This was done by incorporating quantitative metabolite concentrations into the analysis of all stoichiometrically feasible flux distributions, then computing all EFMs and using quantitative metabolite data to test the activities of each EFMs for thermodynamic feasibility.

## Extending DD method with linear programming to deal with thermodynamics

If the concentrations *M*_{i} of all metabolites were known, the thermodynamic constraint Δ_{r}*G*_{j} < 0 would fix the directionality of each reaction *j* and those EFMs satisfying these directionalities would be easily filtered. But in general, only concentrations of external metabolites are known and not those of internal metabolites. Nevertheless, an obvious necessary condition for an EFMs *e* to be thermodynamically feasible is that the system of inequalities Δ_{r}*G*_{j} < 0, for *j* ∈ supp(*e*), is consistent, i.e. has a solution in the unknown variables 's. Note that this system is linear in those unknown variables . So a linear programming tool can easily check this condition for each EFMs *e* (it takes the form of the modified problem (3) where the objective is replaced by *max* 0 and *j* runs through supp(*e*) with *d*_{j} = + ).

The second key remark is that the thermodynamic constraint *C*(*v*) for any flux vector *v* ∈ *P* (not necessarily an EFMs), expressed as this system for all *j* ∈ supp(*v*), is support-monotonic in the sense that ∀*v*, *v*′ ∈ *P*, if *C*(*v*) and supp(*v*′) ⊆ supp(*v*), then *C*(*v*′). The reason is that a subsystem of a consistent system is itself consistent. Thus conversely, if a flux distribution violates this thermodynamic constraint, it will be for any flux distribution with a larger support (analogously, [9] gave a detailed proof showing that a flux distribution containing a thermodynamically infeasible EFMs is always infeasible). Now, in the DD algorithm, any new extreme ray at step *k* + 1 is a positive combination of two adjacent extreme rays at step *k*, and thus its support is the union of their two supports. It results that any extreme ray at a given step can only give birth in the next steps to larger support extreme rays and thus at the end to larger support EFMs. Thus, if it violates the thermodynamic constraint, it has no use for the rest of the algorithm and can be definitely discarded.

This is used by thermodynamic EFMs analysis (tEFMA) [7] to filter thermodynamically infeasible extreme rays during the incremental process of the DD algorithm in order to obtain at the end the thermodynamically feasible EFMs. This is achieved by interfacing efmtool [20] with the LP tool CPLEX, calls to CPLEX on modified system (3) being done at each step for all newly created extreme rays.

Figure 2 illustrates this filtering at the step described by Figure 1, where we saw that two new extreme rays were built. Each one is then checked for thermodynamic feasibility. In the example, the red one is found not feasible and thus discarded and the other is found feasible and thus kept for the next step. At the end of this step, we have thus four thermodynamically feasible extreme rays instead of five and the solution space is made up of three faces of the previous cone.

This filtering can suppress at most 50% of infeasible EFMs (this maximum being reached if all reactions are initially reversible) in the absence of given lower and upper bounds on internal metabolite concentrations [16]. In the presence of such bounds, tEFMA may suppress up to 80% of infeasible EFMs [8].

## Using equilibrium constants of reactions to deal with thermodynamics

It is possible to compute all the possible thermodynamically feasible EFMs without knowing the concentrations of the internal metabolites. Actually, as at equilibrium Δ_{r}*G*_{j} is null, we obtain: , where is the equilibrium constant of reaction *j*. The thermodynamic constraint (2) can thus be rewritten as4and, after moving in the right-hand side the external metabolites concentrations:5where (apparent equilibrium constant of reaction *j*).

This equation is intuitively understandable: if the concentrations of substrates are high, the reaction proceeds in the forward direction, while it would run backward if the concentrations of products were sufficiently high. Let , then (5) rewrites as6Thus, a necessary condition of thermodynamic feasibility of any given pathway is that the linear system (6) (for all *j* in the support of this pathway) has a solution in *y*_{i}. It has been shown in [18] and revisited in [16] by using the Kuhn and Fourier theorem (a consequence of Farkas duality) that this condition is equivalent to the following: a necessary condition of thermodynamic feasibility for any EFMs *e* is7i.e. thermodynamically feasible EFMs need to have a positive scalar product with the fixed vector whose components are the logarithms of the apparent equilibrium constants of the reactions of the network. This simple numeric test is easy to apply at each step of the DD algorithm on each candidate EFMs (extreme ray) without requiring any call to an LP tool and, as seen above, suppresses by its own at most 50% of infeasible EFMs.

## How DD method can deal directly with thermodynamics

The formula (7) is a linear inequality defining a half-space and can thus be directly added as a new constraint in the DD algorithm, which will be processed exactly as the irreversibility constraints. This is simply done by just adding at initialization to the representation matrix representing the irreversibility constraints a new row whose coefficients are the 's. As the inequality in (7) is strict, solutions have then to be checked for equality and those belonging to the boundary hyperplane have to be discarded. Another more direct way is to rewrite the inequality as two constraints, and *v*_{r+1} > 0. This means adding a new column to the representation matrix (as a new (*r* + 1)th irreversible reaction) and the two rows and (0, …, 0, 1) and keeping solutions where reaction (*r* + 1) is present. Thus, multiple calls to LP at each of the *m* steps of the DD algorithm on each newly built extreme ray as described in section 4 are just replaced by processing (at the step we choose, e.g. the first one) a unique (or two) additional inequality constraint.

Figure 3 illustrates the step where this thermodynamic constraint is processed, as the half-space strictly at left of the red hyperplane. One of the five current extreme rays is eliminated and four are kept as thermodynamically feasible, which delimit a union of faces that is the updated solutions space.

But this important simplification is possible only in the absence of information about the concentrations of internal metabolites. Dealing with known lower and upper bounds for those concentrations requires calls to LP as seen in section 4. But both methods may coexist: processing at the beginning of the DD algorithm the constraint (7) as a single additional inequality will already filter up to 50% of infeasible EFMs (whatever the internal metabolite concentrations could be) and calling at each step of the DD algorithm an LP tool on the modified system (3) with given bounds on internal metabolites concentrations will filter the rest.

## Conclusion

Here, we have reviewed several approaches in which more detailed thermodynamic information is included into Metabolic Pathway Analysis. We have compared them to the traditional approach in which a binary distinction is made between reversible and irreversible reactions [19]. Moreover, we discussed some of the underlying theories.

The traditional method can be very useful if done correctly, as witnessed by the many successful case studies published (for review, see [4,21,26]). Thus, the question arises whether the added value in the thermodynamic approaches justifies the additional effort and input information needed. This question is difficult to answer; in our opinion, it requires further studies. Certainly, it depends on the concrete system under consideration and on the amount of available parameter values. While the values of equilibrium constants are relatively easy to obtain, the values of external metabolite concentrations and (if one wants to achieve more efficient filtering) bounds on internal metabolite concentrations are more difficult to obtain. Obviously, data acquisition gets more and more difficult as larger and larger systems are studied.

In the future, it will be interesting to include thermodynamic information also in other tools of Metabolic Pathway Analysis, such as FBA and Flux Variability Analysis. These can usually be applied to larger networks (even at genome scale) than EFMs analysis. On the other hand, the latter provides a more comprehensive picture on the multitude of pathways in the system.

## Competing Interests

The Authors declare that there are no competing interests associated with the manuscript.

**Abbreviations:** DD, double description; EFMs, elementary flux modes; FBA, flux balance analysis; NET, network-embedded thermodynamics; SMT, satisfiability modulo theories; tEFMA, thermodynamic EFM analysis

- © 2018 The Author(s). Published by Portland Press Limited on behalf of the Biochemical Society