## Abstract

Marine diatoms have potential as a biotechnological production platform, especially for lipid-derived products, including biofuels. Here we introduce some features of diatom metabolism, particularly with respect to photosynthesis, photorespiration and lipid synthesis and their differences relative to other photosynthetic eukaryotes. Since structural metabolic modelling of other photosynthetic organisms has been shown to be capable of representing their metabolic capabilities realistically, we briefly review the main approaches to this type of modelling. We then propose that genome-scale modelling of the diatom *Phaeodactylum tricornutum*, in response to varying light intensity, could uncover the novel aspects of the metabolic potential of this organism.

- diatoms
- elementary modes
- flux balance analysis
- genome scale metabolic model
- linear program
- metabolic networks
*Phaeodactylum tricornutum*

## Introduction

Diatoms are photoautotrophic unicellular algae and an important component of marine phytoplankton particularly for biogeochemical cycling of silica and carbon. They contribute 25% of primary production on earth and 40% of the marine primary production [1,2]. Their ability to synthesize lipid as a storage compound at 20%–30% of dry weight [3] makes them of particular interest for the production of biofuels [4–6], along with other high value products such as ω fatty acids [7,8].

*Phaeodactylum tricornutum* is regarded as a model diatom due to the availability of its genome sequence [9], comparatively small genome size [10,11], short generation time and ease of genetic manipulation [12,13].

Diatoms are believed to have arisen as the result of a secondary endosymbiotic event between two eukaryotes, a red alga and an oomycete [14–16]. This appears to have given rise to some unique features in their biochemistry in comparison with other photosynthetic eukaryotes, in terms of the subcellular localization of enzymes and the presence of some enzymes more commonly found in prokaryotes. An overview of central carbon metabolism is presented in Figure 1.

## Carbon metabolism in diatoms

The Calvin cycle enzymes, sedoheptulose-1,7-bisphosphatase (SBPase) and sedoheptulose-1,7-bisphosphate aldolase (SBPaldol), unlike in higher plants, are located in the cytosol rather than the plastid, as is the oxidative limb of the oxidative pentose phosphate pathway (OPPP). The synthesis of storage polysaccharide, in the form of chrysolaminaran (primarily composed of β1 −3 linked glucose residues) rather than starch, is also located in the cytosol, not the plastid. Most diatoms lack cytosolic phosphoglycerate mutase (PGAM) and enolase, effectively bisecting cytosolic glycolysis, although these enzymes are present in the plastid and mitochondrion and these might represent a way to complete glycolysis [17–19]. Alternatively, the pentose phosphate pathway could potentially act a sink for C3 glycolytic intermediates.

Diatoms also possess the enzymic machinery required to catalyse C4 carbon fixation, starting with the carboxylation of phosphoenolpyruvate to oxaloacetate. The carbon thus fixed is ultimately released in the chloroplast where it is re-fixed by ribulose 1,5-bisphosphate carboxylase/oxygenase (rubisco). Although this pathway is well understood in higher plants, where it serves to ameliorate the effects of drought and/or temperature stress [20,21], the role of these reactions in diatoms remains a subject of some debate [22–25].

Previously, Entner–Doudoroff pathway, common in prokaryote [26], has been reported to be present in *P. tricornutum* [27]. Although Entner–Doudoroff pathway has lower energy yield per glucose in comparison with Embden–Meyerhoff–Parnass pathway, it has the advantage of lower protein cost and is thermodynamically less constrained [28]. Similarly, phosphoketolase, reported in bacteria and fungi [29,30] and involved in the degradation of C5 sugars to produce acetyl phosphate, has also been recently reported to be present in *P. tricornutum* [27,31], where again its physiological significance is unclear.

The enzyme rubisco (rubisco, reaction r1 in Figure 1) catalyses the initial fixation of CO2 in the Calvin cycle, by carboxylating one molecule of ribulose 1,5-bisphosphate (RuBP) to produce two molecules of 3-phosphoglycerate (PGA). However, it also catalyses a second reaction resulting in the conversion of RuBP to PGA and the two-carbon compound, glycolate. In higher plants, this is ultimately recycled via glycine and serine to generate PGA in the process of photorespiration.

The fate of glycolate in diatoms is less clear; it is known to be directly excreted under some circumstances [32,33] and ^{14}C investigations reveal the potential for it to be recycled at least as far as serine [34,35]. Though diatoms have reactions to produce serine, both from photorespiratory and non-photorespiratory pathways [36], Zheng et al. [31] reported that a large proportion of serine and glycine was synthesized from glycolate through photorespiratory reactions, at least under mixotrophic conditions. However, there is no conclusive evidence for the photorespiratory pathways of higher plants existing in diatoms [17,18].

Given the limitations to our understanding of how these differences in the structure and compartmentation of metabolism in diatoms affect the functioning of photosynthetic metabolism, we considered it to be worthwhile to apply the techniques of metabolic modelling that have been used to analyse plants and cyanobacteria [37,38].

## Computer modelling of metabolic networks: theory and background

A model of a metabolic network can be defined in terms of either just the reaction stoichiometry and thermodynamics or these plus reaction kinetics. The former is commonly referred to as a structural model and the latter a kinetic model. Kinetic models also require estimates of the parameters associated with each kinetic equation and, in general, the concentrations of at least some of the external and internal metabolites. For this reason they require much more effort in their construction and are prone to much greater uncertainty due to the problems associated with the accurate determination of kinetic parameters and intracellular metabolite concentrations. Furthermore, their subsequent analysis is more complicated from the mathematical perspective and generally requires much greater computational resources than are needed for the structural analysis of the same network. For this reason, kinetic modelling tends to be restricted to relatively small (≈10s of reactions) networks and is not considered further here.

Structural modelling or more precisely the structural analysis of metabolic networks, depends on four very simple assumptions:

(1) The system can be divided into metabolites whose concentration can be changed by the action of the system (internal) metabolites and those that are not (external metabolites).

(2) The rate of change of internal metabolites is equal to the sum of their rates of production and consumption.

(3) The system can attain a state where the rates of change of all internal metabolites are equal to zero (rates of production equal consumption).

(4) Reactions can be categorized as reversible or irreversible, although this is not essential for some analyses.

The stoichiometries of the system can be brought together as a single matrix, the stoichiometry matrix, *N*. The columns represent reactions, the rows the metabolites and the elements the stoichiometric coefficient of a given reaction with respect to a given metabolite such that a positive value denotes production and a negative value consumption of that metabolite [39].

Given this definition of *N*, conditions 2 and 3 above can be expressed as a single equation:

where *v* is a vector of reaction flux values and 0 the null vector denoting steady state conditions. Structural analysis then revolves around identifying instances of *v* satisfying eqn. (1), possibly subject to additional constraints.

Although eqn. (1) appears trivial, its solution is not. This is a result of the fact that it is under-determined, i.e. an infinite number of instances of *v* satisfying the equation exist. Nevertheless, the equation implies relationships between the individual rates in *v* that have to be satisfied if the system is at steady state. Analysis therefore has two possible goals:

(1) Identification of invariant properties of all possible instances of

*v*: null-space analysis.(2) Identification of individual instances of

*v*that also satisfy some additional, user-defined, constraints: linear programming [40], often termed flux balance analysis (FBA) [41].

### Null-space analysis

Although eqn. (1) has no unique solution, it is possible, using standard techniques of linear algebra to derive from *N* a null space or kernel matrix, conventionally denoted *K*. In this matrix, reactions are represented in the rows and the number of columns is equal to the maximum dimension of the space that contains all possible steady states. *K* has the property that any solution *v*, to eqn. (1) can be expressed as:

where *w* is an arbitrary real valued vector whose length is equal to the dimension of *K*. Conversely, any instance of *v* that cannot be expressed by eqn. (2) will not satisfy eqn. (1). Consideration of eqn. (2) leads to the identification of two important invariant properties of the network:

(1) Reactions that cannot carry flux under any steady-state condition (dead reactions), identified by reactions whose row in

*K*contains only zero valued elements.(2) Sets of reactions that must carry flux in fixed ratio at steady state (enzyme or reaction subsets [42]) whose non-zero elements in the corresponding rows of

*K*are a constant ratio.

Furthermore, if the Gauss–Jordan method for solving sets of linear equations is used to determine *K*, then the columns represent minimal independent pathways in the network (but possibly violating reaction irreversibility). This property was extended by Schuster et al. [43] into the concept of elementary modes, the set of all independent minimal pathways in the network, representing all possible functionalities of the metabolism.

Although the determination of *K* is readily achieved for large (genome-scale) metabolic networks, the determination of elementary modes suffers from combinatorial explosion, making it a generally infeasible proposition, requiring large clusters of computers running for weeks or months on a single problem and is therefore not routinely applied to such large networks, although there have been some exceptions to this example [44].

### Linear programming and flux balance analysis

The second form of structural analysis is to determine a specific flux pattern in the network at steady state that also satisfies some additional constraints and achieves a certain objective. This uses the computationally powerful technique of linear programming and, in contrast with elementary modes, can be readily applied to a genome-scale metabolic model (GSM): those that attempt to represent the full metabolic capabilities of an organism using all the enzymes encoded in its genome [45]. In brief, the constraints consist of assigning fixed values, or a range, to some of the rates in *v*, usually derived from experimental data, such as substrate uptake rate.

Next, an objective has to be specified, which can be regarded as a hypothesis about how the metabolism has evolved to behave. It can be any linear function of one or many of the reaction rates in the model. A commonly-used objective in the study of microbial metabolism is maximization of growth rate for a fixed rate of uptake of carbon substrate [41]. (However, we have argued that this formulation is actually a maximization of the biomass yield per unit substrate uptake [46].)

An alternative objective function is the minimization of total flux in the network as a proxy for economy of investment in enzymic machinery [47,48]. This has the advantage that, because of the nature of linear programming algorithms, all non-essential reactions are assigned zero flux, whereas in maximization, they tend to be assigned the maximum flux subject to the constraints, resulting in artefactual cycles in the solution with no thermodynamic driving force. Filtering these out requires further processing and since most maximization problems can be expressed as an equivalent minimization problem, this is a largely unnecessary effort.

A single solution from FBA necessarily represents a single flux distribution in the network. Although useful, this, on its own, gives little insight as to how reaction fluxes may relate to one another and hence to deeper structural properties of the network. One method to overcome this problem is to repeatedly solve the problem with incremental changes to one or more constraints. For example, in a model of a photosynthetic organism, one constraint might be the absorbed photon flux. Repeatedly solving the problem with increasing values of this constraint allows the identification of potential response to variation in light intensity (see [37] for an example of this application).

Finally, how the cell can achieve the predicted flux pattern, through enzyme kinetics, effector interactions, covalent modification and enzyme expression, is not explicitly encoded in these structural models, so any solutions have to be compared with experimental studies of metabolism to determine whether the hypothesized objective is a fair representation of the outcome of the cell's regulatory machinery or whether it is falsified.

## Application of modelling to diatom metabolism

To date a number metabolic models have been developed for algae, including diatoms [44,49,50]. de Oliveira Dal'Molin et al. [50] used FBA techniques to analyse the GSM of *Chlamydomonas reinhardtii* and predict the fate of glycolate. Schauble et al. [49] used elementary modes techniques to predict the physiological role of circadian control in nitrogen metabolism of *Chlamydomonas reinhardtii*. Hunt et al. [44] computed approximately two billion elementary modes in the GSM of *P. tricornutum*, which represents a milestone in elementary modes calculation on large models, though the size of the set of results renders further analysis computationally challenging.

We have developed and applied FBA to this model, to investigate how the metabolism of *P. tricornutum* could to changing light intensity. These results show that glycolate can either be excreted or recycled as a precursor for lipid production, depending on the environmental conditions. The results also suggest that the Entner–Doudoroff and phosphoketolase pathways, uncommon in eukaryotes, have a potential role in diatom central carbon metabolism. These results are consistent with the work by Zheng et al. [31], who also propose a role for these reactions based on ^{13}C-labelling investigations.

## Conclusion

Diatom metabolism has turned out to be a good example of how enzymes present in other organisms can be assembled in different combinations and contexts to generate different modes of metabolic functioning. Holistic modelling of the full reaction network offers a way to uncover this novel metabolic potential, in the first instance as hypotheses to direct more-targeted experimentation to ascertain the extent to which this potential is actually exploited.

## Funding

This work was supported by the European Union's seventh framework programme for research, technological development and demonstration [grant number PITN- GA-2012-316427] to D.S.

## Footnotes

Metabolic Pathways Analysis 2015: Held at Bom Jesus, Braga, Portugal., 8–12 June 2015.

**Abbreviations:** FBA, flux balance analysis; GSM, genome-scale metabolic model; OPPP, oxidative pentose phosphate pathway; PGA, 3-phosphoglycerate; rubisco, ribulose-bisphosphate carboxylase; RuBP, ribulose 1,5-bisphosphate; SBPaldol, sedoheptulose-1,7-bisphosphate aldolase; SBPase, sedoheptulose-1,7- bisphosphatase

- © 2015 Authors; published by Portland Press Limited