Elsevier

Phytochemistry

Volume 64, Issue 6, November 2003, Pages 1097-1112
Phytochemistry

An in silico assessment of gene function and organization of the phenylpropanoid pathway metabolic networks in Arabidopsis thaliana and limitations thereof

https://doi.org/10.1016/S0031-9422(03)00517-XGet rights and content

Abstract

The Arabidopsis genome sequencing in 2000 gave to science the first blueprint of a vascular plant. Its successful completion also prompted the US National Science Foundation to launch the Arabidopsis 2010 initiative, the goal of which is to identify the function of each gene by 2010. In this study, an exhaustive analysis of The Institute for Genomic Research (TIGR) and The Arabidopsis Information Resource (TAIR) databases, together with all currently compiled EST sequence data, was carried out in order to determine to what extent the various metabolic networks from phenylalanine ammonia lyase (PAL) to the monolignols were organized and/or could be predicted. In these databases, there are some 65 genes which have been annotated as encoding putative enzymatic steps in monolignol biosynthesis, although many of them have only very low homology to monolignol pathway genes of known function in other plant systems. Our detailed analysis revealed that presently only 13 genes (two PALs, a cinnamate-4-hydroxylase, a p-coumarate-3-hydroxylase, a ferulate-5-hydroxylase, three 4-coumarate-CoA ligases, a cinnamic acid O-methyl transferase, two cinnamoyl-CoA reductases) and two cinnamyl alcohol dehydrogenases can be classified as having a bona fide (definitive) function; the remaining 52 genes currently have undetermined physiological roles. The EST database entries for this particular set of genes also provided little new insight into how the monolignol pathway was organized in the different tissues and organs, this being perhaps a consequence of both limitations in how tissue samples were collected and in the incomplete nature of the EST collections. This analysis thus underscores the fact that even with genomic sequencing, presumed to provide the entire suite of putative genes in the monolignol-forming pathway, a very large effort needs to be conducted to establish actual catalytic roles (including enzyme versatility), as well as the physiological function(s) for each member of the (multi)gene families present and the metabolic networks that are operative. Additionally, one key to identifying physiological functions for many of these (and other) unknown genes, and their corresponding metabolic networks, awaits the development of technologies to comprehensively study molecular processes at the single cell level in particular tissues and organs, in order to establish the actual metabolic context.

In this study, an exhaustive analysis of The Institute for Genomic Research (TIGR) and The Arabidopsis Information Resource (TAIR) databases, together with all currently compiled EST sequence data, was carried out in order to determine to what extent the various metabolic networks from phenylalanine ammonia lyase (PAL) to the monolignols were organized and/or could be predicted. Only 13 out of 65 annotated genes in this pathway could be given a precise function.

  1. Download : Download full-size image

Introduction

The National Science Foundation (NSF) Arabidopsis 2010 Initiative in the USA has as its mission the identification of the physiological functions of all of the Arabidopsis thaliana genes by 2010, including their cellular, organismal and evolutionary relationships. This approach will thus enable us to further understand, in a whole plant context, many of the most fundamental aspects of plant metabolism and the gene functions associated with same. The overall strategy being undertaken at present essentially uses a systems approach, this being a marked departure from the more conventional practice of, for example, defining a particular scientific goal based on an observed biological phenomenon. Arabidopsis was chosen since this angiosperm, a representative of the mustard family, has a relatively small but fully sequenced 120-megabase genome containing an estimated 25,000+ genes; it can therefore serve as a convenient model for addressing fundamental questions of biological structure and function common to all eukaryotes (Meinke et al., 1998, The Arabidopsis Genome Initiative, 2000).

As part of the NSF Arabidopsis 2010 Initiative, this laboratory has recently undertaken the task of studying some 248 genes found in the A. thaliana databases that display sequence homology to genes primarily involved in selected and/or suspected aspects of phenylpropanoid/phenylpropanoid-acetate pathway metabolism and/or cell wall formation, with a particular emphasis being placed on the lignin/lignan biosynthetic pathways. The overall goal is to isolate and characterize all of these homologues from A. thaliana and to determine their functional genomic roles by studying not only the patterns of gene expression and the effects of modulating same, but also in comprehensively characterizing the proteins and enzymes at the chemical, biochemical and structural levels.

The presumed phenylpropanoid pathway networks of interest can be contemplated as complex biological regulatory systems that have evolved in vascular plants during their successful transition to land, and which are ultimately essential for their growth, development and survival. However, how these metabolic networks are organized and differentially controlled is not yet understood. Nevertheless, it is these coordinated phenylpropanoid pathway networks from phenylalanine ammonia lyase (PAL) onwards [and preceding transcriptionally coordinated metabolic processes (Anterola and Lewis, 2002)] that eventually differentially afford the phenylpropanoid components of the lignins, lignans, hydroxycinnamic acids, flavonoids, suberins, sporopollenins, cutins and other related constituents in various cell types, tissues and organs (Lewis & Davin, 1994, Lewis & Davin, 1999, Lewis et al., 1999). Furthermore, when considered from both evolutionary and functional perspectives, such networks have afforded vascular plants with competitive advantages for successful land plant adaptation (Lewis & Davin, 1994, Lewis & Davin, 1999, Lewis et al., 1999), which include: mechanisms for obtaining and transporting water and nutrients (Lewis and Yamamoto, 1990); maintenance of a high water potential which facilitates active metabolism in desiccating environments (Bernards et al., 1995, Bernards & Lewis, 1998); minimizing the effects of temperature, humidity and (UV) light variations; withstanding and modulating forces of compression which act upon plant structures during growth and development (Lewis et al., 1999), and formation of specialized structures which permit lengthy maintenance of pollen grain viability (Wiermann and Gubatz, 1992).

Interestingly, the apparent partial overlap of these branchpoint phenylpropanoid pathways (e.g. to flavonoids, lignans, sporopollenins, etc.) has been viewed by some researchers as evidence for both metabolic redundancy and metabolic channeling (Hrazdina and Jensen, 1992). In contrast, our own view is that of a series of distinct, sophisticated, biochemical networks which are present in different tissues and cell types, and which enable the various metabolic branch pathways to be differentially controlled during each and every aspect of plant growth and development (Anterola and Lewis, 2002). However, no single plant species has yet been systematically examined to determine either how such proposed networks in phenylpropanoid pathway associated metabolism are organized and controlled during each stage of plant growth and development. In a somewhat related manner, little is even understood about the precise nature of the oxidative enzyme networks responsible for generating the free radical intermediates involved (at least in part) in the assembly of lignins, lignans, suberins and other phenol radical-radical linked products. Hence, the sequencing of the Arabidopsis genome now provides the opportunity for the systematic dissection of the proposed complex metabolic networks in order to identify how they are actually differentially organized during growth and development. Moreover, given that apparently there are various multigene families for most of the enzymatic steps, such proposed networks would be expected to be differentially controlled. This could occur as distinct gene functions within particular cell types, and/or it could be a result of differential expression within the same cell type with distinct metabolic pathways being activated concurrently and/or differentially. It is one of our goals to identify how such overall organization is achieved.

In this regard, it was anticipated that an in silico analysis could provide some useful initial insights in order to bridge the gap between information stored within the Arabidopsis genome, and that already known about [quantifiable] spatial and temporal gene expression patterns (Bouchez and Höfte, 1998). This is because the relative abundance of expressed sequence tags (ESTs) from available databases, which are derived from libraries from different organs and plants, might be anticipated to provide important clues as to possible physiological and/or developmental functions and hence of the tissue-specific organization of the proposed networks involved. Such EST analyses have been applied in studies as diverse as toxicology (Fielden et al., 2002), inner-ear function (Klockars et al., 2003), and plant metabolic pathways (Allona et al., 1998, Sterky et al., 1998, White et al., 2000, Hertzberg et al., 2001).

For example, to learn more about possible gene expression patterns involved in wood formation, Allona et al. (1998) performed a preliminary analysis of xylem formation in loblolly pine (Pinus taeda) albeit based only on 1097 ESTs. Using this approach, genes putatively encoding regulatory proteins were identified, as were others presumed to be involved in cell wall formation, in addition to lignin and carbohydrate biosynthetic enzymes. As expected, other than their grouping into broad enzyme classes (e.g. cytochrome P450s), the actual physiological roles of unknown partially sequenced genes could not be identified at the time of the investigation. For example, it was not until 2002 with functional genomic studies, that a gene of unknown function and partially annotated as a cytochrome P450, was established to be a p-coumarate 3-hydroxylase in loblolly pine (P. taeda) (Anterola et al., 2002), based partly on previous findings by Schoch et al. (2001). In an even more comprehensive manner, Hertzberg et al. (2001) identified an unique tissue-specific transcript profile for a well-defined developmental gradient occurring during xylogenesis in the secondary xylem of poplar trees. These data suggested, as would be predicted, that genes encoding lignin biosynthetic enzymes, transcription factors and other potential xylogenesis regulators were under strict developmental stage-specific transcriptional regulation; however, again no new physiological functions for any of the genes were identified. In other approaches, Klockars et al. (2003) used an in silico analysis of mouse inner-ear transcripts to identify those putatively associated with specific roles in auditory or vestibular functions, whereas White et al. (2000) employed an “electronic or digital northern” technique to estimate gene expression involved in the conversion of photosynthate into oil in developing seeds of Arabidopsis; in the latter case, the ESTs revealed patterns of gene expression associated with specific plant tissues and/or growth conditions.

In spite of the obvious limitations in identifying either precise physiological functions and temporal/spatial relationships of both genes and gene products on a cell-by-cell basis, this “electronic or digital northern” might still provide possible clues into the organization of metabolic routes of interest (White et al., 2000). Accordingly, we thus considered it instructive to initially employ a computerized “database mining” approach to generate three-dimensional graphical representations of reprocessed EST database collections of specific tentative consensus (TC) cDNAs of interest, i.e. as preliminary evaluations of gene expression patterns via “digital northern” analyses. Initial source information was retrieved from The Institute for Genomic Research (TIGR) and The Arabidopsis Information Resource (TAIR) databases in order to facilitate further understanding of the presumed metabolic networks involved in phenylpropanoid metabolism including their regulatory systems. As described below, however, the EST database entries only provided very limited insights into both the identification and organization of the networks involved post-phenylalanine (1) to the monolignols 24. These data also underscored the need for both a full biochemical clarification of the roles of each protein/enzyme of interest, together with the determination of the actual metabolic context of each in vivo.

Section snippets

Results and discussion

The three possible and distinct networks integrally associated with the phenylpropanoid pathway, and of primary interest to us, include temporal and spatial organization in planta of the various: (1) radical–radical coupling/polymerization networks for phenylpropanoid coupling/polymerization leading to formation of lignins, lignans, suberins, sporopollenins, etc. and how such proteins and enzymes mechanistically function; (2) downstream metabolic networks, following phenylpropanoid coupling of

Concluding remarks

Completion of the sequencing of the Arabidopsis genome in 2000 brought to science the entire blueprint of this highly studied organism. This, in turn, now offers the opportunity to systematically dissect, delineate and identify the true physiological functions of its various genes, including their responses to a variety of environmental conditions. However, the annotation of gene sequences in the public databases as having a particular and/or putative function (e.g. in the pathway to the

Database analysis

In this investigation, the Institute for Genomic Research (TIGR) database, which attempts to elucidate and/or predict gene function of the 25,000+ genes of A. thaliana, together with The Arabidopsis Information Resource (TAIR) database, were analyzed. The purpose was to identify all potential tentative consensus sequence homologues of proteins with established and/or suspected (i.e. annotated) roles in the phenylpropanoid pathway, particularly those involved in lignin/lignan biosynthesis.

Acknowledgements

This research project is supported by the National Science Foundation Arabidopsis 2010 (MCB-0117260), the National Aeronautics and Space Administration (NAG 2-1513) and the G. Thomas and Anita Hargrove Center for Plant Genomic Research.

References (84)

  • Y Kida et al.

    Membrane topology of NADPH-cytochrome P450 reductase on the endoplasmic reticulum

    Archives of Biochemistry and Biophysics

    (1998)
  • J Koukol et al.

    The metabolism of aromatic compounds in higher plants. IV. Purification and properties of the phenylalanine deaminase of Hordeum vulgare

    The Journal of Biological Chemistry

    (1961)
  • V Lauvergeat et al.

    Two cinnamoyl-CoA reductase (CCR) genes from Arabidopsis thaliana are differentially expressed during development and in response to infection with pathogenic bacteria

    Phytochemistry

    (2001)
  • N.G Lewis et al.

    Lignans: biosynthesis and function

  • N.G Lewis et al.

    The nature and function of lignins

  • X Liang et al.

    Differential regulation of phenylalanine ammonia-lyase genes during plant development and by environmental cues

    The Journal of Biological Chemistry

    (1989)
  • I Muzac et al.

    Functional expression of an Arabidopsis cDNA clone encoding a flavonol 3′-O-methyltransferase and characterization of the gene product

    Archives of Biochemistry and Biophysics

    (2000)
  • A.-E Pakusch et al.

    S-Adenosyl-L-methionine:trans-caffeoyl-coenzyme A 3-O-methyltransferase from elicitor-treated parsley cell suspension cultures

    Archives of Biochemistry and Biophysics

    (1989)
  • D Schmitt et al.

    Molecular cloning, induction, and taxonomic distribution of caffeoyl-CoA 3-O-methyltransferase, an enzyme involved in disease resistance

    The Journal of Biological Chemistry

    (1991)
  • G Schoch et al.

    CYP98A3 from Arabidopsis thaliana is a 3′-hydroxylase of phenolic esters, a missing link in the phenylpropanoid pathway

    The Journal of Biological Chemistry

    (2001)
  • J Schröder

    The chalcone/stilbene synthase-type family of condensing enzymes

  • P Urban et al.

    Cloning, yeast expression, and characterization of the coupling of two distantly related Arabidopsis thaliana NADPH-cytochrome P450 reductases with P450 CYP73A5

    Proceedings of the National Academy of Sciences of the United States of America

    (1997)
  • P.S van Heerden et al.

    Nitrogen metabolism in lignifying Pinus taeda cell cultures

    The Journal of Biological Chemistry

    (1996)
  • R Wiermann et al.

    Pollen wall and sporopollenin

    International Review of Cytology

    (1992)
  • H Zhang et al.

    An Arabidopsis gene encoding a putative 14-3-3-interacting protein, caffeic acid/5-hydroxyferulic acid O-methyltransferase

    Biochimica et Biophysica Acta

    (1997)
  • I Allona et al.

    Analysis of xylem formation in pine by cDNA sequencing

    Proceedings of the National Academy of Sciences of the United States of America

    (1998)
  • R Atanassova et al.

    Altered lignin composition in transgenic tobacco expressing O-methyltransferase sequences in sense and antisense orientation

    The Plant Journal

    (1995)
  • H Bannai et al.

    Extensive feature detection of N-terminal protein sorting signals

    Bioinformatics

    (2002)
  • M Baucher et al.

    Red xylem and higher lignin extractability by down-regulating a cinnamyl alcohol dehydrogenase in poplar

    Plant Physiology

    (1996)
  • D.A Bell-Lelong et al.

    Cinnamate-4-hydroxylase expression in Arabidopsis

    Plant Physiology

    (1997)
  • B Benveniste et al.

    Purification and characterization of the NADPH-cytochrome P-450 (cytochrome c) reductase from higher plant microsomal fractions

    The Biochemical Journal

    (1986)
  • D Bouchez et al.

    Functional genomics in plants

    Plant Physiology

    (1998)
  • R.C Bugos et al.

    cDNA cloning, sequence analysis and seasonal expression of lignin-bispecific caffeic acid/5-hydroxyferulic acid O-methyltransferance of aspen

    Plant Molecular Biology

    (1991)
  • F Chen et al.

    Evidence for a novel biosynthetic pathway that regulates the ratio of syringyl to guaiacyl residues in lignin in the differentiating xylem of Magnolia kobus DC

    Planta

    (1999)
  • H Chiron et al.

    Molecular cloning and functional expression of a stress-induced multifunctional O-methyltransferase with pinosylvin methyltransferase activity from Scots pine (Pinus sylvestris L.)

    Plant Molecular Biology

    (2000)
  • R Croteau et al.

    Natural products (secondary metabolites)

  • D Cukovic et al.

    Structure and evolution of 4-coumarate:coenzyme A ligase (4CL) gene families

    Biological Chemistry

    (2001)
  • J Ehlting et al.

    Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionary divergent classes in angiosperms

    The Plant Journal

    (1999)
  • J Ehlting et al.

    Identification of 4-coumarate:coenzyme A ligase (4CL) substrate recognition domains

    The Plant Journal

    (2001)
  • D Fell

    Understanding the Control of Metabolism

    (1997)
  • M.R Fielden et al.

    In silico approaches to mechanistic and predictive toxicologyan introduction to bioinformatics for toxicologists

    Critical Reviews in Toxicology

    (2002)
  • T Goujon et al.

    A new Arabidopsis thaliana mutant deficient in the expression of O-methyltransferase impacts lignins and sinapoyl esters

    Plant Molecular Biology

    (2003)
  • Cited by (122)

    • Identification and expression profiling of genes governing lignin biosynthesis in Casuarina equisetifolia L.

      2018, Gene
      Citation Excerpt :

      In situ hybridization studies have shown that EgCAD2 gene expression was observed in differentiating xylem vessels of Eucalyptus stem tissues (Hawkins et al., 2003). The existence of multiple isoforms of lignin biosynthesis genes allows regulatory flexibility where each member is preferentially expressed in response to developmental, environmental, or metabolic needs (Kumar and Ellis, 2001; Kao et al., 2002; Costa et al., 2003; Raes et al., 2003). Differential expression of lignin biosynthesis genes suggests that monolignols serve different physiological functions other than wood formation (Boudet, 2008).

    View all citing articles on Scopus
    View full text