Elsevier

Gene

Volume 333, 26 May 2004, Pages 143-149
Gene

Cytosine methylation and CpG, TpG (CpA) and TpA frequencies

https://doi.org/10.1016/j.gene.2004.02.043Get rights and content

Abstract

An analysis of dinucleotide frequencies was carried out on DNAs from insects and mammals, as well as on large DNA sequences from the genomes of Drosophila melanogaster, Anopheles gambiae, puffer fish (Takifugu rubripes), zebra fish (Danio rerio) and human. These organisms were chosen because Drosophila and Anopheles DNAs have an extremely low level of methylation, human DNA a high level and fish DNA a two-fold higher level compared to human. The results indicate that: (i) CpG deficiency and the corresponding TpG (CpA) excess show no correlation with the level of DNA methylation; indeed, genomes endowed with strikingly different levels of DNA methylation (such as those of Drosophila and human) exhibited similar TpG (CpA) levels; (ii) the correlation between GC levels of large (50 kb) DNA sequences and TpA or CpG shortage levels do not appear to be due to CpG methylation followed by deamination; (iii) CpG dinucleotides are more frequent in fishes than in mammals; interestingly, the monotreme Ornitorhinchus anatinus shows an intermediate CpG frequency. The implications of these results are discussed.

Introduction

It is well known from nearest neighbour analyses Josse et al., 1961, Swartz et al., 1962 that the most remarkable deviation of doublet frequency from random expectation in the genome of vertebrates is the shortage of the doublet CpG. Scarano et al. (1967) proposed a mechanism for cell differentiation involving deamination of mC to T. Salser (1977) suggested that CpG deficiency is related to DNA methylation, since mCpG is practically the only methylated sequence in animal genomes. Subsequent investigations on E. coli have shown that mutational hot spots in the I gene of the Lac operon are caused by an abnormally high mutation rate from 5mC to T (Coulondre et al., 1978).

It was proposed (Bird, 1980) that, if mCpG were to mutate relatively frequently compared to the other dinucleotides over evolutionary time, the observed CpG deficiency should be matched by a corresponding accumulation of the complementary dinucleotides TpG and CpA. Since one 5mC change leads to the loss of two CpGs and the gain of one TpG and one CpA, this should lead to a negative correlation between CpG deficiency and TpG plus CpA excess. Such correlation was reported (Bird, 1980).

Another notably biased dinucleotide is TpA Russell, 1974, Nussinov, 1983, Nussinov, 1984, Bulmer, 1987. Dinucleotide biases have been suggested to stem from structural constraints at the DNA level, such as DNA stacking Nussinov, 1984, Mrazek and Karlin, 1997. It was noticed also (Yomo and Ohno, 1989) that an excess of TpG compensates for a deficiency of CpG and TpA.

Subsequent computer simulations (Duret and Galtier, 2000), in which selection was ignored (as in Sved and Bird, 1990), led to the conclusion that “increased mutation rate from CpG to TpG or CpA induces both an apparent TpA deficiency and a correlation between CpG and TpA deficiencies and G+C content”. According to these authors, since CpG methylation increases mutation rate from CpG to TpG or CpA (as indicated by low values of CpG observed/expected), this would induce “artefactual” correlations between CpG and TpA shortages and GC level. Using the same reasoning, the same inference, namely a “statistical” link between CpG deficiency and TpA excess, was made by Fryxell and Zuckerkandl (2000).

In the present work, we revisited the CpG shortage phenomenon in the light of data from the literature and the advances made in genome projects. Our approach took advantage of the following:

  • (i)

    The large difference in average CpG/GpC values of insects (0.93±0.11) and mammals (0.27±0.04) is ideal for analysing the effect of CpG shortage on the other dinucleotides.

  • (ii)

    The availability of large genomic sequences from the very slightly methylated genome of Drosophila melanogaster Gowher et al., 2000, Lyko et al., 2000 and the heavily methylated genome of human provide a good opportunity for extending this analysis to the intragenome variation of dinucleotide frequencies.

Section snippets

Materials and methods

All genomic sequences larger than 50 kb were extracted from GenBank (release March 2003), with the exception of the Anopheles gambiae sequences, which were downloaded from the site ftp://ftp.ensembl.org/pub/anopheles-7.1a/data/golden_path. This led to data sets of 2916 sequences for human, 1978 sequences for Drosophila and 4269 sequences for Anopheles. Because sequences larger than 50 kb are still scarce for fishes, we decreased their size limit to DNA segments larger than 40 kb, which led to

CpG shortage and TpG (CpA) excess

As shown in Table 1, there is a large difference in CpG/GpC values between mammals (0.27±0.04) and insects (0.93±0.11) with a relatively low value for Drosophila (CpG/GpC=0.71). Fig. 1a shows that CpG frequency is positively correlated with GC level of the large DNA segments Bernardi, 1985, Jabbari and Bernardi, 1998, CpG frequency being higher in large DNA sequences from Drosophila compared to those from human. Fig. 1b shows the correlation between TpG (or CpA) and GC, which indicates that the

Discussion

The three main results of this work are: (1) the lack of correlation between CpG shortage and the TpG (CpA) excess; (2) the independence of CpG and TpA deficiencies from 5mCpG deamination; (3) the in silico confirmation of our results (Jabbari et al., 1997) on the higher CpG level of fish compared to mammals and its extension to the within genome level.

As far as the first result is concerned, the sequence analysis of animal genomes (Drosophila, Anopheles, puffer fish, zebra fish and human) with

Acknowledgements

We acknowledge helpful discussions with participants in the ISME meetings in Costa Rica (“Chromosomes: structure, function and evolution”, January 2001) and in Ischia (“Natural selection and the neutral theory”, October 2001), where these results were first presented.

References (42)

  • A Skalka et al.

    The effect of histones on the enzymatic sythesis of ribonucleic acid

    J. Biol. Chem.

    (1966)
  • M.N Swartz et al.

    Further studies on of nearest neighbor base sequences in deoxyribonucleic acid

    J. Biol. Chem.

    (1962)
  • G Bernardi

    The organization of the vertebrate genome and the problem of the CpG shortage

  • G Bernardi

    Structural and Evolutionary Genomics. Natural Selection in Genome Evolution

    (2004)
  • A.P Bird

    DNA methylation and the frequency of CpG in animal DNA

    Nucleic Acids Res.

    (1980)
  • M Boudraa et al.

    CpG and TpA frequencies in the plant system

    Nucleic Acids Res.

    (1987)
  • M Bulmer

    A statistical analysis of nucleotide sequences of introns and exons in human genes

    Mol. Biol. Evol.

    (1987)
  • L.R Cardon et al.

    Pervasive CpG suppression in animal mitochondrial genomes

    Proc. Natl. Acad. Sci. U. S. A.

    (1994)
  • D.N Cooper et al.

    Mutational processes in pathology and evolution

  • D.N Cooper et al.

    The distribution of the dinucleotide CpG and cytosine methylation in the vitellogenin gene family

    J. Mol. Evol.

    (1987)
  • C Coulondre et al.

    Molecular basis of base substitution hotspots in Escherichia coli

    Nature

    (1978)
  • Cited by (204)

    • HPV- and HIV-associated epigenetic silencing in cervical cancer: Targets for overcoming chemoresistance

      2024, Strategies for Overcoming Chemotherapy Resistance in Cervical Cancer: From Molecular Insights to Precision Solutions
    • Mechanisms of DNA methylation and histone modifications

      2023, Progress in Molecular Biology and Translational Science
    View all citing articles on Scopus
    View full text