Cytosine methylation and CpG, TpG (CpA) and TpA frequencies
Introduction
It is well known from nearest neighbour analyses Josse et al., 1961, Swartz et al., 1962 that the most remarkable deviation of doublet frequency from random expectation in the genome of vertebrates is the shortage of the doublet CpG. Scarano et al. (1967) proposed a mechanism for cell differentiation involving deamination of mC to T. Salser (1977) suggested that CpG deficiency is related to DNA methylation, since mCpG is practically the only methylated sequence in animal genomes. Subsequent investigations on E. coli have shown that mutational hot spots in the I gene of the Lac operon are caused by an abnormally high mutation rate from 5mC to T (Coulondre et al., 1978).
It was proposed (Bird, 1980) that, if mCpG were to mutate relatively frequently compared to the other dinucleotides over evolutionary time, the observed CpG deficiency should be matched by a corresponding accumulation of the complementary dinucleotides TpG and CpA. Since one 5mC change leads to the loss of two CpGs and the gain of one TpG and one CpA, this should lead to a negative correlation between CpG deficiency and TpG plus CpA excess. Such correlation was reported (Bird, 1980).
Another notably biased dinucleotide is TpA Russell, 1974, Nussinov, 1983, Nussinov, 1984, Bulmer, 1987. Dinucleotide biases have been suggested to stem from structural constraints at the DNA level, such as DNA stacking Nussinov, 1984, Mrazek and Karlin, 1997. It was noticed also (Yomo and Ohno, 1989) that an excess of TpG compensates for a deficiency of CpG and TpA.
Subsequent computer simulations (Duret and Galtier, 2000), in which selection was ignored (as in Sved and Bird, 1990), led to the conclusion that “increased mutation rate from CpG to TpG or CpA induces both an apparent TpA deficiency and a correlation between CpG and TpA deficiencies and G+C content”. According to these authors, since CpG methylation increases mutation rate from CpG to TpG or CpA (as indicated by low values of CpG observed/expected), this would induce “artefactual” correlations between CpG and TpA shortages and GC level. Using the same reasoning, the same inference, namely a “statistical” link between CpG deficiency and TpA excess, was made by Fryxell and Zuckerkandl (2000).
In the present work, we revisited the CpG shortage phenomenon in the light of data from the literature and the advances made in genome projects. Our approach took advantage of the following:
- (i)
The large difference in average CpG/GpC values of insects (0.93±0.11) and mammals (0.27±0.04) is ideal for analysing the effect of CpG shortage on the other dinucleotides.
- (ii)
The availability of large genomic sequences from the very slightly methylated genome of Drosophila melanogaster Gowher et al., 2000, Lyko et al., 2000 and the heavily methylated genome of human provide a good opportunity for extending this analysis to the intragenome variation of dinucleotide frequencies.
Section snippets
Materials and methods
All genomic sequences larger than 50 kb were extracted from GenBank (release March 2003), with the exception of the Anopheles gambiae sequences, which were downloaded from the site ftp://ftp.ensembl.org/pub/anopheles-7.1a/data/golden_path. This led to data sets of 2916 sequences for human, 1978 sequences for Drosophila and 4269 sequences for Anopheles. Because sequences larger than 50 kb are still scarce for fishes, we decreased their size limit to DNA segments larger than 40 kb, which led to
CpG shortage and TpG (CpA) excess
As shown in Table 1, there is a large difference in CpG/GpC values between mammals (0.27±0.04) and insects (0.93±0.11) with a relatively low value for Drosophila (CpG/GpC=0.71). Fig. 1a shows that CpG frequency is positively correlated with GC level of the large DNA segments Bernardi, 1985, Jabbari and Bernardi, 1998, CpG frequency being higher in large DNA sequences from Drosophila compared to those from human. Fig. 1b shows the correlation between TpG (or CpA) and GC, which indicates that the
Discussion
The three main results of this work are: (1) the lack of correlation between CpG shortage and the TpG (CpA) excess; (2) the independence of CpG and TpA deficiencies from 5mCpG deamination; (3) the in silico confirmation of our results (Jabbari et al., 1997) on the higher CpG level of fish compared to mammals and its extension to the within genome level.
As far as the first result is concerned, the sequence analysis of animal genomes (Drosophila, Anopheles, puffer fish, zebra fish and human) with
Acknowledgements
We acknowledge helpful discussions with participants in the ISME meetings in Costa Rica (“Chromosomes: structure, function and evolution”, January 2001) and in Ischia (“Natural selection and the neutral theory”, October 2001), where these results were first presented.
References (42)
- et al.
CpG islands, genes, isochores in the genome of vertebrates
Gene
(1991) - et al.
CpG islands: features and distribution in the genome of vertebrates
Gene
(1991) - et al.
Methylation patterns in the isochores of vertebrate genomes
Gene
(1997) - et al.
Micrococcal nuclease as a DNA structural probe: its recognition sequences, their genomic distribution and correlation with DNA structure determinants
J. Mol. Biol.
(1986) - et al.
CpG islands in vertebrate genomes
J. Mol. Biol.
(1987) Sequence-dependent DNA structure. The role of base stacking interactions
J. Mol. Biol.
(1993)- et al.
Evolutionary changes in CpG and methylation levels in the genome of vertebrates
Gene
(1997) - et al.
Enzymatic synthesis of deoxyribonucleic acid: VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid
J. Biol. Chem.
(1961) - et al.
Extent of CpG methylation is not proportional to the in vivo spontaneous mutation frequency at transgenic loci in Big Blue rodents
Mutat. Res.
(2001) - et al.
Sequence-dependent DNA structure: dinucleotide conformational maps
J. Mol. Biol.
(2000)
The effect of histones on the enzymatic sythesis of ribonucleic acid
J. Biol. Chem.
Further studies on of nearest neighbor base sequences in deoxyribonucleic acid
J. Biol. Chem.
The organization of the vertebrate genome and the problem of the CpG shortage
Structural and Evolutionary Genomics. Natural Selection in Genome Evolution
DNA methylation and the frequency of CpG in animal DNA
Nucleic Acids Res.
CpG and TpA frequencies in the plant system
Nucleic Acids Res.
A statistical analysis of nucleotide sequences of introns and exons in human genes
Mol. Biol. Evol.
Pervasive CpG suppression in animal mitochondrial genomes
Proc. Natl. Acad. Sci. U. S. A.
Mutational processes in pathology and evolution
The distribution of the dinucleotide CpG and cytosine methylation in the vitellogenin gene family
J. Mol. Evol.
Molecular basis of base substitution hotspots in Escherichia coli
Nature
Cited by (204)
AIMER: A SNP-independent software for identifying imprinting-like allelic methylated regions from DNA methylome
2024, Computational and Structural Biotechnology JournalHPV- and HIV-associated epigenetic silencing in cervical cancer: Targets for overcoming chemoresistance
2024, Strategies for Overcoming Chemotherapy Resistance in Cervical Cancer: From Molecular Insights to Precision SolutionsMechanisms of DNA methylation and histone modifications
2023, Progress in Molecular Biology and Translational ScienceBiophysics is reshaping our perception of the epigenome: from DNA-level to high-throughput studies
2021, Biophysical Reports