Abstract
We compare the annotation of three complete genomes using theab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.
Similar content being viewed by others
References
Audic S and Claverie J M 1998 Self-identification of protein-coding regions in microbial genomes;Proc. Natl. Acad. Sci. USA 95 10026–10031
Bhattacharya A, Sudha Bhattacharyya and John P Ackers 1999 Nontranslated polyadenylated ribonucleic acids from the protozoan parasiteE. histolytica;Curr. Sci. 77 564–567
Bhattacharya A, Bhattacharya S, Joshi A, Ramachandran S and Ramaswamy R 2000 Identification of Parasitic Genes by Computational Methods;Parasitol. Today 16 127–130
Borodovsky M and McIninch J 1993 GeneMark: Parallel Gene Recognition for both DNA Strands;Comput. Chem. 17 123–133
Burge C and Karlin S 1997 Prediction of complete gene structures in human genomic DNA;J. Mol. Biol. 268 78–94
Burset M and Guigo R 1996 Evaluation of Gene Structure Prediction Programs;Genomics 34 353–367
Claverie J M 1997 Computational methods for identification of genes in vertebrate genomic sequences;Hum. Mol. Genet. 6 1735–1744
Delcher A L, Hormon D, Kasif S, White O and Salzberg S L 1999 Improved microbial gene identification with GLIMMER;Nucleic Acids Res. 27 4636–4641
Dunham Iet al 1999 The DNA sequence of human chromosome 22;Nature (London) 402 489–495
Fickett J W 1996 The gene identification problem: an overview for developers;Comput. Chem. 20 103–118
Guigo R 1999 DNA composition, codon usage and exon prediction; inGenetics Databases (ed.) M Bishop (New York: Academic Press) pp 53–80
Hattori Met al 2000 The DNA sequence of human chromosome 21;Nature (London) 405 311–319
Lawson D, Bowman S and Bartell B 2000;Nature (London) 404 34–35
Ossadnik S M, Buldyrev S V, Goldberger A L, Harvin S, Mantegna R N, Peng C K, Simons M and Stanley HE 1994 Correlation approach to identify coding regions in DNA sequences;Biophys. J. 67 64–70
Parra S, Blanco E and Guigó R 2000 Geneid in Drosophila;Genome Res. 10 511–515
Pertea M, Salzberg S L and Gardner M J 2000 Finding genes inPlasmodium falciparum chromosome 3;Nature (London) 404 34
Ramachandran S and Ramakrishna R 1999 Gene identification in bacterial and organellar genomes using GeneScan;Comput. Chem. 23 165–174
Tiwari Set al 1997 Prediction of probable genes by Fourier analysis of genomic sequences;CABIOS 13 263–270
Uberbacher E C, Xu Y and Mural R J 1996 Discovering and understanding genes in human DNA sequence using GRAIL;Methods Enzymol. 266 259–281
Vukimirovic O G and Tilghman S 2000 Exploring Genome Space;Nature (London) 405 820–822
Xu Y and Uberbacher E C 1997 Automated Gene Identification in Large-Scale Genomic Sequences;J. Comput. Biol. 4 325–338