Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Venter JC, et al (2001) The sequence of the human genome. Science 291:1304–1351
Lander ES, et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Rehm BH(2001) Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification. Appl Microbiol Biotechnol 57:579–592
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202
Staden R (1996) The Staden Sequence Analysis Package. Mol Biotech 5:233–241
Staden R (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12:505–519
Claverie JM (1997) Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet 6:1735–1744
Guigo R (1997) Computational gene identification: an open problem. Comput Chem 21:215–222
Krogh A (1998) In: Salzberg SL, Searls D, Kasif S (eds) Computational methods in molecular biology. Elsevier, Amsterdam
Krogh A (1998) In: Bishop MJ (ed) Guide to human genome computing, 2nd edn. Academic, New York, pp. 261–274
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved micro-bial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641
Guigo R, Agarwal P, Abril JF, Burset M, Fickett JW (2000) An assessment of gene prediction accuracy in large DNA sequences. Genome Res 10:1631–1642
Krogh A (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res 10:523–528
Shibuya T, Rigoutsos I (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 30:2710–2725
Pedersen JS, Hein J (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–227
Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31: 1780–1789
Larsen TS, Krogh A (2003) EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformat 4:21
Gelfand MS (1995) Prediction of function in DNA sequence analysis. J Comput Biol 2:87–115
Sherriff A, Ott J (2001) Applications of neural networks for gene finding. Adv Genet 42:287–297
Fickett JW (1996) Finding genes by computer: the state of the art. Trends Genet 12:316–320
Zhang CT, Wang J, Zhang R (2002) Using a Euclid distance discriminant method to find protein coding genes in the yeast genome. Comput Chem 26:195–206
Bajic VB, Seah SH (2003) Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res 13:1923–1929
Zhang MQ (1998) Statistical features of human exons and their flanking regions. Hum Mol Genet 7:919–932
Searls DB (1992) The linguistics of DNA. Am Sci 80:579–591
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778
Cole ST, Brosch R, Parkhill J, et al (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393: 537–544
Thomas A, Skolnick M (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J Math Appl Med Biol 11:149–160
Henderson J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden Markov model. J Comput Biol 4:127–141
Lukashin AV, Borodovsky M (1998) GeneMark hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115
Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59:24–31
Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Eyol 16:512–524
Bocs S, Cruveiller S, Vallenet D, Nuel G, Medigue C (2003) AMIGene: annotation of microbial genes. Nucleic Acids Res 31:3723–6
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618
Yeramian E, Jones L (2003) GeneFizz: a web tool to compare genetic (coding/ non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res 31:3843–3849
Kotlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13:1930–1937
Snyder E, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18
Reese MG, Eeckman FH, Kulp D, Haussler D (1997) Improved splice site detection in Genie. J Comput Biol 4:311–323
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Xu Y, Uberbacher EC (1997) Automated gene identification in large-scale genomic sequences. J Comput Biol 4:325–338
Gelfand MS, Mironov AA, Pevzner PA (1996) Gene recognition via spliced sequence alignment. Proc Natl Acad Sci USA 93:9061–9066
Foissac S, Bardou P, Moisan A, Cros MJ, Schiex T (2003) EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 31:3742–3745
Smith TE, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Yada T, Takagi T, Totoki Y, Sakaki Y, Takaeda Y (2003) DIGIT: a novel gene finding program by combining gene-finders. Pac Symp Biocomput 8:375–387
Quandt K, Frech K, Karas H, Wingender E, Werner T (1995) MatInd and MatInspector – new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 23:4878–4884
Prestridge DS (1991) SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements. CABIOS 7:203–206
Wingender E, Chen X, Hehl R, et al (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28:316–319
Prestridge DS (1995) Predicting Pol II Promoter Sequences Using Transcription Factor Binding Sites. J Mol Biol 249:923–932
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6:361–365
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763
Baldi R, Brunak S (1998) Bioinformatics: the machine learning approach. MIT Press, Boston, MA
Korenberg MJ, David R, Hunter IW, Solomon JE (2000) Automatic classification of protein sequences into structure/function groups via parallel cascade identification: a feasibility study. Ann Biomed Eng 28:803–811
Thompson JD, Higgins, DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882
Nicholas KB, Nicholas HB, Jr, Deerfield DW, II (1997) GeneDoc: analysis and visualization of genetic variation. EMBNEW NEWS 4:14
Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci USA 91: 1451–1459
Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence. Mol Biol Evol 11:605–612
Brocchieri L (2001) Phylogenetic inferences from molecular sequences: review and critique. Theor Popul Biol 59:27–40
Stewart CB (1993) The powers and pitfalls of parsimony. Nature 361:603–607
Attwood TK, Beck ME, Flower DR, Scordis P, Selley JN (1998) The PRINTS protein fingerprint database in its fifth year. Nucleic Acids Res 26:304–308
Page RD (1996) Tree View: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12:357–358
Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31:3784–3788
Rost B (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol 266:525–539
Eyrich VA, Rost B (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res 31:3308–3310
Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10:1–6
Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S (1998) NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J 15:115–130
Hansen JE, Lund O, Rapacki K, Brunak S (1997) O-glycbase version 2.0 – a revised database of O-glycosylated proteins. Nucleic Acids Res 25:278–282
Hansen JE, Lund O, Rapacki K, et al (1995) Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:-polypeptide N-acetyl-galactosaminyltransferase. Biochem J 308:801–813
Blom N, Gammeltoft S, Brunak S (1999) Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362
Blom N, Hansen J, Blaas D, Brunak S (1996) Cleavage site analysis in picorna-viral polyproteins: discovering cellular targets by neural networks. Protein Sci 5:2203–2216
Emanuelsson O, Nielsen H, von Heijne G (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8:978–984
Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34:508–519
Sonnhammer ELL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences, In proceedings of the sixth intern conference on intelligent systems for molecular biology, (ISMB98), pp175–182
von Heijne G (1992) Membrane protein structure prediction, hydrophobicity analysis and the positive-inside rule. J Mol Biol 225:487–494
Karplus K, Barrett C, Hughey R (1998) Hidden markov models for detecting remote protein homologies. Bioinformatics 14:846–856
Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A (1997) Prediction of transmembrane alpha-helices in procariotic membrane proteins: the dense alignment surface method. Protein Eng 10:673–676
Fischer D, Eisenberg DA (1996) Fold recognition using sequence-derived properties. Protein Sci 5:947–955
Elofsson A, Fischer D, Rice DW, LeGrand S, Eisenberg DA (1996) Study of combined structure-sequence profiles. Folding Design 1:451–461
Karplus K, Karchin R, Draper J, et al (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins 53(Suppl 6):491–496
Peitsch MC (1995) Protein modelling by E-mail. BioTechnology 13:658–660
Peitsch MC (1996) ProMod and Swiss-Model: internet-based tools for automated comparative protein modelling. Biochem Soc Trans 24:274–279
Guex N, Peitsch MC (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis 18:2714–2723
Lund O, Frimand K, Gorodkin J, et al (1997) Protein distance constraints predicted by neural networks and probability density functions. Protein Eng 10:1241–1248
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Altschul SF (1991) Amino acid substitution matrices from an information theoretic perspective. J Mol Biol 219:555–565
Altschul SF, Gish W (1996) Local alignment statistics. Methods Enzymol. 266:460–480
Rost B, Schneider R, Sander C (1997) Protein fold recognition by prediction-based threading. J Mol Biol 270:471–480
Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10,915–10,919
Pearson WR (1995) Comparison of methods for searching protein sequence databases. Protein Sci 4:1145–1160
Karlin S, Altschul SE (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268
Wootton JC (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18:269–285
Altschul SF, Madden TL, Schäffer AA, et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Martin AC, Orengo CA, Hutchinson EG, et al (1998) Protein folds and functions. Structure 6:875–884
McGuffin LJ, Bryson K, Jones DT (2001) What are the baselines for protein fold recognition? Bioinformatics 17:63–72
Bairoch A (1991) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res 19:2241–2245
Bairoch A, Bucher P, Hofmann K (1997) The PROSITE database, its status in 1997. Nucleic Acids Res 25:217–221
Bucher P, Karplus K, Moeri, N, Hofmann K (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20:3–23
Sonnhammer EL, Kahn D (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci 3:482–492
Corpet F, Gouzy J, Kahn D (1998) The ProDom database of protein domain families. Nucleic Acids Res 26:323–326
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420
Bateman A, Birney E, Cerruti L, et al (2002) The Pfam protein families database. Nucleic Acids Res 30:276–280
Apweiler R, Attwood TK, Bairoch A, et al (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 29:37–40
Mulder NJ, Apweiler R, Attwood TK, et al (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31:315–8
Rawlings ND, O'Brien E, Barrett AJ (2002) MEROPS: the protease database. Nucleic Acids Res 30:343–346
Storm CE, Sonnhammer EL (2001) NIFAS: visual analysis of domain evolution in proteins. Bioinformatics 17:343–348
Schultz J, Milpetz F, Bork P, Ponting, CP (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95:5857–5864
Schultz J, Copley RR, Doerks T, Ponting CP, Bork P (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 28:231–234
Letunic I, Goodstadt L, Dickens NJ, et al (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res 30:242–244
Pietrokovski S, Henikoff JG, Henikoff S (1996) The Blocks database–a system for protein classification. Nucleic Acids Res 24:197–200
Attwood TK, Flower DR, Lewis AP, et al (1999) PRINTS prepares for the new millennium. Nucleic Acids Res 27:220–225
Silverstein KA, Shoop E, Johnson JE, Retzel EF (2001) MetaFam: a unified classification of protein families. I. Overview and statistics. Bioinformatics 17:249– 261
Yuan YP, Eulenstein O, Vingron M, Bork P (1998) Towards detection of ortho-logues in sequence databases. Bioinformatics 14:285–289
Bernstein FC, Koetzle TF, Williams GJ, et al (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem 80:319–324
Berman HM, Westbrook J, Feng Z, et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thomton JM (1997) CATH–a Hierarchic classification of protein domain structures. Structure 5: 1093–1108
Pearl FMG, Lee D, Bray JE, Sillitoe I, Todd AE, Harrison AP, Thomton JM, Orengo CA (2000) Assigning genomic sequences to CATH. Nucleic Acids Res 28:277–282
Peitsch MC, Jongeneel V (1993) A 3- dimensional model for the CD40 ligand predicts that it is a compact trimer similar to the tumor necrosis factors. Int Immunol 5:233–238
Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31:3381–3385
Guex N, Peitsch MC (1997) SWISS-MODEL and the Swiss-Pdb Viewer: an environment for comparative protein modeling. Electrophoresis 18:2714–2723
Combet C, Jambon M, Deleage G, Geourjon C (2002) Geno3D: automatic comparative molecular modelling of protein. Bioinformatics 18:213–214
Lambert C, Leonard N, De Bolle X, Depiereux E (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics 18:1250–1256
Bader GD, Betel D, Hogue CW (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31:248–250
Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D (2000) DIP: The Database of Interacting Proteins. Nucleic Acids Res 28:289–291
Levinthal C, Wodak SJ, Kahn P, Dadivanian AK (1975) Hemoglobin interaction in sickle cell fibers. I. Theoretical approaches to the molecular contacts. Proc Natl Acad Sci USA 72:1330–1334
Wodak SJ, Janin J (1978) Computer analysis of protein-protein interaction. J Mol Biol 124:323–342
Janin J, Henrick K, Moult J, et al (2003) CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52:2–9
Taylor RD, Jewsbury PJ, Essex JW (2002) A review of protein-small molecule docking methods. J Comput Aided Mol Des 16:151–166
Read TD, Peterson SN, Tourasse N, et al (2003) The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423:81–86
Ivanova N, Sorokin A, Anderson I, et al (2003) Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature 423:87–91
Smith DR (1996) Microbial pathogen genomes – new strategies for identifying therapeutics and vaccine targets. Trends Biotechnol 14:290–293
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637
Tatusov, RL, Natale DA, Garkavtsev IV, et al (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29:22–28
Wheeler DL, Church DM, Federhen S, et al (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res 31:28–33
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210
Rehm BHA, Reinecke F (2004) Evaluation of proteomic techniques: applications and potential. Curr Proteomics 1:103–111
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Rehm, B.H.A., Reinecke, F. (2008). Gene/Protein Sequence Analysis. In: Walker, J.M., Rapley, R. (eds) Molecular Biomethods Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1007/978-1-60327-375-6_22
Download citation
DOI: https://doi.org/10.1007/978-1-60327-375-6_22
Publisher Name: Humana Press
Print ISBN: 978-1-60327-370-1
Online ISBN: 978-1-60327-375-6
eBook Packages: Springer Protocols