Bioinformatic Tools for Gene and Protein Sequence Analysis

Rehm, Bernd H. A.; Reinecke, Frank

doi:10.1385/1-59259-870-6:387

Bernd H. A. Rehm² &
Frank Reinecke³

Part of the book series: Springer Protocols Handbooks ((SPH))

2058 Accesses
1 Citations

Abstract

The rapid development of efficient, automated DNA-sequencing methods has strongly advanced the genome-sequencing era, culminating in the determination of the entire human genome in 2001 (1,2). An enormous amount of DNA sequence data are available and databases still grow exponentially (see Fig. 1). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia coli more than 30‰ of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction. In addition, a set of methods for protein analysis summarized under the term proteomics holds tremendous potential for biomedicine and biotechnology (141). The large number of bioinformatic tools that have been made available to scientists during the last few years has presented the problem of which to use and how best to obtain scientifically valid answers (3). In this chapter, we will provide a guide for the most efficient way to analyze a given sequence or to collect information regarding a gene, protein, structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web. All links to services or download sites are given in the text or listed in Table 1; the succession of tools is briefly summarized in Fig. 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Venter, J. C. et al (2001) The sequence of the human genome. Science 291, 1304–1351.
PubMed CAS Google Scholar
Lander, E. S. et al (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.
PubMed CAS Google Scholar
Rehm B.H. (2001) Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification. Appl. Microbiol. Biotechnol. 57, 579–592.
PubMed CAS Google Scholar
Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.
PubMed CAS Google Scholar
Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194.
PubMed CAS Google Scholar
Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877.
PubMed CAS Google Scholar
Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202.
PubMed CAS Google Scholar
Staden, R. (1996) The Staden Sequence Analysis Package. Mol. Biotech. 5, 233–241.
CAS Google Scholar
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12, 505–519.
PubMed CAS Google Scholar
Claverie, J.-M. (1997) Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6, 1735–1744.
PubMed CAS Google Scholar
Guigo, R. (1997) Computational gene identification: an open problem. Comput. Chem. 21, 215–222.
PubMed CAS Google Scholar
Krogh, A. (1998) In Computational Methods in Molecular Biology (Salzberg, S. L., Searls, D., and Kasif, S., eds.), Elsevier, Amsterdam.
Google Scholar
Krogh, A. (1998) In Guide to Human Genome Computing (Bishop, M. J., ed.), 2nd ed. Academic, New York, pp. 261–274.
Google Scholar
Delcher, A. L., Harmon, D., Kasif, S., White, O., and Salzberg, S. L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27, 4636–4641.
PubMed CAS Google Scholar
Guigo, R., Agarwal, P., Abril, J. F., Burset, M., and Fickett, J. W. (2000) An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642.
PubMed CAS Google Scholar
Krogh, A. (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res. 10, 523–5
PubMed CAS Google Scholar
Shibuya, T. and Rigoutsos, I. (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res. 30, 2710–2725.
PubMed CAS Google Scholar
Pedersen, J. S. and Hein, J. (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19, 219–227.
PubMed CAS Google Scholar
Guo, F. B., Ou, H. Y., and Zhang, C. T. (2003) ZCURVE: a new system for recognizing proteincoding genes in bacterial and archaeal genomes. Nucleic Acids Res. 31, 1780–1789.
PubMed CAS Google Scholar
Larsen, T. S., Krogh, A. (2003) EasyGene—a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformat. 4, 21.
Google Scholar
Gelfand, M. S. (1995) Prediction of function in DNA sequence analysis. J. Comput. Biol. 2, 87–115.
PubMed CAS Google Scholar
Sherriff, A. and Ott, J. (2001) Applications of neural networks for gene finding. Adv. Genet. 42, 287–297.
PubMed CAS Google Scholar
Fickett, J. W. (1996) Finding genes by computer: the state of the art. Trends Genet. 12, 316–320.
PubMed CAS Google Scholar
Zhang, C. T., Wang, J., and Zhang, R. (2002) Using a Euclid distance discriminant method to find protein coding genes in the yeast genome. Comput. Chem. 26, 195–206.
PubMed CAS Google Scholar
Bajic, V. B. and Seah, S. H. (2003) Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13, 1923–1929.
PubMed CAS Google Scholar
Zhang, M. Q. (1998) Statistical features of human exons and their flanking regions. Hum. Mol. Genet. 7, 919–932.
PubMed CAS Google Scholar
Searls, D. B. (1992) The linguistics of DNA. Am. Sci. 80, 579–591.
Google Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic acids. Cambridge University Press, Cambridge.
Google Scholar
Krogh, A., Mian, I. S., and Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22, 4768–4778.
PubMed CAS Google Scholar
Cole, S. T., Brosch, R., Parkhill, J., et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544.
PubMed CAS Google Scholar
Thomas, A. and Skolnick, M. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–160.
PubMed CAS Google Scholar
Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4, 127–141.
PubMed CAS Google Scholar
Lukashin, A. V. and Borodovsky, M. (1998) GeneMark hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115.
PubMed CAS Google Scholar
Salzberg, S. L., Pertea, M., Delcher, A. L., Gardner, M. J., and Tettelin, H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24–31.
PubMed CAS Google Scholar
Badger, J. H. and Olsen, G. J. (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524.
PubMed CAS Google Scholar
Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., and Medigue, C. (2003) AMIGene: annotation of microbial genes. Nucleic Acids Res. 31, 3723–6.
PubMed CAS Google Scholar
Besemer, J., Lomsadze, A., and Borodovsky, M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618.
PubMed CAS Google Scholar
Yeramian, E. and Jones, L. (2003) GeneFizz: a web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res. 31, 3843–3849.
PubMed CAS Google Scholar
Kotlar, D. and Lavner, Y. (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res. 13, 1930–1937.
PubMed CAS Google Scholar
Snyder, E. and Stormo, G. (1995) Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18.
PubMed CAS Google Scholar
Reese, M. G., Eeckman, F. H., Kulp, D., and Haussler, D. (1997) Improved splice site detection in Genie. J. Comput. Biol. 4, 311–323.
PubMed CAS Google Scholar
Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94.
PubMed CAS Google Scholar
Xu, Y. and Überbacher, E. C. (1997) Automated gene identification in large-scale genomic sequences. J. Comput. Biol. 4, 325–338.
PubMed CAS Google Scholar
Gelfand, M. S., Mironov, A. A., and Pevzner, P. A. (1996) Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066.
PubMed CAS Google Scholar
Foissac, S., Bardou, P., Moisan, A., Cros, M. J., and Schiex, T. (2003) EUGENE’HOM: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res. 31, 3742–3745.
PubMed CAS Google Scholar
Smith, T. E. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
PubMed CAS Google Scholar
Yada, T., Takagi, T., Totoki, Y., Sakaki, Y., and Takaeda Y. (2003) DIGIT: a novel gene finding program by combining gene-finders. Pac. Symp. Biocomput. 2003, 375–387.
Google Scholar
Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. (1995) MatInd and MatInspector-new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878–4884.
PubMed CAS Google Scholar
Prestridge, D. S. (1991) SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements. CABIOS 7, 203–206.
PubMed CAS Google Scholar
Wingender, E., Chen, X., Hehl, R., et al. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319.
PubMed CAS Google Scholar
Prestridge, D. S. (1995) Predicting Pol II Promoter Sequences Using Transcription Factor Binding Sites. J. Mol. Biol. 249, 923–932.
PubMed CAS Google Scholar
Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.
PubMed CAS Google Scholar
Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.
PubMed CAS Google Scholar
Baldi, R. and Brunak, S. (1998) Bioinformatics: The Machine Learning Approach. MIT Press, Boston, MA.
Google Scholar
Korenberg, M. J., David, R., Hunter, I. W., and Solomon, J. E. (2000) Automatic classification of protein sequences into structure/function groups via parallel cascade identification: a feasibility study. Ann. Biomed. Eng. 28, 803–811.
PubMed CAS Google Scholar
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
PubMed CAS Google Scholar
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882.
PubMed CAS Google Scholar
Nicholas, K. B., Nicholas, H. B., Jr., and Deerfield, D. W., II. (1997) GeneDoc: analysis and visualization of genetic variation. EMBNEW.NEWS 4, 14.
Google Scholar
Lake, J. A. (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA 91, 1451–1459.
Google Scholar
Lockhart, P. J., Steel, M. A., Hendy, M. D., and Penny, D. (1994) Recovering evolutionary trees under a more realistic model of sequence. Mol. Biol. Evol. 11, 605–612.
PubMed CAS Google Scholar
Brocchieri, L. (2001) Phylogenetic inferences from molecular sequences: review and critique. Theor. Popul. Biol. 59, 27–40.
PubMed CAS Google Scholar
Stewart, C.-B. (1993) The powers and pitfalls of parsimony. Nature 361, 603–607.
PubMed CAS Google Scholar
Attwood, T. K., Beck, M. E., Flower, D. R., Scordis, P., and Selley, J. N. (1998) The PRINTS protein fingerprint database in its fifth year. Nucleic Acids Res. 26, 304–308.
PubMed CAS Google Scholar
Page, R. D. (1996) TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12, 357–358.
PubMed CAS Google Scholar
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., and Bairoch A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.
PubMed CAS Google Scholar
Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266, 525–539.
PubMed CAS Google Scholar
Eyrich, V. A. and Rost, B. (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res. 31, 3308–3310.
PubMed CAS Google Scholar
Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6.
PubMed CAS Google Scholar
Hansen, J. E., Lund, O., Tolstrup, N, Gooley, A. A., Williams, K. L., and Brunak, S. (1998) NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J. 15, 115–130.
CAS Google Scholar
Hansen, J. E., Lund, O., Rapacki, K., and Brunak, S. (1997) O-glycbase version 2.0-A revised database of O-glycosylated proteins. Nucleic Acids Res. 25, 278–282.
PubMed CAS Google Scholar
Hansen, J. E., Lund, O., Rapacki, K., et al. (1995) Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:-polypeptide N-acetylgalactosaminyltransferase. Biochem. J. 308, 801–813.
PubMed CAS Google Scholar
Blom, N., Gammeltoft, S., and Brunak, S. (1999) Sequence-and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362.
PubMed CAS Google Scholar
Blom, N., Hansen, J., Blaas, D., and Brunak, S. (1996) Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Sci. 5, 2203–2216.
PubMed CAS Google Scholar
Emanuelsson, O., Nielsen, H., and von Heijne, G. (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8, 978–984.
PubMed CAS Google Scholar
Cuff, J. A. and Barton, G. J. (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508–519.
PubMed CAS Google Scholar
Sonnhammer, E. L. L. von Heijne, G., and Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. in Proceedings of the Sixth Intern Conference on Intelligent Systems for Molecular Biology (ISMB98), pp. 175–182.
Google Scholar
von Heijne, G. (1992) Membrane protein structure prediction, hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225, 487–494.
Google Scholar
Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.
PubMed CAS Google Scholar
Cserzo, M., Wallin, E., Simon, I., von Heijne, G., and Elofsson, A. (1997) Prediction of transmembrane alpha-helices in procariotic membrane proteins: the dense alignment surface method. Protein Eng. 10, 673–676.
PubMed CAS Google Scholar
Fischer, D. and Eisenberg, D. A. (1996) Fold recognition using sequence-derived properties. Protein Sci. 5, 947–955.
PubMed CAS Google Scholar
Elofsson, A., Fischer, D., Rice, D. W., LeGrand, S., and Eisenberg, D. A. (1996) Study of combined structure-sequence profiles. Folding Design 1, 451–461.
PubMed CAS Google Scholar
Karplus, K., Karchin, R., Draper, J., et al. (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins 53(Suppl 6), 491–496.
PubMed CAS Google Scholar
Peitsch, M. C. (1995) Protein modelling by E-mail. BioTechnology 13, 658–660.
CAS Google Scholar
Peitsch, M. C. (1996) ProMod and Swiss-Model: internet-based tools for automated comparative protein modelling. Biochem. Soc. Trans. 24, 274–279.
PubMed CAS Google Scholar
Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis 18, 2714–2723.
PubMed CAS Google Scholar
Lund, O., Frimand, K., Gorodkin, J., et al. (1997) Protein distance constraints predicted by neural networks and probability density functions. Protein Eng. 10, 1241–1248.
PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
PubMed CAS Google Scholar
Altschul, S. F. (1991) Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565.
PubMed CAS Google Scholar
Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Methods Enzymol. 266, 460–480.
PubMed CAS Google Scholar
Rost, B., Schneider, R., and Sander, C. (1997) Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480.
PubMed CAS Google Scholar
Dayhoff, M. O., Barker, W. C., and Hunt, L. T. (1983) Establishing homologies in protein sequences. Methods Enzymol. 91, 524–545.
PubMed CAS Google Scholar
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.
PubMed CAS Google Scholar
Pearson, W. R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1145–1160.
PubMed CAS Google Scholar
Karlin, S. and Altschul, S. E. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.
PubMed CAS Google Scholar
Wootton, J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.
PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
PubMed CAS Google Scholar
Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
PubMed CAS Google Scholar
Martin, A. C., Orengo, C. A., Hutchinson, E. G., et al. (1998) Protein folds and functions. Structure 6, 875–884.
PubMed CAS Google Scholar
McGuffin, L. J., Bryson, K., and Jones, D. T. (2001) What are the baselines for protein fold recognition? Bioinformatics 17, 63–72.
PubMed CAS Google Scholar
Bairoch, A. (1991) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19, 2241–2245.
PubMed CAS Google Scholar
Bairoch, A., Bucher, P., and Hofmann, K. (1997) The PROSITE database, its status in 1997. Nucleic Acids Res. 25, 217–221.
PubMed CAS Google Scholar
Bucher, P., Karplus, K., Moeri, N., and Hofmann, K. (1996) A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–23.
PubMed CAS Google Scholar
Sonnhammer, E. L. and Kahn, D. (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 3, 482–492.
PubMed CAS Google Scholar
Corpet, F., Gouzy, J., and Kahn, D. (1998) The ProDom database of protein domain families. Nucleic Acids Res. 26, 323–326.
PubMed CAS Google Scholar
Sonnhammer, E. L., Eddy, S. R., and Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420.
PubMed CAS Google Scholar
Bateman, A., Birney, E., Cerruti, L., et al. (2002) The Pfam protein families database. Nucleic Acids Res. 30, 276–280.
PubMed CAS Google Scholar
Apweiler, R., Attwood, T. K., Bairoch, A., et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29, 37–40.
PubMed CAS Google Scholar
Mulder, N. J., Apweiler, R., Attwood, T. K., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–8.
PubMed CAS Google Scholar
Rawlings, N. D., O’Brien, E., and Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343–346.
PubMed CAS Google Scholar
Storm, C. E. and Sonnhammer, E. L. (2001) NIFAS: visual analysis of domain evolution in proteins. Bioinformatics 17, 343–348.
PubMed CAS Google Scholar
Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864.
PubMed CAS Google Scholar
Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork, P. (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234.
PubMed CAS Google Scholar
Letunic, I., Goodstadt, L., Dickens, N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.
PubMed CAS Google Scholar
Pietrokovski, S., Henikoff, J.G. and Henikoff, S, (1996) The Blocks database-a system for protein classification. Nucleic Acids Res. 24, 197–200.
PubMed CAS Google Scholar
Attwood, T. K., Flower, D. R., Lewis, A. P., et al. (1999) PRINTS prepares for the new millennium. Nucleic Acids Res. 27, 220–225.
PubMed CAS Google Scholar
Silverstein, K. A., Shoop, E., Johnson, J. E., and Retzel, E. F. (2001) MetaFam: a unified classification of protein families. I. Overview and statistics. Bioinformatics 17, 249–261.
PubMed CAS Google Scholar
Yuan, Y. P., Eulenstein, O., Vingron, M., and Bork, P. (1998) Towards detection of orthologues in sequence databases. Bioinformatics 14, 285–289.
PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., et al. (1977) The Protein Data Bank. A computerbased archival file for macromolecular structures. Eur. J. Biochem. 80, 319–324.
PubMed CAS Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
PubMed CAS Google Scholar
Murzin, A.G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.
PubMed CAS Google Scholar
Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997) CATH-a Hierarchic classification of protein domain structures. Structure 5, 1093–1108.
PubMed CAS Google Scholar
Pearl, F. M. G, Lee, D., Bray, J. E, Sillitoe, I., Todd, A. E., Harrison, A. P., Thornton, J. M., and Orengo, C.A. (2000) Assigning genomic sequences to CATH. Nucleic Acids Res. 28, 277–282.
PubMed CAS Google Scholar
Peitsch, M. C. and Jongeneel, V. (1993) A 3-dimensional model for the CD40 ligand predicts that it is a compact trimer similar to the tumor necrosis factors. Int. Immunol. 5, 233–238.
PubMed CAS Google Scholar
Schwede, T., Kopp, J., Guex, N., and Peitsch, M. C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 31, 3381–3385.
PubMed CAS Google Scholar
Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18, 2714–2723.
PubMed CAS Google Scholar
Combet, C., Jambon, M., Deleage, G., and Geourjon, C. (2002) Geno3D: automatic comparative molecular modelling of protein. Bioinformatics 18, 213–214.
PubMed CAS Google Scholar
Lambert, C., Leonard, N., De Bolle, X., and Depiereux, E. (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics 18, 1250–1256.
PubMed CAS Google Scholar
Bader, G. D., Betel, D., and Hogue, C. W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250.
PubMed CAS Google Scholar
Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: The Database of Interacting Proteins. Nucleic Acids Res. 28, 289–291.
PubMed CAS Google Scholar
Levinthal, C., Wodak, S. J., Kahn, P., and Dadivanian, A. K. (1975) Hemoglobin interaction in sickle cell fibers. I. Theoretical approaches to the molecular contacts. Proc. Natl. Acad. Sci. USA 72, 1330–1334.
PubMed CAS Google Scholar
Wodak, S. J. and Janin, J. (1978) Computer analysis of protein-protein interaction. J. Mol. Biol. 124, 323–342.
PubMed CAS Google Scholar
Janin, J., Henrick, K., Moult, J., et al. (2003) CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 2–9.
PubMed CAS Google Scholar
Taylor, R. D., Jewsbury, P. J., and Essex, J. W. (2002) A review of protein-small molecule docking methods. J. Comput. Aided Mol. Des. 16, 151–166.
PubMed CAS Google Scholar
Read, T. D., Peterson, S. N., Tourasse, N., et al. (2003) The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423, 81–86.
PubMed CAS Google Scholar
Ivanova, N., Sorokin, A., Anderson, I., et al. (2003) Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature 423, 87–91.
PubMed CAS Google Scholar
Smith, D. R. (1996) Microbial pathogen genomes-new strategies for identifying therapeutics and vaccine targets. Trends Biotechnol. 14, 290–293.
PubMed CAS Google Scholar
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.
PubMed CAS Google Scholar
Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., et al. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28.
PubMed CAS Google Scholar
Wheeler, D. L., Church, D. M., Federhen, S., et al. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31, 28–33.
PubMed CAS Google Scholar
Edgar, R., Domrachev, M., and Lash, A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210.
PubMed CAS Google Scholar
Rehm, B. H. A. and Reinecke, F. (2004) Evaluation of proteomic techniques: applications and potential. Curr. Proteomics 1, 103–111.
CAS Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand
Bernd H. A. Rehm
Institut für Molekulaire Mikrobiologie und Biotechnologie, Westfälische Wilhelms-Universität Münster, Münster, Germany
Frank Reinecke

Authors

Bernd H. A. Rehm
View author publications
You can also search for this author in PubMed Google Scholar
Frank Reinecke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hertfordshire, Hatfield, UK
John M. Walker & Ralph Rapley &

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Rehm, B.H.A., Reinecke, F. (2005). Bioinformatic Tools for Gene and Protein Sequence Analysis. In: Walker, J.M., Rapley, R. (eds) Medical Biomethods Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-870-6:387

Download citation

DOI: https://doi.org/10.1385/1-59259-870-6:387
Publisher Name: Humana Press
Print ISBN: 978-1-58829-288-9
Online ISBN: 978-1-59259-870-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics