Abstract
Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignment algorithms like dynamic programming and show that it significantly outperforms the commonly used average scoring and dot product approach on a fold recognition benchmark. The score is also applicable to the problem of aligning two multiple alignments since every multiple alignment induces a frequency profile.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Nick Alexandrov, Ruth Nussinov, and Ralf Zimmer. Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In Lawrence Hunter and Teri E. Klein, editors, Pacific Symposium on Biocomputing’96, pages 53–72. World Scientific Publishing Co., 1996.
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, September 1997.
F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Jr. Meyer, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. The protein data bank: a computer based archival file for macromolecular structures. J.Mol.Biol., 112:535–542, 1977.
Patrick Billingsley. Probability and Measure. Wiley, 1995.
S. E. Brenner, P. Koehl, and M. Levitt. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res, 28(1):254–6., 2000.
Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjölander, and David Haussler. Using dirichlet mixture priors to derive hidden markov models for protein families. In Proceedings of the Second Conference on Intelligent Systems for Molecular Biology, volume 2, Washington, DC, July 1993. AAAI Press. preprint.
Jean-Michel Claverie. Some useful statistical properties of position-weight matrices. Computers Chem., 18(3):287–294, 1994.
Margaret O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, volume 5, Supplement 3, chapter 22, pages 345–352. National Biochemical Research Foundation, Washington DC, 1978.
Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.
Michael Gribskov, A. D. McLachlan, and David Eisenberg. Profile analysis: Detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America, 84(13):4355–4358, 1987.
Michael Gribskov and Stella Veretnik. Identification of sequence patterns with profile analysis. In Methods in Enzymology, volume 266, chapter 13, pages 198–212. Academic Press, Inc., 1996.
Steven Henikoff and Jorja G. Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89(22):10915–10919, 1992.
Steven Henikoff and Jorja G. Henikoff. Position-based sequence weights. Journal of Molecular Biology, 243(4):574–578, 1994. 4. November.
Uwe Hobohm and Chris Sander. Enlarged representative set of protein structures. Protein Science, 3:522–524, 1994.
Yvonne Kallberg and Bengt Persson. KIND-A non-redundant protein database. Bioinformatics, 15(3):260–261, March 1999.
Anders Krogh and Graeme Mitchison. Maximum entropy weighting of aligned sequences of protein or DNA. In C. Rawlings, D. Clark, R. Altman, L. Hunter, T. Lengauer, and S. Wodak, editors, Proceedings of ISMB 95, pages 215–221, Menlo Park, California 94025, 1995. AAAI Press.
L. Lo Conte, B. Ailey, T. J. Hubbard, S. E. Brenner, A. G. Murzin, and C. Chothia. SCOP: a structural classification of proteins database. Nucleic Acids Res, 28(1):257–9., 2000.
Leszek Rychlewski, Lukasz Jaroszewski, Weizhong Li, and Adam Godzik. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science, 9:232–241, 2000.
Shamil R. Sunyaev, Frank Eisenhaber, Igor V. Rodchenkov, Birgit Eisenhaber, Vladimir G. Tumanyan, and Eugene N. Kuznetsov. PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. Protein Engineering, 12(5):387–394, 1999.
Roman L Tatusov, Stephen F. Altschul, and Eugene V. Koonin. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the United States of America, 91:12091–12095, December 1994.
Ralf Thiele, Ralf Zimmer, and Thomas Lengauer. Protein threading by recursive dynamic programming. Journal of Molecular Biology, 290(3):757–779, 1999.
Julie D. Thompson, Desmond G. Higgins, and Toby J. Gibson. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673–4680, Nov 1994.
Hermann Witting. Mathematische Statistik. Teubner, 1966.
Thomas D. Wu, Craig G. Nevill-Manning, and Douglas L. Brutlag. Minimal-risk scoring matrices for sequence analysis. Journal of Computational Biology, 6(2):219–235, 1999.
Alexander Zien, Ralf Zimmer, and Thomas Lengauer. A simple iterative approach to parameter optimization. Journal of Computational Biology, 7(3):483–501, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
von Ohsen, N., Zimmer, R. (2001). Improving Profile-Profile Alignments via Log Average Scoring. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_2
Download citation
DOI: https://doi.org/10.1007/3-540-44696-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive