Improving Profile-Profile Alignments via Log Average Scoring

von Ohsen, Niklas; Zimmer, Ralf

doi:10.1007/3-540-44696-6_2

Improving Profile-Profile Alignments via Log Average Scoring

Niklas von Ohsen⁶ &
Ralf Zimmer⁶

Conference paper
First Online: 01 January 2001

454 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2149))

Abstract

Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignment algorithms like dynamic programming and show that it significantly outperforms the commonly used average scoring and dot product approach on a fold recognition benchmark. The score is also applicable to the problem of aligning two multiple alignments since every multiple alignment induces a frequency profile.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nick Alexandrov, Ruth Nussinov, and Ralf Zimmer. Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In Lawrence Hunter and Teri E. Klein, editors, Pacific Symposium on Biocomputing’96, pages 53–72. World Scientific Publishing Co., 1996.
Google Scholar
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, September 1997.
Google Scholar
F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Jr. Meyer, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. The protein data bank: a computer based archival file for macromolecular structures. J.Mol.Biol., 112:535–542, 1977.
Article Google Scholar
Patrick Billingsley. Probability and Measure. Wiley, 1995.
Google Scholar
S. E. Brenner, P. Koehl, and M. Levitt. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res, 28(1):254–6., 2000.
Article Google Scholar
Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjölander, and David Haussler. Using dirichlet mixture priors to derive hidden markov models for protein families. In Proceedings of the Second Conference on Intelligent Systems for Molecular Biology, volume 2, Washington, DC, July 1993. AAAI Press. preprint.
Google Scholar
Jean-Michel Claverie. Some useful statistical properties of position-weight matrices. Computers Chem., 18(3):287–294, 1994.
Article MATH Google Scholar
Margaret O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, volume 5, Supplement 3, chapter 22, pages 345–352. National Biochemical Research Foundation, Washington DC, 1978.
Google Scholar
Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.
Article Google Scholar
Michael Gribskov, A. D. McLachlan, and David Eisenberg. Profile analysis: Detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America, 84(13):4355–4358, 1987.
Article Google Scholar
Michael Gribskov and Stella Veretnik. Identification of sequence patterns with profile analysis. In Methods in Enzymology, volume 266, chapter 13, pages 198–212. Academic Press, Inc., 1996.
Article Google Scholar
Steven Henikoff and Jorja G. Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89(22):10915–10919, 1992.
Article Google Scholar
Steven Henikoff and Jorja G. Henikoff. Position-based sequence weights. Journal of Molecular Biology, 243(4):574–578, 1994. 4. November.
Google Scholar
Uwe Hobohm and Chris Sander. Enlarged representative set of protein structures. Protein Science, 3:522–524, 1994.
Article Google Scholar
Yvonne Kallberg and Bengt Persson. KIND-A non-redundant protein database. Bioinformatics, 15(3):260–261, March 1999.
Google Scholar
Anders Krogh and Graeme Mitchison. Maximum entropy weighting of aligned sequences of protein or DNA. In C. Rawlings, D. Clark, R. Altman, L. Hunter, T. Lengauer, and S. Wodak, editors, Proceedings of ISMB 95, pages 215–221, Menlo Park, California 94025, 1995. AAAI Press.
Google Scholar
L. Lo Conte, B. Ailey, T. J. Hubbard, S. E. Brenner, A. G. Murzin, and C. Chothia. SCOP: a structural classification of proteins database. Nucleic Acids Res, 28(1):257–9., 2000.
Article Google Scholar
Leszek Rychlewski, Lukasz Jaroszewski, Weizhong Li, and Adam Godzik. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science, 9:232–241, 2000.
Article Google Scholar
Shamil R. Sunyaev, Frank Eisenhaber, Igor V. Rodchenkov, Birgit Eisenhaber, Vladimir G. Tumanyan, and Eugene N. Kuznetsov. PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. Protein Engineering, 12(5):387–394, 1999.
Article Google Scholar
Roman L Tatusov, Stephen F. Altschul, and Eugene V. Koonin. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the United States of America, 91:12091–12095, December 1994.
Google Scholar
Ralf Thiele, Ralf Zimmer, and Thomas Lengauer. Protein threading by recursive dynamic programming. Journal of Molecular Biology, 290(3):757–779, 1999.
Article Google Scholar
Julie D. Thompson, Desmond G. Higgins, and Toby J. Gibson. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673–4680, Nov 1994.
Google Scholar
Hermann Witting. Mathematische Statistik. Teubner, 1966.
Google Scholar
Thomas D. Wu, Craig G. Nevill-Manning, and Douglas L. Brutlag. Minimal-risk scoring matrices for sequence analysis. Journal of Computational Biology, 6(2):219–235, 1999.
Article Google Scholar
Alexander Zien, Ralf Zimmer, and Thomas Lengauer. A simple iterative approach to parameter optimization. Journal of Computational Biology, 7(3):483–501, 2000.
Article Google Scholar

Download references

Author information

Authors and Affiliations

GMD—German National Research Center for Information Technology, SCAI—Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, 53754, Germany
Niklas von Ohsen & Ralf Zimmer

Authors

Niklas von Ohsen
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Zimmer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRMM, 161 rue Ada, 34392, Montpellier, France
Olivier Gascuel
Department of Computer Science, University of New Mexico, Albuquerque, NM, 87131, USA
Bernard M. E. Moret

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

von Ohsen, N., Zimmer, R. (2001). Improving Profile-Profile Alignments via Log Average Scoring. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_2

Download citation

DOI: https://doi.org/10.1007/3-540-44696-6_2
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics