Skip to main content

Improving Profile-Profile Alignments via Log Average Scoring

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2149))

Abstract

Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignment algorithms like dynamic programming and show that it significantly outperforms the commonly used average scoring and dot product approach on a fold recognition benchmark. The score is also applicable to the problem of aligning two multiple alignments since every multiple alignment induces a frequency profile.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nick Alexandrov, Ruth Nussinov, and Ralf Zimmer. Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In Lawrence Hunter and Teri E. Klein, editors, Pacific Symposium on Biocomputing’96, pages 53–72. World Scientific Publishing Co., 1996.

    Google Scholar 

  2. Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, September 1997.

    Google Scholar 

  3. F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Jr. Meyer, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. The protein data bank: a computer based archival file for macromolecular structures. J.Mol.Biol., 112:535–542, 1977.

    Article  Google Scholar 

  4. Patrick Billingsley. Probability and Measure. Wiley, 1995.

    Google Scholar 

  5. S. E. Brenner, P. Koehl, and M. Levitt. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res, 28(1):254–6., 2000.

    Article  Google Scholar 

  6. Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjölander, and David Haussler. Using dirichlet mixture priors to derive hidden markov models for protein families. In Proceedings of the Second Conference on Intelligent Systems for Molecular Biology, volume 2, Washington, DC, July 1993. AAAI Press. preprint.

    Google Scholar 

  7. Jean-Michel Claverie. Some useful statistical properties of position-weight matrices. Computers Chem., 18(3):287–294, 1994.

    Article  MATH  Google Scholar 

  8. Margaret O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, volume 5, Supplement 3, chapter 22, pages 345–352. National Biochemical Research Foundation, Washington DC, 1978.

    Google Scholar 

  9. Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.

    Article  Google Scholar 

  10. Michael Gribskov, A. D. McLachlan, and David Eisenberg. Profile analysis: Detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America, 84(13):4355–4358, 1987.

    Article  Google Scholar 

  11. Michael Gribskov and Stella Veretnik. Identification of sequence patterns with profile analysis. In Methods in Enzymology, volume 266, chapter 13, pages 198–212. Academic Press, Inc., 1996.

    Article  Google Scholar 

  12. Steven Henikoff and Jorja G. Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89(22):10915–10919, 1992.

    Article  Google Scholar 

  13. Steven Henikoff and Jorja G. Henikoff. Position-based sequence weights. Journal of Molecular Biology, 243(4):574–578, 1994. 4. November.

    Google Scholar 

  14. Uwe Hobohm and Chris Sander. Enlarged representative set of protein structures. Protein Science, 3:522–524, 1994.

    Article  Google Scholar 

  15. Yvonne Kallberg and Bengt Persson. KIND-A non-redundant protein database. Bioinformatics, 15(3):260–261, March 1999.

    Google Scholar 

  16. Anders Krogh and Graeme Mitchison. Maximum entropy weighting of aligned sequences of protein or DNA. In C. Rawlings, D. Clark, R. Altman, L. Hunter, T. Lengauer, and S. Wodak, editors, Proceedings of ISMB 95, pages 215–221, Menlo Park, California 94025, 1995. AAAI Press.

    Google Scholar 

  17. L. Lo Conte, B. Ailey, T. J. Hubbard, S. E. Brenner, A. G. Murzin, and C. Chothia. SCOP: a structural classification of proteins database. Nucleic Acids Res, 28(1):257–9., 2000.

    Article  Google Scholar 

  18. Leszek Rychlewski, Lukasz Jaroszewski, Weizhong Li, and Adam Godzik. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science, 9:232–241, 2000.

    Article  Google Scholar 

  19. Shamil R. Sunyaev, Frank Eisenhaber, Igor V. Rodchenkov, Birgit Eisenhaber, Vladimir G. Tumanyan, and Eugene N. Kuznetsov. PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. Protein Engineering, 12(5):387–394, 1999.

    Article  Google Scholar 

  20. Roman L Tatusov, Stephen F. Altschul, and Eugene V. Koonin. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the United States of America, 91:12091–12095, December 1994.

    Google Scholar 

  21. Ralf Thiele, Ralf Zimmer, and Thomas Lengauer. Protein threading by recursive dynamic programming. Journal of Molecular Biology, 290(3):757–779, 1999.

    Article  Google Scholar 

  22. Julie D. Thompson, Desmond G. Higgins, and Toby J. Gibson. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673–4680, Nov 1994.

    Google Scholar 

  23. Hermann Witting. Mathematische Statistik. Teubner, 1966.

    Google Scholar 

  24. Thomas D. Wu, Craig G. Nevill-Manning, and Douglas L. Brutlag. Minimal-risk scoring matrices for sequence analysis. Journal of Computational Biology, 6(2):219–235, 1999.

    Article  Google Scholar 

  25. Alexander Zien, Ralf Zimmer, and Thomas Lengauer. A simple iterative approach to parameter optimization. Journal of Computational Biology, 7(3):483–501, 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

von Ohsen, N., Zimmer, R. (2001). Improving Profile-Profile Alignments via Log Average Scoring. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-44696-6_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42516-8

  • Online ISBN: 978-3-540-44696-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics