Skip to main content

Bioinformatics: A Challenge to Constraint Programming

  • Chapter
  • First Online:
Hybrid Optimization

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 45))

Abstract

Bioinformatics is a rapidly growing field at the intersection of biology and computer science. As such, it poses a wealth of problems, opportunities, and challenges for both areas. This paper overviews some of these issues, with an emphasis on those that seem most amenable to constraint programming (CP) approaches and where CP has made some progress. Since bioinformatics is tightly focused on real-life applications, this paper does not expand on theoretical principles but, rather, tries to give an idea of the practical issues. At this light, the paper briefly presents the selected problems together with the solutions found so far, that illustrate the versatility of CP techniques that have been used in this area and the need to integrate them with other complementary techniques to handle realistic applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Google Scholar 

  2. Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45:810–825

    MathSciNet  MATH  Google Scholar 

  3. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441

    Google Scholar 

  4. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Google Scholar 

  5. Roland HC Yap (2001) Parametric sequence alignment with constraints. Constraints 6(2–3):157–172

    MathSciNet  MATH  Google Scholar 

  6. Will S, Busch A, Backofen R (2008) Efficient sequence alignment with side-constraints by cluster tree elimination. Constraints 13(1–2):110–129

    MathSciNet  MATH  Google Scholar 

  7. Carlsson M, Beldiceanu N (2004) Multiplex dispensation order generation for pyrosequencing. In: CP’2004 workshop on CSP techniques with immediate application, Toronto, Canada, 27 September 2004

    Google Scholar 

  8. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2004) GenBank: update. Nucleic Acids Res 32(Database issue):D23–D26

    Google Scholar 

  9. Gent IP, Prosser P, Smith BM, Wei W (2003) Supertree construction using constraint programming. In: Proc CP2003. Lecture notes in computer science, vol 2833. Springer, Berlin

    Google Scholar 

  10. Moore NC, Prosser P (2008) The ultrametric constraint and its application to phylogenetics. J Artif Intell Res 32:901–938

    MathSciNet  MATH  Google Scholar 

  11. Brooks DR, Erdem E, Erdogan ST, Minett JW, Ringe D (2007) Inferring phylogenetic trees using answer set programming. J Automat Reason 39(4):471–511

    MathSciNet  MATH  Google Scholar 

  12. Wu G, You JH, Lin G (2007) Quartet-based phylogeny reconstruction with answer set programming. IEE/ACM Trans Comput Biol Bioinform 4(1):139–152

    Google Scholar 

  13. Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 77:111–122

    Google Scholar 

  14. Gusfield D (2003) Haplotype inference by pure parsimony. In: 14th Annual symposium on combinatorial pattern matching (CPM03). Springer, Heidelberg, pp 144–155

    Google Scholar 

  15. Huang Y-T et al (2005) An approximation algorithm for haplotype inference by maximum parsimony. J Comput Biol 12:1261–1274

    Google Scholar 

  16. Lancia G et al (2004) Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS J Comput 16:348–359

    MathSciNet  MATH  Google Scholar 

  17. Cilibrasi R et al (2005) On the complexity of several haplotyping problems. In: 5th Workshop on algorithms in bioinformatics (WABI 2005). Springer, Mallorca, pp 128–139

    Google Scholar 

  18. Sharan R et al (2006) Islands of tractability for parsimony haplotyping. IEEE/ACM Trans Comput Biol Bioinform 3:303–311

    Google Scholar 

  19. Brown D, Harrower I (2006) Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Trans Comput Biol Bioinform 3(2):141–154

    Google Scholar 

  20. Wang L, Xu Y (2003) Haplotype inference by maximum parsimony. Bioinformatics 19:1773–1780

    Google Scholar 

  21. Lynce I, Marques-Silva J (2006) Efficient haplotype inference with Boolean satisfiability. In: AAAI conference on artificial intelligence, pages 104109, July 2006

    Google Scholar 

  22. Lynce I, Marques-Silva J, Prestwich S (2008) Boosting haplotype inference with local search. Constraints 13(1):155–179

    MathSciNet  MATH  Google Scholar 

  23. Lynce I, Graa A, Marques-Silva J, Oliveira AL (2008) Haplotype inference with boolean constraint solving: an overview. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence (ICTAI 08), Dayton, OH, 2008

    Google Scholar 

  24. Erdem E, Erdem O, Türe F (2009)In: HAplo-ASP: haplotype inference using answer set programming, LPNMR09. Lecture notes in computer science, vol 5753. Springer, Berlin, pp 573–578

    Google Scholar 

  25. Benedettini S, Roli A, Di Gaspero L (2008) Two-level ACO for haplotype inference under pure parsimony. In: ANTS conference, 2008, pp 179–190

    Google Scholar 

  26. Climer S, Jäger G, Templeton AR, Zhang W (2009) How frugal is mother nature with haplotypes? Bioinformatics 25(1):68–74

    Google Scholar 

  27. Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2(12):919–929

    Google Scholar 

  28. Tinoco I, Bustamante C (1999) How RNA folds. J Mol Biol 293:271–281

    Google Scholar 

  29. Moore PB (1999) The RNA folding problem. In: The RNA world, 2nd edn. CSHL Press, Cold Spring Harbor, pp 381–401

    Google Scholar 

  30. Waterman MS (1995) RNA secondary structure. In: Introduction to computational biology. Chapman and Hall, London, pp 327–343

    Google Scholar 

  31. Wu M, Tinoco I (1998) RNA folding causes secondary structure rearrangement. Proc Natl Acad Sci USA 95:11555–11560

    Google Scholar 

  32. Capriotti E, Marti-Renom MA (2008) Computational RNA structure prediction. Curr Bioinform 3(1):32–45

    Google Scholar 

  33. Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77:6309–6313

    Google Scholar 

  34. Jaeger JA, Turner DH, Zuker M (1989) Improved predictions of secondary structures for RNA. Proc Natl Acad Sci USA 86:7706–7710

    Google Scholar 

  35. Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454

    Google Scholar 

  36. Gaspin C, Westhof E (1995) An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. J Mol Biol 254(2):163–174

    Google Scholar 

  37. Gaspin C (2001) RNA secondary structure determination and representation based on constraints satisfaction. Constraints 6(2–3):201–221

    MathSciNet  MATH  Google Scholar 

  38. Thebault P, de Givry S, Schiex T, Gaspin C (2006) Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics 22(17):2074–2080

    Google Scholar 

  39. Zytnicki M, Gaspin C, Schiex T (2008) Darn! a weighted constraint solver for RNA motif localization. Constraints 13(1–2):91–109

    MathSciNet  MATH  Google Scholar 

  40. Billoud B, Kontic M, Viari A (1996) Palingol: declarative programming language to describe nucleic acids secondary structures and to scan sequence databases. Nucleic Acids Res 24(8):1395–1404

    Google Scholar 

  41. Harmanci AO, Sharma G, Mathews DH (2007) Efficient pairwise RNA structure prediction using probabilistic alignment constraints in dynalign. BMC Bioinformatics 8:130

    Google Scholar 

  42. Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400

    Google Scholar 

  43. Major F, Turcotte M, Gautheret D, Lapalme G, Fillion E, Cedergren R (1991) The combination of symbolic and numerical computation for three-dimensional modeling of RNA. Science 253:1255–1260

    Google Scholar 

  44. Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165-2pc]Reference [44] is not cited in text. Please provide appropriate text citation or remove from list.

    Google Scholar 

  45. Gautheret D, Major F, Cedergren R (1993) Modeling the threedimensional structure of RNA using discrete nucleotide conformational sets. J Mol Biol 229:1049–1064

    Google Scholar 

  46. Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165

    Google Scholar 

  47. Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res 20:5785–5795

    Google Scholar 

  48. Leontis NB, Lescoute A, Westhof E (2006) The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 16(3):279–287

    Google Scholar 

  49. Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10(2):139–145

    Google Scholar 

  50. Shirts MR, Pande VS (2000) Screen savers of the world, unite! Science 290:1903–1904

    Google Scholar 

  51. Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding – a perspective of simple exact models. Protein Sci 4:561–602

    Google Scholar 

  52. Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22:3986–3997

    Google Scholar 

  53. Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A, Yannakakis M ( 1998) On the complexity of protein folding. J Comput Biol 5(3):423–466

    MATH  Google Scholar 

  54. Berger B, Leighton T (1998) Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. J Comput Biol 5(3):27–40

    Google Scholar 

  55. Yue K, Dill KA (1996) Folding proteins with a simple energy function and extensive conformational search. Protein Sci 5(2):254–261

    Google Scholar 

  56. Abkevitch VI, Gutin AM, Shakhnovich EI (1995) Impact of local and non-local interactions oin thermodynamics and kinetics of protein folding. J Mol Biol 252:460–471

    Google Scholar 

  57. Unger R, Moult J (1996) Local interactions dominate folding in a simple protein model. J Mol Biol 259:988–994

    Google Scholar 

  58. Hinds DA, Levitt M (1996) From structure to sequence and back again. J Mol Biol 258:201–209

    Google Scholar 

  59. Bornberg-Bauer E (1997) Chain growth algorithms for HP-type lattice proteins. In: Proceedings of RECOMB97. 1st International conference on Research in computational molecular biology, pp 47–55

    Google Scholar 

  60. Backofen R (1998) Constraint techniques for solving the protein structure prediction problem. In: Proceedings of CP98. Lecture notes in computer science, vol 1520, pp 72–86

    Google Scholar 

  61. Backofen R, Will S (2002) Excluding symmetries in constraint-based search. Constraints 7(3):333–349

    MathSciNet  MATH  Google Scholar 

  62. Backofen R, Will S, Bornberg-Bauer E (1999) Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets. Bioinformatics 15(3):234–242

    Google Scholar 

  63. Bagci Z, Jernigan RL, Bahar I (2002) Residue coordination in proteins conforms to the closest packing of spheres. Polymer 43:451–459

    Google Scholar 

  64. Cipra B (1998) Packing challenge mastered at last. Science 281:1267

    Google Scholar 

  65. Park BH, Levitt M (1995) The complexity and accuracy of discrete state models of protein structure. J Mol Biol 249:493–507

    Google Scholar 

  66. Cooperativity in protein-folding kinetics. Proc Natl Acad Sci USA 90:1942–1946 (1993)

    Google Scholar 

  67. Unger R, Moult J (1993) Genetic algorithms for protein folding simulations. J Mol Biol 231:75–81

    Google Scholar 

  68. Agarwala R, Batzoglou S, Dancik V, Decatur SE, Farach M, Hannenhalli S, Muthukrishnan S, Skiena S (1997) Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP-model. J Comput Biol 4(2):275–296

    MATH  Google Scholar 

  69. Backofen R, Will S (2006) A constraint-based approach to fast and exact structure prediction in three-dimensional protein models. Constraints 11(1):5–30

    MathSciNet  MATH  Google Scholar 

  70. Backofen R, Will S (2001) Fast, constraint-based threading of HP-sequences to hydrophobic cores. In: Proceedings of CP01. Lecture notes in computer science, vol 2239, pp 494–508

    MATH  Google Scholar 

  71. Cebrian M, Dotu I, Van Hentenryck P, Clote P (2008) Protein structure prediction on the face centered cubic lattice by local search. In: Proceedings of AAAI08, pp 241–245

    Google Scholar 

  72. Dot I, Cebrin M, Van Hentenryck P, Clote P (2008) Protein structure prediction with large neighborhood constraint programming search. In: Proceedings of CP08. Lecture notes in computer science, vol 5202, pp 82–96

    Google Scholar 

  73. Dal Pal A, Dovier A, Fogolari F (2004) Constraint logic programming approach to protein structure prediction. BMC Bioinformatics 5:186

    Google Scholar 

  74. Dal Pal A, Dovier A, Pontelli E (2007) A constraint solver for discrete lattices, its paralelization and application to protein structure prediction. Software Pract Ex 37(13):1405–1449

    Google Scholar 

  75. Cipriano R, Pal AD, Dovier A (2008) A hybrid approach mixing local search and constraint programming applied to the protein structure prediction problem. In: Proceedings of WCB08, Paris, May 2008

    Google Scholar 

  76. Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40

    Google Scholar 

  77. Fischer D (2006) Servers for protein structure prediction. Curr Opin Struct Biol 16:178–182

    Google Scholar 

  78. Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CE, Baker D (2001) Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins 45(S5)119–126

    Google Scholar 

  79. Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289

    Google Scholar 

  80. Skolnick J, Kolinski A, Kihara D, Betancourt M, Rotkiewicz P, Boniecki M (2001) Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement. PROTEINS Suppl 5:149–156

    Google Scholar 

  81. Gntert P, Mumenthaler C, Wthrich K (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273:283–298

    Google Scholar 

  82. Krippahl L, Barahona P (2002) PSICO: solving protein structures with constraint programming and optimisation. Constraints 7:317–331

    MathSciNet  MATH  Google Scholar 

  83. Krippahl L, Barahona P (2003) Propagating N-ary rigid-body constraints. In: Francesca Rossi (ed) CP’2003: principles and practice of constraint programming, October 2003. Lecture notes in computer science, vol 2833. Springer, pp 452–465

    Google Scholar 

  84. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89(6):2195–2199

    Google Scholar 

  85. Krippahl L, Barahona P (2005) Applying constraint programming to rigid body protein docking. In: van Beek P (ed) CP’2005: principles and practice of constraint programming. Lecture notes in computer science, vol 3709. Springer, Berlin, pp 373–387

    MATH  Google Scholar 

  86. Krippahl L, Moura JJ, Palma PN (2003) Modeling protein complexes with bigger. Proteins 52(1):19–23

    Google Scholar 

  87. Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein–protein docking approach based on biochemical and/or biophysical information. J Am Chem Soc 125:1731–1737

    Google Scholar 

  88. de Vries SJ, van Dijk ADJ, Krzeminski M, van Dijk M, Thureau A, Hsu V, Wassenaar T, Bonvin AMJJ (2007) HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 69:726–733

    Google Scholar 

  89. Kitano H (ed) (2001) Foundations of system biology. MIT Press, Camdridge

    Google Scholar 

  90. Bower JM, Bolouri H (eds) (2001) Computational modeling of genetic and biochemical networks. MIT Press, Camdridge

    Google Scholar 

  91. Kauffman SA (1993) The origins of order. Oxford University Press, New York

    Google Scholar 

  92. Thieffry D, Thomas R (1998) Qualitative analysis of gene networks. Pac Symp Biocomput 3:77–88

    Google Scholar 

  93. Reddy VN, Mavrovouniotis ML, Liebman ML (1993) Petri net representation in metabolic pathways. Proc Int Conf Intell Syst Mol Biol 1:328–336

    Google Scholar 

  94. Regev A, Silverman W, Shapiro E (2001) Representation and simulation of bio-chemical processes using the pcalculus process algebra. Pac Symp Biocomput 6:459–470

    Google Scholar 

  95. Cardelli L (2005) Abstract machines of systems biology. Trans Comput Syst Biol 3737: 145–168

    MATH  Google Scholar 

  96. Calzonne L, Fages F, Soliman S (2006) BIOCHAM. An environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics 22(14):1805–1807

    Google Scholar 

  97. Cruz J, Barahona P (2003) Constraint satisfaction differential problems. In: Proceedings of CP03. Lecture notes in computer science, vol 2833, pp 259–273

    MATH  Google Scholar 

  98. Cruz J, Barahona P (2005) Constraint reasoning in deep biomedical models. Artif Intell Med 34:77–88

    Google Scholar 

  99. Bockmayr A, Courtois A (2002) Using hybrid concurrent constraint programming to model dynamic biological systems. In: ICLP02. Lecture notes in computer science, vol 2401, pp 85–99

    MATH  Google Scholar 

  100. Dooms G, Deville Y, Dupont P (2005) CP (Graph): introducing a graph computation domain in constraint programming. In: Proceedings of CP05. Lecture notes in computer science, vol 3709, pp 211–225

    MATH  Google Scholar 

  101. Gebser M, Schaub T, Thiele S, Usadel B, Veber P (2008) Detecting inconsistencies in large influence networks with answer set programming. In: International conference on logic programming, 2008

    MATH  Google Scholar 

  102. Dworschak S, Grell S, Nikiforova VJ, Schaub T, Selbig J (2008) Modeling biological networks by action languages via answer set programming. Constraints 13(1–2):21–65

    MathSciNet  MATH  Google Scholar 

  103. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Barahona .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media LLC

About this chapter

Cite this chapter

Barahona, P., Krippahl, L., Perriquet, O. (2011). Bioinformatics: A Challenge to Constraint Programming. In: van Hentenryck, P., Milano, M. (eds) Hybrid Optimization. Springer Optimization and Its Applications, vol 45. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1644-0_14

Download citation

Publish with us

Policies and ethics