Abstract
Bioinformatics is a rapidly growing field at the intersection of biology and computer science. As such, it poses a wealth of problems, opportunities, and challenges for both areas. This paper overviews some of these issues, with an emphasis on those that seem most amenable to constraint programming (CP) approaches and where CP has made some progress. Since bioinformatics is tightly focused on real-life applications, this paper does not expand on theoretical principles but, rather, tries to give an idea of the practical issues. At this light, the paper briefly presents the selected problems together with the solutions found so far, that illustrate the versatility of CP techniques that have been used in this area and the need to integrate them with other complementary techniques to handle realistic applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45:810–825
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Roland HC Yap (2001) Parametric sequence alignment with constraints. Constraints 6(2–3):157–172
Will S, Busch A, Backofen R (2008) Efficient sequence alignment with side-constraints by cluster tree elimination. Constraints 13(1–2):110–129
Carlsson M, Beldiceanu N (2004) Multiplex dispensation order generation for pyrosequencing. In: CP’2004 workshop on CSP techniques with immediate application, Toronto, Canada, 27 September 2004
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2004) GenBank: update. Nucleic Acids Res 32(Database issue):D23–D26
Gent IP, Prosser P, Smith BM, Wei W (2003) Supertree construction using constraint programming. In: Proc CP2003. Lecture notes in computer science, vol 2833. Springer, Berlin
Moore NC, Prosser P (2008) The ultrametric constraint and its application to phylogenetics. J Artif Intell Res 32:901–938
Brooks DR, Erdem E, Erdogan ST, Minett JW, Ringe D (2007) Inferring phylogenetic trees using answer set programming. J Automat Reason 39(4):471–511
Wu G, You JH, Lin G (2007) Quartet-based phylogeny reconstruction with answer set programming. IEE/ACM Trans Comput Biol Bioinform 4(1):139–152
Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 77:111–122
Gusfield D (2003) Haplotype inference by pure parsimony. In: 14th Annual symposium on combinatorial pattern matching (CPM03). Springer, Heidelberg, pp 144–155
Huang Y-T et al (2005) An approximation algorithm for haplotype inference by maximum parsimony. J Comput Biol 12:1261–1274
Lancia G et al (2004) Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS J Comput 16:348–359
Cilibrasi R et al (2005) On the complexity of several haplotyping problems. In: 5th Workshop on algorithms in bioinformatics (WABI 2005). Springer, Mallorca, pp 128–139
Sharan R et al (2006) Islands of tractability for parsimony haplotyping. IEEE/ACM Trans Comput Biol Bioinform 3:303–311
Brown D, Harrower I (2006) Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Trans Comput Biol Bioinform 3(2):141–154
Wang L, Xu Y (2003) Haplotype inference by maximum parsimony. Bioinformatics 19:1773–1780
Lynce I, Marques-Silva J (2006) Efficient haplotype inference with Boolean satisfiability. In: AAAI conference on artificial intelligence, pages 104109, July 2006
Lynce I, Marques-Silva J, Prestwich S (2008) Boosting haplotype inference with local search. Constraints 13(1):155–179
Lynce I, Graa A, Marques-Silva J, Oliveira AL (2008) Haplotype inference with boolean constraint solving: an overview. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence (ICTAI 08), Dayton, OH, 2008
Erdem E, Erdem O, Türe F (2009)In: HAplo-ASP: haplotype inference using answer set programming, LPNMR09. Lecture notes in computer science, vol 5753. Springer, Berlin, pp 573–578
Benedettini S, Roli A, Di Gaspero L (2008) Two-level ACO for haplotype inference under pure parsimony. In: ANTS conference, 2008, pp 179–190
Climer S, Jäger G, Templeton AR, Zhang W (2009) How frugal is mother nature with haplotypes? Bioinformatics 25(1):68–74
Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2(12):919–929
Tinoco I, Bustamante C (1999) How RNA folds. J Mol Biol 293:271–281
Moore PB (1999) The RNA folding problem. In: The RNA world, 2nd edn. CSHL Press, Cold Spring Harbor, pp 381–401
Waterman MS (1995) RNA secondary structure. In: Introduction to computational biology. Chapman and Hall, London, pp 327–343
Wu M, Tinoco I (1998) RNA folding causes secondary structure rearrangement. Proc Natl Acad Sci USA 95:11555–11560
Capriotti E, Marti-Renom MA (2008) Computational RNA structure prediction. Curr Bioinform 3(1):32–45
Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77:6309–6313
Jaeger JA, Turner DH, Zuker M (1989) Improved predictions of secondary structures for RNA. Proc Natl Acad Sci USA 86:7706–7710
Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454
Gaspin C, Westhof E (1995) An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. J Mol Biol 254(2):163–174
Gaspin C (2001) RNA secondary structure determination and representation based on constraints satisfaction. Constraints 6(2–3):201–221
Thebault P, de Givry S, Schiex T, Gaspin C (2006) Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics 22(17):2074–2080
Zytnicki M, Gaspin C, Schiex T (2008) Darn! a weighted constraint solver for RNA motif localization. Constraints 13(1–2):91–109
Billoud B, Kontic M, Viari A (1996) Palingol: declarative programming language to describe nucleic acids secondary structures and to scan sequence databases. Nucleic Acids Res 24(8):1395–1404
Harmanci AO, Sharma G, Mathews DH (2007) Efficient pairwise RNA structure prediction using probabilistic alignment constraints in dynalign. BMC Bioinformatics 8:130
Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400
Major F, Turcotte M, Gautheret D, Lapalme G, Fillion E, Cedergren R (1991) The combination of symbolic and numerical computation for three-dimensional modeling of RNA. Science 253:1255–1260
Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165-2pc]Reference [44] is not cited in text. Please provide appropriate text citation or remove from list.
Gautheret D, Major F, Cedergren R (1993) Modeling the threedimensional structure of RNA using discrete nucleotide conformational sets. J Mol Biol 229:1049–1064
Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165
Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res 20:5785–5795
Leontis NB, Lescoute A, Westhof E (2006) The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 16(3):279–287
Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10(2):139–145
Shirts MR, Pande VS (2000) Screen savers of the world, unite! Science 290:1903–1904
Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding – a perspective of simple exact models. Protein Sci 4:561–602
Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22:3986–3997
Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A, Yannakakis M ( 1998) On the complexity of protein folding. J Comput Biol 5(3):423–466
Berger B, Leighton T (1998) Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. J Comput Biol 5(3):27–40
Yue K, Dill KA (1996) Folding proteins with a simple energy function and extensive conformational search. Protein Sci 5(2):254–261
Abkevitch VI, Gutin AM, Shakhnovich EI (1995) Impact of local and non-local interactions oin thermodynamics and kinetics of protein folding. J Mol Biol 252:460–471
Unger R, Moult J (1996) Local interactions dominate folding in a simple protein model. J Mol Biol 259:988–994
Hinds DA, Levitt M (1996) From structure to sequence and back again. J Mol Biol 258:201–209
Bornberg-Bauer E (1997) Chain growth algorithms for HP-type lattice proteins. In: Proceedings of RECOMB97. 1st International conference on Research in computational molecular biology, pp 47–55
Backofen R (1998) Constraint techniques for solving the protein structure prediction problem. In: Proceedings of CP98. Lecture notes in computer science, vol 1520, pp 72–86
Backofen R, Will S (2002) Excluding symmetries in constraint-based search. Constraints 7(3):333–349
Backofen R, Will S, Bornberg-Bauer E (1999) Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets. Bioinformatics 15(3):234–242
Bagci Z, Jernigan RL, Bahar I (2002) Residue coordination in proteins conforms to the closest packing of spheres. Polymer 43:451–459
Cipra B (1998) Packing challenge mastered at last. Science 281:1267
Park BH, Levitt M (1995) The complexity and accuracy of discrete state models of protein structure. J Mol Biol 249:493–507
Cooperativity in protein-folding kinetics. Proc Natl Acad Sci USA 90:1942–1946 (1993)
Unger R, Moult J (1993) Genetic algorithms for protein folding simulations. J Mol Biol 231:75–81
Agarwala R, Batzoglou S, Dancik V, Decatur SE, Farach M, Hannenhalli S, Muthukrishnan S, Skiena S (1997) Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP-model. J Comput Biol 4(2):275–296
Backofen R, Will S (2006) A constraint-based approach to fast and exact structure prediction in three-dimensional protein models. Constraints 11(1):5–30
Backofen R, Will S (2001) Fast, constraint-based threading of HP-sequences to hydrophobic cores. In: Proceedings of CP01. Lecture notes in computer science, vol 2239, pp 494–508
Cebrian M, Dotu I, Van Hentenryck P, Clote P (2008) Protein structure prediction on the face centered cubic lattice by local search. In: Proceedings of AAAI08, pp 241–245
Dot I, Cebrin M, Van Hentenryck P, Clote P (2008) Protein structure prediction with large neighborhood constraint programming search. In: Proceedings of CP08. Lecture notes in computer science, vol 5202, pp 82–96
Dal Pal A, Dovier A, Fogolari F (2004) Constraint logic programming approach to protein structure prediction. BMC Bioinformatics 5:186
Dal Pal A, Dovier A, Pontelli E (2007) A constraint solver for discrete lattices, its paralelization and application to protein structure prediction. Software Pract Ex 37(13):1405–1449
Cipriano R, Pal AD, Dovier A (2008) A hybrid approach mixing local search and constraint programming applied to the protein structure prediction problem. In: Proceedings of WCB08, Paris, May 2008
Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40
Fischer D (2006) Servers for protein structure prediction. Curr Opin Struct Biol 16:178–182
Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CE, Baker D (2001) Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins 45(S5)119–126
Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289
Skolnick J, Kolinski A, Kihara D, Betancourt M, Rotkiewicz P, Boniecki M (2001) Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement. PROTEINS Suppl 5:149–156
Gntert P, Mumenthaler C, Wthrich K (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273:283–298
Krippahl L, Barahona P (2002) PSICO: solving protein structures with constraint programming and optimisation. Constraints 7:317–331
Krippahl L, Barahona P (2003) Propagating N-ary rigid-body constraints. In: Francesca Rossi (ed) CP’2003: principles and practice of constraint programming, October 2003. Lecture notes in computer science, vol 2833. Springer, pp 452–465
Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89(6):2195–2199
Krippahl L, Barahona P (2005) Applying constraint programming to rigid body protein docking. In: van Beek P (ed) CP’2005: principles and practice of constraint programming. Lecture notes in computer science, vol 3709. Springer, Berlin, pp 373–387
Krippahl L, Moura JJ, Palma PN (2003) Modeling protein complexes with bigger. Proteins 52(1):19–23
Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein–protein docking approach based on biochemical and/or biophysical information. J Am Chem Soc 125:1731–1737
de Vries SJ, van Dijk ADJ, Krzeminski M, van Dijk M, Thureau A, Hsu V, Wassenaar T, Bonvin AMJJ (2007) HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 69:726–733
Kitano H (ed) (2001) Foundations of system biology. MIT Press, Camdridge
Bower JM, Bolouri H (eds) (2001) Computational modeling of genetic and biochemical networks. MIT Press, Camdridge
Kauffman SA (1993) The origins of order. Oxford University Press, New York
Thieffry D, Thomas R (1998) Qualitative analysis of gene networks. Pac Symp Biocomput 3:77–88
Reddy VN, Mavrovouniotis ML, Liebman ML (1993) Petri net representation in metabolic pathways. Proc Int Conf Intell Syst Mol Biol 1:328–336
Regev A, Silverman W, Shapiro E (2001) Representation and simulation of bio-chemical processes using the pcalculus process algebra. Pac Symp Biocomput 6:459–470
Cardelli L (2005) Abstract machines of systems biology. Trans Comput Syst Biol 3737: 145–168
Calzonne L, Fages F, Soliman S (2006) BIOCHAM. An environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics 22(14):1805–1807
Cruz J, Barahona P (2003) Constraint satisfaction differential problems. In: Proceedings of CP03. Lecture notes in computer science, vol 2833, pp 259–273
Cruz J, Barahona P (2005) Constraint reasoning in deep biomedical models. Artif Intell Med 34:77–88
Bockmayr A, Courtois A (2002) Using hybrid concurrent constraint programming to model dynamic biological systems. In: ICLP02. Lecture notes in computer science, vol 2401, pp 85–99
Dooms G, Deville Y, Dupont P (2005) CP (Graph): introducing a graph computation domain in constraint programming. In: Proceedings of CP05. Lecture notes in computer science, vol 3709, pp 211–225
Gebser M, Schaub T, Thiele S, Usadel B, Veber P (2008) Detecting inconsistencies in large influence networks with answer set programming. In: International conference on logic programming, 2008
Dworschak S, Grell S, Nikiforova VJ, Schaub T, Selbig J (2008) Modeling biological networks by action languages via answer set programming. Constraints 13(1–2):21–65
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media LLC
About this chapter
Cite this chapter
Barahona, P., Krippahl, L., Perriquet, O. (2011). Bioinformatics: A Challenge to Constraint Programming. In: van Hentenryck, P., Milano, M. (eds) Hybrid Optimization. Springer Optimization and Its Applications, vol 45. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1644-0_14
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1644-0_14
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1643-3
Online ISBN: 978-1-4419-1644-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)