Abstract
Genome-wide variant detection within a species is the primary initial step towards linking genotypic variation and phenotypes. The conversion of these genetic variants (the most prevalent of these being single-nucleotide polymorphisms or SNPs) into genetic markers is particularly important in agronomically valuable crop species to allow for cost-effective marker-assisted selection strategies, whole-genome fingerprinting, association studies, map-based gene cloning and population-based analyses. Towards these goals, an increasing number of large-scale genetic variant discovery initiatives are being undertaken in conjunction with next-generation sequencing platforms, allowing for drastically quicker and cheaper variant discovery, and leading towards a far more comprehensive view of the genome or transcriptome. This review will summarize the current status of these initiatives and will discuss the expanding role of next-generation sequencing technologies in facilitating crop improvement.
Similar content being viewed by others
References
Ahn SM, Kim TH, Lee S et al (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res 19:1622–1629
Albert TJ, Molla MN, Muzny DM et al (2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods 4:903–905
Altshuler D, Pollara VJ, Cowles CR et al (2000) An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513–516
Amaral AJ, Megens HJ, Kerstens HHD et al (2009) Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome. BMC Genomics 10:374
Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208–218
Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (2007) SNP discovery via 454 transcriptome sequencing. Plant J 51:910–918
Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K (2007) High-resolution profiling of histone methylations in the human genome. Cell 129:823–837
Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent genome size variation in flowering plants. Ann Bot 95:127–135
Bentley DR, Balasubramanian S, Swerdlow HP et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59
Buetow KH, Edmonson MN, Cassidy AB (1999) Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet 21:323–325
Bundock PC, Eliott FG, Ablett G, Benson AD, Casu RE, Aitken KS, Henry RJ (2009) Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploidy plant species using 454 sequencing. Plant Biotechnol J 7:347–354
Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330
Chaisson M, Pevzner P, Tang H (2004) Fragment assembly with short reads. Bioinformatics 20:2067–2074
Cheung F, Win J, Lang JM, Hamilton J, Vuong H, Leach JE, Kamoun S, Levesque AC, Tisserat N, Buell CR (2008) Analysis of the Pythium ultimum transcriptome using Sanger and pyrosequencing approaches. BMC Genomics 9:542
Choi IY, Hyten DL, Matukimalli LK et al (2007) A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics 176:685–696
Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H (2009) Continuous base identification for single molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270
Clifton SW, Mitreva M (2009) Strategies for undertaking expressed sequence tag (EST) projects. Methods Mol Biol 533:13–32
Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE (2008) Shotgun bisulphate sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452:215–219
De Bona F, Ossowski S, Schneeberger K, Rätsch G (2008) Optimal spliced alignments of short sequence reads. Bioinformatics 24:i174
Diguistini S, Liao NY, Platt D et al (2009) De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol 10:R94
Dila D, Sutherland E, Moran L, Slatko B, Raleigh EA (1990) Genetic and sequence organization of the mcrBC locus of Escherichia coli K-12. J Bacteriol 172:4888–4900
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17:1697–1706
Dressman D, Yan H, Traverso G, Kinzler KW, Vogelstein B (2003) Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proc Natl Acad Sci 100:8817–8822
Drmanac R, Sparks AB, Callow MJ et al. (2009) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science doi:10.1126/Science.1181498
Duran C, Appleby N, Clark T et al (2009) AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res 37:D951–D953
Eck SH, Benet-Pages A, Flisikowski K, Meitinger T, Fries R, Strom TM (2009) Whole genome sequencing of a single Bos taurus animal for single nucleotide polymorphism discovery. Genome Biol 10:R82
Eid J, Fehr A, Gray J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
Emrich SJ, Li L, Wen T-J et al (2007a) Nearly identical paralogs: Implications for maize (Zea mays L.) genome evolution. Genetics 175:429–439
Emrich SJ, Barbazuk WB, Li L, Schanble PS (2007b) Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res 17:69–73
Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ (2008) Alta-cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 5:679–682
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Fahlgren N, Howell MD, Kasschau KD et al (2007) High-throughput sequencing of Arabidopsis microRNAs: evidence for frequenct birth and death of MIRNA genes. PLoS ONE 2:e219
FAO (2000) Global forest resources assessment 2000—Main report. FAO Forestry Paper 140
Fedurco M, Romieu A, Williams S, Lawrence I, Turcatti G (2006) BTA, a novel reagent for DNA attachement on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res 34:e22
Fellers JP (2008) Genome filtering using methylation-sensitive restriction enzymes with six base pair recognition sites. Plant Genome 1:146–152
Goff SA, Ricke D, Lan TH et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100
Gore MA, Wright MH, Ersoz ES et al (2009) Large-scale discovery of gene-enriched SNPs. Plant Genome 2:121–133
Grover CE, Hawkins JS, Wendel JF (2008) Phylogenetic insights into the pace and pattern of plant genome size evolution. In: Volff J-N (ed) Plant genomes. Karger, Basel, pp 57–68
Harris TD, Buzby PR, Babcock H et al (2008) Single-molecule DNA sequencing of a viral genome. Science 320:106–109
Hillier LW, Marth GT, Quinlan AR et al (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5:183–188
Hodges E, Xuan Z, Balija V et al (2007) Genome-wide in situ exon capture for selective resequencing. Nat Genet 39:1522–1527
Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877
Huang X, Feng Q, Qian Q et al (2009) High-throughput genotyping by whole-genome resequencing. Genome Res 19:1068–1076
Hunkapiller T, Kaiser RJ, Koop BF, Hood L (1991) Large-scale and automated DNA sequence determination. Science 254:59–67
Initiative TheArabidopsisGenome (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815
IRGSP (2005) The map-based sequence of the rice genome. Nature 436:793–800
Jaillon O, Aury JM, Noel B et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467
Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER, Dangl JL, Jones CD (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23:2942–2944
Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vitro protein-DNA interactions. Science 316:1497–1502
Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC (2007) Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol 5:e57
Kerstens HHD, Crooijmans RPMA, Veenendaal A, Dibbits BW, Chin-A-Woeng TFC, den Dunnen JT, Groenen MAM (2009) Large scale single nucleoptide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey. BMC Genomics 10:479
Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Langmead B, Trapnell C, Pop M, Saltzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Li R, Li Y, Kristiansen K, Wang J (2008a) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714
Li H, Ruan J, Durbin R (2008b) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858
Li JB, Gao Y, Aach J et al (2009) Multiplex padlock targeted sequencing reveal human hypermutable CpG variations. Genome Res 19:1606–1615
Lijavetzky D, Cabezas JA, Ibanez A, Rodriguez V, Martinez-Zapater JM (2007) High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology. BMC Genomics 8:424
Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536
Lu C, Jeong DH, Kulkarni K et al (2008) Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs). Proc Natl Acad Sci 105:4951–4956
Luckey JA, Drossman H, Kostichka AJ, Mead DA, D’Cunha J, Norris TB, Smith LM (1990) High speed DNA sequencing by capillary electrophoresis. Nucleic Acids Res 18:4417–4421
Maglia G, Restrepo MR, Mikhailova E, Bayley H (2008) Enhanced translocation of single DNA molecules through alpha-hemolysin nanopores by manipulation of internal charge. Proc Natl Acad Sci 105:19720–19725
Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380
Marth GT, Korf I, Yandell MD et al (1999) A general approach to single-nucleotide polymorphism discovery. Nat Genet 23:452–456
Martienssen RA (1998) Transposons, DNA methylation and gene control. Trends Genet 14:263–264
Maughan PJ, Yourstone SM, Jellen EN, Udall JA (2009) SNP discovery via genomic reduction, barcoding, and 454 pyrosequencing in amaranth. Plant Genome 2:260–270
McKernan KJ, Peckham HE, Costa G et al. (2009) Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two base encoding. Genome Res doi:10.1101/gr.091868.109
Meyers BC, Tingey SV, Morgante M (2001) Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res 11:1660–1676
Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628
Moskal WA Jr, Wu HC, Underwood BA, Wang W, Town CD, Xiao Y (2007) Experimental validation of novel genes predicted in the unannotated regions of the Arabidopsis genome. BMC Genomics 8:18
Ng SB, Turner EH, Robertson PD et al (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461:272–276
Nobuta K, Lu C, Shrivastava R et al (2008) Distinct size distribution of endogenous siRNAs in maize: evidence from deep sequencing in the mop1–1 mutant. Proc Natl Acad Sci 105:14958–14963
Novaes E, Drost DR, Farmerie WG, Pappas GJ Jr, Grattapaglia D, Sederoff RR, Kirst M (2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics 9:312
Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods 4:907–909
Ossowski S, Schneeberger K, Clark RM et al (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033
Ozsolak F, Platt AR, Jones DR et al (2009) Direct RNA sequencing. Nature 461:814–818
Palmer JD, Shields CR, Cohen DB, Orton TJ (1983) Chloroplast DNA evolution and the origin of amphidiploid Brassica species. Theor Appl Genet 65:181–189
Palmer LE, Rabinowicz PD, O’Shaughnessy AL et al (2003) Maize genome sequencing by methylation filtration. Science 302:2115–2117
Parkin IAP, Sharpe PAG, Keith DJ, Lydiate DJ (1995) Identification of the A and C genomes of the amphidiploid Brassica napus (oilseed rape). Genome 38:1122–1131
Parkinson J, Blaxter M (2009) Expressed sequence tags: an overview. Methods Mol Biol 533:1–12
Paterson AH, Bowers JE, Bruggmann R et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457:551–556
Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci 98:9748–9753
Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M (1999) Mining SNPs from EST databases. Genome Res 9:167–174
Pop M, Salzberg SL (2007) Bioinformatics challenges of new sequencing technology. Trends Genet 24:142–149
Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, Baumeister K (1987) A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238:336–341
Pushkarev D, Neff NF, Quake SR (2009) Single-molecule sequencing of an individual human genome. Nat Biotechnol 27:847–852
Quinlan AR, Stewart DA, Stramberg MP, Marth GT (2008) PyroBayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods 5:179–181
Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA (1999) Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat Genet 23:305–308
Rabinowicz P, McCombie WR, Martienssen RA (2003) Gene enrichment in plant genomic shotgun libraries. Curr Opin Plant Biol 6:150–156
Rabinowicz PD, Citek R, Budiman MA et al (2005) Differential methylation of genes and repeats in land plants. Genome Res 15:1431–1440
Raleigh EA, Wilson G (1986) Escherichia coli K-12 restricts DNA containing 5-methylcytosine. Proc Natl Acad Sci 83:9070–9074
Ramos AM, Crooijmans RPMA, Affara AJ et al (2009) Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS ONE 4:e6524
Ren X-Y, Vorst O, Fiers MWEJ et al (2006) In plants, highly expressed genes are the least compact. Trends Genet 22:528–532
Roe BA (2004) Shotgun library construction for DNA sequencing. Methods Mol Biol 255:171–187
Ronaghi M (2001) Pyrosequencing sheds light on DNA sequencing. Genome Res 11:3–11
Rostoks N, Park YJ, Ramakrishna W et al (2002) Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Funct Integr Genomics 2:51–59
Rostoks N, Mudie S, Cardle L et al (2005) Genome-wide SNP discovery and linkage analysis in barley based on genes responsive to abiotic stress. Mol Genet Genomics 274:515–527
Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno M (2009) SHRIMP: accurate mapping of short color-space reads. PLoS Comput Biol 5:e10000386
Rusk N (2009) Cheap third-generation sequencing. Nat Methods 6:244–245
Sachidanandam R, Weissman D, Schmidt SC et al (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933
Sakharkar MK, Chow VTK, Kangueane P (2004) Distributions of exons and introns in the human genome. In Silico Biol 4:387–393
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci 74:5463–5467
SanMiguel P, Vitte C (2008) The LTR-retrotransposons of maize. In: Bennetzen JL, Hake S (eds) Handbook of maize-volume II: domestication, genetics and genomics. Springer, Netherlands, p 307
SanMiguel P, Tikhonov A, Jin YK et al (1996) Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768
Shcheglov AS, Zhulidov PA, Bogdanova EA, Shagin DA (2007) Normalization of cDNA libraries. In: Buzdin AA, Lukyanov SA (eds) Nucleic acids hybridizations: modern applications. Springer, Netherlands, pp 97–102
Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145
Shendure J, Porreca GJ, Reppas NB et al (2005) Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309:1728–1732
Smailus DE, Marziali A, Dextras P et al (2006) Simple, robust methods for high-throughput nanoliter-scale DNA sequencing. Genome Res 15:1447–1450
Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321:674–679
Smith AD, Xuan Z, Zhang MQ (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9:128
Sultan M, Schulz MH, Richard H et al (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–960
Sundquist A, Ronaghu M, Tang H et al (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One 2:484
Sunkar R, Zhou X, Zheng Y, Zhang W, Zhu JK (2008) Identification of novel and candidate miRNAs in rice by high throughput sequencing. BMC Plant Biol 8:25
Swerdlow H, Gesteland R (1990) Cappilary gel electrophoresis for rapid, high resolution DNA sequencing. Nucleic Acids Res 18:1415–1419
Swigonova Z, Lai J, Ma J et al (2004) Close split of sorghum and maize genome progenitors. Genome Res 14:1916–1923
Tewhey R, Warner JB, Nakano M et al. (2009) Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nat Biotechnol doi:10.1038/nbt.1583
Trick M, Long Y, Meng J, Bancroft I (2009a) Single nucleotide polymorphism (SNP) discovery in the polyploidy Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J 7:334–346
Trick M, Cheung F, Drou N, Fraser F, Lobenhofer EK, Hurban P, Magusin A, Town CD, Bancroft I (2009b) A newly-developed community microarray resource for transcriptome profiling in Brassica species enables the confirmation of Brassica-specific expressed sequences. BMC Plant Biol 9:50
Turcatti G, Romieu A, Fedurco M, Tairi AP (2008) A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res 36:e25
Tuskan GA, DiFazio S, Jansson S et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604
Useche FJ, Gao G, Hanafey M, Rafalski A (2001) High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform 12:194–203
Van Tassel CP, Smith TPL, Matukumalli LK et al (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5:247–252
Velasco R, Zharkikh A, Troggio M et al (2007) A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2:e1326
Wang J, Wang W, Li R et al (2008) The diploid sequence of an Asian individual. Nature 456:60–65
Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23:500–501
Wheeler DA, Srinivasan M, Egholm M et al (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872–876
Whiteford N, Haslam N, Weber G et al (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res 33:e171
Wicker T, Schlagenhauf E, Graner A, Close TJ, Keller B, Stein N (2006) 454 sequencing put to the test using the complex genome of barley. BMC Genomics 7:275
Wilhelm B, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett C, Rogers J, Bähler J (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453:1239–1243
Yu J, Hu S, Wang J et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deschamps, S., Campbell, M.A. Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery. Mol Breeding 25, 553–570 (2010). https://doi.org/10.1007/s11032-009-9357-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11032-009-9357-9