TBC: A clustering algorithm based on prokaryotic taxonomy

Lee, Jae-Hak; Yi, Hana; Jeon, Yoon-Seong; Won, Sungho; Chun, Jongsik

doi:10.1007/s12275-012-1214-6

TBC: A clustering algorithm based on prokaryotic taxonomy

Articles
Published: 27 April 2012

Volume 50, pages 181–185, (2012)
Cite this article

The Journal of Microbiology Aims and scope Submit manuscript

Jae-Hak Lee¹,
Hana Yi²,
Yoon-Seong Jeon^1,5,
Sungho Won³ &
…
Jongsik Chun^1,2,4,5

473 Accesses
22 Citations
Explore all metrics

Abstract

High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw.ezbiocloud.net/tbc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.25, 3389–3402.
Article PubMed CAS Google Scholar
Bacon, D.J. and Anderson, W.F. 1986. Multiple sequence alignment. J. Mol. Biol.191, 153–161.
Article PubMed CAS Google Scholar
Cai, Y. and Sun, Y. 2011. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. doi:10.1093/nar/gkr349.
Cameron, M., Bernstein, Y., and Williams, H.E. 2007. Clustered sequence representation for fast homology search. J. Comput. Biol.14, 594–614.
Article PubMed CAS Google Scholar
Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat.11, 265–270.
Google Scholar
Chao, A.L. and Lee, S.M. 1992. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc.87, 210–217.
Google Scholar
Chao, A.M., Ma, M.C., and Yang, M.C.K. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika80, 193–201.
Article Google Scholar
Chun, J., Kim, K.Y., Lee, J.H., and Choi, Y. 2010. The analysis of oral microbial communities of wild-type and toll-like receptor 2-deficient mice using a 454 GS FLX Titanium pyrosequencer. BMC Microbiol.10, 101.
Article PubMed Google Scholar
Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.32, 1792–1797.
Article PubMed CAS Google Scholar
Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics26, 2460–2461.
Article PubMed CAS Google Scholar
Hamady, M. and Knight, R. 2009. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res.19, 1141–1152.
Article PubMed CAS Google Scholar
Hurlbert, S.H. 1971. The non-concept of species diversity: a critique and alternative parameters. Ecology52, 577–586.
Article Google Scholar
Kuenne, C.T., Ghai, R., Chakraborty, T., and Hain, T. 2007. GECO — linear visualization for comparative genomics. Bioinformatics23, 125–126.
Article PubMed CAS Google Scholar
Li, W. and Godzik, A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics22, 1658–1659.
Article PubMed CAS Google Scholar
Li, W., Jaroszewski, L., and Godzik, A. 2001. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics17, 282–283.
Article PubMed CAS Google Scholar
Li, W., Jaroszewski, L., and Godzik, A. 2002. Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng.15, 643–649.
Article PubMed CAS Google Scholar
Li, W., Wooley, J.C., and Godzik, A. 2008. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS One3, e3375.
Article PubMed Google Scholar
Ling, Z., Kong, J., Liu, F., Zhu, H., Chen, X., Wang, Y., Li, L., Nelson, K.E., Xia, Y., and Xiang, C. 2010. Molecular analysis of the diversity of vaginal microbiota associated with bacterial vaginosis. BMC Genomics11, 488.
Article PubMed Google Scholar
Metzker, M.L. 2010. Sequencing technologies — the next generation. Nat. Rev. Genet.11, 31–46.
Article PubMed CAS Google Scholar
Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. Comput. Appl. Biosci.4, 11–17.
PubMed CAS Google Scholar
Petrosino, J.F., Highlander, S., Luna, R.A., Gibbs, R.A., and Versalovic, J. 2009. Metagenomic pyrosequencing and microbial identification. Clin. Chem.55, 856–866.
Article PubMed CAS Google Scholar
Retief, J.D. 2000. Phylogenetic analysis using PHYLIP. Methods Mol. Biol.132, 243–258.
PubMed CAS Google Scholar
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., andet al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol.75, 7537–7541.
Article PubMed CAS Google Scholar
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res.22, 4673–4680.
Article PubMed CAS Google Scholar
Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., andet al. 1987. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Bacteriol.37, 463–464.
Article Google Scholar
Yang, F., Zhu, Q., Tang, D., and Zhao, M. 2009. Using affinity propagation combined post-processing to cluster protein sequences. Protein Pept. Lett.17, 681–689.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Graduate Program in Bioinformatics, Seoul National University, Seoul, 151-742, Republic of Korea
Jae-Hak Lee, Yoon-Seong Jeon & Jongsik Chun
Inst. of Molecular Biology and Genetics, Seoul National University, Seoul, 151-742, Republic of Korea
Hana Yi & Jongsik Chun
Department of Statistics, Chung-Ang University, Seoul, 156-756, Republic of Korea
Sungho Won
School of Biological Sciences and Advanced Inst. of Convergence Tech., Seoul National University, Seoul, 151-742, Republic of Korea
Jongsik Chun
Chunlab, Inc., Seoul National University, Bldg 138 Rm 318, Seoul, 151-742, Republic of Korea
Yoon-Seong Jeon & Jongsik Chun

Authors

Jae-Hak Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hana Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yoon-Seong Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Sungho Won
View author publications
You can also search for this author in PubMed Google Scholar
Jongsik Chun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jongsik Chun.

Additional information

Supplemental material for this article may be found at http://www.springer.com/content/120956

Electronic supplementary material

Supplementary material, approximately 68.0 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, JH., Yi, H., Jeon, YS. et al. TBC: A clustering algorithm based on prokaryotic taxonomy. J Microbiol. 50, 181–185 (2012). https://doi.org/10.1007/s12275-012-1214-6

Download citation

Received: 25 April 2011
Accepted: 07 November 2011
Published: 27 April 2012
Issue Date: April 2012
DOI: https://doi.org/10.1007/s12275-012-1214-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TBC: A clustering algorithm based on prokaryotic taxonomy

Abstract

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

The Family Enterobacteriaceae

A practical guide to amplicon and metagenomic analysis of microbiome data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 68.0 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TBC: A clustering algorithm based on prokaryotic taxonomy

Abstract

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

The Family Enterobacteriaceae

A practical guide to amplicon and metagenomic analysis of microbiome data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 68.0 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation