skip to main content
10.1145/1143844.1143917acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Pachinko allocation: DAG-structured mixture models of topic correlations

Published:25 June 2006Publication History

ABSTRACT

Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.

References

  1. Blei, D., Griffiths, T., Jordan, M., & Tenenbaum, J. (2004). Hierarchical topic models and the nested chinese restaurant process. In Advances in neural information processing systems 16.Google ScholarGoogle Scholar
  2. Blei, D., & Lafferty, J. (2006). Correlated topic models. In Advances in neural information processing systems 18.Google ScholarGoogle Scholar
  3. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association.Google ScholarGoogle ScholarCross RefCross Ref
  5. Diggle, P., & Gratton, R. (1984). Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society.Google ScholarGoogle Scholar
  6. Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences (pp. 5228--5235).Google ScholarGoogle ScholarCross RefCross Ref
  7. Lawrie, D., Croft, W., & Rosenberg, A. (2001). Finding topic words for hierarchical summarization. Proceedings of SIGIR'01 (pp. 349--357). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Newton, M., & Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society.Google ScholarGoogle Scholar
  9. Teh, Y., Jordan, M., Beal, M., & Blei, D. (2005). Hierarchical Dirichlet processes. Journal of the American Statistical Association.Google ScholarGoogle Scholar

Index Terms

  1. Pachinko allocation: DAG-structured mixture models of topic correlations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '06: Proceedings of the 23rd international conference on Machine learning
          June 2006
          1154 pages
          ISBN:1595933832
          DOI:10.1145/1143844

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader