ABSTRACT
Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.
- Blei, D., Griffiths, T., Jordan, M., & Tenenbaum, J. (2004). Hierarchical topic models and the nested chinese restaurant process. In Advances in neural information processing systems 16.Google Scholar
- Blei, D., & Lafferty, J. (2006). Correlated topic models. In Advances in neural information processing systems 18.Google Scholar
- Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022. Google ScholarDigital Library
- Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association.Google ScholarCross Ref
- Diggle, P., & Gratton, R. (1984). Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society.Google Scholar
- Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences (pp. 5228--5235).Google ScholarCross Ref
- Lawrie, D., Croft, W., & Rosenberg, A. (2001). Finding topic words for hierarchical summarization. Proceedings of SIGIR'01 (pp. 349--357). Google ScholarDigital Library
- Newton, M., & Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society.Google Scholar
- Teh, Y., Jordan, M., Beal, M., & Blei, D. (2005). Hierarchical Dirichlet processes. Journal of the American Statistical Association.Google Scholar
Index Terms
- Pachinko allocation: DAG-structured mixture models of topic correlations
Recommendations
Mixtures of hierarchical topics with Pachinko allocation
ICML '07: Proceedings of the 24th international conference on Machine learningThe four-level pachinko allocation model (PAM) (Li & McCallum, 2006) represents correlations among topics using a DAG structure. It does not, however, represent a nested hierarchy of topics, with some topical word distributions representing the ...
A Hierarchical Pachinko Allocation Model for Social Sentiment Mining
Knowledge Science, Engineering and ManagementAbstractExisting topic models for mining sentiments from articles often ignores the fact that intra-topic correlations are common and useful to uncover a large number of fine-grained and tightly-coherent topics. This paper is concerned with the problem of ...
Comments