The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection

Zhu, Linhong; Gao, Sheng; Pan, Sinno Jialin; Li, Haizhou; Deng, Dingxiong; Shahabi, Cyrus

doi:10.1007/978-3-319-14379-8_9

Linhong Zhu⁶,
Sheng Gao⁷,
Sinno Jialin Pan⁷,
Haizhou Li⁷,
Dingxiong Deng⁸ &
…
Cyrus Shahabi⁸

Part of the book series: Lecture Notes in Social Networks ((LNSN))

1337 Accesses
3 Citations

Abstract

Most previous works on opinion summarization focus on summarizing sentiment polarity distribution toward different aspects of an entity (e.g., battery life and screen of a mobile phone). However, users’ demand may be more beyond this kind of opinion summarization. Besides such coarse-grained summarization on aspects, one may prefer to read detailed but concise text of the opinion data for more information. In this paper, we propose a new framework for opinion summarization. Our goal is to assist users to get helpful opinion suggestions from reviews by only reading a short summary with a few informative sentences, where the quality of summary is evaluated in terms of both aspect coverage and viewpoints preservation. More specifically, we formulate the informative sentence selection problem in opinion summarization as a community leader detection problem, where a community consists of a cluster of sentences toward the same aspect of an entity and leaders can be considered as the most informative sentences of the corresponding aspect. We develop two effective algorithms to identify communities and leaders. Reviews of six products from Amazon.com are used to verify the effectiveness of our method for opinion summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.tripadvisor.com/.
2.
Note that \(|\mathcal {N}_{k}(s)|\) can be larger than \(k\) since there could be the event of ties (i.e., a set of neighbors have the same similarity to \(s\)).
3.
Available at http://sites.google.com/site/linhongi2r/data-and-code.
4.
A longer summary is more likely to provide better information but is less concise.
5.
ROUGE-N is a popular toolkit which measures the quality of a summary by comparing it to other reference summaries using \(n\)-gram co-occurrence.

References

Ageev AA, Sviridenko M (1999) Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: Proceedings of the 7th international conference on integer programming and combinatorial optimization, Springer, London, pp 17–30
Google Scholar
Beineke P, Hastie T, Manning C, Vaithyanathan S (2004) Exploring sentiment summarization. In: AAAI spring symposium on exploring attitude and affect in text: theories and applications
Google Scholar
Blair-goldensohn S, Neylon T, Hannan K, Reis GA, Mcdonald R, Reynar J (2008) Building a sentiment summarizer for local service reviews. In: NLP in the information explosion era
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bookstein A (1990) Informetric distributions, part i: unified overview. J Am Soc Inf Sci 41(5):368–375
Article MathSciNet Google Scholar
Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the SIGMOD. ACM, New York, pp 447–458
Google Scholar
Danescu-Niculescu-Mizil C, Kossinets G, Kleinberg JM, Lee L (2009) How opinions are received by online communities: a case study on amazon.com helpfulness votes. In: Proceedings of the 18th WWW, ACM, New York, pp 141–150
Google Scholar
Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: Proceedings of EMNLP, Barcelona, Spain
Google Scholar
Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: COLING, pp 322–330
Google Scholar
Freeman LC (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239
Article Google Scholar
Ganesan K, Zhai C, Han J (2010) Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd COLING
Google Scholar
Heerschop B, Goossen F, Hogenboom A, Frasincar F, Kaymak U, de Jong F (2011) Polarity analysis of texts using discourse structure. In: Proceedings of the 20th CIKM. ACM, New York, pp 1061–1070
Google Scholar
Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A 102(46):16569–16572
Article Google Scholar
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence, pp 289–296
Google Scholar
Hu B, Song Z, Ester M (2012) User features and social networks for topic modeling in online social media. In: ASONAM, pp 202–209
Google Scholar
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD. ACM, New York, pp 168–177
Google Scholar
Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: COLING (Posters), pp 525–533
Google Scholar
Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70:39–45
Article MATH MathSciNet Google Scholar
Kim HD, Ganesan K, Sondhi P, Zhai C (2011) Comprehensive review of opinion summarization
Google Scholar
Kim SM, Pantel P, Chklovski T, Pennacchiotti M (2006) Automatically assessing review helpfulness. In: Proceedings of EMNLP. Association for Computational Linguistics, Stroudsburg, pp 423–430
Chapter Google Scholar
Lerman K, Blair-Goldensohn S, McDonald R (2009) Sentiment summarization: evaluating and learning user preferences. In: Proceedings of the 12th EACL. ACL, Stroudsburg, pp 514–522
Chapter Google Scholar
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI, pp 2488–2493
Google Scholar
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th CIKM. ACM, New York, pp 939–948
Google Scholar
Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the NAACL. ACL, Stroudsburg, pp 71–78
Google Scholar
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th HLT/ACL. ACL, Stroudsburg, pp 510–520
Google Scholar
Liu J, Cao Y, Lin CY, Huang Y, Zhou M (2007) Low-Quality product review detection in opinion summarization. In: Proceedings of the joint conference on EMNLP-CoNLL, pp 334–342
Google Scholar
Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th ICML. AAAI Press, Chicago, pp 496–503
Google Scholar
Lu Y, Zhai C, Sundaresan N (2009) Rated aspect summarization of short comments. In: Proceedings of the 18th WWW. ACM, New York, pp 131–140
Google Scholar
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th WWW. ACM, New York, pp 171–180
Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: EMNLP, pp 404–411
Google Scholar
Muthukrishnan P, Gerrish J, Radev DR (2008) Detecting multiple facets of an event using graph-based unsupervised methods. In: COLING, pp 609–616
Google Scholar
Newman MEJ (2007) The mathematics of networks. The new palgrave encyclopedia of economics pp 1–12
Google Scholar
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd ACL. Association for Computational Linguistics, Stroudsburg
Google Scholar
Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the HLT and EMNLP. Association for Computational Linguistics, Stroudsburg, pp 339–346
Google Scholar
Sabidussi G (1966) The centrality index of a graph. Psychometrika 31(4):581–603
Article MATH MathSciNet Google Scholar
Smith LM, Zhu L, Lerman K, Kozareva Z (2013) The role of social media in the discussion of controversial topics. In: SocialCom, pp 236–243
Google Scholar
Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of 5th ICLRE, Genoa, Italy pp 427–432
Google Scholar
Taskar B, Wong M, Abbeel P, Koller D (2004) Link prediction in relational data. In: NIPS. MIT Press, Cambridge
Google Scholar
Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, pp 308–316
Google Scholar
Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD. ACM, New York, pp 168–176
Google Scholar
Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st ACM SIGIR. ACM, New York, pp 299–306
Google Scholar
Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th CIKM. ACM, New York, pp 279–288
Google Scholar
Yu J, Zha ZJ, Wang M, Chua TS (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: ACL, The Association for Computer Linguistics, pp 1496–1505
Google Scholar
Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. In: SIGMOD Conference, pp 1531–1542
Google Scholar
Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. CoRR abs/1402.6010
Google Scholar
Zhu L, Gao S, Pan SJ, Li H, Deng D, Shahabi C (2013) Graph-based informative-sentence selection for opinion summarization. In: ASONAM, pp 408–412
Google Scholar
Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. In: Proceedings of the 15th CIKM. ACM, New York, pp 43–50
Google Scholar

Download references

Acknowledgments

This work is partially supported by DARPA under grant Number W911NF-12-1-0034.

Author information

Authors and Affiliations

Information Sciences Institute, Los Angeles, USA
Linhong Zhu
Institute for Infocomm Research, Singapore, Singapore
Sheng Gao, Sinno Jialin Pan & Haizhou Li
University of Southern California, Los Angeles, USA
Dingxiong Deng & Cyrus Shahabi

Authors

Linhong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Sinno Jialin Pan
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Dingxiong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Cyrus Shahabi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linhong Zhu .

Editor information

Editors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, Turkey
Özgür Ulusoy
Statistics and Computer Information Systems, Baruch College, CUNY, New York, New York, USA
Abdullah Uz Tansel
Department of Computer Engineering, Bilkent University, Ankara, Turkey
Erol Arkun

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, L., Gao, S., Pan, S.J., Li, H., Deng, D., Shahabi, C. (2015). The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection. In: Ulusoy, Ö., Tansel, A., Arkun, E. (eds) Recommendation and Search in Social Networks. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-14379-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-14379-8_9
Published: 13 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14378-1
Online ISBN: 978-3-319-14379-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics