Skip to main content

The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection

  • Chapter
  • First Online:
Recommendation and Search in Social Networks

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

Most previous works on opinion summarization focus on summarizing sentiment polarity distribution toward different aspects of an entity (e.g., battery life and screen of a mobile phone). However, users’ demand may be more beyond this kind of opinion summarization. Besides such coarse-grained summarization on aspects, one may prefer to read detailed but concise text of the opinion data for more information. In this paper, we propose a new framework for opinion summarization. Our goal is to assist users to get helpful opinion suggestions from reviews by only reading a short summary with a few informative sentences, where the quality of summary is evaluated in terms of both aspect coverage and viewpoints preservation. More specifically, we formulate the informative sentence selection problem in opinion summarization as a community leader detection problem, where a community consists of a cluster of sentences toward the same aspect of an entity and leaders can be considered as the most informative sentences of the corresponding aspect. We develop two effective algorithms to identify communities and leaders. Reviews of six products from Amazon.com are used to verify the effectiveness of our method for opinion summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tripadvisor.com/.

  2. 2.

    Note that \(|\mathcal {N}_{k}(s)|\) can be larger than \(k\) since there could be the event of ties (i.e., a set of neighbors have the same similarity to \(s\)).

  3. 3.

    Available at http://sites.google.com/site/linhongi2r/data-and-code.

  4. 4.

    A longer summary is more likely to provide better information but is less concise.

  5. 5.

    ROUGE-N is a popular toolkit which measures the quality of a summary by comparing it to other reference summaries using \(n\)-gram co-occurrence.

References

  1. Ageev AA, Sviridenko M (1999) Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: Proceedings of the 7th international conference on integer programming and combinatorial optimization, Springer, London, pp 17–30

    Google Scholar 

  2. Beineke P, Hastie T, Manning C, Vaithyanathan S (2004) Exploring sentiment summarization. In: AAAI spring symposium on exploring attitude and affect in text: theories and applications

    Google Scholar 

  3. Blair-goldensohn S, Neylon T, Hannan K, Reis GA, Mcdonald R, Reynar J (2008) Building a sentiment summarizer for local service reviews. In: NLP in the information explosion era

    Google Scholar 

  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Bookstein A (1990) Informetric distributions, part i: unified overview. J Am Soc Inf Sci 41(5):368–375

    Article  MathSciNet  Google Scholar 

  6. Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the SIGMOD. ACM, New York, pp 447–458

    Google Scholar 

  7. Danescu-Niculescu-Mizil C, Kossinets G, Kleinberg JM, Lee L (2009) How opinions are received by online communities: a case study on amazon.com helpfulness votes. In: Proceedings of the 18th WWW, ACM, New York, pp 141–150

    Google Scholar 

  8. Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: Proceedings of EMNLP, Barcelona, Spain

    Google Scholar 

  9. Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: COLING, pp 322–330

    Google Scholar 

  10. Freeman LC (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239

    Article  Google Scholar 

  11. Ganesan K, Zhai C, Han J (2010) Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd COLING

    Google Scholar 

  12. Heerschop B, Goossen F, Hogenboom A, Frasincar F, Kaymak U, de Jong F (2011) Polarity analysis of texts using discourse structure. In: Proceedings of the 20th CIKM. ACM, New York, pp 1061–1070

    Google Scholar 

  13. Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A 102(46):16569–16572

    Article  Google Scholar 

  14. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence, pp 289–296

    Google Scholar 

  15. Hu B, Song Z, Ester M (2012) User features and social networks for topic modeling in online social media. In: ASONAM, pp 202–209

    Google Scholar 

  16. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD. ACM, New York, pp 168–177

    Google Scholar 

  17. Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: COLING (Posters), pp 525–533

    Google Scholar 

  18. Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70:39–45

    Article  MATH  MathSciNet  Google Scholar 

  19. Kim HD, Ganesan K, Sondhi P, Zhai C (2011) Comprehensive review of opinion summarization

    Google Scholar 

  20. Kim SM, Pantel P, Chklovski T, Pennacchiotti M (2006) Automatically assessing review helpfulness. In: Proceedings of EMNLP. Association for Computational Linguistics, Stroudsburg, pp 423–430

    Chapter  Google Scholar 

  21. Lerman K, Blair-Goldensohn S, McDonald R (2009) Sentiment summarization: evaluating and learning user preferences. In: Proceedings of the 12th EACL. ACL, Stroudsburg, pp 514–522

    Chapter  Google Scholar 

  22. Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI, pp 2488–2493

    Google Scholar 

  23. Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th CIKM. ACM, New York, pp 939–948

    Google Scholar 

  24. Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the NAACL. ACL, Stroudsburg, pp 71–78

    Google Scholar 

  25. Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th HLT/ACL. ACL, Stroudsburg, pp 510–520

    Google Scholar 

  26. Liu J, Cao Y, Lin CY, Huang Y, Zhou M (2007) Low-Quality product review detection in opinion summarization. In: Proceedings of the joint conference on EMNLP-CoNLL, pp 334–342

    Google Scholar 

  27. Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th ICML. AAAI Press, Chicago, pp 496–503

    Google Scholar 

  28. Lu Y, Zhai C, Sundaresan N (2009) Rated aspect summarization of short comments. In: Proceedings of the 18th WWW. ACM, New York, pp 131–140

    Google Scholar 

  29. Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th WWW. ACM, New York, pp 171–180

    Google Scholar 

  30. Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: EMNLP, pp 404–411

    Google Scholar 

  31. Muthukrishnan P, Gerrish J, Radev DR (2008) Detecting multiple facets of an event using graph-based unsupervised methods. In: COLING, pp 609–616

    Google Scholar 

  32. Newman MEJ (2007) The mathematics of networks. The new palgrave encyclopedia of economics pp 1–12

    Google Scholar 

  33. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd ACL. Association for Computational Linguistics, Stroudsburg

    Google Scholar 

  34. Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the HLT and EMNLP. Association for Computational Linguistics, Stroudsburg, pp 339–346

    Google Scholar 

  35. Sabidussi G (1966) The centrality index of a graph. Psychometrika 31(4):581–603

    Article  MATH  MathSciNet  Google Scholar 

  36. Smith LM, Zhu L, Lerman K, Kozareva Z (2013) The role of social media in the discussion of controversial topics. In: SocialCom, pp 236–243

    Google Scholar 

  37. Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of 5th ICLRE, Genoa, Italy pp 427–432

    Google Scholar 

  38. Taskar B, Wong M, Abbeel P, Koller D (2004) Link prediction in relational data. In: NIPS. MIT Press, Cambridge

    Google Scholar 

  39. Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, pp 308–316

    Google Scholar 

  40. Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD. ACM, New York, pp 168–176

    Google Scholar 

  41. Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st ACM SIGIR. ACM, New York, pp 299–306

    Google Scholar 

  42. Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th CIKM. ACM, New York, pp 279–288

    Google Scholar 

  43. Yu J, Zha ZJ, Wang M, Chua TS (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: ACL, The Association for Computer Linguistics, pp 1496–1505

    Google Scholar 

  44. Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. In: SIGMOD Conference, pp 1531–1542

    Google Scholar 

  45. Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. CoRR abs/1402.6010

    Google Scholar 

  46. Zhu L, Gao S, Pan SJ, Li H, Deng D, Shahabi C (2013) Graph-based informative-sentence selection for opinion summarization. In: ASONAM, pp 408–412

    Google Scholar 

  47. Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. In: Proceedings of the 15th CIKM. ACM, New York, pp 43–50

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by DARPA under grant Number W911NF-12-1-0034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linhong Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zhu, L., Gao, S., Pan, S.J., Li, H., Deng, D., Shahabi, C. (2015). The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection. In: Ulusoy, Ă–., Tansel, A., Arkun, E. (eds) Recommendation and Search in Social Networks. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-14379-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14379-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14378-1

  • Online ISBN: 978-3-319-14379-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics