Skip to main content

A Hybrid and Adaptive Approach for Classification of Indian Stock Market-Related Tweets

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1016))

  • 1237 Accesses

Abstract

Twitter generates an enormous amount of data daily. Various studies over the years have concluded that tweets have a significant impact in predicting and understanding the stock price movement. Designing a system to store relevant tweets and extracting information for specific stocks and industry is a relevant and unattempted problem for Indian stock market, which is the eighth largest in terms of market capitalization. As people with diverse backgrounds are tweeting about many topics simultaneously, it is nontrivial to identify tweets which are relevant for the stock market. Therefore, a critical component of the aforesaid system should contain one module for the extraction and storage of the tweets and another module for text classification. In the current study, we have proposed a hybrid approach for text classification which combines lexicon-based and machine learning-based techniques. The proposed scheme handles class imbalance problems effectively and has an adaptive characteristic, where it automatically grows the lexicon both through WordNet and by using a machine learning techniques. This system achieves F1-score over 98% of the relevant class, as compared to 60% achieved using the baseline method over a corpus of 10,000 tweets. The coverage of tweets by lexicons also improves by 8%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Natalie Hockham makes this point in her talk Machine learning with imbalanced data sets, which focuses on imbalance in the context of credit card fraud detection.

References

  1. Liu, H., et al. (2016). The good, the bad, and the ugly: Uncovering novel research opportunities in social media mining. International Journal of Data Science and Analytics, 1(3–4), 137–143.

    Article  Google Scholar 

  2. Ediger, D., Jiang, K., Riedy, J., Bader, D.A., & Corley, C. (2010, September). Massive social network analysis: Mining Twitter for social good. In 2010 39th International Conference on Parallel Processing (ICPP) (pp. 583–593). IEEE.

    Google Scholar 

  3. Ashktorab, Z., Brown, C., Nandi, M., & Culotta, A. (2014, May). Tweedr: Mining Twitter to inform disaster response. In ISCRAM.

    Google Scholar 

  4. Abboute, A., Boudjeriou, Y., Entringer, G., Az, J., Bringay, S., & Poncelet, P. (2014, June). Mining Twitter for suicide prevention. In International Conference on Applications of Natural Language to Data Bases/Information Systems (pp. 250–253). Cham: Springer.

    Google Scholar 

  5. Goswami, S., Chakraborty, S., Ghosh, S., Chakrabarti, A., & Chakraborty, B. (2016). A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal, 9(3), 362–378.

    Google Scholar 

  6. Jain, V. K., & Kumar, S. (2017). Effective surveillance and predictive mapping of mosquito-borne diseases using social media. Journal of Computational Science, 25, 406–415.

    Article  Google Scholar 

  7. Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with Applications, 40(16), 6266–6282.

    Article  Google Scholar 

  8. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

    Article  Google Scholar 

  9. Rao, T., & Srivastava, S. (2012, August). Analyzing stock market movements using Twitter sentiment analysis. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) (pp. 119–123). IEEE Computer Society.

    Google Scholar 

  10. Zhang, X., Shi, J., Wang, D., & Fang, B. (2017). Exploiting investors social network for stock prediction in Chinas market. Journal of Computational Science, 28, 294–303.

    Article  Google Scholar 

  11. Ruan, Y., Durresi, A., & Alfantoukh, L. (2018). Using Twitter trust network for stock market analysis. Knowledge-Based Systems, 1(145), 207–218.

    Article  Google Scholar 

  12. Nisar, T. M., & Yeung, M. (2018). Twitter as a tool for forecasting stock market movements: A short-window event study. The Journal of Finance and Data Science, 4(2), 101–119.

    Article  Google Scholar 

  13. Rajput, H. (2014). Social media and politics in India: A study on Twitter usage among Indian Political Leaders. Asian Journal of Multidisciplinary Studies, 2(1), 63–69.

    Google Scholar 

  14. Khan, A. Z., Atique, M., & Thakare, V. M. (2015). Combining lexicon-based and learning-based methods for Twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science and Engineering (IJECSCSE), 89.

    Google Scholar 

  15. Mudinas, A., Zhang, D., & Levene, M. (2012, August). Combining lexicon and learning based approaches for concept-level sentiment analysis. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 5). ACM.

    Google Scholar 

  16. Christiane, F. (Ed.). (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  17. Rothwell, A. C., Jagger, L. D., Dennis, W. R., & Clarke, D. R. (2004). Networks Associates Technology Inc, 2004. Intelligent SPAM detection system using an updateable neural analysis engine. U.S. Patent 6,769,016.

    Google Scholar 

  18. Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.

    Article  Google Scholar 

  19. Kumar, M., & Rangan, V. (2011). Clearwell Systems Inc, 2011. Methods and systems for e-mail topic classification. U.S. Patent 7,899,871.

    Google Scholar 

  20. Veningston, K., Shanmugalakshmi, R., & Nirmala, V. (2015). Semantic association ranking schemes for information retrieval applications using term association graph representation. Sadhana, 40(6), 1793–1819.

    Article  MathSciNet  Google Scholar 

  21. Rani, P., Pudi, V., & Sharma, D. M. (2016). A semi-supervised associative classification method for POS tagging. International Journal of Data Science and Analytics, 1(2), 123–136.

    Article  Google Scholar 

  22. Lpez, V., et al. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.

    Article  Google Scholar 

  23. Melville, P., Gryc, W., & Lawrence, R. D. (2009, June). Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1275–1284). ACM.

    Google Scholar 

  24. Yenala, H., et al. (2017). Deep learning for detecting inappropriate content in text. International Journal of Data Science and Analytics, 6(4), 273–286.

    Article  Google Scholar 

  25. Lu, B., & Tsou, B. K. (2010, July). Combining a large sentiment lexicon and machine learning for subjectivity classification. In 2010 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 6, pp. 3311–3316). IEEE.

    Google Scholar 

  26. Zhao, S., et al. (2016). Correlating Twitter with the stock market through non-Gaussian SVAR. In 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI). IEEE.

    Google Scholar 

  27. Pagolu, V. S., et al. (2016). Sentiment analysis of Twitter data for predicting stock market movements. In 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES). IEEE.

    Google Scholar 

  28. Oliveira, N., Paulo C., & Nelson, A. (2013). Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics. ACM.

    Google Scholar 

  29. Leitch, D., & Sherif, M. (2017). Twitter mood, CEO succession announcements and stock returns. Journal of Computational Science, 21, 1–10.

    Article  Google Scholar 

  30. Chung, S., & Sandy, L. (2011). Predicting stock market fluctuations from Twitter. Berkeley, California.

    Google Scholar 

  31. Mao, Y., Wei, W., & Bing, W. (2013). Twitter volume spikes: analysis and application in stock trading. In Proceedings of the 7th Workshop on Social Network Mining and Analysis. ACM.

    Google Scholar 

  32. Simsek, M. U., & Suat, Z. (2012). Analysis of the relation between Turkish Twitter messages and stock market index. In 2012 6th International Conference on Application of Information and Communication Technologies (AICT). IEEE.

    Google Scholar 

  33. Smailovi, J., et al. (2013). Predictive sentiment analysis of tweets: A stock market application. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77–88). Berlin, Heidelberg: Springer.

    Google Scholar 

  34. R Core Team. (2017). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.

  35. Fellbaum, C. (1998). WordNet: An electronic lexical database. Bradford Books.

    Google Scholar 

  36. Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1–54.

    Article  Google Scholar 

  37. Rinker, T. W. (2018). Textstem: Tools for stemming and lemmatizing text version 0.1.4. New York: Buffalo.

    Google Scholar 

  38. Faruqui, M., et al. (2016). Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276.

  39. Torgo, L. (2010). Data mining with R, learning with case studies. Boca Rotan: Chapman and Hall/CRC.

    Book  Google Scholar 

  40. R Development Core Team. (2008). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0.

    Google Scholar 

  41. Kuhn, M. (2018). Caret: classification and regression training. Contributions from Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., The R Core Team, Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., & Tyler Hunt. In R Package Version 6.0-79.

    Google Scholar 

  42. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sourav Malakar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malakar, S., Goswami, S., Chakrabarti, A., Chakraborty, B. (2020). A Hybrid and Adaptive Approach for Classification of Indian Stock Market-Related Tweets. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-13-9364-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9364-8_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9363-1

  • Online ISBN: 978-981-13-9364-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics