A Hybrid and Adaptive Approach for Classification of Indian Stock Market-Related Tweets

Malakar, Sourav; Goswami, Saptarsi; Chakrabarti, Amlan; Chakraborty, Basabi

doi:10.1007/978-981-13-9364-8_24

Sourav Malakar¹⁷,
Saptarsi Goswami¹⁷,
Amlan Chakrabarti¹⁷ &
…
Basabi Chakraborty¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1016))

1237 Accesses

Abstract

Twitter generates an enormous amount of data daily. Various studies over the years have concluded that tweets have a significant impact in predicting and understanding the stock price movement. Designing a system to store relevant tweets and extracting information for specific stocks and industry is a relevant and unattempted problem for Indian stock market, which is the eighth largest in terms of market capitalization. As people with diverse backgrounds are tweeting about many topics simultaneously, it is nontrivial to identify tweets which are relevant for the stock market. Therefore, a critical component of the aforesaid system should contain one module for the extraction and storage of the tweets and another module for text classification. In the current study, we have proposed a hybrid approach for text classification which combines lexicon-based and machine learning-based techniques. The proposed scheme handles class imbalance problems effectively and has an adaptive characteristic, where it automatically grows the lexicon both through WordNet and by using a machine learning techniques. This system achieves F1-score over 98% of the relevant class, as compared to 60% achieved using the baseline method over a corpus of 10,000 tweets. The coverage of tweets by lexicons also improves by 8%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Natalie Hockham makes this point in her talk Machine learning with imbalanced data sets, which focuses on imbalance in the context of credit card fraud detection.

References

Liu, H., et al. (2016). The good, the bad, and the ugly: Uncovering novel research opportunities in social media mining. International Journal of Data Science and Analytics, 1(3–4), 137–143.
Article Google Scholar
Ediger, D., Jiang, K., Riedy, J., Bader, D.A., & Corley, C. (2010, September). Massive social network analysis: Mining Twitter for social good. In 2010 39th International Conference on Parallel Processing (ICPP) (pp. 583–593). IEEE.
Google Scholar
Ashktorab, Z., Brown, C., Nandi, M., & Culotta, A. (2014, May). Tweedr: Mining Twitter to inform disaster response. In ISCRAM.
Google Scholar
Abboute, A., Boudjeriou, Y., Entringer, G., Az, J., Bringay, S., & Poncelet, P. (2014, June). Mining Twitter for suicide prevention. In International Conference on Applications of Natural Language to Data Bases/Information Systems (pp. 250–253). Cham: Springer.
Google Scholar
Goswami, S., Chakraborty, S., Ghosh, S., Chakrabarti, A., & Chakraborty, B. (2016). A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal, 9(3), 362–378.
Google Scholar
Jain, V. K., & Kumar, S. (2017). Effective surveillance and predictive mapping of mosquito-borne diseases using social media. Journal of Computational Science, 25, 406–415.
Article Google Scholar
Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with Applications, 40(16), 6266–6282.
Article Google Scholar
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.
Article Google Scholar
Rao, T., & Srivastava, S. (2012, August). Analyzing stock market movements using Twitter sentiment analysis. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) (pp. 119–123). IEEE Computer Society.
Google Scholar
Zhang, X., Shi, J., Wang, D., & Fang, B. (2017). Exploiting investors social network for stock prediction in Chinas market. Journal of Computational Science, 28, 294–303.
Article Google Scholar
Ruan, Y., Durresi, A., & Alfantoukh, L. (2018). Using Twitter trust network for stock market analysis. Knowledge-Based Systems, 1(145), 207–218.
Article Google Scholar
Nisar, T. M., & Yeung, M. (2018). Twitter as a tool for forecasting stock market movements: A short-window event study. The Journal of Finance and Data Science, 4(2), 101–119.
Article Google Scholar
Rajput, H. (2014). Social media and politics in India: A study on Twitter usage among Indian Political Leaders. Asian Journal of Multidisciplinary Studies, 2(1), 63–69.
Google Scholar
Khan, A. Z., Atique, M., & Thakare, V. M. (2015). Combining lexicon-based and learning-based methods for Twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science and Engineering (IJECSCSE), 89.
Google Scholar
Mudinas, A., Zhang, D., & Levene, M. (2012, August). Combining lexicon and learning based approaches for concept-level sentiment analysis. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 5). ACM.
Google Scholar
Christiane, F. (Ed.). (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
MATH Google Scholar
Rothwell, A. C., Jagger, L. D., Dennis, W. R., & Clarke, D. R. (2004). Networks Associates Technology Inc, 2004. Intelligent SPAM detection system using an updateable neural analysis engine. U.S. Patent 6,769,016.
Google Scholar
Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.
Article Google Scholar
Kumar, M., & Rangan, V. (2011). Clearwell Systems Inc, 2011. Methods and systems for e-mail topic classification. U.S. Patent 7,899,871.
Google Scholar
Veningston, K., Shanmugalakshmi, R., & Nirmala, V. (2015). Semantic association ranking schemes for information retrieval applications using term association graph representation. Sadhana, 40(6), 1793–1819.
Article MathSciNet Google Scholar
Rani, P., Pudi, V., & Sharma, D. M. (2016). A semi-supervised associative classification method for POS tagging. International Journal of Data Science and Analytics, 1(2), 123–136.
Article Google Scholar
Lpez, V., et al. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.
Article Google Scholar
Melville, P., Gryc, W., & Lawrence, R. D. (2009, June). Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1275–1284). ACM.
Google Scholar
Yenala, H., et al. (2017). Deep learning for detecting inappropriate content in text. International Journal of Data Science and Analytics, 6(4), 273–286.
Article Google Scholar
Lu, B., & Tsou, B. K. (2010, July). Combining a large sentiment lexicon and machine learning for subjectivity classification. In 2010 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 6, pp. 3311–3316). IEEE.
Google Scholar
Zhao, S., et al. (2016). Correlating Twitter with the stock market through non-Gaussian SVAR. In 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI). IEEE.
Google Scholar
Pagolu, V. S., et al. (2016). Sentiment analysis of Twitter data for predicting stock market movements. In 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES). IEEE.
Google Scholar
Oliveira, N., Paulo C., & Nelson, A. (2013). Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics. ACM.
Google Scholar
Leitch, D., & Sherif, M. (2017). Twitter mood, CEO succession announcements and stock returns. Journal of Computational Science, 21, 1–10.
Article Google Scholar
Chung, S., & Sandy, L. (2011). Predicting stock market fluctuations from Twitter. Berkeley, California.
Google Scholar
Mao, Y., Wei, W., & Bing, W. (2013). Twitter volume spikes: analysis and application in stock trading. In Proceedings of the 7th Workshop on Social Network Mining and Analysis. ACM.
Google Scholar
Simsek, M. U., & Suat, Z. (2012). Analysis of the relation between Turkish Twitter messages and stock market index. In 2012 6th International Conference on Application of Information and Communication Technologies (AICT). IEEE.
Google Scholar
Smailovi, J., et al. (2013). Predictive sentiment analysis of tweets: A stock market application. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77–88). Berlin, Heidelberg: Springer.
Google Scholar
R Core Team. (2017). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.
Fellbaum, C. (1998). WordNet: An electronic lexical database. Bradford Books.
Google Scholar
Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1–54.
Article Google Scholar
Rinker, T. W. (2018). Textstem: Tools for stemming and lemmatizing text version 0.1.4. New York: Buffalo.
Google Scholar
Faruqui, M., et al. (2016). Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276.
Torgo, L. (2010). Data mining with R, learning with case studies. Boca Rotan: Chapman and Hall/CRC.
Book Google Scholar
R Development Core Team. (2008). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0.
Google Scholar
Kuhn, M. (2018). Caret: classification and regression training. Contributions from Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., The R Core Team, Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., & Tyler Hunt. In R Package Version 6.0-79.
Google Scholar
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77.
Article Google Scholar

Download references

Author information

Authors and Affiliations

A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, India
Sourav Malakar, Saptarsi Goswami & Amlan Chakrabarti
Faculty of Software and Information Science, Iwate Prefectural University, Takizawa, Japan
Basabi Chakraborty

Authors

Sourav Malakar
View author publications
You can also search for this author in PubMed Google Scholar
Saptarsi Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Amlan Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Basabi Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sourav Malakar .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malakar, S., Goswami, S., Chakrabarti, A., Chakraborty, B. (2020). A Hybrid and Adaptive Approach for Classification of Indian Stock Market-Related Tweets. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-13-9364-8_24

Download citation

DOI: https://doi.org/10.1007/978-981-13-9364-8_24
Published: 25 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9363-1
Online ISBN: 978-981-13-9364-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics