Abstract
This study proposes a model for website classification using website content, and discusses applications for the Internet advertising (ad) strategies. Internet ad agencies have a vast amount of ad-spaces embedded in websites and have to choose which advertisements are feasible for place. Therefore, ad agencies have to know the properties and topics of each website to optimize advertising submission strategy. However, since website content is in natural languages, they have to convert these qualitative sentences into quantitative data if they want to classify websites using statistical models. To address this issue, this study applies statistical analysis to website information written in natural languages. We apply a dictionary of neologisms to decompose website sentences into words and create a data set of indicator matrices to classify the websites. From the data set, we estimate the topics of each website using latent Dirichlet allocation, which is fast and robust method for sparse matrices. Finally, we discuss how to apply the results obtained to optimize ad strategies.
Similar content being viewed by others
References
Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet recommendation systems. Journal of Marketing Research, 37(3), 363–375.
Ansari, A., & Mela, C. (2003). E-customization. Journal of Marketing Research, 40(2), 131–146.
Anderson, C. (2008). The long tail: why the future of business is selling less of more. Harlow: Hachette Books.
Dentsu Inc. (2016). 2015 advertising expenditures in Japan (Online). Available at http://www.dentsu.com/knowledgeanddata/ad_expenditures/pdf/expenditures_2015.pdf. Accessed 12 Aug 2016.
Stephen, A. T. (2016). The role of digital and social media marketing in consumer behavoir. Current Opinion in Psychology, 10, 17–21.
Lamberton, C., & Stephen, A. T. (2016). A thematic explosion of digital social media, and mobile marketing: research evolution from 2000 to 2015 and agenda for future inquiry. Journal of Marketing, 80, 146–172.
Berger, J. (2014). Word-of-mouth and interpersonal communication: a review and directions for future research. Journal of Consumer Psychology, 24(4), 586–607.
Yadav, M. S., & Pavlou, P. A. (2014). Marketing in computer-mediated environments: research synthesis and new directions. Journal of Marketing, 78, 20–40.
Tirunillai, S., & Tellis, G. J. (2012). Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Science, 31(2), 198–215.
Kumar, A., Bezawada, R., Rishika, R., Janakiraman, R., & Kannan, P. K. (2016). From social to sale: the effects of firm-generated content in social media on customer behavior. American Marketing Association, 80, 7–25.
Krippendorff, K. H. (2013). Content analysis: an introduction to its methodology. Beverly Hills: Sage Publications.
Humphreys, A. (2010). Megamarketing: the creation of markets as a social process. Journal of Marketing, 74(2), 1–19.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Tirunillai, S., & Tellis, G. J. (2014). Mining marketing from online chatter: strategic brand analysis of big data using latent Dirichlet allocation. Journal of Marketing Research, 51, 463–479.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. PNAS, 101(1), 5228–5235.
Sriwannawit, P., & Sandström, U. (2015). Large-scale bibliometric review of diffusion research. Scientometrics, 102, 1615–1645.
Agarwal, A., Hosanagar, K., & Smith, M. D. (2011). Location, location, location: an analysis of profitability of position in online advertising markets. Journal of Marketing Research, 48(6), 1057–1073.
Kinjo, K., & Ebina, T. (2016). An advertising strategy using consumption externality and forgetting in the case of Japanese electronic books. The Review of Socionetwork Strategies, 10(2), 55–71.
Kudo, T., Yamamoto, K., & Matsumoto, Y. (2004). Applying conditional random fields to Japanese morphological analysis. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 4, 230–237.
Sato, T. (2016). mecab-ipadic-NEologd: neologism dictionary for MeCab (Online). Available at https://github.com/neologd/mecab-ipadic-neologd. Accessed 12 Aug 2016.
Newton, M. A., & Raftery, A. E. (1994). Approximate bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56(1), 3–48.
Aoyama, Y., & Izushi, H. (2003). Hardware Gimmick or Cultural Innovation? Technological, cultural, and social foundations of the Japanese video game industry. Research Policy, 32, 423–444.
Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. ICML ‘06 Proceedings of the 23rd International Conference on Machine Learning, pp. 577–584.
Acknowledgements
The authors would like to thank Kazuki Oomori and members of F@N Communications Information and Science Technology Department, and anonymous reviewers for helpful comments and suggestions. This work was supported by JSPS KAKENHI Grant Number 17H02573.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Katsumata, S., Motohashi, E., Nishimoto, A. et al. The Contents-Based Website Classification for the Internet Advertising Planning: An Empirical Application of the Natural Language Analysis. Rev Socionetwork Strat 11, 129–142 (2017). https://doi.org/10.1007/s12626-017-0007-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-017-0007-0