Trend Extraction Method using Co-occurrence Patterns from Tweets

Keywords: Twitter Analysis, Natural Language Processing, Topic Extraction

Abstract

We can feel free to post the information such as personal events using Twitter one of the popular micro-blogging service. However, the collection of information is limited by the human power only, therefore, the method of collecting trends automatically is important. Existing web services focus on the number of tweets for getting trends. However, a time lag was occurred for extracting the trends. In this paper, we propose the trend extraction system for twitter in real time by paying attention to the co-occurrence patterns. Our system can learn the new key patterns at the same time not only using the picked up trend biterms, previously. Furthermore, we evaluate the efficiency of the proposed method of extracting the trends from twitter by the comparative experiments. We demonstrate that our proposed method can extract accurately and widely without time-lags compared with the existing service (Realtime Yahoo Search).

References

Twitter. [Online]. Available: https://twitter.com

T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: Real-time event detection by social sensors,” in Proceedings of the 19th International Conference on World Wide Web, ser. WWW ’10. New York, NY, USA: ACM, 2010, pp. 851–860.

K. W. Lim, C. Chen, and W. Buntine, “Twitter-network topic model: A full bayesian treatment for social network and text modeling,” in Proceedings of the NIPS 2013 Topics Model: Computation, Application, and Evaluation., 2013.

W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang, “Topicsketch: Real-time bursty topic detection from twitter,” 2013 IEEE 13th International Conference on Data Mining, vol. 0, pp. 837–846, 2013.

J. Kleinberg, “Bursty and hierarchical structure in streams,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002, pp. 91–101.

A. Ihler, J. Hutchins, and P. Smyth, “Adaptive event detection with time-varying poisson processes,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. ACM, 2006, pp. 207–216.

Gunosy. [Online]. Available: http://gunosy.co.jp/service/

Y. Pan, J. Yin, S. Liu, and J. Li, “A biterm-based dirichlet process topic model for short texts,” Proceedings of 3rd International Conference on Computer Science and Service System (CSSS 2014), 2014.

J. Xu, P. Liu, G. Wu, Z. Sun, B. Xu, and H. Hao, “A fast matching method based on semantic similarity for short texts,” in Natural Language Processing and Chinese Computing. Springer, 2013, pp. 299–309.

X. Yan, J. Guo, Y. Lan, and X. Cheng, “A biterm topic model for short texts,” in Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2013, pp. 1445–1456.

Cakephp. [Online]. Available: http://api.cakephp.org/3.0/

Twitter api (twitter developers). [Online]. Available: https://dev.twitter.com/overview/documentation

Ng words list: Nico nico pedia. [Online]. Available: http://dic.nicovideo.jp/a/%E3%83%8B%E3%82%B3%E3%83%8B%E3%82%B3%E7%94%9F%E6%94%BE%E9%80%81%3A%E9%81%8B%E5%96%B6ng%E3%83%AF%E3%83%BC%E3%83%89%E4%B8%80%E8%A6%A7

Mecab. [Online]. Available: http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html

Wikipedia. [Online]. Available: http://wikipedia.org/

Hatena keyword’s list - hatena developer center. [Online]. Available: http://developer.hatena.ne.jp/ja/documents/keyword/misc/catalog

Realtime yahoo search. [Online]. Available: http://searchranking.yahoo.co.jp/realtime_buzz/

Php simple html dom parser. [Online]. Available: http://simplehtmldom.sourceforge.net

A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2002, pp. 253–260.

Published
2016-12-31
Section
Technical Papers (Advanced Applied Informatics)