Data Mining Framework for Treating both Numerical and Text Data

  • Wataru Sunayama The university of shiga prefecture
  • Tomoya Matsumoto The University of Shiga Prefecture
  • Yuji Hatanaka The University of Shiga Prefecture
  • Kazunori Ogohara The University of Shiga Prefecture
Keywords: text mining, data mining, data analysis support, TETDM

Abstract

In recent years, data mining and text mining techniques have been frequently used for analyzing data. Electronic data is collected in everywhere and many products and services are widely used in our daily lives. Data mining techniques such as association analysis and cluster analysis are used for marketing analysis, because those can discover relationships and rules hiding in enormous numerical data. On the other hand, text mining techniques such as keywords extraction and opinion extraction are used for questionnaire or review text analysis, because those can support us to investigate consumers’ opinion in text data. However, data mining tools and text mining tools cannot be used in a single environment. Therefore, a data which has both numerical and text data is not well analyzed because the numerical part and the text part cannot be connected for interpretation. Goal of the data analysis is knowledge emergence that we find or create a new knowledge for decision making.
In this paper, a mining framework that can treat both numerical and text data is proposed. Users of the proposed system can iterate data shrink and data analysis with both numerical and text analysis tools in a unique framework. Based on the experimental results, the proposed system was effectively used to data analysis for review texts of humidifiers and fan heaters. We verified that balanced use of numerical and text analysis leads to good ideas and the users should be conscious to use both type of tools and both type of data shrink.

References

Pekka Paakkonen and Daniel Pakkala: Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems, Big Data Research, Vol.2, No.4, pp.166 – 186 (2015)

Savi Gupta and Roopal Mamtora: A Survey on Association Rule Mining in Market Basket Analysis, International Journal of Information and Computation Technology, Vol.4, No.4, pp.409 – 414 (2014)

Pavel Turcinek and jana Turcinkova: Exploring Consumer Behavior: Use of Association Rules, Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Vol.63, No.3, pp.1031 – 1042 (2015)

Hearst, M.A.: Untangling text mining, Proc Annual Meeting of the Association for Computational Linguistics ACL99, (1999)

Masao Kakihara and Carsten Sorensen: Exploring Knowledge Emergence: From Chaos to Organizational Knowledge, Journal of Global Information Technology Management, Vol.5, No.3, pp.48 – 66 (2002)

Wataru Sunayama: Knowledge Emergence using Total Environment for Text Data Mining, In Proceedings of the Joint 7th International Conference on Soft Computing and Intelligent Systems and 15th International Symposium on Advanced Intelligent Systems (SCIS & ISIS2014), Kitakyushu, TP6-2-7-(3), (2014)

Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth: From Data Mining to Knowledge Discovery in Databases, AI Magazine, Vol.17, No.3, pp.37–54 (1996)

Ronald J. Brachman, Tom Khabaza, Willi Kloesgen, Gregory Piatetsky-Shapiro, and Evangelos Simoudis: Mining Business Databases, Communications of the ACM, Vol.39, Nol.11, pp.42 – 48 (1996)

Amruta Kulkarni, Jyoti Nighot, Ashish Ramdasi: Text Mining Methodology to Build Dependency Matrix from Unstructured Text to Perform Fault Diagnosis, Proceedings of The First International Conference on Smart Trends in Information Technology and Computer Communications, pp.534–540 (2016)

Wataru Sunayama and Masahiko Yachida: Panoramic View System for Extracting Key Sentences Based on Viewpoints and an Application to a Search Engine, Journal of Network and Computer Applications, Elsevier Science, Netherlands, Vol.28, No.2, pp.115–127 (2005)

Wataru Sunayama, Shuhei Hamaoka and Kiyoshi Okuda: Map Interface for a Text Data Set by Recursive Clustering, In Workshop Proceedings of The 6th International Workshop on Chance Discovery(IWCD6), held with the Twenty-second International Joint Conference on Artificial Intelligence(IJCAI2011), pp.63–68 (2011)

S. Kimani, S. Lodi, T. Catarci, G. Santucci and C. Sar- tori: VidaMine:A Visual Data Mining Environment, Journal of Visual Languages and Computing, Vol.15, No.1, pp.37–67 (2004)

Ferrucci, D. and Lally, A. : UIMA: an architectural approach to unstructured information processing in the corporate research environment, Natural Language Engineering, Vol.10, No.3-4, pp.327–348 (2004)

Yogapreethi.N, Maheswari.S: A Review On Text Mining in Data Mining, International Journal on Soft Computing (IJSC), Vol.7, No.2, pp.1–8 (2016)

Scott M. Lundberg and Su-In Lee: A Unified Approach to Interpreting Model Predictions, Proceedings of Neural Information Processing Systems 30 (NIPS 2017), (2017)

Raymond A. Yeh, Jinjun Xiong †, Wen-mei W. Hwu, Minh N. Do, and Alexander G. Schwing, Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts, Proceedings of Neural Information Processing Systems 30 (NIPS 2017), (2017)

Wu He: Examining students’ online interaction in a live video streaming environment using data mining and text mining, Computers in Human Behavior, Vol.29, No.1, pp.90–102 (2013)

Yu Zhou, Yanxiang Tong, Ruihang Gu and Harald Gall: Combining text mining and data mining for bug report classification, Journal of Software, Evolusion and Process, Vol.28, No.3, pp.150–176 (2016)

Christopher Bull, Dommy Asfiandy, Ann Gledson, Joseph Mellor, Samuel Couth, Gemma Stringer, Paul Rayson, Alistair Sutcliffe, John Keane, Xiaojun Zeng, Alistair Burns, Iracema Leroi, Clive Ballard, Pete Sawyer: Combining data mining and text mining for detection of early stage dementia: the SAMS framework, Proceedings of LREC 2016 Workshop, Resources and Processing of Linguistic and Extra-Linguistic Data from People with Various Forms of Cognitive/Psychiatric Impairments (RaPID2016), pp. 35 - 40 (2016)

Published
2018-06-30
Section
Technical Papers