Semi-Automatic Category Estimation and Data Augmentation for Opinion Extraction of Product Components

  • Shogo Anda Nagoya Institute of Technology
  • Masato Kikuchi Nagoya Institute of Technology
  • Tadachika Ozono Nagoya Institute of Technology
Keywords: information extraction, data augmentation, aspect based sentiment analysis


When customers purchase a product online, they use reviews to gather information about that product to help them make a purchase decision. Aspect-based Sentiment Analysis is a task that analyzes the review content from various perspectives, including the product itself, its components, and its retail outlets. We focus on comparing the characteristics of each component in a product with those of other products at the time of purchase. We define a task called component-based sentiment analysis (CBSA), which analyzes the review content from the perspective of only each component in the product. The CBSA task consists of opinion target extraction and polarity analysis. We approach that task with a classifier. We describe a semi-automatic category determination method for creating classification labels for CBSA and a data augmentation method to improve its classification performance. In experiments, we show that our category determination method can generate categories that cover 95% of the existing categories on e-commerce sites and that our data augmentation method improves the macro-F1-measure for uncommon opinions by 10%.


M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. ALSmadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq, V. Hoste, M. Apidianaki, X. Tannier, N. Loukachevitch, E. Kotelnikov, N. Bel, S. M. Jiménez-Zafra, and G. Eryiğit, “SemEval-2016 Task 5: Aspect Based Sentiment Analysis,” SemEval 2016, pp.19–30, 2016.

B. Liu, “Sentiment Analysis: Mining Opinions, Sentiments, and Emotions,” Studies in Natural Language Processing, 2020. [3] H. Isahara, F. Bond, K. Uchimoto, M. Utiyama, and K. Kanzaki, “Development of the Japanese WordNet,” In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 2420–2423, 2008.

S. Ramezani, R. Rahimi, and J. Allan, “Aspect Category Detection in Product Reviews using Contextual Representation,” In Proceedings of the 2020 ACM SIGIR Workshop on eCommerce (SIGIR eCom’20), 2020.

P. Jeyanthi, R. Subhashini, and B. Bhamare, “Aspect Category Extraction for Sentiment Analysis using Multivariate Filter Method of Feature Selection,” International Journal of Recent Technology and Engineering, vol. 8, 2021.

Z. Chen and T. Qian, “Enhancing Aspect Term Extraction with Soft Prototypes,” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 2107–2117, 2020.

R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,” In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), pp. 4444–4451, 2017.

Z. Zheng, Y. Cai, and L. Li, “Enhance Weakly-Supervised Aspect Detection with External Knowledge (Student Abstract),” In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022), vol. 36, no. 11, pp. 13 119–13 120, 2022.

Y. Zeng, G. Wang, H. Ren, and Y. Cai, “Enhance Cross-Domain Aspect-Based Sentiment Analysis by Incorporating Commonsense Relational Structure (Student Abstract),” In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022), vol. 36, no. 11, pp. 13 105–13 106, 2022.

J. Cao, R. Liu, H. Peng, L. Jiang, and X. Bai, “Aspect Is Not You Need: No-Aspect Differential Sentiment Framework for Aspect-Based Sentiment Analysis,” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2022), pp. 1599–1609, 2022.

P. Sircar, A. Chakrabarti, D. Gupta, and A. Majumdar, “Distantly Supervised Aspect Clustering and Naming for E-Commerce Reviews,” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track (NAACL-HLT 2022), pp. 94–102, 2022.

C. Sindhu, D. Mukherjee, and Sonakshi, “A Joint Sentiment-Topic Model for Product Review Analysis of Electronic Goods,” In Proceedings of the 5th International Conference on Computing Methodologies and Communication (ICCMC 2021), pp. 574–578, 2021.

M. Bilal and A. A. Almazroi, “Effectiveness of Fine-Tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews,” Electronic Commerce Research, pp. 1–21, 2022.

O. Montenegro, O. S. Pabón, and R. E. G. De Piñerez R., “A Deep Learning Approach for Negation Detection from Product Reviews Written in Spanish,” In Proceedings of the 47th Latin American Computing Conference (CLEI 2021), pp. 1–6, 2021.

Y. Kawazoe, D. Shibata, E. Shinohara, E. Aramaki, and K. Ohe, “A Clinical Specific BERT Developed Using a Huge Japanese Clinical Text Corpus,” Plos one, vol. 16, no. 11, p. e0259763, 2021.

J. Wei and K. Zou, “EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks,” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pp. 6382–6388, 2019.

S. Anda, M. Kikuchi, and T. Ozono, “Developing a Component Comment Extractor from Product Reviews on E-Commerce Sites,” In Proceedings of the 12th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI 2022), pp. 83–88, 2022.

V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics,” Information Sciences, vol. 250, pp. 113–141, 2013.

N. Kobayashi, K. Inui, and Y. Matsumoto, “Extracting Aspect-evaluation and Aspectof Relations in Opinion Mining,” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 1065–1074, 2007.

Technical Papers