Comparison of Different Sentiment Analysis Techniques on Different Domain Datasets
Author: Himani Varshney
Faculty Supervisor: Shahrukh Humayoun
Department: Computer Science
The purpose of this research is to compare the accuracy and performance metrics of different sentiment analysis techniques on different domains of datasets and to devise a visual analytic system which will auto detect the domain of the uploaded text and hence use the best sentiment technique. Sentiment analysis techniques used in this research varies from lexicon-based approaches to machine learning based approaches to deep learning-based approaches. Lexicon based approaches used are dictionary-based approaches. Machine learning based approaches include Naïve Bayes, SVM. Deep learning-based approaches include CNN and BERT. We have collected different domains of dataset from Twitter and other open-source datasets ranging from Healthcare industry to Tourism industry to Entertainment industry. Visual analytics system analyzes the user’s uploaded social media data for sentiment analysis. It will find the domain of the uploaded dataset by doing a similarity match between the top frequent words of the uploaded dataset and the frequent words of the training dataset from different domains. Based on this, the visual analytics system will apply the appropriate sentiment analysis technique on the underlying dataset and provide a set of interactive visualizations to explore and understand sentiments and other behavioral aspects of the underlying uploaded dataset. Furthermore, the system will show interesting data facts about the underlying data, e.g., frequent keywords, number of tweets per sentiment, etc.