Supervised and Unsupervised Aspect Category Detection For Sentiment Analysis With Co-Occurrence Data

Supervised and Unsupervised Aspect Category Detection For Sentiment Analysis With Co-Occurrence Data


Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an F1-score of 67%. The second method is a supervised variant that outperforms existing methods with an F1-score of 84%.

Existing System:

The information that can be obtained from product and service reviews is not only beneficial to consumers, but also to companies. Knowing what has been posted on the Web can help companies improve their products or services. However, to effectively handle the large amount of information available in these reviews, a framework for the automated summarization of reviews is desirable. An important task for such a framework would be to recognize the topics (i.e., characteristics of the product or service) people write about.


When the aspect categories are known beforehand, and enough training data is available, a supervised machine learning approach to aspect category detection is feasible, yielding a high performance. Many approaches to find aspect categories are supervised. However, sometimes the flexibility inherent to an unsupervised method is desirable.

Proposed System:

In, an approach is suggested that simultaneously and iteratively clusters product aspects and opinion words. Aspects/opinion words with high similarity are clustered together, and aspects/opinion words from different clusters are dissimilar. The similarity between two aspects/opinion words is measured by fusing both homogeneous similarity between the aspects/opinion words (content information), calculated by traditional approach, and similarity by their respective heterogeneous relationships they have with the opinion words/aspects (link information). Based on the product aspect categories and opinion word groups, a sentiment association set between the two groups is then constructed by identifying the strongest n sentiment links.