Text Mining Based on Tax Comments

Text Mining Based on Tax Comments


The tax gives an important role for the contributions of the economy and development of a country. The improvements to the taxation service system continuously done in order to increase the State Budget. One of consideration to know the performance of taxation particularly in Indonesia is to know the public opinion as for the object service. Text mining can be used to know public opinion about the tax system. The rapid growth of data in social media initiates this research to use the data source as big data analysis. The dataset used is derived from Facebook and Twitter as a source of data in processing tax comments. The results of opinions in the form of public sentiment in part of service, website system, and news can be used as consideration to improve the quality of tax services. In this research, text mining is done through the phases of text processing, feature selection and classification with Support Vector Machine (SVM). To reduce the problem of the number of attributes on the dataset in classifying text, Feature Selection used the Information Gain to select the relevant terms to the tax topic. Testing is used to measure the performance level of SVM with Feature Selection from two data sources. Performance measured using the parameters of precision, recall, and Fmeasure.


In doing text processing required the use of classification methods such as Naïve Bayes (NB), Artificial Neural Network (ANN), and Support Vector Machine (SVM). NB is used to handle document classification problems through simple models so that the calculation of Naïve Bayes is easy and works well on large datasets. In addition, the hierarchical model of Naïve Bayes is considered to improve the efficiency of multi-grade text classification models. ANN applied to the classification based on the extracted rule usually has a low error rate. SVM has a better degree of accuracy in classification [6] and AdaBoost's combination with SVM can provide better generalization performance on an unbalanced class dataset. SVM is a method that overcomes over machine learning, but one of the problems with text classification is the number of attributes used on a dataset. Many attributes make accuracy low, even though the dynamic data needed a better technique to handle the dataset. To get better accuracy, the existing attributes must be selected with the right algorithm. Feature selection is an important part of text processing, especially in the process of optimizing the performance of the classifier. Feature selection is based on a subset that works by minimizing features that are not relevant to the classification [10]. In this research, feature selection used Information Gain to calculate entropy value in the dataset. Information Gain is one of the most widely used feature selection criteria for classification applications



Document categorization as the important issues of mining the text refers to the automatic classification of documents in a data class based on category or topic. In a study focused on the Machine Learning approach for automatic text categorization to be used on typical web structures

Text mining is a text analysis where data sources are usually obtained from documents with the aim of searching for words that can represent the contents of a document so that interrelationships and inter-document classes can be analyzed. It is used to know the pattern of issues and problems that occur in the community in real time so that it can be taken into consideration in preparing a more appropriate policy.


This research proposed a text mining processing through SVM method with classification optimization with Feature Selection. Feature Selection is used to select the relevant feature of the dataset in order to get a better performance of SVM as a classifier. Text mining aims to generate a classification on the sentiment about the problem of taxation based on data sources the public comments on Facebook and Twitter. In this study, the results of positive and negative sentiments are based on time period and the type of tax data namely service, website system, and tax news. For further research, information generated from this text mining can be used as considerable of taxation and support services for future policies