Sunteți pe pagina 1din 3

Special Issue Published in International Journal of Trend in Research and Development (IJTRD),

ISSN: 2394-9333, www.ijtrd.com


A Comparative Study on Linear Classifiers for
Opinion Mining
1
N.Saranya and 2Dr.R.Gunavathi,
1
M.Sc(SS).,MCA.,M.Phil., Assistant Professor,Department of PG CS, Sree Saraswathi Thyagaraja College, Pollachi.
2
MCA.,M.Phil.,PhD., Head Department of MCA., Sree Saraswathi Thyagaraja college, Pollachi, India

Abstract---Sentimental analysis or opinion mining is one of the support vector machine and extreme learning machine in order
research areas. Sentimental analysis is the method of finding to evaluate and compare the speed. SVM is known as being
sentiment such as positive or negative from a text data. accurate, but at the cost of high computational complexity,
Nowadays, people are often communicating, discussing and particularly in the learning phase, making it less convenient for
sharing information through internet. Hence, internet is one of hardware-oriented applications. For each sample of the training
the essential part of human life. The information in it covers a dataset, the training algorithm assigns the sample to one existing
wide range of areas such as feedback or opinion, academic class or creates a new one, making it a non-probabilistic linear
information, about products, comments about social issues etc. classifier. The comparison between SVM and ELM made in
It helps people to think and make decision in many things. terms of accuracy, number of hidden neurons / support vectors
Majority of people always listen to others opinion before taking and in terms of speed.
a final decision. It is necessary to understand the attitudes,
Remaining sections are organized as follows: In section 2
opinions and emotions of the users and. In this paper we are
delivers some related works on the domain of sentimental
comparing feature selection techniques of liner classifiers such a
analysis; Section 3 comparative review on approaches to
Super Vector Machine and Extreme Learning Machine. The
sentiment analysis, section 4 contributes an overview of our
comparison of these two classifiers is considered for
proposed methodology and the final section 5 is the conclusion
performance, resources used or support vector kernels and its
and future work.
computational complexity or speed. From the analysis, best
algorithm is consider for determining the emotional tone of the II. LITERATURE REVIEW
user.
O’Connor et al., (2010) connect measures of opinion
Keywords---Machine Learning, Sentiment analysis, Support measured from polls with sentiment measured from text. The
Vector Machine, Extreme Learning Machine. others analyze many surveys on shopper confidence and
political opinion over the 2008 to 2009 amount, and realize they
I. INTRODUCTION
correlate to sentiment word frequencies in contemporaneous
opinion analysis has drawn great interest in recent Twitter messages. Huanget al., (2015) presented an extreme
yearsbecause of the surge in blog posts, movie and restaurant learning machine presenting an efficient unified solution to
reviews, etc. being created and shared by Internet users, and the generalized feed-forward neural networks. Unlike ANNs,
scope of new applications enabled byunderstanding the however, ELM cannot be easily parallelized, due to the presence
sentiments embedded in that content. For example, extracting of a pseudo-inverse calculation represented in Xinet al., (2014).
the sentiment of a review can help provide succinct summaries Therefore, a reliable method is proposed to realize a parallel
to readers, and can be very useful inautomatically generating implementation of ELM that can be applied to large datasets
recommendations for users. Sentiment mainly refers to feelings, typical of big data problems. An example represented by Heet
emotions, opinion or attitude. Business owners and advertising al., (2013) of parallel ELM implementation for regression based
companies often employ sentiment analysis to start new on the MapReduce framework can be found in [11], while
business strategies and advertising campaign. Sentiment Huanget al., (2016) provides a parallel ensemble method for an
analysis, which is additionally known as opinion mining, online sequential ELM variant.
involves in building a system to gather and examine opinions
Li and Hovy (2014) propose a semi-supervised
regarding the merchandise created in journal posts, comments,
bootstrapping algorithmic rule for analyzing China’s foreign
reviews or tweets.
relations from the People’s Daily. These approach addresses
The data is processed through the algorithm and given to the sentiment target clump, subjective lexicons extraction and
classifier for processing.Sentiment analysis can be used in sentiment prediction in a unified framework. Totally different
Online Commerce. The websites made an option to allow the from existing algorithms within the literature, time data is taken
user to record their experience about shopping and product into account in our algorithmic rule through a hierarchical
qualities. It gives detail summary and scores/ ratings. It will theorem model to guide the bootstrapping approach.
made easy for customers to select the recommended products
Danget al., (2010) presented a group of sentiment words
based on the interest. Machine learning algorithms are very
built on sentiment lexicon using a method called lexicon-
helpful to predict and classify whether a particular document
enhanced. They have used these words as a new feature. The
have positive or negative sentiment. Machine learning is
experiment were used three features such as Sentiment words
categorized in two types known as supervised and unsupervised
along with content specific and content free features. The
machine learning algorithms. Supervised learning algorithm
evaluation was performed using 10-fold cross validation. The
uses a labelled dataset where each document of training set is
dataset used contains reviews about DVD, Books, Digital
labelled with appropriate sentiment, whereas, unsupervised
cameras, Electronics, Kitchen appliances. The highest overall
learning include un-labelled dataset where text is not labelled
accuracy was 84.15%, it is obtained for the product Kitchen
with appropriate sentiments.
appliances. The experiments show that the combination of F1,
In this paper, the basis of comparison including the widely F2, F3was giving more accuracy when compared to the
used supervised learning techniques on a labeled dataset. (i.e.,) individual feature set.
National Conference on “Digital Transformation – Challenges and Outcomes” (ASAT in CS'17) organized by Department of
Computer Science, St.Anne's First Grade College For Women, Bangalore on 3rd Mar 2017 63 | P a g e
Special Issue Published in International Journal of Trend in Research and Development (IJTRD),
ISSN: 2394-9333, www.ijtrd.com
Celikyilmazet al., (2010) presented a method for 2: while min Ji <−ε:i=1,...,N
normalizing the noisy tweets and classified them according to Update J, Ji = gidi.
the polarity. They have collected 2 million tweets from Obtain the minimum of Ji, c = arg min i=1,...,N
September 2009 to June 2010 using Twitter search API. They Ji. And update the corresponding Lagrange variable αc.
collected tweets related to the mobile operation. To generate Update g, d.
sentimental words they have employed a mixture model endwhile
approach, and calculated F-score of each word and the words
with F-score greater than 10 % will be selected as raw words. Figure 2: Algorithm for SVM
As a future work, they suggested a frame work to gain The decision is based on the support vectors selected in the
knowledge of the lexicon that can be extracted from the training set. Among the different variants of SVM, the
collected tweets so we can represent the words such as luv, multiclass SVM is used for sentiment analysis. The centroid
lovwww and love as one entity “love”. classification algorithm first calculates the centroid vector for
Martin et al., (2016) have projected SemEval-2013 Task 2: every training class. Then the similarities between a document
Sentiment Analysis in Twitter, including 2 subtasks: A, AN and all the centroids are calculated and the document is assigned
expression-level subtask, and B, a message level subtask. The a class based on these similarities values. The aim of this
others used crowdsourcing on Amazon Mechanical Turk to experiment is to improve SVM on benchmark datasets by
label an outsized Twitter coaching dataset at the side of further Panget al., (2004) and Taboadaet al., (2006). The framework
take a look at sets of Twitter and SMS messages for each consists of preprocessing, feature extraction, feature selection
subtasks. All datasets utilized in the analysis area unit free to the and classification stages.
analysis community. B. Extreme Learning Machine
Xia et al., (2015) have a tendency to propose a model Onetoet al., (2016) proposed an efficient implementation of
referred to as twin sentiment analysis (DSA), to handle this the ELMs on Spark, in order to exploit the benefits of the spark
downside for sentiment classification. They have a tendency to framework, in the context of big social data analysis. In
initial propose a completely unique information growth particular, an approach to support emotion recognition and
technique by making a sentiment-reversed review for every polarity detection in natural language text has been proposed
coaching and take a look at review. On this basis, we have a and evaluated ELM’s output layer can be considered as a linear
tendency to propose a twin coaching algorithmic rule to create system where the output weights can be computed through
use of original and reversed coaching reviews in pairs for simple generalized inverse operation.
learning a sentiment classifier, and a twin prediction algorithmic
rule to classify the take a look at reviews by considering 2 sides The simple, efficient procedure to train the ELM involves the
of 1 review. They have a tendency to conjointly extend the DSA following steps such as randomly generate hidden node
framework from polarity (positive-negative) classification to 3- parameters, Compute the activation matrix and compute the
class (positive negative-neutral) classification, by taking the output weights. They have tested on two affective analogical
neutral reviews into thought. reasoning datasets. In particular, two benchmarks are
considered, each one composed by 21743 common-sense
III. APPROACHES TO OPINION MINING concepts; each concept is represented according to the affective
A. Support Vector Machines space model Cambria Gastaldoet al., (2015) and the affective
space 2 model Cambria et al., (2015). Figure 3 shows ELM
Support Vector Machines are highly effective in many research algorithm, initially the set of training data is obtained. Then the
and application domains, including text categorization. It have kernel matrix is formed.
been shown to outperform Naive Bayes and maximum entropy
classifiers. Zainuddin andSelamat (2014)work is considered Candidate SV = {closest pair from opposite classes }
here, the basic idea behind SVMs is to find a separating hyper- while there are violating points do
plane with the largest margin in a given higher-dimensional Find a violator
feature space. The search for this hyper-plane corresponds to a candidateSV = candidateSV U violator
constrained optimization problem. It is also used for text if any αp < 0 due to addition of c to S then
classification based on a discriminative classifier. The approach candidateSV = candidateSV/p
is based on the principle of structural risk minimization. First repeat till all such points are pruned
the training data points are separated into two different classes end if
based on a decided decision criteria or surface. The pseudo code end while
for SVM is shown in figure 2. Figure 3 Algorithm for ELM

1: Initialization:
α = 0, g = Gα − 1, J = g, d = 1, α, g, J, d ∈RN.

Table 1: Comparison Results Of Opinion Mining

Accuracy Testing
Accuracy on Training
on Time Error
Approach Dataset negative class time
positive (minutes) (%)
(%) (minutes)
class (%)
Product Review 17.02
SVM 67.35 84.2 16.15 2.36
dataset
Analogical 14.65
ELM 75.21 96.3 15.03 3.53
reasoning datasets

National Conference on “Digital Transformation – Challenges and Outcomes” (ASAT in CS'17) organized by Department of
Computer Science, St.Anne's First Grade College For Women, Bangalore on 3rd Mar 2017 64 | P a g e
Special Issue Published in International Journal of Trend in Research and Development (IJTRD),
ISSN: 2394-9333, www.ijtrd.com
The experiments are performed based on the review, accuracy [11]. Dang, Y., Zhang, Y., & Chen, H. (2010). A lexicon-
achieved in sentiment classification task in shown in the table I. enhanced method for sentiment classification: An
In the Table the accuracy of unigram features in positive class as experiment on online product reviews. IEEE Intelligent
well as negative class were analyzed. For improving the Systems, 25(4), 46-53.
accuracy preprocessing unit is added before dividing and testing [12]. Celikyilmaz, A., Hakkani-Tür, D., & Feng, J. (2010,
process.The samples in the dataset should be preprocessed December). Probabilistic model-based sentiment analysis
before performing any type of operation. The main purpose of of twitter messages. In Spoken Language Technology
this step is that reducing the feature set and improves the Workshop (SLT), 2010 IEEE (pp. 79-84). IEEE.
classification performance. Maximum accuracy is obtained in [13]. Martin, V. M. A., David, K., &Bhuvaneswari, R. (2016).
positive class of ELM. Maximum performance for opinion A Survey on Various Techniques for Sentiment Analysis
mining in case of informal text is achieved. To make this and Opinion Mining. Data Mining and Knowledge
effective, need to concentrate on number of factors that affects Engineering, 8(3), 78-82.
the performance of Opinion Mining problem. [14]. Pang, B., & Lee, L. (2004, July). A sentimental
education: Sentiment analysis using subjectivity
CONCLUSION AND FUTURE WORK summarization based on minimum cuts. In Proceedings
A comparative study between two types of learning of the 42nd annual meeting on Association for
methods was considered. Several advantages of SVM is Computational Linguistics (p. 271). Association for
analyzed, such as there is no randomness involved in the Computational Linguistics.
training algorithm, so the result will not be changed by running [15]. Taboada, M., Anthony, C., &Voll, K. (2006, May).
the algorithm multiple times. SVM has active learning which is Methods for creating semantic orientation dictionaries. In
an optimization method for controlling model growth and Proceedings of the 5th Conference on Language
reducing model build time. Since SVM has more testing time it Resources and Evaluation (LREC’06) (pp. 427-432).
is one of the drawback. In ELM, the training time is higher, but [16]. Cambria, E., Gastaldo, P., Bisio, F., &Zunino, R. (2015).
the testing time remains lower. ELM also have high accuracy. An ELM-based model for affective analogical reasoning.
Since both SVM and ELM belong to the class of kernel Neurocomputing, 149, 443-455.
networks, further research will consider similar comparison [17]. Cambria, E., Fu, J., Bisio, F., &Poria, S. (2015, January).
when using different other types of kernels. AffectiveSpace 2: Enabling Affective Intuition for
Concept-Level Sentiment Analysis. In AAAI (pp. 508-
References 514).
[18]. Xia, R., Xu, F., Zong, C., Li, Q., Qi, Y., & Li, T. (2015).
[1]. Oneto, L., Bisio, F., Cambria, E., &Anguita, D. (2016). Dual sentiment analysis: Considering two sides of one
Statistical learning theory and ELM for big social data review. Knowledge and Data Engineering, IEEE
analysis. ieeeCompUTATionAlinTelliGenCemAGAzine, Transactions on, 27(8), 2120-2133.
11(3), 45-55.
[2]. Poria, S., Cambria, E., Winterstein, G., & Huang, G. B.
(2014). Sentic patterns: Dependency-based rules for
concept-level sentiment analysis. Knowledge-Based
Systems, 69, 45-63.
[3]. Zainuddin, N., &Selamat, A. (2014, September).
Sentiment analysis using support vector machine. In
Computer, Communications, and Control Technology
(I4CT), 2014 International Conference on (pp. 333-337).
IEEE.
[4]. Poria, S., Cambria, E., Gelbukh, A., Bisio, F., & Hussain,
A. (2015). Sentiment data flow analysis by means of
dynamic linguistic patterns. IEEE Computational
Intelligence Magazine, 10(4), 26-36.
[5]. O'Connor, B., Balasubramanyan, R., Routledge, B. R., &
Smith, N. A. (2010). From tweets to polls: Linking text
sentiment to public opinion time series. ICWSM, 11(122-
129), 1-2.
[6]. Huang, G., Huang, G. B., Song, S., & You, K. (2015).
Trends in extreme learning machines: a review. Neural
Networks, 61, 32-48.
[7]. Xin, J., Wang, Z., Chen, C., Ding, L., Wang, G., & Zhao,
Y. (2014). ELM∗: distributed extreme learning machine
with MapReduce. World Wide Web, 17(5), 1189-1204.
[8]. Li, J., &Hovy, E. H. (2014, October). Sentiment Analysis
on the People's Daily. In EMNLP (pp. 467-476).
[9]. He, Q., Shang, T., Zhuang, F., & Shi, Z. (2013). Parallel
extreme learning machine for regression based on
MapReduce. Neurocomputing, 102, 52-58.
[10]. Huang, S., Wang, B., Qiu, J., Yao, J., Wang, G., & Yu,
G. (2016). Parallel ensemble of online sequential extreme
learning machine based on MapReduce.
Neurocomputing, 174, 352-367.

National Conference on “Digital Transformation – Challenges and Outcomes” (ASAT in CS'17) organized by Department of
Computer Science, St.Anne's First Grade College For Women, Bangalore on 3rd Mar 2017 65 | P a g e

S-ar putea să vă placă și