Sunteți pe pagina 1din 9

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976

6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
53











ASPECT BASED SENTIMENT ANALYSIS OF MOVIE REVIEWS


Mitisha Vaidya
1
, Priyank Thakkar
2


Nirma University, Ahmedabad, 382481, Gujarat, India



ABSTRACT

Aspect based Sentiment Analysis identifies users sentiment towards particular aspect of an
entity. In aspect based sentiment analysis, aspect and sentiment word extraction and sentiment
polarity identification are two important tasks. In this paper, Seeded Aspect and Sentiment (SAS)
topic model is extended using part of speech (POS) tagging for aspect and sentiment word extraction.
Two approaches of SentiWordNet for sentiment polarity identification are also studied in the paper.

Keywords: Aspect, Aspect Extraction, Sentiment Analysis, Sentiwordnet, Topic Modeling.

I. INTRODUCTION

Aspect based sentiment analysis investigates what precisely individuals likes or dislikes.
Document level and sentence level sentiment analysis would not be able to identify users opinion
towards particular aspect of an entity. Document level analysis represents general opinion of users
towards an entity. Sentence level analysis represents users opinion sentence by sentence. So, for
reviewing any entity accurately, aspect based sentiment analysis is more preferable.
In aspect based sentiment analysis, aspect and sentiment word extraction separates aspects
that have been assessed [5]. For instance, in the sentence, The voice quality of this phone is
amazing, the aspect is voice quality of the entity this phone. Here, this phone does not show
the aspect GENERAL, in light of the fact that the assessment is not about the phone in general, but
just about its voice quality. On the other hand, the sentence I love this phone. assesses the phone
all in all, i.e., the GENERAL aspect of the entity this phone.
Sentiment polarity identification figures out if the opinions on different aspects are positive,
negative, or neutral [5]. In the first illustration over, the opinion on the voice quality aspect is
positive. In the second, the opinion on the aspect GENERAL is also positive.



INTERNATIONAL JOURNAL OF ADVANCED RESEARCH
IN ENGINEERING AND TECHNOLOGY (IJARET)


ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 6, June (2014), pp. 53-61
IAEME: http://www.iaeme.com/IJARET.asp
Journal Impact Factor (2014): 7.8273 (Calculated by GISI)
www.jifactor.com

IJARET
I A E M E
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
54

II. RELATED WORK

In aspect and sentiment word extraction, mainly three techniques are used.First technique is
aspect extraction based on frequent nouns and noun phrases [12]. Second technique is aspect
extraction by exploiting opinion and target relations [4] and the third technique is aspect extraction
using topic modelling [7]. Aspect extraction using topic modelling as discussed in [7] combined
features of the first two techniques. In topic modelling, the synonymous aspects must be grouped
into the same class. To address this issue, a different setting was presented in [7], where the user
gave some seed words for a few aspect class and the model extracted and grouped aspect terms into
class at the same time. This setting was paramount on the grounds that arranging aspects was a
subjective task. For different application proposed, different arrangements may be required. Some
form of user direction is sought. The principle task focused in [7] was to extract the aspects and
group them. Notwithstanding, the models could additionally extract aspect specific sentiment word.
In sentiment polarity identification, two primary approaches are used. First technique is
Lexicon based approach [4],[9] and second technique is supervised learning approach [3]. In this
paper, the Lexicon based approach is used as described in [9]. In [9], SentiWordNet is used to
determine aspects sentiment polarity. This was done for all the sentences in a review and
subsequently for all reviews of a movie. The scores for a particular aspect from all the reviews of a
movie were aggregated to obtain an opinionated analysis of that aspect. The sentiment analysis
around aspects thus first located an opinionated content about an aspect in a review and then used the
SentiWordNet based approach to compute its sentiment polarity. This paper examines two methods
of SentiWordNet. First method is Adjective + Adverb Combine denoted as SWN(AAC)[9] and the
second method is Adjective + Adverb combine with Adverb +Verb combine denoted as
SWN(AAAVC)[9].

III. SAS MODEL [7] WITH POS TAGS


Figure 1: SAS model [7] with POS tags
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
55

ME-SAS [7] used a maximum entropy method to generate priors for aspect's and sentiment's
part-of-speech tag. In this paper Stanford-POS-Tagger[10] is used for the same purpose. As this
tagger tags the words in the sentences using Maximum Entropy, proposed model does not require to
calculate maximum entropy for part-of-speech tags separately. As shown in Figure 1,
,
is
computed at the same level as in ME-SAS. But passing parameters are hyper parameter and words
generated by part-of-speech tagging denoted as pos in this study. The equations used to set priors
are same as in SAS-model.
The entries in the vocabulary is denoted by

, where V is the number of unique non-seed


terms.

is used to signify seed sets, where each seed set

is a group of semantically related


terms. T aspects and T aspect specific sentiment models are denoted by

respectively.
Aspect specific distribution of seeds in the seed set Q

is represented by
,
. In this study, it is
assumed that a review sentence usually talks about one aspect. A review document d
..
comprises of
S

sentences and each sentence s in S

has N
,
words. The sentence s of document d is represented
by

. To distinguish between aspect and sentiment terms, an indicator (switch) variable

,,
, for the

term of

,
,,
is used. Further, let
,
mean the distribution of
aspects and sentiments in

. Different priors are calculated from the Equations (1), (2) and (3).
This equations are same as used in SAS model [7].

,

,
,
,
,
,
,
,

,,


,
,


,,
,


,,
,

,,,
,

, ,

, ,

,,

,
,
,
,
,
,
,
,
,
,
,,

,
,,

,
,,

,
,,

,
,,


,
,,

,,

,
,
,
,
,
,
,
,
,
,
,,

,,
,,
,

,,
,,
,
|

,
,,

,
,,


,
,,

,
,,
,

,
,

,
,,

,
,,


,
,,

; ,




where

is the multinomial Beta function. Number of times term v assigned to


aspect t as an opinion/sentiment word is denoted as
,

.Number of times non-seed term v in


Vassigned to aspect t as an aspect is signified by
,
,
. Number of times seed term v in

assigned to
aspect t as an aspect is represented as
,,
,
.
,

is the number of sentences in document d that were


assigned to aspect t. designate The number of terms in

that were assigned to aspects and


International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
56

opinions are designated as
,

and
,

respectively. Number of times any term of seed set


assigned to aspect t is represented as
,

. Omission of a latter index denoted by [] in the above


notation represents the corresponding row vector spanning over the latter index. For example,

,
,

,
,

,
,
and denotes the marginalized sum over the latter index. Counts excluding
assignments of all terms in

is denoted by the subscript , . Counts excluding


,,
is
represented by , , . Hierarchical sampling is performed in this paper. For each sentence
,
,
first, an aspect is sampled using Equation (1). Once the aspect is sampled,
,,
is computed. In
,,
,
the probability of
,,
being an opinion or sentiment term,
,,
is given by Equation (2).
However, for
,,
, there are two cases: (i) the observed term
,,

or
(ii) does not belong to any seed set, ,

i.e., w is an non-seed term. These cases are dealt in


Equation (3).

IV. SentiWordNet

After extracting aspect and sentiment words for each sentence in a document, for sentiment
polarity identification two approaches are implemented. In SWN(AAC), Adjective or Adjective +
Adverb combine words are extracted from the sentences, which contain aspects. Polarities to these
words are assigned by SentiWordNet using following algorithm [9]. Here, scaling factor (sf) for
adverb is taken 0.35 as suggested in [9]. Adjective is represented by adj and adverb is represented by
adv.

Algorithm 1: SWN(AAC) [9]
For each sentence, extract adv+adj combines.
For each extracted adv+adj combine do:
If adj score=0, ignore it.
If adv is affirmative, then
o If score(adj)>0

(adv,adj)=
min(1,score(adj)+sf*score(adv))
o If score(adj)<0

(adv,adj)=
min(1,score(adj)-sf*score(adv))
If adv is negative, then
o If score(adj)>0

(adv,adj)=
max(-1,score(adj)+sf*score(adv))
o If score(adj)<0

(adv,adj)=
max(-1,score(adj)-sf*score(adv))


In SWN(AAAVC), Adverb + verb patterns are combined with Adjective + Adverb. Here
Adverb + Verb are multiplied with different weight factors from 0.1 to 1 as suggested in [9]. In
this implementation, best result is obtained when weight factor is set to 1.




International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
57


Algorithm 2: SWN(AAAVC) [9]
For each sentence, extract adv+adj and adv+verb combines.
1. For each extracted adv+adj combine do:
If adj score=0, ignore it.
If adv is affirmative, then
o If score(adj)>0

(adv,adj)=
min(1,score(adj)+sf*score(adv))
o If score(adj)<0

(adv,adj)=
min(1,score(adj)-sf*score(adv))
If adv is negative, then
o If score(adj)>0

(adv,adj)=
max(-1,score(adj)+sf*score(adv))
o If score(adj)<0

(adv,adj)=
max(-1,score(adj)-sf*score(adv))
2. For each extracted adv+verb combine do:
If verb score=0, ignore it.
If adv is affirmative, then
o If score(verb)>0

(adv,verb)=
min(1,score(verb)+sf*score(adv))
o If score(verb)<0

(adv, verb)=
min(1,score(verb)-sf*score(adv))

If adv is negative, then
o If score(verb)>0

(adv, verb)=
max(-1,score(verb)+sf*score(adv))
o If score(verb)<0

(adv, verb)=
max(-1,score(verb)-sf*score(adv))
3.

(sentence)=
f(adv,adj)+1*f(adv,verb)


IV. EXPERIMENTAL EVALUATION

DataSet
In all the experiments carried out, benchmark dataset AC1IMDB [6] is used. For aspect and
sentiment word extraction seeds are manually created using different film awards, movie review sites
and film magazines. This dataset contains 50,000 movie reviews from www.imdb.com. From that,
25,000 movie reviews are negative and 25,000 movie reviews are positive.

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
58

Evaluation Measures
Accuracy and f-measure are used to evaluate the performance. Accuracy is defined as the
ratio of the correctly identified polarities of reviews divided by total reviews. In this paper, user
liking a movie is considered as positive review while user disliking a movie is considered as negative
review. In this sense, true positive (TP), false negative (FN), false positive (FP) and true negative
(TN) are defined as under [13].

TP: the number of correctly identified positive reviews
FN: the number of incorrectly identified of the negative reviews
FP: the number of incorrectly identified of the positive reviews
TN: the number of correctly identified of the negative reviews
Based on the above interpretations precision () and recall () are defined in equations (4) and (5)
respectively.


5
F-measure (F) is used to compare classifier on a single measure and it is represented by the
equation (6)

2

6

Experimental Methodology, Results and Discussions
First, pre-processing of the dataset was done using stop-words excluding negative words i.e.
not, isnt, doesnt. Words that appeared less than five times in corpus are removed. The seeds for
aspects were manually made from various film awards sites, film magazines and film review sites.
After pre-processing the dataset, SAS model with pos tags is applied on dataset to extract
aspect and aspect specific sentiment words. SWN(AAC) and SWN(AAAVC) schemes are used to
assign sentiment scores for sentiment words extracted by SAS model. After identifying scores of the
sentiment words assigned to the aspects appearing in the review, final score of the review is
computed by aggregating the scores of these sentiment words. If score > 0, review is considered
positive else negative. Computed polarity is then matched with actual polarity to compute accuracy
and f-measure.

Table 1: Comparison of SentiWordNet schemes with computed sentiment polarity
Scheme Actual
Computed
(In Comparison to Actual)
SWN(AAC)
Positive 25000 21736
Negative 25000 17774
SWN(AAAVC)
Positive 25000 23002
Negative 25000 19422


Table 1 represents the total number of correctly identified reviews by two SentiWordNet
schemes with actual number of reviews. From this result, it can be seen that SWN(AAAVC)
provides better result than SWN(AAC). Table 2 shows correctly classified polarities for both the
schemes in terms of percentage.



International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
59

Table 2: Percentage of correctly classified polarity by two schemes
Scheme Correctly Classified Polarity (%)
SWN(AAC)
Positive 86.94%
Negative 71.10%
SWN(AAAVC)
Positive 92%
Negative 77.69%

Table 3: Accuracy and f-measure
Scheme Performance Measure Value
SWN(AAC)
Accuracy 70.02%
F-measure 78.89%
SWN(AAAVC)
Accuracy 84.85%
F-measure 84.77%

As shown in Table 3, accuracy of 84.85% is achieved for the task of sentiment polarity
identification by SWN (AAAVC) schemeof SentiWordNet..Figure 2 depicts the impact of different
amount of fraction of verb score (weight factor) on the accuracy for the SWN(AAAVC) scheme. It
can be seen that best accuracy is achieved when the weight factor is set to 1.


Figure 2: Impact of weight factors on accuracy

Using aspect level sentiment analysis, detailed review profile of a movie can be represented.
Figure 3 shows review profile of a movie with majority positive reviews while Figure 4 depicts the
same for a movie with majority negative reviews.

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
60


Figure 3: Review Profile of a movie with majority positive reviews



Figure 4: Review Profile of a movie with majority negative reviews
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 6, June (2014), pp. 53-61 IAEME
61

V. CONCLUSIONS & FUTURE WORK

This paper focuses on identifying polarity/sentiment of reviews about the product/items. To
identify the sentiment, first, aspects and sentiment words are extracted using SAS model with POS
tagging. Using two schemes of SentiWordNet, sentiment scores of the sentiment words related to the
aspects appearing in the review are found. After identifying scores of the sentiment words assigned
to the aspects appearing in the review, final score of the review is computed by aggregating the
scores of these sentiment words. It is evident from the result that SWN(AAAVC) scheme gives
better result than SWN(AAC) scheme. One potential direction for the future work can be the
experimentation on other data sets of the same domain as well as different domain than the movie
reviews.

REFERENCES

[1] http://www.tripadvisor.com.
[2] SentiWordNet, available at http://www.sentiwordnet.isti.cnr.it.
[3] Murthy Ganapathibhotla, South Morgan Street, Bing Liu, and South Morgan Street. Mining
opinions in comparative sentences. In International Conference on Compu-tational Linguistics
(Coling-2008), 2008.
[4] Minqing Hu, Bing Liu, and South Morgan Street. Mining and summarizing customer reviews. In
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2004), 2004.
[5] Bing Liu. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers,May 2012.
[6] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher
Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting
of the Association for Computational Linguistics:Human Language Technologies, pages
142{150, Portland, Oregon, USA, June 2011.Association for Computational Linguistics.
[7] Arjun Mukherjee and Bing Liu. Aspect extraction through semi-supervised modeling. In ACL,
2012.
[8] Bo Pang, Lillian Lee, Harry Rd, and San Jose. Sentiment classi_cation using machine learning
techniques. In Conference on Empirical Methods in Natural LanguageProcessing (EMNLP-
2002), pages 79-86, July 2002.
[9] V K Singh, R Piryani, and A Uddin. Sentiment analysis of movie reviews. In IEEE explore, 2013.
[10] Kristina Toutanova and Christopher D. Manning. 2000. Enriching the knowledge sources used in
a maximum entropy part of-speech tagger. In Joint SIGDAT Conference on Empirical Methods,
2000.
[11] Bruce Wiebe and O'Hara. Development and use of a gold-standard data set for subjectivity
classification. In Association for Computational Linguistics, 1999.
[12] L. Zhang and B. Liu. Identifying noun product features that imply opinions. In ACL (short
paper), 2011.
[13] J. P. Jiawei Han, MichelineKamber, Data Mining Concepts and Techniques, Morgan
Kaufmann, 3 Edition, July 2011.
[14] Ronak Patel, Priyank Thakkar and K Kotecha, Enhancing Movie Recommender System,
International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 5,
Issue 1, 2014, pp. 73 - 82, ISSN Print: 0976-6480, ISSN Online: 0976-6499.
[15] R. Manickam, D. Boominath and V. Bhuvaneswari, An Analysis of Data Mining: Past,
Present and Future, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[16] Dr. Jamshed Siddiqui, An Overview of Opinion Mining Techniques, International Journal of
Advanced Research in Engineering & Technology (IJARET), Volume 4, Issue 7, 2013,
pp. 176 - 182, ISSN Print: 0976-6480, ISSN Online: 0976-6499.

S-ar putea să vă placă și