Sunteți pe pagina 1din 6

2010 Ninth International Conference on Machine Learning and Applications

Semi-Automatic WordNet Based Emotion


Dictionary Construction
David B. Bracewell
General Electric Global Research
One Research Circle
Niskayuna, NY 12309
david.bracewell@ge.com

Abstract—This paper describes an algorithm for semi- to understand the affect (emotion) and sentiment (opinion) of
automatically creating an emotion dictionary using WordNet. text.
The algorithm takes as input a set of seed words that have Language processing requires a vast amount of resources,
had emotion information assigned as well as WordNet sense
information. From this list an initial dictionary is automatically such as dictionaries, corpora, ontologies, etc. The natural
created using the various relations found within WordNet. Then, language processing community over the years has created
various correction stages are performed where parts of the these resources for understanding the explicit message, such as
dictionary are shown to the user for verification and additional WordNet [7] and the Penn Treebank [10]. However, due to its
information in the form of emotion polarity and probability infancy, there are only a scarce amount of resources available
information are assigned. Using the proposed algorithm with a set
of 549 seed words, an emotion dictionary containing over 13,000 for dealing with emotion or opinion in text. To compound the
WordNet senses was created in just under 7 person-hours of time. scarcity problem there is no agreed upon theory of emotion or
To evaluate the created dictionary its usefulness in improving the even which emotions are basic. This limits the usefulness of
performance of affect and sentiment classification was examined. large scale manual creation, because of its limited applicability
Classification was performed using support vector machines and and causes much of the manually created resources to be
a baseline non-machine learning dictionary based algorithm. The
results showed that the error rate is reduced when using the researcher/research specific.
dictionary over when not using the dictionary. In order to ignite more research in this area, the amount
of resources must grow and semi-automatic ways of creating
I. I NTRODUCTION resources must be discovered. This paper attacks the problem
of resource scarcity by proposing an algorithm, which has
Emotions play an important role in guiding the decisions
some improvements over our previous work [1], for semi-
we make and determining the language we use. They also act
automatically creating emotion dictionaries using WordNet
as the basis of human relationships [5]. Besides its importance
(version 2.1 was used). The algorithm takes a list of seed
in human-computer interaction, emotion also plays a role in
words and then automatically creates an initial dictionary.
language understanding. In a similar idea to that of Cowie et
After the initial creation a series of corrective steps are done
al.[3], language is made up of an explicit and implicit message.
that require minimal human interaction. The algorithm is not
Understanding the explicit message is the goal of natural lan-
tied to any one theory of emotion, which allows for researchers
guage processing and semantic analysis. The implicit message
to create emotion dictionaries that suit their needs.
gives the intent of the explicit message and often is given
This paper will continue as follows. First, in section II
through emotion. In order to truly comprehend language both
background information on affect analysis and semi-automatic
messages must be understood.
and automatic resource construction is given. Then, in section
Researchers have made great strides in understanding the
III how to define the seed words is discussed. In section IV
explicit message, i.e. semantic analysis of language. However,
how emotional inheritance in seed words is carried out is
until recently there has been little interest in understanding
explained. Next, in section V the structure of the dictionary
the implicit meaning or emotion of language in text. In
is discussed. Then, in section VI the proposed algorithm
order to achieve the next step of comprehension in language,
is described. After that, section VII gives the details of an
recognizing and understanding emotion as well as how it
emotion dictionary constructed with the proposed algorithm.
effects the meaning of language is crucial. To accomplish this
Then, section VIII gives an evaluation of the constructed
goal, natural language processing and affective computing, two
dictionary. Finally, section IX gives future work and makes
seemingly disconnected areas of artificial intelligence, must be
concluding remarks.
brought together. This union has slowly been taking place over
the past few years as researchers make the first steps in trying II. BACKGROUND
This research was performed while the author was a student at The Affect and sentiment analysis are subfields of affective
University of Tokushima. computing dealing with the classification of emotion and opin-

978-0-7695-4300-0/10 $26.00 © 2010 IEEE 629


DOI 10.1109/ICMLA.2010.97
ion respectively. Both have been explored for many different words in its sysnet, full hypernym tree and any related forms
emotion expressing mediums, such as facial expressions [12] will inherit (be assigned) the emotion of the seed word. In this
and speech [14]. Textual affect and sentiment analysis, which fashion, child senses inherit the emotion of their parents and
this research aims to help, combines affective computing with members of a synset can inherit from one another.
natural language processing to determine the emotion and Normally, this is desirable, however in certain situations the
opinion of text, such as [15]. WordNet hierarchy may cause non-desirable inheritance. An
As with other text based research, textual affect and senti- example of this is shown in figure 1. In this example, the
ment analysis requires specialized resources like dictionaries, parent is “pleasantness, sweetness” and would generally have
corpora and ontologies. However, because of its infancy, only a positive emotion. Based on the idea of inheritance, all of its
a sparse amount of resources are available. Much of the children would be assigned its emotion, which results in all the
problem comes from that fact that there is no standardized siblings having the same emotion assigned. This means that
emotion theory or standardized set of basic emotions. Which “disagreeableness” and “agreeableness” would be assigned the
emotions are the most basic and why has been debated same emotion. However, these two senses are more accurately
amongst philosophers, psychologists, and cognitive scientists described as antonyms and should not be assigned the same
since the time of the ancient Greeks. emotion.
This scarcity of resources has led researchers to manually
create the needed resources, such as [2]. In fact, much of the
research done in this field uses manually annotated resources
designed for the researcher’s current project or painstakingly
modified to fit into the researcher’s theory of emotion. These
resources are often time consuming to create and difficult, if
not impossible, to extend or change.
One common approach is to use humans to annotate or
determine the emotion of words. Cowie et al. created BEEV
(Basic English Emotion Vocabulary) by using the responses of
human participants [4]. There emotion vocabulary is created
using a human-centric approach and as such should describe Fig. 1. Example of Problematic Inheritance
emotion well. However, there is a large creation cost and an
equally large update cost. If researchers needed to add an extra To counteract this problem, inheritance is done using the
emotion then the creation method used by Cowie et al. may be more specific seed word. What this means is that the user
prohibitive. Sugimoto et al. had 15 human subjects annotate can define the emotion of a parent and its child. The emotion
the emotional categories of verbs and adjectives found in a assigned to the child will override any inheritance from the
standard dictionary [16]. parent or siblings. Looking back at figure 1 as an example, the
Some researchers have seen the need to create more auto- user has pleasantness as a seed word assigned with emotion
matic methods of creation. Kanayama and Nasukawa created 𝑒1. In order to fix the problems in WordNet the user also
an unsupervised approach using context coherency for extract- defines disagreeableness as a seed word with emotion 𝑒2. All
ing domain-dependent polar terms from unannotated corpora the children of pleasantness inherit the 𝑒1 emotion except for
that achieved 94% precision [9]. Others, have annotated well disagreeableness, because it was defined as seed word. The
known knowledge bases, such as Esuli and Sebastiani who children of disagreeableness will inherit the 𝑒2 emotion and
assigned positive, negative, and objective scores to all of the not the 𝑒1 emotion. In this way the correct emotions can be
senses in WordNet [6]. assigned to the word senses.

III. D EFINING S EED W ORDS V. E MOTION D ICTIONARY S TRUCTURE


The user given seed words include the word, its part- The created dictionary is in XML format and is made up
of-speech, WordNet sense id and emotion category. When of information that is useful for both affect and sentiment
creating the initial dictionary from the seed words a number analysis. The information within can be used for both shallow
of the WordNet relations are used. However, if an “*” is put surface level approaches or more involved natural language
in front of the word in the seed word list, the algorithm will processing based approaches. Each entry in the dictionary
not use any of the relations and only add that word to the contains the word, part-of-speech, WordNet Sense, polarity,
dictionary. This is to allow users to add individual WordNet emotion category, probability of the word being an emotion,
senses to the dictionary. probability of the word being the given emotion and WordNet
gloss.
IV. E MOTION I NHERITANCE
Inheritance of emotion in the creation process comes about VI. E MOTION D ICTIONARY C REATION A LGORITHM
through WordNet’s synset, hypernym and derivationally re- The proposed algorithm is designed to semi-automatically
lated form relations. This means that for each seed word, the create emotion dictionaries from WordNet. The algorithm

630
automates most of the process and only requires human certain senses may actually carry multiple emotions. For
intervention for creating a set of initial seed words and for example, WordNet defines the first sense of the noun form
manual verification. The algorithm is broken down into six of “apprehension” as “fearful expectation or anticipation” 1 .
steps: Initial Creation, Multiple Category Verification, Cate- If there were a “fear” and “anticipation” category, then this
gory Outlier Correction, Polarity Assignment, Polarity Outlier sense should have both emotion categories assigned to it.
Correction and Probability Calculation. Multiple category verification allows the user to examine the
The algorithm starts with an automated initial creation word senses that were assigned multiple emotions and verify
process that takes a set of user defined seed words. Us- that they should have multiple emotions assigned or correct
ing the various relations found within WordNet an initial them. Errors most typically, will happen due to the WordNet
emotion dictionary is constructed. After this, a series of hierarchy not being the same as the user’s emotion theory.
correction/verification steps are performed. These steps start Errors can also happen if the user does not define seed words
by verifying the words that have multiple emotions assigned. to take advantage of or correct errors in emotion inheritance.
Then, words with possible miscategorized emotion are deter- Verification is done by finding the word senses in the
mined and given to the user to verify or correct. The next two dictionary with multiple emotion categories assigned and
steps, assign polarity to each of the words and then determines displaying them with extra information to the user for verifi-
possible outliers. Finally, the probability that each word is an cation/correction. The extra information shown includes part-
emotion and the probability that each word is its assigned of-speech, WordNet sense id and WordNet gloss. Additionally,
emotion is determined and the final dictionary created. The the set of words that through one of the relations could have
following sections will explain the steps of the algorithm in caused the word to be added to the dictionary are displayed.
more detail. The set of word senses that have multiple emotions assigned
are displayed in hypernym/hyponym hierarchical order to
A. Initial Creation further help the user.
The first step of the proposed algorithm creates the initial The user then selects the word senses and emotions that
dictionary from the given seed words. To create the dictionary, should be deleted. When the algorithm deletes the word
WordNet’s synset, hyponym, and derivationally related forms sense/emotion pair from the the dictionary it can also delete
relations are used. This step is fully automatic and constitutes the emotion from the synset and hyponym tree if the user
the largest part in creating the emotion dictionary. desires. This allows for a more minimal effort on the user’s
First, the user defined seed words are read in and parsed. part.
Then, each of the seed words has its synset retrieved. The seed C. Category Outlier Correction
word and the words in its synset are then added to an extended
It is a strong possibility that the differences in motivations
seed word list. Additionally, the seed and the words in synset
behind the creation of WordNet and the user’s emotion theory
are also added to a protected word list. The protected word
will cause the WordNet hierarchy to introduce errors when
list is used to keep the second part of the initial creation step
assigning emotion categories automatically. Because of this,
from assigning extra emotions the seed word. This is done to
the category outlier correction step is performed. It tries to
ensure correct emotion inheritance as described earlier.
determine possible miscategorized word senses and allows the
The second part takes the new extended seed word list and
user to correct them.
uses WordNet’s relations to further expand the dictionary. For
Determining possible miscategorized word senses is done
each of the seed words, its synset, hyponyms and derivation-
using a clustering approach. Before the simple clustering can
ally related forms are extracted. Each of the acquired words
take place the word senses in the dictionary must be converted
are first checked for in the protected list. The words that are
into feature vectors. The vectors contain a feature for each
not protected have the seed’s emotion assigned to them and
emotion category defined by the user. Additionally, the feature
are added to the initial dictionary if they are not already in
vectors are labeled with the emotion of the word senses that
it. Each of the words in the synset and the hyponym list are
created them.
also added to the extended seed word list. This allows for the
The values of the features are calculated using the emotions
full hyponym tree to be extracted as well as additional related
assigned to the word sense and its hypernym, hyponyms and
forms.
derivationally related forms. This is done by first initializing
This step is fully automated and creates a descent, but pos-
the values of the features that correspond to the word senses
sibly noisy, dictionary. Of course, this step, like the algorithm
emotions to 1 and all other values to 0. Then, the word senses
is not limited to just emotions. It is possible to use this step
extracted from the hypernym, hyponyms and derivationally
for extracting hierarchies from WordNet that could be useful
related forms relations are examined one by one. Each sense
in information retrieval and extraction.
that is in the emotion dictionary will cause the feature values
B. Multiple Category Verification corresponding to the emotions assigned to the sense to be
incremented by 1. Finally, the feature values are normalized
Typically, word senses should only carry and be assigned
using equation 1.
one type of emotion. However, depending on the set of
emotion categories and the theory of emotion being used, 1 http://wordnet.princeton.edu/perl/webwn?s=apprehension

631
𝑓𝑖 Both probabilities are calculated using the emotion dic-
𝑁 𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑(𝑓𝑖 ) = ∑𝑁 (1) tionary and information about the word and its senses in
𝑗=1 𝑓𝑗
WordNet. Equation 4 shows how to calculate the probability
After the feature vectors are created they are clustered that a word carries emotion and equation 5 shows how to
by their labeled emotion category. Then for each emotion calculate the probability that a word carries the given emotion.
category (cluster) a mean vector is created to represent the In the equations, E stands for the set emotion categories and
centroid. The calculation of the mean vector is shown in 𝑊 = {𝑠1 , 𝑠2 , ⋅ ⋅ ⋅ , 𝑠𝑛 } is the set of senses for a word.
equation 2.
Count(𝑠𝑖 ∈ 𝑊 that carry emotion)
〈 ∑∣𝑐𝑖 ∣ ∑∣𝑐𝑖 ∣ ∑∣𝑐𝑖 ∣ 〉 𝑃 (𝐸∣𝑊 ) = (4)
∣𝑊 ∣
𝑗=1 𝑐𝑖 [𝑓1 ] 𝑗=1 𝑐𝑖 [𝑓2 ] 𝑗=1 𝑐𝑖 [𝑓𝑁 ]
𝐶¯𝑖 = , ,⋅⋅⋅, (2)
∣𝑐𝑖 ∣ ∣𝑐𝑖 ∣ ∣𝑐𝑖 ∣
Count(𝑠𝑖 ∈ 𝑊 that carry 𝑒𝑔𝑖𝑣𝑒𝑛 )
𝑃 (𝑒𝑔𝑖𝑣𝑒𝑛 ∣𝑊 ) = (5)
After the mean vectors are calculated for each of the Count(𝑠𝑖 ∈ 𝑊 that carry emotion)
emotion categories, outliers can be detected. This is done by
Each of the word’s senses are taken to be equally probable,
determining which of the emotion categories’ mean vector is
which may or may not be the case in the real use of the word.
closest to each of the senses in the dictionary. The simple
However, the probabilities should aid in using the dictionary
Euclidean distance, equation 3, is used to calculate the distance
in more shallow approaches, where part-of-speech and sense
between the mean vectors and the senses’ feature vectors. The
information may not be used. After the two probabilities are
emotion category whose mean vector has the shortest distance
calculated for each of the words in the dictionary then the final
is presumed to be the true emotion category of the sense. If
dictionary can be created in XML format.
the sense’s assigned emotion categories does not include this
emotion then it will be given to the user so that they can VII. E XAMPLE D ICTIONARY C ONSTRUCTION
determine if the emotion category should be changed or not. This section will detail the process of creating an emotion
 dictionary using the proposed algorithm. First, the basic emo-
𝑁
∑ tions that were chosen will be discussed. Then, statistical in-
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑣, 𝑐) = ⎷
2
(𝑣𝑖 − 𝑐𝑖 ) (3)
formation on the seed words will be given. Finally, information
𝑖=1
from the results of each step of the proposed algorithm will
D. Polarity Assignment and Outlier Correction be discussed.
The next step in the dictionary creation process is to A. Basic Emotions
assign polarity information. This is done using a user supplied
This first step in building an emotion dictionary is deciding
emotion-polarity mapping. In addition to the mapping a set of
on the basic emotion categories. As has been stated before,
precedence rules can be defined to determine which polarity
there is no one agreed upon theory of what the basic emotions
will be assigned to senses with multiple emotion categories.
should be. This is the reason for needing semi-automatic or
After the initial polarities are assigned, possible outliers
automatic dictionary creating methods, since often times the
can be determined. This process is done in the same way as
categories vary between research to suite the needs of the
category outlier correction except that polarity categories are
application.
used instead of emotion categories for features. This means
that the number of features is equal to the number of polarity
Anger (-) Disgust (-) Sadness (-)
categories and that each category has a mean vector generated.
Shame (-) Fear (-) Apathy(-)
The distance between the vector of each sense and the mean
Surprise (?) Anticipation (?) Agnostic (?)
vectors are computed, using the Euclidean distance, and the
Calmness (+) Joy (+) Love (+)
polarity category with the shortest distance is chosen. The
Desire (+) Courage (+) Praise (+)
senses whose initially assigned polarity does not match the
closest polarity category is displayed to the user for possible Fig. 2. The 15 Basic Emotion Categories with Their Polarities
correction.
The proposed algorithm is capable of creating emotion
E. Probability Calculation dictionaries that are in accordance with any theory of emotion.
In this paper a set of basic emotion categories that draw
The final step in creating a dictionary is to assign probability
heavily from the ideas of Parrot [13] have been chosen. These
information. Currently, two probabilities are assigned to each
categories represent one theory of emotion, which we believe
word. The first probability is the probability that the word
to be beneficial to computer related tasks. Figure 2 shows the
carries emotion and is labeled as “emotionProbability” in
15 basic categories and some of their subcategories. The first 6
the dictionary. The second probability is the probability that
basic emotions have a polarity of “-” and the last 6 have a “+”
the word carries the assigned emotion and is labeled as
polarity. Surprise, Anticipation and Agnostic have a specially
“givenEmotionProbability” in the dictionary.
defined polarity value of “?” denoting that the category itself

632
carries no polarity. The words in these categories have their trained, which were called SVM Standard, SVM Standard
polarity decided on an individual basis and possibly change in (Pruned), SVM Dictionary (Polarity) and SVM Dictionary
different context. (Emotion). SVM Standard and SVM Standard (Pruned) did
not use the emotion dictionary for classification. Instead, they
B. Seed Words
used the words found in the training data as binary features,
A total of 549 seed words (1,088 senses) were taken from which is a standard text classification approach. SVM Standard
Parrot’s hierarchy and by looking in the WordNet hierarchy. used all of the words in the training data and SVM Standard
In total 502 of the senses were nouns, 274 were verbs, 287 (Pruned) used the maximum relevance criterion to prune the
were adjectives and 25 were adverbs. words down to only the most important ones.
SVM Dictionary (Polarity) and SVM Dictionary (Emotion)
C. Using the Algorithm
used the emotion dictionary with the user defined polarity
The initial creation step resulted in a dictionary containing and emotion categories as features respectively. The values for
a total of 13,509 senses made up of 9,726 unique words. A the features were calculated using the probability information
total of 364 word senses were assigned multiple emotions found in the emotion dictionary.
and presented for verification/correction during the multiple This was done by looking or emotion words and phrases in
category verification step. Of those, 289 sense/emotion pairs the sentence. Finding the emotion words and phrases was done
were deleted from the dictionary. by examining each word in the sentence starting at the first.
The next step, category outlier correction, discovered 260 To detect phrases, n-grams were extracted with the first word
possible outliers. Of those, 92 were determined to be actual of the n-gram being the currently looked. A 5-gram, 4-gram,
outliers and were corrected. Using the polarities assigned trigram, bigram and the word by itself went through WordNet’s
to the basic emotions, as shown previously, and a simple morphological analysis algorithm and then were looked up in
precedence of − > + > ? polarities were automatically the dictionary. The longest n-gram found in the dictionary was
assigned to the dictionary2 . Then, polarity outlier correction taken as the emotion word/phrase and the process continued
was performed and discovered 115 possible outliers. After starting at the next word after the n-gram. The feature values
manual examination, 43 were determined to be real outliers were then calculated using equation 6 for SVM Dictionary
and were corrected. (Polarity) and equation 7 for SVM Dictionary (Emotion). In
The final step assigned probability information and created the equations, 𝑝𝑓 represents the feature’s polarity category, 𝐸
the final XML formatted dictionary. The final dictionary con- represents the set of of emotion categories, 𝑒𝑓 the feature’s
tained a total of 13,166 senses made up of 9,709 words3 . The emotion category, and 𝑤𝑖 the 𝑖th word/phrase in that carries
total time to create the dictionary was close to 7 person-hours. the feature’s polarity or emotion.
In comparison, the manually tagged word level emotion corpus

used in the evaluation section took close to 1,200 person-hours 𝐹 𝑉 𝑎𝑙𝑢𝑒(𝑝𝑓 ) = 𝑃 (𝐸∣𝑤𝑖 ) (6)
to complete. 𝑤𝑖 ∈𝑝𝑓

VIII. E VALUATION 𝐹 𝑉 𝑎𝑙𝑢𝑒(𝑒𝑓 ) = 𝑃 (𝐸∣𝑤𝑖 ) ∗ 𝑃 (𝑒𝑓 ∣𝑤𝑖 ) (7)
To evaluate if the dictionary, we ran an experiment to see 𝑤𝑖 ∈𝑒𝑓
if we could improve the performance of sentence level affect The non-machine learning dictionary based algorithm de-
and sentiment classification. Typically, one would expect this termines the polarity or emotion simply based on which
to be the case, but poor quality dictionaries can actually hurt one had the largest value. The values for each polarity or
performance [11]. As such, if the dictionary is able to improve emotion were calculated the same as the feature values using
performance it is at least a small indication that the dictionary SVM Dictionary (Polarity) and SVM Dictionary (Emotion)
might be beneficial. With this said, however, state-of-the-art respectively. For sentiment classification only “+” and “-”
classification is not the goal, the evaluation is only to examine polarities were used for determining the sentence’s polarity. If
the possible usefulness of the emotion dictionary. the values of “+” and “-” were equal then the chosen polarity
Classification was done using support vector machines and was “?”.
a non-machine learning dictionary based algorithm. A set Table I shows the average error rate and the standard devia-
of 1,000 English sentences were manually annotated with tion for sentiment classification. What can be seen is that all of
emotion and polarity information by ourselves and all tests the dictionary based approaches greatly reduced the average
were done using 10-fold-cross-validation. The emotion and error over the non-dictionary based approaches. The lowest
polarity of the sentence most often was annotated as the error rate was obtained by SVM Dictionary (Polarity) closely
emotion or polarity of the subject of the sentence. followed by SVM Dictionary (Emotion). The non-machine
SVM𝑚𝑢𝑙𝑡𝑖𝑐𝑙𝑎𝑠𝑠 [8] was used as the support vector ma- learning dictionary based algorithm (Baseline Dictionary) even
chine framework. Four different support vector machines were was able to reduce the average error by almost 6% from SVM
2 where − > + denotes that negative polarity has higher precedence than Standard (Pruned).
positive Table II shows the average error rate and the standard de-
3 The dictionary falls under the GPL and is available upon request.
viation for affect classification. As one would expect, because

633
TABLE I In the future, we hope to expand the dictionary by adding
S ENTIMENT C LASSIFICATION R ESULTS
more seed words to the system. We also are thinking about
Method Average Error Std. Deviation
augmenting the system with some type of machine translation
SVM Standard 32.1% 1.9%
and manual check in order to create a bilingual or multilingual
SVM Standard (Pruned) 31.1% 3.1%
emotion dictionary. Finally, the method was designed for
Baseline Dictionary 25.7% 5.2%
emotion dictionary creation, but the ideas used in the algorithm
SVM Dictionary (Polarity) 18.8% 2.6%
are not limited to this. As such, we also hope to look at
SVM Dictionary (Emotion) 19.0% 2.1%
modifying the method to semi-automatically create domain-
dependent dictionaries.
TABLE II
A FFECT C LASSIFICATION R ESULTS R EFERENCES
[1] D. Bracewell, “Semi-automatic creation of an emotion dictionary using
Method Average Error Std. Deviation wordnet and its evaluation,” in 2008 IEEE Conference on Cybernetics
SVM Standard 62.1% 3.7% and Intelligent Systems,, sep. 2008, pp. 1385 –1389.
[2] Y. H. Cho and K. J. Lee, “Automatic affect recognition using natural
SVM Standard (Pruned) 60.4% 2.1% language processing techniques and manually built affect lexicon,”
Baseline Dictionary 43.6% 0.6% IEICE - Transactions on Information Systems, vol. E89-D, no. 12, pp.
2964–2971, 2006.
SVM Dictionary (Polarity) 89.8% 7.4%
[3] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias,
SVM Dictionary (Emotion) 44.5% 3.0% W. Fellenz, and J. Taylor, “Emotion recognition in human-computer
interaction,” IEEE Signal Processing Magazine, pp. 32–80, 2001.
[4] R. Cowie, B. A. E. Douglas-Cowie and, J. Taylor, A. Romano, and
W. Fellenz, What a neural net needs to know about emotion words.
the number of classes increased, the average error rate for World Scientific & Engineering Society Press, 1999.
affect classification is much higher. The best results were [5] R. J. Dolan, “Emotion, cognition, and behavior,” Science, vol. 298, no. 8,
pp. 1191–1194, November 2002.
obtained from the non-machine learning dictionary based algo- [6] A. Esuli and F. Sebastiani, “Sentiwordnet: A publicly available lexical
rithm (Baseline Dictionary) and SVM Dictionary (Emotion). resource for opinion mining,” in Proceedings of the fifth international
Surprisingly, the baseline dictionary approach, which only conference on Language Resources and Evaluation, 2006, pp. 22–28.
[7] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database. The MIT
picked the emotion with the highest value, achieved the lowest Press, 1998.
average error and the lowest standard deviation. The results [8] T. Joachims, Making large-scale support vector machine learning prac-
show that the dictionary can reduce the average error for tical. Cambridge, MA, USA: MIT Press, 1999.
[9] H. Kanayama and T. Nasukawa, “Fully automatic lexicon expansion
both sentiment and affect classification for this corpus and we for domain-oriented sentiment analysis,” in Proceedings of the 2006
believe that similar results would be found in other copora. Conference on Empirical Methods in Natural Language Processing
The reduction of average error indicates that the dictionary (EMNLP2006), 2006, pp. 355–363.
[10] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a
could be useful in classification and other research. large annotated corpus of english: the penn treebank,” Comput. Linguist.,
In both affect and sentiment classification the dictionary vol. 19, no. 2, pp. 313–330, 1993.
was able to improve performance by decreasing the average [11] P. McNamee and J. Mayfield, “Comparing cross-language query ex-
pansion techniques by degrading translation resources,” in SIGIR ’02:
error over the standard text classification based support vector Proceedings of the 25th annual international ACM SIGIR conference
machines. Further reductions in error are possible with a on Research and development in information retrieval. New York, NY,
more intelligent system that can understand modifiers, such USA: ACM Press, 2002, pp. 159–166.
[12] C. Padgett and G. Cottrell, “Identifying emotion in static face images,”
as “really” and “wholeheartedly,” and negatives like “not.” in Proceedings of the 2nd Joint Symposium on Neural Computation,
vol. 5, 1995, pp. 91–101.
IX. C ONCLUSION AND F UTURE W ORK [13] W. Parrot, Emotions in Social Psychology. Psychology Press, 2001.
This paper examined an algorithm for the semi-automatic [14] D. Roy and A. Pentland, “Automatic spoken affect classification and
analysis,” in FG ’96: Proceedings of the 2nd International Conference
creation emotion dictionaries using WordNet. The dictio- on Automatic Face and Gesture Recognition (FG ’96). Washington,
naries contain emotion, polarity, probability, part-of-speech, DC, USA: IEEE Computer Society, 1996, p. 363.
and sense information. They are useful for both affect and [15] P. Subasic and A. Huettner, “Affect analysis of text using fuzzy semantic
typing,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 483–496,
sentiment analysis and can be used in both shallow surface August 2001.
level and deeper NLP based algorithms. [16] F. Sugimoto, K. Yazu, M. Murakami, and M. Yoneyama, “A method to
Using the proposed algorithm a sample emotion dictionary classify emotional expressions of text and synthesize speech,” in Con-
trol, Communications and Signal Processing, 2004. First International
was created. The emotion theory behind the dictionary en- Symposium on, 2004, pp. 611–614.
compassed 15 different basic emotion categories. The final
dictionary contained 13,166 word senses made up of 9,709
unique words.
The dictionary was evaluated by examining its usefulness
in affect and sentiment classification. Classification using
the dictionary, even non-machine learning based approaches,
outperformed the standard method greatly. This leads us to
believe that the dictionary might indeed be useful in future
research.

634

S-ar putea să vă placă și