Sunteți pe pagina 1din 6

Opinion Mining in MapReduce Framework

Kyung Soo Cho, Ji Yeon Lim, Jae Yeol Yoon, Young Hee Kim,
Seung Kwan Kim, and Ung Mo Kim
School of Information and Communication Engineering SungKyunKwan University,
2nd Engineering Building 27039 CheonCheon-Dong, JangAn-Gu,
Suwon 440-746, Republic of Korea
kisschks@hotmail.com, {01039374479,vntlffl}@naver.com,
younghees@gmail.com, libertas@korea.kr, umkim@ece.skku.ac.kr

Abstract. Presently, many researching fields are crossed and mashed up to each
fields, however, some of computer science fields cannot be solved by technique
only. Opinion mining sometimes needs a solution from other fields, too. For
example, we use a method from psychology to gain information from text about
users. Likewise, we suggested a new method of opinion mining which is using
MapReduce before, and this method also uses a WordMap which is dictionarylike. WordMap just has information of category and value of word. If we use a
novel method of Opinion mining, it could be mining opinion from web more
powerful than before. Therefore, for stronger opinion mining, we suggest a
framework of Opinion mining in MapReduce.
Keywords: Framework, Opinion mining, WordMap, POS tagging, MapReduce.

1 Introduction
Opinion mining and Semantic web techniques are fascinating domain of searching
engine. Between them, Opinion Mining is one of the mining techniques, extracts
estimation from the internet, analyzes it, and puts out the results. These results are
usable and useful in many areas like marketing or product reviews. Nonetheless,
current methods are inefficient and use time too much for huge data because they run
on a single node to process. To settle this problematic, cloud computing, which is the
center of attention for next computing environment, is appropriate. MapReduce,
which is one of cloud computing methods, already be used in Google file system.
Therefore, this paper suggests Opinion Mining in MapReduce framework to this
novel trial for designing under a cloud computing environment, and we look forward
to the framework showing performance moderately. This framework is able to be
utilized when a developer who has wanted for some object and expectations about
performance makes Opinion mining tools in MapReduce.
This paper is composed as follows: in the section 2, we explain a technique of
opinion mining and existing representative research which has relation with the
framework. In the section 3, we present the framework of opinion mining in
MapReduce function. In the section 4, we finish the paper with a conclusion and our
future work.
C. Lee et al. (Eds.): STA 2011 Workshops, CCIS 187, pp. 5055, 2011.
Springer-Verlag Berlin Heidelberg 2011

Opinion Mining in MapReduce Framework

51

2 Related Work
2.1 Opinion Mining Methods
Opinion mining study has been gradually growing since the late 90s. Known as
sentiment classification, Opinion Mining focuses not on the topic, but a users mental
attitude that topic. In late years, opinion mining has been applied to product reviews,
or other commercial things. [1] WY. Kim and others suggest a method for opinion
mining of product reviews using association rules. [2] Opinion mining field also
includes featured-based opinion mining, summarization, comparative sentence,
relation mining, opinion searching, opinion spamming, and the linguistic resource
defining & constructing. [3] [4]
In a case of sentiment classification, reading text and analyzing make a result like
<word | value>, and <word | value> is similarly to MapReduces [5] data structure. So
sentiment classification has a lot possibility of well-matching within MapReduce. In
addition, some rules for analysis, which is like the POS tagging technique [6] [7], or
dictionary information are usable, too.

Fig. 1. Example of sentiment classification

Figure 1 shows sentiment classification. Sentiment classification is simple concept.


It selects the sentiment of a portion of a document set Positive or Negative. If a blog
user write Hyundai is good, it will calculate and make a result positive on Hyundai.
A topic associated word is realized as important in a technique of Topic-based
classification; however, it is as insignificant in sentiment classification. The late
research on sentiment classification is mainly performed in a document level, which
can find a detailed attribute. Sentence-level studies are also being done. B Pang, and L
Lee introduce the ways of discovering sentiment in mining. [8] SM Kim and E Hovy
suggest the way of recognizing opinion and sentiment of each opinion in a given topic
using sentiment classification of sentence level, [9] It describes that opinions are
categorized by the technique of POS tagging(Part-of-speech tagging). Some
academics have focused on comments with emoticons. For instance, Potthast and S.
Becker give a method of opinion summarization of web comments [10], and J. Read
introduces a relationship between emoticons and sentiment classification, and

52

K.S. Cho et al.

recognizes an emoticon-trained classifiers. [11] Opinion holder means a person or a


group who makes an opinion in analyzed resource. A considering of opinion holder is
central in opinion mining. Thus, SM Kim, and E Hovy submit a paper of a technique
of mining opinions generated by an opinion holder on topics in online news media
texts also. [12] Along with sentiment classification research, methods of weight for
sentiment information also have studied. We suggest reader weight and method of
using LIWC[13], before.
2.2 MapReduce
Google suggest MapReduce for analyzing large data, and they use it in their BigTable
[8]. MapReduce is very simple and strong for huge size data like terabytes or
petabytes, and it is able to customize to each systems efficiently. For these reason,
MapReduce is paid attention from many researchers.

Fig. 2. MapReduce

Figure 2 shows how MapReduce implements. A master node controls all of


Worker nodes which are called by Map or Reduce nodes, and Map nodes make
intermediate data which structure is <key, value>, and Reduce nodes collect
intermediate data and transform <key, value> to <key, listed value>. It is just general
fact, and it will change for each systems.
Some researchers in field of mining have attention to this function. They consider
that it will make methods of mining stronger. Kelvin Cardona, Jimmy Secretan,
Michael Georgiopoulos and Georgios Anagnostopoulos suggest a grid based system
for data mining using MapReduce. [15] In addition, Bayir, M.A, Toroslu, I.H, Cosar,
A, and Fidan, G suggest a smart miner: a new framework for mining large scale web
usage data. [16] They suggest novel methods using data mining and MapReduce. Xia,
T suggest SMS mining with MapReduce which is Large-scale sms messages mining
based on map-reduce in a Computational Intelligence and Design. [17] In their
thesis, performance of mining methods improves because of MapReduce. This mean
is MapReduce is suitable for Mining technique, and it is also adoptable to opinion
mining. We suggest a method of Using WordMap and Score-based weight in opinion
mining with Mapreduce before. [18]

Opinion Mining in MapReduce Framework

53

3 Opinion Mining in MapReduce Framework


3.1 WordMap
A paper of Using WordMap and Score-based weight in opinion mining with
Mapreduce gives a structure of WordMap. It is multidimensional indexing, dictionary
data, and usable to any systems flexibly. It is possible that a developer uses this
concept on his system, and changes its element like mean or value of words. Also, it
can use additional weight policy. For example, LIWC or leader weight can use in
connected WordMap. WordMap is able to choose two ways, and the first is that the
WordMap includes supplementary weight information, and the second is linking
weight information externally. Including weight information is faster than an external
way, however, adding weight information to completed WordMap will require a lot of
time and data space.
3.2 RuleBox
RuleBox is a part to classify sentiment and analyze in the opinion mining in
MapReduce framework. It can connected additional component like as POS tagging
technique. It defines a sentence or document with using WordMap part. A system
developer is always able to choose and customize reasonable rules in RuleBox for
appropriate his systems; however, RuleBox must have one or more rules like a natural
language processing method. The POS tagging, we mentioned, is a representative
natural language processing method.
3.3 Framework

Fig. 3. Opinion mining in MapReduce framework

Figure 3 shows Opinion mining in MapReduce framework. This framework has


three parts: MapReduce, WordMap, and RuleBox. The MapReduce part is membrane
of the framework, the RuleBox part is brain, and the WordMap part is resource for the
framework.
The WordMap and RuleBox influence to accuracy of opinion mining, and
MapReduce improves time performance of the framework. [16][18] The Opinion
mining in MapReduce framework can use for searching engines and it will make

54

K.S. Cho et al.

searching results wealthier. Also it is able to use strong marketing analyzing tools in
companies for collecting their product reviews, and government is able to utilize this
framework for their information gathering and analysis. For example, In case of
America, Google and CIA make co-financing investment company which called
recorded futures. This company uses mining methods with a technique of huge data
processing. This fact is issued in several newspapers.

4 Conclusion and Future Work


We suggest an opinion mining in MapReduce framwork. It is novel method of
opinion mining technique and using MapReduce. This framework is useful to
someone who wants to develop opinion mining in MapReduce, however, it is
unsuitable for small size data because the construction of WordMap part spends a lot
of time cost, and it is inefficient that several nodes analyze small size data.
Nonetheless, it is powerful for large scale data, and has a strong point of flexibility.
Today, many companies want to know opinion of their products in the internet,
therefore, opinion mining which analyze huge resource is interesting research topic.
Next task is to improve performance and accuracy of opinion mining technique in
MapReduce.
Acknowledgments. This work was supported by the Korea Science and Engineering
Foundation (KOSEF) grant funded by the Korea government (MEST) (No. 20090075771).

References
1. Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. In: Proceedings of the 11th
International Conference on Artificial Intelligence and Law, pp. 231236. ACM, New
York (2007)
2. Kim, W.Y., Ryu, J.S., Kim, K.I., Kim, U.M.: A Method for Opinion Mining of Product
Reviews using Association Rules. In: Proceedings of the 2nd International Conference on
Interaction Sciences: Information Technology, Culture and Human (ICIS 2009), Seoul,
Korea, November 24-26, pp. 270274 (2009)
3. Esuli, A., Sebastiani, F.: SENTIWORDNET: A Publicly Available Lexical Resource for
Opinion Mining. In: Proceedings of the 5th Conference on Language Resources and
Evaluation (LREC 2006), Citeseer (2006)
4. Esuli, A., Sebastiani, F.: PageRanking WordNet synsets: An application to opinion
mining. In: Proceedings of the 45th Annual Meeting of the Association for Computational
Linguistics (ACL 2007), Citeseer (2007)
5. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters.
Communications of the ACM 51(1), 107113 (2008)
6. Stanford Tagger Version 1.6 (2008), http://www.nlp.staford.edu/software/tagger.shtml
7. Stanford Parser Version 1.6 (2008), http://nlp.stanford.edu/software/lex-parser.shtml
8. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval 2(1-2), 1135 (2008)
9. Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th
International Conference on Computational Linguistics (2004)

Opinion Mining in MapReduce Framework

55

10. Potthast, M., Becker, S.: Opinion Summarization of Web Comments. Advances in
Information Retrieval, 668669 (2010)
11. Read, J.: Using emoticons to reduce dependency in machine learning techniques for
sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43
48. Association for Computational Linguistics (2005)
12. Kim, S.M., Hovy, E.: Extracting opinions, opinion holders, and topics expressed in online
news media text. In: Proceedings of ACL/COLING Workshop on Sentiment and
Subjectivity in Text, Sydney, Australia (2006)
13. Cho, K.S., Ryu, J.S., Jeong, J.H., Kim, Y.H., Kim, U.M.: Credibility Evaluation and
Results with Leader Weight in Opinion Mining. In: The 2nd International Conference on
Cyber-Enabled Distributed Computing and Knowledge Discovery, Huangshan, China,
October 10-12 (2010)
14. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra,
T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data.
ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)
15. Cardona, K., Secretan, J., Georgiopoulos, M., Anagnostopoulos, G.: A grid based system
for data mining using MapReduce. Technical Report TR-2007-02, AMALTHEA (2007)
16. Bayir, M.A., Toroslu, I.H., Cosar, A., Fidan, G.: Smart miner: a new framework for
mining large scale web usage data. In: Proceedings of the 18th International Conference
on World Wide Web, pp. 161170. ACM, New York (2009)
17. Xia, T.: Large-scale sms messages mining based on map-reduce. In: International
Symposium on Computational Intelligence and Design, ISCID2008, pp. 712. IEEE, Los
Alamitos (2008)
18. Cho, K.S., Jung, N.R., Kim, U.M.: Using WordMap and Score-based Weight in Opinion
mining with MapReduce. In: IEEE International Conference on Service-Oriented
Computing and Applications (2010)

S-ar putea să vă placă și