Documente Academic
Documente Profesional
Documente Cultură
Abstract
Data mining techniques provide researchers and practitioners the tools needed to analyze large,
complex, and frequently changing social media data. This chapter introduces the basics of data mining,
reviews social media, discusses how to mine social media data, and highlights some illustrative
examples with an emphasis on social networking site sand blogs.
Keywords: Data mining, Social media, Data representation, Social computing, Social networks, Social
networking sites, Blogs, Blogosphere, Event maps
Introduction
Data mining, as a young field, has been spearheading research and development of methods
and algorithms handling huge amounts of data in solving real-world problems. Much like
traditional miners extract precious metals from earth and ore, data miners seek to extract
meaningful information from a data set that is not readily apparent and not always easily
obtainable. With the ubiquitous use of social media via the internet, an unprecedented amount
of data is available and of interest to many fields of study including sociology, business,
psychology, entertainment, politics, news, and other cultural aspects of societies. Applying data
mining to social media can yield interesting perspectives on human behaviour and human
interaction. Datamining can be used.
In conjunction with social media to better understand the opinions people have about a subject,
identify groups of people amongst the masses of a population, study group changes over time,
and in Àuential people, or even recommend a product or activity to an individual. The elections
during 2008 marked an unprecedented use of social media in a United States presidential
campaign. Social media sites including YouTube and Facebook played a significant role in
raising funds and getting candidates’ messages to voters . Researchers at the Massachusetts
Institute of Technology, Center for Collective Intelligence, mined blog data to show
correlations between the amount of social media used by candidates and the winner of the 2008
presidential campaign. This powerful example underscores the potential for data mining social
media data to predict outcomes at a national level. Data mining social media can also yield
personal and corporate benefits. In an other example, researchers developed a Group
Recommendation System (GRS) for Facebook users using hierarchical clustering and decision
2
tree data mining methods . The GRS matches users, based on their Facebook profiles, with
Facebook groups the users are likely to join by applying data mining methods to Facebook
groups and their members. Applying data mining techniques to social media data has gained
increasing attention with the significant rise of online social media in recent years. Social media
data have three characteristics that pose challenges for researchers: the data are large, noisy,
and dynamic. In order to overcome these challenges, data mining techniques are used by
researchers to reveal insights into social media data that would not be possible otherwise. This
chapter introduces the basics of data mining, reviews social media, discusses how to mine
social media data, and highlights some illustrative examples, paving the way for addressing
research issues and exploring novel data mining applications.
Social Media
We start describing social media beginning with a definition produced from a social media
source, Wikipedia. It defines social media as follows: “media designed to be disseminated
through social interaction, created using highly accessible and scalable publishing techniques.
Social media uses Internet and web-based technologies to transform broadcast media
monologues (one to many)into social media dialogues(many to many). It supports the
democratization of knowledge and information, transforming people from content consumers
into content producers.” In Kaplan and Haenle in define Social media as: “a group of Internet-
based applications that build on the ideological and technological foundations of Web2.0,and
that all own the creation and exchange of User Generated Content.” Mining social media is one
type of social computing. Social computing is “any type of computing application in which
software serves as an intermediary or a focus for a social relation”. Social computing includes
applications used for interpersonal communication as well as applications and research
activities related to “computational social studies " or “social behaviour ”.
Data Representation
Similar to other social network data, it is common to use a graph representation to study social
media data sets. A graph consists of a set containing vertexes(nodes) and edges(links).
Individuals are typically represented as the nodes in the graph. Relationships or associations
between individuals (nodes) are represented as the links in the graph. The graph representation
is natural for data extracted from social networking sites where individuals create a social
network of friends, classmates, or business associates. Less apparent is how the graph structure
is applied to blogs, wikis, opinion mining, and similar types of online social media.
The graph representation enables the application of classic mathematical graph theory,
traditional social network analysis methods, and work on mining graph data. However, the
potentially large size of a graph used to represent social media can present challenges for
automated processing as limits on computer memory and processing speeds are maximized and
often surpassed when trying to deal with large social media data sets. Other challenges to
applying automated processes to enable data mining in social media include identifying and
dealing with spam, the variety of formats used in the same social media subcategory, and
constantly changing content and structure.
4
sites is based on analyzing the structure of the network and finding individuals that associate
more with each other than with other users.
Group detection can also yields interesting perspectives about the social networking site itself,
such as how many different groups are using the social networking site.
Recommendation systems - A recommendation system analyzes social networking data and
recommends new friends or new groups to a user. The ability to recommend group membership
to an individual is advantageous for a group that would like to have additional members and
can be helpful to an individual who is looking to find other individuals or a group of people
with similar interests or goals. Again, large numbers of individuals and groups make this an
almost impossible task without an automated system. Additionally, group characteristics
change overtime. For those reasons, data mining algorithms drive the inherent
recommendations made to users. From the moment a user profile is entered into a social
networking site, the site provides suggestions to expand the user’s social network. Much of the
appeal of social networking sites is a direct result of the automated recommendations which
allow a user to rapidly create and expand a non line social network with relatively little efforton
theuser’s part.
Social networking sites have been widely adopted and contain a variety of interesting data on
scale that is unprecedented to previous social network data sets. Some users are using social
networking sites for regular interpersonal communication while abandoning more traditional
communication mechanisms such as e-mail. The mass migration to, and continued use of,
social networking sites is creating an almost innumerable amount of data that can only be
analyzed practically using data mining techniques.
The Blogosphere: Illustrative Examples
Web logs or “blogs” are user published journals available on the web18. Blogs entries, known
as posts, cover a variety of subjects from personal logs to professional journalism reporting on
current events. In some cases blogs are considered more accurate than tradition alone-to-many
news media sources. Blogs are typically open to the public and provide a mechanism for readers
to comment on the specific post. These to fall blogs and blog posts is referred to as the
blogosphere. Earlier in Data Representation, we describe two common graph structure
representations used to represent blog networks, blog networks and post networks.
Lakshamanan and Oberhofer highlight clustering, matrix factorization, and ranking as the three
most commonly used techniques for data mining in the blogosphere. Applying data mining
technologies to analyze blogs and blog posts is pursued for a variety of purposes.
Blog classification - A straightforward use of data mining related to the blogosphere is the
automated classification of blogs themselves. The ability to automatically organize blogs by to
picaids blog search applications and results canal so help focus other blog-related social
network analytic purposes in one area of the blogosphere. With thousands upon thousands of
blogs available to choose from, it is not practical to try to categorize blog sites manually.
Identifying inÀuential nodes - “A blogger is inÀuential if he/she has the capacity to affect
the behaviour of fellow bloggers.” Understanding how information is disseminated through the
blogosphere can provide interesting insights for businesses or any other entity seeking to spread
information about a product, service, or topic as fast as possible. The benefit of being able to
identify in Àuential bloggers, blogs, or blogposts is that marketing efforts for goods and
6
services could be focussed on points of in Àuence that are most likely to gain support for atopic,
product, or message.
Topic detection and change - Like other online social medium data, blog content changes
over time. New posts are added, new topics are discussed, opinions change, and new
communities develop and mature. Understanding what topics are popular in the blogosphere
can provide insights into product sales, political views, and future social attention areas.
However, new topics are not easily detected amongst the vast amount of vblog posts.
Additionally, blog sites are updated daily with new information and topics that were popular a
few days ago may superseded by a new topic. Applying data mining techniques to blogs can
help detect topic trends and changes.
Sentiment analysis - How people feel about a topic (e.g., their sentiment) can be just as
important as identifying the topic itself. Blogs can be classified into categories, in Àuential
blogs can be highlighted, and new topics can be detected. It is also possible to ascertain
opinions, or sentiment, from the blogosphere using data mining techniques. This is not an easy
task as language is filled with ambiguities and there are many different opinions. However,
some interesting work has been done and progress is being made in this area
Conclusions
There exist sample opportunities for collaboration between computer scientists, social
scientists, and other interested disciplines to use data mining technologies and techniques to
reveal patterns in online social media data that would not otherwise be visible. However, there
are some challenges that need to be addressed. As scientists seek to conduct new research to
advance data mining in social media, open sources of social media data for researchers would
enable researchers to validate published work. Tools and and policies need to be developed to
ensure privacy integrity will be maintained regardless of how the data is aggregated and
analyzed. Protecting privacy is likely to remain a challenge for data mining in social media.
Researchersarealsoworkingtoaddressthechallengesassociatedwithlarge social media data,
changing network structure etc. Despite these challenges, the power of online social media
evidenced by its impact on national elections, business and marketing, and society itself will
continue to be significant motivating factor for mining online social media data as valuable
information source for gaining a deeper understanding about people.
References
1. https://www.researchgate.net/publication/304401202_Data_Mining_Techniques_in_S
ocial_Media_A_Survey
2. https://www.researchgate.net/publication/226859517_Social_Network_Data_Mining_
Research_Questions_Techniques_and_Applications
3. https://www.ijera.com/papers/Vol7_issue4/Part-2/I0704024652.pdf
4. https://ijarcce.com/wp-content/uploads/2016/11/IJARCCE-ICRITCSA-10.pdf
5. https://pdfs.semanticscholar.org/8a60/b082aa758c317e9677beed7e7776acde5e4c.pdf
6. https://www.ijser.org/researchpaper/Data-Mining-in-Social-Media.pdf
7. https://www.ijcsmc.com/docs/papers/April2017/V6I4201734.pdf
8. https://arxiv.org/vc/arxiv/papers/1312/1312.4617v1.pdf
9. https://globaljournals.org/GJCST_Volume16/3-Social-Media-Analytics.pdf
10. https://www.tandfonline.com/doi/abs/10.2753/JEC1086-
4415150301?journalCode=mjec20
11. https://ieeexplore.ieee.org/abstract/document/7244976
12. https://www.expertsystem.com/social-media-data-mining/
13. https://www.cs.purdue.edu/homes/neville/papers/jensen-neville-nas2002.pdf
14. https://journals.sagepub.com/doi/full/10.1177/1747016117738559
15. https://archive.siam.org/meetings/sdm08/TS1.pdf