Sunteți pe pagina 1din 7

1

DATA MINING IN SOCIAL MEDIA

Neelam kumari B, 2nd semester MCA,Garden city university.


neelam181286@gcu.ac.in
Dr. Ashok Kumar T A, Garden city university.

Abstract
Data mining techniques provide researchers and practitioners the tools needed to analyze large,
complex, and frequently changing social media data. This chapter introduces the basics of data mining,
reviews social media, discusses how to mine social media data, and highlights some illustrative
examples with an emphasis on social networking site sand blogs.
Keywords: Data mining, Social media, Data representation, Social computing, Social networks, Social
networking sites, Blogs, Blogosphere, Event maps

Introduction
Data mining, as a young field, has been spearheading research and development of methods
and algorithms handling huge amounts of data in solving real-world problems. Much like
traditional miners extract precious metals from earth and ore, data miners seek to extract
meaningful information from a data set that is not readily apparent and not always easily
obtainable. With the ubiquitous use of social media via the internet, an unprecedented amount
of data is available and of interest to many fields of study including sociology, business,
psychology, entertainment, politics, news, and other cultural aspects of societies. Applying data
mining to social media can yield interesting perspectives on human behaviour and human
interaction. Datamining can be used.
In conjunction with social media to better understand the opinions people have about a subject,
identify groups of people amongst the masses of a population, study group changes over time,
and in Àuential people, or even recommend a product or activity to an individual. The elections
during 2008 marked an unprecedented use of social media in a United States presidential
campaign. Social media sites including YouTube and Facebook played a significant role in
raising funds and getting candidates’ messages to voters . Researchers at the Massachusetts
Institute of Technology, Center for Collective Intelligence, mined blog data to show
correlations between the amount of social media used by candidates and the winner of the 2008
presidential campaign. This powerful example underscores the potential for data mining social
media data to predict outcomes at a national level. Data mining social media can also yield
personal and corporate benefits. In an other example, researchers developed a Group
Recommendation System (GRS) for Facebook users using hierarchical clustering and decision
2

tree data mining methods . The GRS matches users, based on their Facebook profiles, with
Facebook groups the users are likely to join by applying data mining methods to Facebook
groups and their members. Applying data mining techniques to social media data has gained
increasing attention with the significant rise of online social media in recent years. Social media
data have three characteristics that pose challenges for researchers: the data are large, noisy,
and dynamic. In order to overcome these challenges, data mining techniques are used by
researchers to reveal insights into social media data that would not be possible otherwise. This
chapter introduces the basics of data mining, reviews social media, discusses how to mine
social media data, and highlights some illustrative examples, paving the way for addressing
research issues and exploring novel data mining applications.

Data Mining in a Nutshell


One definition of data mining is identifying novel and actionable patterns in data. Data mining
is also known as Knowledge Discovery from Data(KDD) or Knowledge Discovery in
Databases, also abbreviated as KDD . Data mining is related to machine learning, information
retrieval, statistics, databases, and even data visualization. One formal definition for data
mining is found in Princeton University’s WordNet3 where data mining is defined as: “data
processing using sophisticated data search capabilities and statistical algorithms to discover
patterns and correlations in large pre-existing databases; away to discover new meaning in
data”.

Social Media
We start describing social media beginning with a definition produced from a social media
source, Wikipedia. It defines social media as follows: “media designed to be disseminated
through social interaction, created using highly accessible and scalable publishing techniques.
Social media uses Internet and web-based technologies to transform broadcast media
monologues (one to many)into social media dialogues(many to many). It supports the
democratization of knowledge and information, transforming people from content consumers
into content producers.” In Kaplan and Haenle in define Social media as: “a group of Internet-
based applications that build on the ideological and technological foundations of Web2.0,and
that all own the creation and exchange of User Generated Content.” Mining social media is one
type of social computing. Social computing is “any type of computing application in which
software serves as an intermediary or a focus for a social relation”. Social computing includes
applications used for interpersonal communication as well as applications and research
activities related to “computational social studies " or “social behaviour ”.

Motivations for Data Mining in Social Media


The data available via social media can give us in sights into social networks and societies that
were not previously possible in both scale and extent. This digital media can transcend the
physical world boundaries to study human relationships and help measure popular social and
political sentiment attributed to regional populations without explicit surveys. Social media
effectively records viral marketing trend sand is the ideal source to study to better understand
3

and leverage in Àuence mechanisms. However, it is extremely difficult to gain useful


information from social media data without applying data mining technologies due to unique
challenges.
Data mining can help researchers and practitioners overcome these challenges. Applying data
mining techniques to large social media data sets has the potential to continue to improve search
results for everyday search engines, realize specialized target marketing for businesses, help
psychologist study behavior, provide new insights into social structure for sociologists,
personalize web services for consumers, and even help detect and prevent spam for all of us.
Additionally, the open access to data provides researches with unprecedented amounts of
information to improve performance and optimize data mining techniques. The advancement
of the data mining field itself relies on large data set sand social media is an ideal data source
in the frontier of data mining for developing and testing new data mining techniques for
academic and corporate data mining researchers.

Data Mining Methods for Social Media


Applying data mining methods to social media is relatively new compared to other areas of
study related to social network analytics when you consider the work in social network analysis
that dates back to the 1930s . However, applications that apply data mining techniques
developed by industry and academia are already being used commercially. For example,
Samepoint15, a “Social Media Analytics” company, provides services to mine and monitor
social media to provide clients information about how goods and services perceived and
discussed through social media. Researchers in other organizations have applied text mining
algorithms and disease propagation models to blogs to develop approaches for better
understanding how information moves through the blogosphere.

Data Representation
Similar to other social network data, it is common to use a graph representation to study social
media data sets. A graph consists of a set containing vertexes(nodes) and edges(links).
Individuals are typically represented as the nodes in the graph. Relationships or associations
between individuals (nodes) are represented as the links in the graph. The graph representation
is natural for data extracted from social networking sites where individuals create a social
network of friends, classmates, or business associates. Less apparent is how the graph structure
is applied to blogs, wikis, opinion mining, and similar types of online social media.
The graph representation enables the application of classic mathematical graph theory,
traditional social network analysis methods, and work on mining graph data. However, the
potentially large size of a graph used to represent social media can present challenges for
automated processing as limits on computer memory and processing speeds are maximized and
often surpassed when trying to deal with large social media data sets. Other challenges to
applying automated processes to enable data mining in social media include identifying and
dealing with spam, the variety of formats used in the same social media subcategory, and
constantly changing content and structure.
4

Data Mining A Process


No matter what type of social media under study, there are a few basic items that are important
to consider to ensure that the most meaningful results are possible. Each type of social media
and each data mining purpose applied to social media may require a unique approaches and
algorithms to produce a data mining benefit. Different data sets and data questions require
different types of tools. If it is known how the data should be organized, a classification tool
might be appropriate. If you understand what the data is about but cannot ascertain trends and
patterns in the data, a clustering tool may be best. The problem itself may determine the best
approach. There is no substitute for understanding the data as much as possible before applying
data mining techniques, and second, understanding the different data mining tools that are
available. For the former, subject matter experts might be needed to help better understand the
data set. To better understand the different data mining tools available there are a host of data
mining and machine learning texts and resources that are available to provide very detailed
information about a variety of specific data mining algorithms and techniques.
Some social media sites such as Technorati, Facebook, and Twitter provide Application
Programmer Interfaces(APIs) which allow crawler applications to directly interface with the
data sources. However, these sites usually limit the number of API transactions per day
depending on the affiliation the API user has with the site. For some sites, it is possible to
collect data (crawl) without using APIs. Given the vast size of the social media data available,
it may be necessary to limit the amount of data the crawler collects. Once the crawler has
collected the data, some postprocessing might be needed to validate and clean up the data.
Traditional social network analysis techniques can be applied such as centrality measures and
group structure studies. In many cases, additional data will also be associated with a node or a
link opening opportunities for more sophisticated methods to consider the deeper semantics
that can be brought to light with text and data mining techniques. We now focus on two specific
types of social media data in order to further illustrate how data mining techniques are applied
to social media. The two areas are Social Networking Sites and Blogs. Both these areas are
characterized by dynamic and rich data sources. Both areas offer potential value to the broader
scientific community as well as businesses.
Social Networking Sites: Illustrative Examples A social networking site like Facebook or
LinkedIn consists of connected users with unique profiles. Users can link to friends and
colleagues and can share news, photos, videos, favourite links etc. Users customize their
profiles depending on individual preferences but some common information might include
relationship status, birthday, an e-mail address, and hometown. Users have options to decide
how much information they include in their profile and who has access to it. The amount of
information available via a social networking site has raised privacy concerns and is a related
societal issue.
The driving factors for data mining social networking sites is the “unique opportunity to
understand the impact of a person’s position in the network one very thing from their tastes to
their moods to their health.” The most common data mining applications related to social
networking sites include.
Group detection - One of the most popular applications of data mining to social networking
sites is finding and identifying a group. In general, group detection applied to social networking
5

sites is based on analyzing the structure of the network and finding individuals that associate
more with each other than with other users.
Group detection can also yields interesting perspectives about the social networking site itself,
such as how many different groups are using the social networking site.
Recommendation systems - A recommendation system analyzes social networking data and
recommends new friends or new groups to a user. The ability to recommend group membership
to an individual is advantageous for a group that would like to have additional members and
can be helpful to an individual who is looking to find other individuals or a group of people
with similar interests or goals. Again, large numbers of individuals and groups make this an
almost impossible task without an automated system. Additionally, group characteristics
change overtime. For those reasons, data mining algorithms drive the inherent
recommendations made to users. From the moment a user profile is entered into a social
networking site, the site provides suggestions to expand the user’s social network. Much of the
appeal of social networking sites is a direct result of the automated recommendations which
allow a user to rapidly create and expand a non line social network with relatively little efforton
theuser’s part.
Social networking sites have been widely adopted and contain a variety of interesting data on
scale that is unprecedented to previous social network data sets. Some users are using social
networking sites for regular interpersonal communication while abandoning more traditional
communication mechanisms such as e-mail. The mass migration to, and continued use of,
social networking sites is creating an almost innumerable amount of data that can only be
analyzed practically using data mining techniques.
The Blogosphere: Illustrative Examples
Web logs or “blogs” are user published journals available on the web18. Blogs entries, known
as posts, cover a variety of subjects from personal logs to professional journalism reporting on
current events. In some cases blogs are considered more accurate than tradition alone-to-many
news media sources. Blogs are typically open to the public and provide a mechanism for readers
to comment on the specific post. These to fall blogs and blog posts is referred to as the
blogosphere. Earlier in Data Representation, we describe two common graph structure
representations used to represent blog networks, blog networks and post networks.
Lakshamanan and Oberhofer highlight clustering, matrix factorization, and ranking as the three
most commonly used techniques for data mining in the blogosphere. Applying data mining
technologies to analyze blogs and blog posts is pursued for a variety of purposes.
Blog classification - A straightforward use of data mining related to the blogosphere is the
automated classification of blogs themselves. The ability to automatically organize blogs by to
picaids blog search applications and results canal so help focus other blog-related social
network analytic purposes in one area of the blogosphere. With thousands upon thousands of
blogs available to choose from, it is not practical to try to categorize blog sites manually.
Identifying inÀuential nodes - “A blogger is inÀuential if he/she has the capacity to affect
the behaviour of fellow bloggers.” Understanding how information is disseminated through the
blogosphere can provide interesting insights for businesses or any other entity seeking to spread
information about a product, service, or topic as fast as possible. The benefit of being able to
identify in Àuential bloggers, blogs, or blogposts is that marketing efforts for goods and
6

services could be focussed on points of in Àuence that are most likely to gain support for atopic,
product, or message.
Topic detection and change - Like other online social medium data, blog content changes
over time. New posts are added, new topics are discussed, opinions change, and new
communities develop and mature. Understanding what topics are popular in the blogosphere
can provide insights into product sales, political views, and future social attention areas.
However, new topics are not easily detected amongst the vast amount of vblog posts.
Additionally, blog sites are updated daily with new information and topics that were popular a
few days ago may superseded by a new topic. Applying data mining techniques to blogs can
help detect topic trends and changes.
Sentiment analysis - How people feel about a topic (e.g., their sentiment) can be just as
important as identifying the topic itself. Blogs can be classified into categories, in Àuential
blogs can be highlighted, and new topics can be detected. It is also possible to ascertain
opinions, or sentiment, from the blogosphere using data mining techniques. This is not an easy
task as language is filled with ambiguities and there are many different opinions. However,
some interesting work has been done and progress is being made in this area

Ethnography and Netnography


Princeton’s wordnet defines ethnography as “anthropology that provides scientific description
of individual human societies.” Kozinets coined the word “netnography” to describe
conducting online market research based on ethnography. Kansas State University offers a
graduate level Digital Ethnography class based on the work of Professor Michael Wesch. Dr.
Wesch and his students are studying how digital culture is evolving on YouTube. Specifically,
digital ethnography can provide a better understanding of the context of social media data and
the implications for present and future data mining applications. For example, in 2008, the
group at Kans as State University reported that over 50% of You Tube users are between 18-
24. This type of information can further inform data mining results of detecting a new
community, topic detection on via a specific social medium or even a better understanding of
what data might be available from a particular social media site in the future as culture, use,
and demographics change over time. Ethnography can also help inform thinking about
important data mining issues such as authenticity. Whether or not users are honest with what
they include or contribute to online social media is an issue that data miners should consider
when analyzing large data sets derived from online social media sources. The netnography
methodology is applied to study cultures and communities based on computer-mediated
communications. Researchers have applied netnography to study several areas including
cosmetic surgery and coffee consumers [10, 38]. The approach to netnography is founded in
direct observations but it is conceivable that data mining techniques could be applied to
netnography studies. Both digital Ethnography and Netnography can help provide useful in
sights to data miners exploring social media. Ultimately applying data mining to social media
is about understanding data about people online which is at the heart of digital ethnogoraphy
and netnography research.
7

Conclusions
There exist sample opportunities for collaboration between computer scientists, social
scientists, and other interested disciplines to use data mining technologies and techniques to
reveal patterns in online social media data that would not otherwise be visible. However, there
are some challenges that need to be addressed. As scientists seek to conduct new research to
advance data mining in social media, open sources of social media data for researchers would
enable researchers to validate published work. Tools and and policies need to be developed to
ensure privacy integrity will be maintained regardless of how the data is aggregated and
analyzed. Protecting privacy is likely to remain a challenge for data mining in social media.
Researchersarealsoworkingtoaddressthechallengesassociatedwithlarge social media data,
changing network structure etc. Despite these challenges, the power of online social media
evidenced by its impact on national elections, business and marketing, and society itself will
continue to be significant motivating factor for mining online social media data as valuable
information source for gaining a deeper understanding about people.

References
1. https://www.researchgate.net/publication/304401202_Data_Mining_Techniques_in_S
ocial_Media_A_Survey
2. https://www.researchgate.net/publication/226859517_Social_Network_Data_Mining_
Research_Questions_Techniques_and_Applications
3. https://www.ijera.com/papers/Vol7_issue4/Part-2/I0704024652.pdf
4. https://ijarcce.com/wp-content/uploads/2016/11/IJARCCE-ICRITCSA-10.pdf
5. https://pdfs.semanticscholar.org/8a60/b082aa758c317e9677beed7e7776acde5e4c.pdf
6. https://www.ijser.org/researchpaper/Data-Mining-in-Social-Media.pdf
7. https://www.ijcsmc.com/docs/papers/April2017/V6I4201734.pdf
8. https://arxiv.org/vc/arxiv/papers/1312/1312.4617v1.pdf
9. https://globaljournals.org/GJCST_Volume16/3-Social-Media-Analytics.pdf
10. https://www.tandfonline.com/doi/abs/10.2753/JEC1086-
4415150301?journalCode=mjec20
11. https://ieeexplore.ieee.org/abstract/document/7244976
12. https://www.expertsystem.com/social-media-data-mining/
13. https://www.cs.purdue.edu/homes/neville/papers/jensen-neville-nas2002.pdf
14. https://journals.sagepub.com/doi/full/10.1177/1747016117738559
15. https://archive.siam.org/meetings/sdm08/TS1.pdf

S-ar putea să vă placă și