Real Time Detection of Odd Behavior and Irrelevant Promotion in Video Sharing Systems

International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page248

Real Time Detection of Odd Behavior and Irrelevant Promotion in Video
Sharing Systems
Panga Eswaraiah, (M.Tech)
CSE Dept, GPREC, Kurnool

Abstract: Metacafe is one of the most popular video
sharing system among all online video sharing systems,
these video sharing systems provide features that allow
users to post a video as a response to the topic that is being
discussed. There are opportunities for its users to
introduce polluted content, or simply pollution, into the
system because of these features. For example, spammers
may post an unrelated video as response to a popular one,
aiming at increasing the likelihood of the response being
viewed by a larger number of users in the same way also
content promoters may try to gain visibility to a specific
video by posting a large number of responses for boosting
the rank of the responded video so that it will appear in the
top lists maintained by the system. Content pollution
means the posting of irrelevant content may jeopardize the
trust of users on the video sharing system, thus in turn it
compromises the success in promoting social interactions.
In spite of that, the available literature is very limited in
providing a deep understanding of this problem. In this
paper, we address the issue of detecting video spammers
and promoters considering you tube video sharing system.
Index Terms: Promoter, social media, social networks,
spammer, video promotion, video response, video spam.
I. INTRODUCTI ON
In the end, we first manually build a test collection of real
Metacafe users, classifying them as spammers, promoters,
and legitimate users. Using our test collection, we provide a
characterization of content, individual, and social attributes
that help distinguish each user class. Next we must
investigate the feasibility of using supervised classification
Dr.D.Kavitha,Ph.D.
Professor,CSE Dept, Kurnool

algorithms to automatically detect spammers and promoters
and then the assessment of their effectiveness in our test
collection has to be done. While our classification approach
succeeds at separating spammers and promoters from
legitimate users, the high cost of manually labeling vast
amounts of examples compromises its full potential in
realistic scenarios. For this reason, we further provide you
an active learning approach as an alternative that
automatically chooses a set of examples for labeling, which
is likely to provide the highest amount of information in
turn reducing the amount of required training data while
maintaining comparable classification effectiveness.
Video content is becoming a predominant part of users
daily lives on the Web. By allowing users to generate and
distribute their own content to large audiences, the Web has
been transformed into a major channel for the delivery of
multimedia, leading society to a new multimedia age. Video
pervades the Internet and supports new types of interaction
among users, including video forums, video chats, video
mail, and video blogs. Additionally, a number of services in
the current Web 2.0 are offering video based functions as
alternative to text-based ones, such as video reviews for
products are examples of sites that allow users to post video
reviews about products), video ads and video responses.
This huge growth of multimedia content in the Web is
mostly due to the evolution of the user from content
consumer to content creator. As a consequence, several
multimedia issues need to be revisited.
In fact, a recent discussion on the needs and challenges of
multimedia research in the context of Web 2.0 pointed out

that understanding how users typically behave is of great
relevance. As an example, the design of effective video
content classification mechanisms is crucial for automatic
identification of videos with malicious content such as
infringing copyright, pornography, and spam. However,
content classification based solely on the raw content can be
a challenging research problem due to the typically low
quality of user generated and the multitude of strategies one
can make use of to publicize malicious content. In contrast,
understanding how users interact with each other in an
online video social network service may highlight aspects
inherent to how malicious users behave, which in turn, may
be used in a much more effective way for detecting and
possibly removing malicious or unwanted content.
In this article, we perform a large and representative
characterization of users interacting with each other
essentially via video objects. Particularly, we characterize
the use of Metacafes video response feature, which allows
one user to video respond to another users video
contribution, creating asynchronous multimedia
conversations. The video response feature became a new
trend in online video social network systems as a means to
exchange knowledge and express ideas through video
interactions. Video responses allow users to provide reviews
for products or places, and to exchange their opinions about
certain themes using a much richer media than simple text.
The characterization of video interactions through video
responses is of interest for two reasons. The first is
technical, stemming from the necessity to understand video
communication in order to evaluate new design choices for
video services. The second is sociological, relating to social
networking issues that influence the behavior of users
interacting primarily via streaming objects, instead of
textual content traditionally available on the Web. The video
media type opens new doors for originality and spontaneity,
but also for content pollution (e.g., spam, promotion, etc.),
in the interactions among users of an online social network.
Our characterization of video interactions relies on a large
sample collected from Metacafe, the currently most popular
online video social network. We analyze the characteristics
of video responses as well as of the interactions triggered by
the use of this feature. We further investigate the presence
of opportunistic activities. In summary, the main
contributions of this work are as follows:
WITH INTERNET video sharing sites gaining popularity at
a dazzling speed, the Web is being transformed into a major
channel for the delivery of multimedia content. Online video
social networks (SNs), out of which Metacafe is the most
popular, are distributing videos at a massive scale. It has
been reported that the amount of content uploaded to
Metacafe in 60 days is equivalent to the content that would
have been broadcasted for 60 years, without interruption, by
NBC, CBS, and ABC altogether. Moreover, Metacafe has
reportedly served over 100 million users only in January
2009, with a video upload rate equivalent to 10 h per
minute. By allowing users to publicize and share their
independently generated content, online video SNs become
susceptible to different types of non-cooperative user
actions. Particularly, these systems usually offer three basic
mechanisms for video retrieval: 1) a search system; 2)
ranked lists of top videos; and 3) social links connecting
users and/or videos. Although appealing as mechanisms to
facilitate content location and enrich online interaction,
these mechanisms open opportunities for users to introduce
polluted content into the system. As an example, video
search systems can be fooled by malicious attacks in which
users post their videos with several popular tags.
Opportunistic behavior on the other two mechanisms for
video retrieval can be exemplified by observing a Metacafe
feature that allows users to post a video as a response to a
video topic. Some users, which we call spammers, post
unrelated videos as responses to popular video topics aiming
at increasing the likelihood of the responses being viewed
by a larger number of users. Other users, to whom we refer

as promoters, may try to gain visibility toward a specific
video by posting a large number of (potentially unrelated)
responses to boost the rank of the video topic among the
most responded videos, making it appear in the top lists
maintained by Metacafe. Promoters and spammers are
driven by several goals, such as spread advertisements to
generate sales, disseminate pornography, or simply
compromise system reputation.
Polluted content may compromise user patience and
satisfaction with the system since users cannot easily
identify the pollution before watching at least a segment of
it, which also consumes system resources, particularly
bandwidth. Additionally, promoters may further negatively
impact system mechanisms related to content distribution,
since promoted videos that quickly reach high rankings are
strong candidates to be kept in caches or in content
distribution networks.
II. SYSTEM DEVELOPMENT
a) Construction of Cloud Data Storage
Metacafe consists of videos along with their tags.
The module tries download the text content
(posts/responses) of users viewed the video.
The module downloads video link, web-snippet, posts,
and click stats.
Using the data collected, building a similar website.
b) User Management
The module stores the crawled data into the database
User can create account by registering into the server
A user can log in to a to obtain access and can then log
out or log off, when the access is no longer needed.
c) User Video Upload
User can upload a video to the video sharing system.
The video upload is possible only when a user is signed
in to the system.
The module prepares the word-snippet for the uploaded
video text tags.
d) User Profile Management
In this module, the users profiles are maintained.
A dataset of user profiles are collected to analyse the
spammers and content promoters.
The complete set consists of normal users, spammers
and content promoters.
e) User Behavior Analysis
The user profiles are analysed for the anomalous
activity profiles
The users those who are behaving anomalously
Comparing LAC and Active LAC Results with ROC
Curves.
We address the issue of detecting video spammers and
promoters adopting a five-step approach. First, we crawled a
large user data set from Metacafe site, containing more than
260 thousand users. Second, we sampled our user data set to
create a labeled test collection consisting of 829 users,
which were manually classified as legitimate, spammers,
and promoters. Our sampling was performed so as to
capture different profiles of users in each category. Third,
we analyzed a variety of video, individual and social
attributes that reflect the behavior of our sampled users,
aiming at drawing some insights into their relative
discriminatory power in distinguishing legitimate users,
promoters, and spammers. Fourth, using the same set of
attributes, which are based on the users profile, the users
social behavior in the system, and the videos posted by the
user as well as her target (responded) videos, we
investigated the feasibility of applying supervised learning
methods for identifying the two envisioned types of
polluters.
We consider two state-of-the-art supervised classification
algorithms, namely, support vector machines (SVMs) and
lazy associative classification (LAC). We evaluated both
algorithms over our test collection, finding that both

techniques can effectively identify the majority of the
promoters and spammers.
However, despite effectiveness, supervised solutions usually
rely on manually labeled training data to learn patterns
capable of identifying specific behaviors. Manually labeling
large amounts of training data, specifically in the case of
video sharing systems, is very costly. For instance, to
manually create our user test collection with 829 users,
volunteers had to watch around 20 000 videos. Thus, the
high cost of manually labeling vast amounts of examples
compromise its full potential on a practical and realistic
scenario.
For this reason, we propose an active learning approach
which automatically chooses a set of examples to be labeled
that is likely to provide the highest amount of information.
This approach allows us to drastically reduce the labeling
effort while maintaining a similar classification
effectiveness, making our algorithm very suitable for
practical scenarios.
III. RELATED WORK
Over the last few years, there have been a number of studies
that explored the various aspects of social networking sites.
Researchers explored the overall scope, structure, and friend
relationship patterns of popular online social networks such
as Flickr,1 Metacafe, LiveJ ournal,2 Facebook,3 and Orkut.4
Particularly, an interesting study of Metacafe is presented
in. The authors analyzed the popularity distribution,
popularity evolution, and content characteristics of Metacafe
and of a Korean video sharing service. They also analyzed
system issues that could be used to improve video
distribution mechanisms, such as caching and peer-to-peer
distribution schemes. Developers present a characterization
of the Metacafe traffic collected from a university campus
network, comparing its properties with those previously
reported for Web and media streaming workloads. They
found that HTTP GET requests, used for fetching content
from the server, correspond to over 99% of the total requests
sent to the server, and that requests sent from the campus to
the server follow typical daily and weekly patterns. They
also analyzed file sizes, video durations, video bit rates,
video ages, video ratings, and video categories, comparing
these properties with those of objects in other media types
retrieved from Metacafe as well as of traditional Web and
media streaming workloads. Another characterization of the
Metacafe traffic collected from a university campus is
presented. Based on their measurements, the authors
designed trace-driven simulations to show that client-based
local caching, P2P-based distribution, and proxy caching
can significantly reduce network traffic and allow faster
access to videos. These studies show evidence of significant
differences in user and video access patterns compared with
traditional Web servers.
CRAWLING A SOCIAL NETWORK
In order to analyze social network aspects on video
interactions, we need to obtain information about users and
their interactions (via video responses) from Metacafe. To
do it, we can sequentially visit pages on the Metacafe site
(that is, crawl) and gather information about Metacafe video
responses and their contributors. Every Metacafe video post
has a single contributor, who is a registered Metacafe user.
We say a Metacafe video is a responded video or video topic
if it has at least one video response. A video topic has a
sequence of video responses listed chronologically in terms
of when they are uploaded to the system.7 we say a
Metacafe user is a responded user if at least one of her
contributed videos is a responded video. Finally, we say that
a Metacafe user is a responsive user if she has posted at
least one video response. A natural user graph emerges from
video response interactions. At a given instant of time t, let
X be the union of all responded users and responsive users.
The set X is, of course, a subset of all Metacafe users. We
denote the video response user graph as the directed graph
(X, Y), where (x1, x2) is a directed arc in Y if user x1 ! X has
responded to at least one video contributed by user x2! X.

Two video response sequences and the graph established by
these interactions. We note that the video response user
graph may have multiple weakly connected components.

Fig:1

V. CONCLUSI ON
Promoters and spammers can pollute video retrieval features
of online video SNs, compromising not only user
satisfaction with the system, but also the usage of system
resources and the effectiveness of content delivery
mechanisms such as caching and content delivery networks.
We here proposed an effective solution that can help system
administrators to detect spammers and promoters in online
video SNs. Relying on a sample of pre-classified users and
on a set of user behavior attributes, our supervised
classification approaches are able to correctly detect the vast
majority of the promoters and many spammers,
misclassifying only a very small number of legitimate users.
Thus, our proposed approach poses a promising alternative
to simply considering all users as legitimate or to randomly
selecting users for manual inspection. Moreover, given that
the cost of the labeling process may be too high for practical
purposes, we also propose an active learning approach,
which was able to produce results very close to the
completely supervised solutions, but with a greatly reduced
amount of labeled data.
We envision some directions toward which our work can
evolve. We intend to explore other refinements to the
proposed approach such as to use different classification
methods, perhaps combining multiple strategies. We believe
that better classification effectiveness may require exploring
other features which include temporal aspects of user
behavior and also features obtained from other SNs
established among Metacafe users. Additionally, we intend
to explore a better combination of features to improve
classification results. Finally, we also plan to extend our
general approach to detect malicious and opportunistic users
in other online SN sites and contexts.
REFERENCES
[1] Practical Detection of Spammers and Content
Promoters in Online Video Sharing Systems Fabrcio
Benevenuto, Tiago Rodrigues, Adriano Veloso, Jussara
Almeida, Marcos Gonalves, and Virglio Almeida
[2] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels,
R. Gummadi, S. Rhea, H. Weather spoon, W. Weimer, C.
Wells, and B. Zhao, Ocean store: An Architecture for
Global-Scale Persistent Storage, Proc. Ninth Intl Conf.
Architectural Support for Programming Languages and
Operating Systems (ASPLOS), pp. 190- 201, 2000.
[3] P. Druschel and A. Rowstron, PAST: A Large-Scale,
Persistent Peer-to-Peer Storage Utility, Proc. Eighth
Workshop Hot Topics in Operating System (HotOS VIII),
pp. 75-80, 2001.
[4] A. Adya, W.J. Bolo sky, M. Castro, G. Cermak, R.
Chaiken, J.R. Douceur, J. Howell, J.R. Lorch, M. Theimer,
and R. Wattenhofer, Farsite: Federated, Available, and
Reliable Storage for an Incompletely Trusted Environment,
Proc. Fifth Symp. Operating System Design and
Implementation (OSDI), pp. 1-14, 2002.
[5] A. Haeberlen, A. Mislove, and P. Druschel, Glacier:
Highly Durable, Decentralized Storage Despite Massive
Correlated Failures, Proc. Second Symp. Networked
Systems Design and Implementation (NSDI), pp. 143-158,
2005.

[6] Z. Wilcox-OHearn and B. Warner, Tahoe: The Least-
Authority File system, Proc. Fourth ACM Intl Workshop
Storage Security and Survivability (StorageSS), pp. 21-26,
2008.
[7] H.-Y. Lin and W.-G. Tzeng, A Secure Decentralized
Erasure Code for Distributed Network Storage, IEEE
Trans. Parallel and Distributed Systems, vol. 21, no. 11, pp.
1586-1594, Nov. 2010.
[8] D.R. Brownbridge, L.F. Marshall, and B. Randell, The
Newcastle Connection or Unixes of the World Unite!,
Software Practice and Experience, vol. 12, no. 12, pp. 1147-
1162, 1982.
[9] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B.
Lyon, Design and Implementation of the Sun Network File
system, Proc. USENIX Assoc. Conf., 1985.
[10] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang,
and K. Fu, Plutus: Scalable Secure File Sharing on
Untrusted Storage, Proc. Second USENIX Conf. File and
Storage Technologies (FAST), pp. 29- 42, 2003.
[11] S.C. Rhea, P.R. Eaton, D. Geels, H. Weather spoon,
B.Y. Zhao, and J. Kubiatowicz, Pond: The Ocean store
Prototype, Proc. Second USENIX Conf. File and Storage
Technologies (FAST), pp. 1-14, 2003.
[12] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and
G.M. Volker, Total Recall: System Support for Automated
Availability Management, Proc. First Symp. Networked
Systems Design and Implementation (NSDI), pp. 337-350,
2004.
[13] A.G. Dimakis, V. Prabhakaran, and K. Ramchandran,
Ubiquitous Access to Distributed Data in Large-Scale
Sensor Networks through Decentralized Erasure Codes,
Proc. Fourth Intl Symp. Information Processing in Sensor
Networks (IPSN), pp. 111- 117, 2005.
[14] A.G. Dimakis, V. Prabhakaran, and K. Ramchandran,
Decentralized Erasure Codes for Distributed Networked
Storage, IEEE Trans. Information Theory, vol. 52, no. 6
pp. 2809-2816, June 2006.
[15] M. Mambo and E. Okamoto, Proxy Cryptosystems:
Delegation of the Power to Decrypt Cipher texts, IEICE
Trans. Fundamentals of Electronics, Comm. and Computer
Sciences, vol. E80-A, no. 1, pp. 54- 63, 1997.

First Author: Panga Eswaraiah received his M.Sc. from
T.N.C.K.R. P.G College in Computer Science and
Engineering department in the year of 2008. He is currently
M.Tech student in the Computer Science Engineering from
G.Pullareddy Engineering College (Autonomous), Kurnool.
And he is interested in the field of Data Mining and Cloud
Computing.

Second Author: Dr.D.Kavitha, Working as Professor,
Department of CSE, G.Pullareddy Engineering College
(Autonomous), Kurnool, Andhra Pradesh.

Real Time Detection of Odd Behavior and Irrelevant Promotion in Video Sharing Systems

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Real Time Detection of Odd Behavior and Irrelevant Promotion in Video Sharing Systems

Încărcat de

Drepturi de autor:

Formate disponibile

International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page248

S-ar putea să vă placă și