0 evaluări0% au considerat acest document util (0 voturi)
13 vizualizări6 pagini
Metacafe is one of the most popular video
sharing system among all online video sharing systems,
these video sharing systems provide features that allow
users to post a video as a response to the topic that is being
discussed. There are opportunities for its users to
introduce polluted content, or simply pollution, into the
system because of these features. For example, spammers
may post an unrelated video as response to a popular one,
aiming at increasing the likelihood of the response being
viewed by a larger number of users in the same way also
content promoters may try to gain visibility to a specific
video by posting a large number of responses for boosting
the rank of the responded video so that it will appear in the
top lists maintained by the system. Content pollution
means the posting of irrelevant content may jeopardize the
trust of users on the video sharing system, thus in turn it
compromises the success in promoting social interactions.
In spite of that, the available literature is very limited in
providing a deep understanding of this problem. In this
paper, we address the issue of detecting video spammers
and promoters considering you tube video sharing system.
Index Terms: Promoter, social media, social networks,
spammer, video promotion, video response, video spam.
Titlu original
Real Time Detection of Odd Behavior and Irrelevant Promotion in Video
Sharing Systems
Metacafe is one of the most popular video
sharing system among all online video sharing systems,
these video sharing systems provide features that allow
users to post a video as a response to the topic that is being
discussed. There are opportunities for its users to
introduce polluted content, or simply pollution, into the
system because of these features. For example, spammers
may post an unrelated video as response to a popular one,
aiming at increasing the likelihood of the response being
viewed by a larger number of users in the same way also
content promoters may try to gain visibility to a specific
video by posting a large number of responses for boosting
the rank of the responded video so that it will appear in the
top lists maintained by the system. Content pollution
means the posting of irrelevant content may jeopardize the
trust of users on the video sharing system, thus in turn it
compromises the success in promoting social interactions.
In spite of that, the available literature is very limited in
providing a deep understanding of this problem. In this
paper, we address the issue of detecting video spammers
and promoters considering you tube video sharing system.
Index Terms: Promoter, social media, social networks,
spammer, video promotion, video response, video spam.
Metacafe is one of the most popular video
sharing system among all online video sharing systems,
these video sharing systems provide features that allow
users to post a video as a response to the topic that is being
discussed. There are opportunities for its users to
introduce polluted content, or simply pollution, into the
system because of these features. For example, spammers
may post an unrelated video as response to a popular one,
aiming at increasing the likelihood of the response being
viewed by a larger number of users in the same way also
content promoters may try to gain visibility to a specific
video by posting a large number of responses for boosting
the rank of the responded video so that it will appear in the
top lists maintained by the system. Content pollution
means the posting of irrelevant content may jeopardize the
trust of users on the video sharing system, thus in turn it
compromises the success in promoting social interactions.
In spite of that, the available literature is very limited in
providing a deep understanding of this problem. In this
paper, we address the issue of detecting video spammers
and promoters considering you tube video sharing system.
Index Terms: Promoter, social media, social networks,
spammer, video promotion, video response, video spam.
Real Time Detection of Odd Behavior and Irrelevant Promotion in Video Sharing Systems Panga Eswaraiah, (M.Tech) CSE Dept, GPREC, Kurnool
Abstract: Metacafe is one of the most popular video sharing system among all online video sharing systems, these video sharing systems provide features that allow users to post a video as a response to the topic that is being discussed. There are opportunities for its users to introduce polluted content, or simply pollution, into the system because of these features. For example, spammers may post an unrelated video as response to a popular one, aiming at increasing the likelihood of the response being viewed by a larger number of users in the same way also content promoters may try to gain visibility to a specific video by posting a large number of responses for boosting the rank of the responded video so that it will appear in the top lists maintained by the system. Content pollution means the posting of irrelevant content may jeopardize the trust of users on the video sharing system, thus in turn it compromises the success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem. In this paper, we address the issue of detecting video spammers and promoters considering you tube video sharing system. Index Terms: Promoter, social media, social networks, spammer, video promotion, video response, video spam. I. INTRODUCTI ON In the end, we first manually build a test collection of real Metacafe users, classifying them as spammers, promoters, and legitimate users. Using our test collection, we provide a characterization of content, individual, and social attributes that help distinguish each user class. Next we must investigate the feasibility of using supervised classification Dr.D.Kavitha,Ph.D. Professor,CSE Dept, Kurnool
algorithms to automatically detect spammers and promoters and then the assessment of their effectiveness in our test collection has to be done. While our classification approach succeeds at separating spammers and promoters from legitimate users, the high cost of manually labeling vast amounts of examples compromises its full potential in realistic scenarios. For this reason, we further provide you an active learning approach as an alternative that automatically chooses a set of examples for labeling, which is likely to provide the highest amount of information in turn reducing the amount of required training data while maintaining comparable classification effectiveness. Video content is becoming a predominant part of users daily lives on the Web. By allowing users to generate and distribute their own content to large audiences, the Web has been transformed into a major channel for the delivery of multimedia, leading society to a new multimedia age. Video pervades the Internet and supports new types of interaction among users, including video forums, video chats, video mail, and video blogs. Additionally, a number of services in the current Web 2.0 are offering video based functions as alternative to text-based ones, such as video reviews for products are examples of sites that allow users to post video reviews about products), video ads and video responses. This huge growth of multimedia content in the Web is mostly due to the evolution of the user from content consumer to content creator. As a consequence, several multimedia issues need to be revisited. In fact, a recent discussion on the needs and challenges of multimedia research in the context of Web 2.0 pointed out International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page249
that understanding how users typically behave is of great relevance. As an example, the design of effective video content classification mechanisms is crucial for automatic identification of videos with malicious content such as infringing copyright, pornography, and spam. However, content classification based solely on the raw content can be a challenging research problem due to the typically low quality of user generated and the multitude of strategies one can make use of to publicize malicious content. In contrast, understanding how users interact with each other in an online video social network service may highlight aspects inherent to how malicious users behave, which in turn, may be used in a much more effective way for detecting and possibly removing malicious or unwanted content. In this article, we perform a large and representative characterization of users interacting with each other essentially via video objects. Particularly, we characterize the use of Metacafes video response feature, which allows one user to video respond to another users video contribution, creating asynchronous multimedia conversations. The video response feature became a new trend in online video social network systems as a means to exchange knowledge and express ideas through video interactions. Video responses allow users to provide reviews for products or places, and to exchange their opinions about certain themes using a much richer media than simple text. The characterization of video interactions through video responses is of interest for two reasons. The first is technical, stemming from the necessity to understand video communication in order to evaluate new design choices for video services. The second is sociological, relating to social networking issues that influence the behavior of users interacting primarily via streaming objects, instead of textual content traditionally available on the Web. The video media type opens new doors for originality and spontaneity, but also for content pollution (e.g., spam, promotion, etc.), in the interactions among users of an online social network. Our characterization of video interactions relies on a large sample collected from Metacafe, the currently most popular online video social network. We analyze the characteristics of video responses as well as of the interactions triggered by the use of this feature. We further investigate the presence of opportunistic activities. In summary, the main contributions of this work are as follows: WITH INTERNET video sharing sites gaining popularity at a dazzling speed, the Web is being transformed into a major channel for the delivery of multimedia content. Online video social networks (SNs), out of which Metacafe is the most popular, are distributing videos at a massive scale. It has been reported that the amount of content uploaded to Metacafe in 60 days is equivalent to the content that would have been broadcasted for 60 years, without interruption, by NBC, CBS, and ABC altogether. Moreover, Metacafe has reportedly served over 100 million users only in January 2009, with a video upload rate equivalent to 10 h per minute. By allowing users to publicize and share their independently generated content, online video SNs become susceptible to different types of non-cooperative user actions. Particularly, these systems usually offer three basic mechanisms for video retrieval: 1) a search system; 2) ranked lists of top videos; and 3) social links connecting users and/or videos. Although appealing as mechanisms to facilitate content location and enrich online interaction, these mechanisms open opportunities for users to introduce polluted content into the system. As an example, video search systems can be fooled by malicious attacks in which users post their videos with several popular tags. Opportunistic behavior on the other two mechanisms for video retrieval can be exemplified by observing a Metacafe feature that allows users to post a video as a response to a video topic. Some users, which we call spammers, post unrelated videos as responses to popular video topics aiming at increasing the likelihood of the responses being viewed by a larger number of users. Other users, to whom we refer International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page250
as promoters, may try to gain visibility toward a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the video topic among the most responded videos, making it appear in the top lists maintained by Metacafe. Promoters and spammers are driven by several goals, such as spread advertisements to generate sales, disseminate pornography, or simply compromise system reputation. Polluted content may compromise user patience and satisfaction with the system since users cannot easily identify the pollution before watching at least a segment of it, which also consumes system resources, particularly bandwidth. Additionally, promoters may further negatively impact system mechanisms related to content distribution, since promoted videos that quickly reach high rankings are strong candidates to be kept in caches or in content distribution networks. II. SYSTEM DEVELOPMENT a) Construction of Cloud Data Storage Metacafe consists of videos along with their tags. The module tries download the text content (posts/responses) of users viewed the video. The module downloads video link, web-snippet, posts, and click stats. Using the data collected, building a similar website. b) User Management The module stores the crawled data into the database User can create account by registering into the server A user can log in to a to obtain access and can then log out or log off, when the access is no longer needed. c) User Video Upload User can upload a video to the video sharing system. The video upload is possible only when a user is signed in to the system. The module prepares the word-snippet for the uploaded video text tags. d) User Profile Management In this module, the users profiles are maintained. A dataset of user profiles are collected to analyse the spammers and content promoters. The complete set consists of normal users, spammers and content promoters. e) User Behavior Analysis The user profiles are analysed for the anomalous activity profiles The users those who are behaving anomalously Comparing LAC and Active LAC Results with ROC Curves. We address the issue of detecting video spammers and promoters adopting a five-step approach. First, we crawled a large user data set from Metacafe site, containing more than 260 thousand users. Second, we sampled our user data set to create a labeled test collection consisting of 829 users, which were manually classified as legitimate, spammers, and promoters. Our sampling was performed so as to capture different profiles of users in each category. Third, we analyzed a variety of video, individual and social attributes that reflect the behavior of our sampled users, aiming at drawing some insights into their relative discriminatory power in distinguishing legitimate users, promoters, and spammers. Fourth, using the same set of attributes, which are based on the users profile, the users social behavior in the system, and the videos posted by the user as well as her target (responded) videos, we investigated the feasibility of applying supervised learning methods for identifying the two envisioned types of polluters. We consider two state-of-the-art supervised classification algorithms, namely, support vector machines (SVMs) and lazy associative classification (LAC). We evaluated both algorithms over our test collection, finding that both International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page251
techniques can effectively identify the majority of the promoters and spammers. However, despite effectiveness, supervised solutions usually rely on manually labeled training data to learn patterns capable of identifying specific behaviors. Manually labeling large amounts of training data, specifically in the case of video sharing systems, is very costly. For instance, to manually create our user test collection with 829 users, volunteers had to watch around 20 000 videos. Thus, the high cost of manually labeling vast amounts of examples compromise its full potential on a practical and realistic scenario. For this reason, we propose an active learning approach which automatically chooses a set of examples to be labeled that is likely to provide the highest amount of information. This approach allows us to drastically reduce the labeling effort while maintaining a similar classification effectiveness, making our algorithm very suitable for practical scenarios. III. RELATED WORK Over the last few years, there have been a number of studies that explored the various aspects of social networking sites. Researchers explored the overall scope, structure, and friend relationship patterns of popular online social networks such as Flickr,1 Metacafe, LiveJ ournal,2 Facebook,3 and Orkut.4 Particularly, an interesting study of Metacafe is presented in. The authors analyzed the popularity distribution, popularity evolution, and content characteristics of Metacafe and of a Korean video sharing service. They also analyzed system issues that could be used to improve video distribution mechanisms, such as caching and peer-to-peer distribution schemes. Developers present a characterization of the Metacafe traffic collected from a university campus network, comparing its properties with those previously reported for Web and media streaming workloads. They found that HTTP GET requests, used for fetching content from the server, correspond to over 99% of the total requests sent to the server, and that requests sent from the campus to the server follow typical daily and weekly patterns. They also analyzed file sizes, video durations, video bit rates, video ages, video ratings, and video categories, comparing these properties with those of objects in other media types retrieved from Metacafe as well as of traditional Web and media streaming workloads. Another characterization of the Metacafe traffic collected from a university campus is presented. Based on their measurements, the authors designed trace-driven simulations to show that client-based local caching, P2P-based distribution, and proxy caching can significantly reduce network traffic and allow faster access to videos. These studies show evidence of significant differences in user and video access patterns compared with traditional Web servers. CRAWLING A SOCIAL NETWORK In order to analyze social network aspects on video interactions, we need to obtain information about users and their interactions (via video responses) from Metacafe. To do it, we can sequentially visit pages on the Metacafe site (that is, crawl) and gather information about Metacafe video responses and their contributors. Every Metacafe video post has a single contributor, who is a registered Metacafe user. We say a Metacafe video is a responded video or video topic if it has at least one video response. A video topic has a sequence of video responses listed chronologically in terms of when they are uploaded to the system.7 we say a Metacafe user is a responded user if at least one of her contributed videos is a responded video. Finally, we say that a Metacafe user is a responsive user if she has posted at least one video response. A natural user graph emerges from video response interactions. At a given instant of time t, let X be the union of all responded users and responsive users. The set X is, of course, a subset of all Metacafe users. We denote the video response user graph as the directed graph (X, Y), where (x1, x2) is a directed arc in Y if user x1 ! X has responded to at least one video contributed by user x2! X. International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page252
Two video response sequences and the graph established by these interactions. We note that the video response user graph may have multiple weakly connected components.
Fig:1
V. CONCLUSI ON Promoters and spammers can pollute video retrieval features of online video SNs, compromising not only user satisfaction with the system, but also the usage of system resources and the effectiveness of content delivery mechanisms such as caching and content delivery networks. We here proposed an effective solution that can help system administrators to detect spammers and promoters in online video SNs. Relying on a sample of pre-classified users and on a set of user behavior attributes, our supervised classification approaches are able to correctly detect the vast majority of the promoters and many spammers, misclassifying only a very small number of legitimate users. Thus, our proposed approach poses a promising alternative to simply considering all users as legitimate or to randomly selecting users for manual inspection. Moreover, given that the cost of the labeling process may be too high for practical purposes, we also propose an active learning approach, which was able to produce results very close to the completely supervised solutions, but with a greatly reduced amount of labeled data. We envision some directions toward which our work can evolve. We intend to explore other refinements to the proposed approach such as to use different classification methods, perhaps combining multiple strategies. We believe that better classification effectiveness may require exploring other features which include temporal aspects of user behavior and also features obtained from other SNs established among Metacafe users. Additionally, we intend to explore a better combination of features to improve classification results. Finally, we also plan to extend our general approach to detect malicious and opportunistic users in other online SN sites and contexts. REFERENCES [1] Practical Detection of Spammers and Content Promoters in Online Video Sharing Systems Fabrcio Benevenuto, Tiago Rodrigues, Adriano Veloso, Jussara Almeida, Marcos Gonalves, and Virglio Almeida [2] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weather spoon, W. Weimer, C. Wells, and B. Zhao, Ocean store: An Architecture for Global-Scale Persistent Storage, Proc. Ninth Intl Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 190- 201, 2000. [3] P. Druschel and A. Rowstron, PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility, Proc. Eighth Workshop Hot Topics in Operating System (HotOS VIII), pp. 75-80, 2001. [4] A. Adya, W.J. Bolo sky, M. Castro, G. Cermak, R. Chaiken, J.R. Douceur, J. Howell, J.R. Lorch, M. Theimer, and R. Wattenhofer, Farsite: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment, Proc. Fifth Symp. Operating System Design and Implementation (OSDI), pp. 1-14, 2002. [5] A. Haeberlen, A. Mislove, and P. Druschel, Glacier: Highly Durable, Decentralized Storage Despite Massive Correlated Failures, Proc. Second Symp. Networked Systems Design and Implementation (NSDI), pp. 143-158, 2005. International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page253
[6] Z. Wilcox-OHearn and B. Warner, Tahoe: The Least- Authority File system, Proc. Fourth ACM Intl Workshop Storage Security and Survivability (StorageSS), pp. 21-26, 2008. [7] H.-Y. Lin and W.-G. Tzeng, A Secure Decentralized Erasure Code for Distributed Network Storage, IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 11, pp. 1586-1594, Nov. 2010. [8] D.R. Brownbridge, L.F. Marshall, and B. Randell, The Newcastle Connection or Unixes of the World Unite!, Software Practice and Experience, vol. 12, no. 12, pp. 1147- 1162, 1982. [9] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, Design and Implementation of the Sun Network File system, Proc. USENIX Assoc. Conf., 1985. [10] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu, Plutus: Scalable Secure File Sharing on Untrusted Storage, Proc. Second USENIX Conf. File and Storage Technologies (FAST), pp. 29- 42, 2003. [11] S.C. Rhea, P.R. Eaton, D. Geels, H. Weather spoon, B.Y. Zhao, and J. Kubiatowicz, Pond: The Ocean store Prototype, Proc. Second USENIX Conf. File and Storage Technologies (FAST), pp. 1-14, 2003. [12] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G.M. Volker, Total Recall: System Support for Automated Availability Management, Proc. First Symp. Networked Systems Design and Implementation (NSDI), pp. 337-350, 2004. [13] A.G. Dimakis, V. Prabhakaran, and K. Ramchandran, Ubiquitous Access to Distributed Data in Large-Scale Sensor Networks through Decentralized Erasure Codes, Proc. Fourth Intl Symp. Information Processing in Sensor Networks (IPSN), pp. 111- 117, 2005. [14] A.G. Dimakis, V. Prabhakaran, and K. Ramchandran, Decentralized Erasure Codes for Distributed Networked Storage, IEEE Trans. Information Theory, vol. 52, no. 6 pp. 2809-2816, June 2006. [15] M. Mambo and E. Okamoto, Proxy Cryptosystems: Delegation of the Power to Decrypt Cipher texts, IEICE Trans. Fundamentals of Electronics, Comm. and Computer Sciences, vol. E80-A, no. 1, pp. 54- 63, 1997.
First Author: Panga Eswaraiah received his M.Sc. from T.N.C.K.R. P.G College in Computer Science and Engineering department in the year of 2008. He is currently M.Tech student in the Computer Science Engineering from G.Pullareddy Engineering College (Autonomous), Kurnool. And he is interested in the field of Data Mining and Cloud Computing.
Second Author: Dr.D.Kavitha, Working as Professor, Department of CSE, G.Pullareddy Engineering College (Autonomous), Kurnool, Andhra Pradesh.