Sunteți pe pagina 1din 3

Rubila Dwi Adawiyah (14/360054/PA/15759)

Natasha Christabelle (14/360017/PA/15753)


M. Nurlazuardi Ajipawenang (14/360029/PA/15754)

MOVIE RECOMMENDATION USING MAHOUTS RECOMMENDATIO


ENGINE
Background

Predicting what user wants is the common use of big data. For example, Google show you relevant
ads, Amazon recommend relevant products, and Netflix recommend movies that you might like
Recommendation involves the prediction of what new items a user would like or dislike based
on preferences of or associations to previous items.

For example, a user, Kevin Benedict, likes the following books which are mostly classic books
(items):
A Tale of Two Cities
The Great Gatsby
For Whom the Bell Tolls

Recommendations will predict which new books (items), Kevin Benedict, will like:
Jane Eyre
The Adventures of Tom Sawyer

In this project, we will use Mahout. Mahout is a machine learning application programming
interface (API) built on Hadoop.

Goals

The goal of this project is to show the movie recommendations for each user.

Tools

Java (to run hadoop)


Hadoop (used by Mahout)
Mahout
Python (use to show the result)

Data set

The dataset we use is The GroupLens Movie DataSet which provides the rating of movies in this
format. This data set contains 943 users, 1,682 movies and 100,000 ratings.
This archive contains:

u.data: contains several tuples(user_id, movie_id, rating, timestamp)


u.user: contains several tuples(user_id, age, gender, occupation, zip_code)
u.item: contains several tuples(movie_id, title, release_date, video_release_data,
imdb_url, cat_unknown, cat_action, cat_adventure, cat_animation, cat_children,
cat_comedy, cat_crime, cat_documentary, cat_drama, cat_fantasy, cat_film_noir,
cat_horror, cat_musical, cat_mystery, cat_romance, cat_sci_fi, cat_thriller, cat_war,
cat_western)

Methods

Mahouts recommendation engines work can be simply described in following way.

X =

S U R

S is the similarity matrix between items


U is the users preferences for items
R is the predicted recommendations

Therefore, to simplify the steps, we use the dataset whose format supports this matrix
multiplication. The dataset itself from MovieLens supports this format. It contains a set of lines
with the userId, the itemId and a preference value separated by a tab. The userId and itemId
are integers and the preference value is a double.

Then, after hadoop and mahout are successfully installed, we simply run Mahout
Recommenders command:
hadoop jar <MAHOUT DIRECTORY>/mahout-core-0.7-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s
SIMILARITY_COOCCURRENCE --input u.data --output output

Argument -s SIMILARITY_COOCURRENCE tells recommender which item similarity formula to


use. It is said that two items (movies) are very similar if they often appear together in users
rating. So to find the movies to recommend to a user, we need to find the 10 movies which are
most similar to the movies the user has rated. For example, if a user A gives a good rating on
movie X and other users gives a good rating on movie X and movie Y, then we can recommend
the movie Y to the user A.

Mahout computes the recommendations by running several Hadoop mapreduce jobs.


After 30-50 minutes, the jobs are finished and each user will have the 10 movies that he/she
might mostly like based on the co-occurrence of each movie in users reviews.

The recommendation result is not easily read. So, we use a small python program to show for a
given user, the movies she/he has rated and the movies we recommend him. The python
program uses the file u.data for the list of rated movies, the file u.item to get the movie titles
and output.txt to get the list of recommended movies for the user.

S-ar putea să vă placă și