Sunteți pe pagina 1din 22

Distributed Matrix

Factorization with
MapReduce
Agenda
Recommender System (RS)
Recommendation System Techniques
MapReduce
Matrix Factorization (MF)
Effect of MF on Time Complexity
Conclusion
References
15 October 2014 2
Literature Survey
15 October 2014 3
Sr. No. Title of Paper Name of Author Findings
1 Distributed Matrix Factorization
with Mapreduce Using a Series
of Broadcast-Joins
Sebastian Schelter,
Christoph Boden,
Martin Schenck,
Alexander
Alexandrov and
Volker Markl
Authors have proposed an efficient, data-parallel
low-rank matrix factorization with Alternating Least
Squares which uses a series of broadcast-joins that
can be efficiently executed with MapReduce.
2 Comparing the Effect of Matrix
Factorization Techniques in
Reducing the Time Complexity
for Traversing the Big Data of
Recommendation System
Animesh Pandey
and Siddharth
Shrotriya
Authors discussed that in this fast trending
technological era, data is growing very fast all
around the globe. Today "Big Data analytics is
challenging and dependent on time complexity. They
compared the time complexity of different Matrix
Factorization model.
3 Matrix Factorization Techniques
for Recommender Systems
Yehuda Koren,
Robert Bell and
Chris Volinsky
Authors have discussed various learning algorithms
like SGD and ALS. Adding biases allows the
incorporation of additional information such as
implicit feedback, temporal effects, and confidence
levels for faster and scalable processing of large
datasets using the matrix factorization
Literature Survey(Cont.)
15 October 2014 4
Sr. No. Title of Paper Name of Author Findings
4 Recommender Systems
Handbook
Yehuda Koren and
Robert Bell
In Chapter 5, authors have discussed the matrix
factorization techniques, which combine
implementation convenience with a relatively high
accuracy. This has made them the preferred
technique for addressing the largest publicly
available dataset - the Netflix data.
5 Large-Scale Matrix
Factorization using Mapreduce
Zhengguo Sun, Tao
Li and Naphtali
Rishe
Authors discussed about the feasibility to factorize a
million-by-million matrix with billions of nonzero
elements on a MapReduce cluster. In this work, they
presented three different matrix multiplication
implementations and scale up three types of
nonnegative matrix factorizations on MapReduce.
6 Mapreduce: Simplied Data
Processing on Large Clusters
Jeffrey Dean and
Sanjay Ghemawat
Authors discussed the MapReduce as a
programming model and an associated
implementation for processing and generating large
datasets that is amenable to a broad variety of real-
world tasks.
Recommender System
The first Recommender System was developed by Goldberg,
Nichols, Oki & Terry in 1992
Definition:
Recommender System (RS) is comprised of software tools and
information filtering techniques providing suggestion for items
that will be of interest to a certain group of users
Suggestion related to decision making process:
Items to buy
Books to read
Movies to watch by Movielens
Online news to read at VERSIFI Technologies
15 October 2014 5
}By Amazon.com
Example
15 October 2014 6
Recommender System(Cont.)
For a typical Recommendation System (RS):
Given:
User model (e.g. ratings, preferences)
Items (with or without description of item characteristics)
Find:
Relevance score used for ranking
Finally:
Recommend items that are assumed to be relevant
But:
Remember that relevance might be context-dependent
15 October 2014 7
RS Techniques
Recommendation
System (RS)
Collaborative
Filtering (CF)
Memory-based
CF
User-based
Approach
Item-based
Approach
Model-based
CF
Bayesian
Network
Model
Cluster
Model
Hybrid CF
( Memory-based +
Model-based )
Content-Based
Filtering (CB)
Hybrid
Recommendation
15 October 2014 8
MapReduce
A functional paradigm for parallel processing of large data sets on
shared-nothing clusters in DFS
Hadoop is popular & widely deployed open source implementation
of MapReduce
Hadoop jobs are executed as a pipeline map-shuffle-reduce
MapReduce provides:
Automatic parallelization and distribution
Fault-tolerance
I/O scheduling
Status monitoring
15 October 2014 9
I
n
p
u
t

D
a
t
a
Map( )
Map( )
Map( )
Reduce( )
O
u
t
p
u
t

D
a
t
a
Reduce( )
Split
[ k1, v1 ]
Sort by
k1
Merge
[ k1, [ v1, v2, v3] ]
Latent Factor Models
Find features that describe the characteristics of rated objects
Item characteristics and user preferences are described with
numerical factor values
Challenge:
How to compute a mapping of items and users to factor vectors?
Approaches:
Singular Value Decomposition (SVD)
Matrix Factorization
15 October 2014 10
Matrix Factorization
Items and users are associated with a factor vector
Dot product captures the users estimated interest in the item:
=
r (5X6) q (5X3) p (3X6)
X
32
= (a, b, c).(x, y, z) = a*x + b*y + c*z
15 October 2014 11
Rating Prediction
User Preference Factor Vector
Item Preference Factor Vector
Matrix Factorization(Cont.)
Where,
: constant to control the extend of regularization
: is the set of the (u, i) pairs for which r
ui
is known
Need:
Processing very large datasets with MapReduce
Design a fast, scalable algorithm and make it run efficiently in parallel
15 October 2014 12
Matrix Factorization(Cont.)
Learning Algorithms:
15 October 2014 13
Matrix Factorization
Learning Algorithms
Stochastic Gradient
Descent
(SGD)
Alternating Least
Squares
(ALS)
Matrix Factorization(Cont.)
Learning Algorithms(Cont.)
1. Stochastic Gradient Descent (SGD)
Algorithm loops through all the rating in training set
Calculation of the prediction error
Error = Actual rating Predicted rating
Modifies the parameters:
Proportional to (gradient) in the opposite direction
2. Alternating Least Squares (ALS)
ALS technique rotate between fixing the q
i
s and the p
u
s
Allows massive parallelization
Better for densely filled matrices
15 October 2014 14
Matrix Factorization(Cont.)
Adding Biases:
Item or user specific rating variations are called Biases
Example:
Alice rates no movie with more than 2 (out of 5)
Movie X is hyped and rated with 5 only
Matrix factorization allows modeling of biases
Including bias parameters in the prediction:
Temporal Dynamics
Ratings may be affected by temporal effects
Popularity of an item, Users identity and preferences may change
Temporal affects can improve accuracy significantly
Rating predictions as a function of time:
15 October 2014 15
MF Effect on Time Complexity
Factorization Model Matrix
Type
Matrix Description
Singular Value Decomposition
(SVD)
m-by-n
(Rect.)
A = UDV
H
Where,
D is a nonnegative diagonal matrix
U and V are unitary matrices
V
H
denotes the transpose of V
LU Decomposition Square A = LU
Where,
L is lower triangular
U is upper triangular
QR Decomposition m-by-n
(Rect.)
A = QR
Where,
Q is an orthogonal matrix of size m-by-m
R is an upper triangular matrix of size m-by-n
15 October 2014 16
Matrix Factorization Models
Fig. 1: 1024 X 600 Rectangular Matrix Fig. 2: 1024 X 1024 Square matrix with Zero
Padding
MF Effect on Time Complexity(Cont.)
15 October 2014 17
Factorization
Model
Matrix
Dimension
Matrix
Type
Rank (R) before
Decomposition
Rank (R) After
Reconstruction
Correlation with
original image
Rank
Reduction
Singular Value
Decomposition
1024 X 600 Rect. 602 21 0.9896 96.5%
LU
Decomposition
1024 X 1024 Square 602 438 0.9854 27.24%
QR
Decomposition
1024 X 600 Rect. 602 150 0.9890 75%
MF Effect on Time Complexity(Cont.)
15 October 2014 18
1. SVD 2. LU Decomposition
0.97 < Correlation < 0.996 0.915 <Correlation < 0.995
3. QR Decomposition
0.962 < Correlation < 0.99
Conclusion
MapReduce simplifies large-scale computations at Google
Matrix Factorization (MF) with MapReduce using ALS achieves:
Fully Distributed processing
Massive Parallelization
Faster Processing
Good Scalability
The variation of correlation with rank is smoother in case of
rectangular matrix
Thus, MapReduce has proven to be useful in MF
MF is promising approach for Collaborative Filtering (CF)
Addition of biases parameters further improves accuracy
15 October 2014 19
References
1. Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov and Volker Markl,Distributed Matrix Factorization
with MapReduce using a series of Broadcast-Joins, ACM 978-1-4503-2409-0/13/10, ACM RecSys13, Hong Kong, China,
October 1216, 2013.
2. Animesh Pandey and Siddharth Shrotriya,Comparing the Effect of Matrix Factorization Techniques in Reducing the Time
Complexity for Traversing the Big Data of Recommendation Systems, International Journal of Computer and Communication
Engineering, Vol. 2, No. 2, March 2013.
3. Yehuda Koren, Robert Bell and Chris Volinsky,Matrix Factorization Techniques for Recommender Systems, IEEE Computer
Society, IEEE 0018-9162/09, August 2009.
4. Francesco Ricci, Lior Rokach, Bracha Shapira, Paul B. Kantor, Recommender Systems Handbook, Springer, ISBN 978-0-387-
85819-7, Springer Science + Business Media LLC, 2011.
5. Zhengguo Sun, Tao Li, and Naphtali Rishe, Large-Scale Matrix Factorization using MapReduce, IEEE International Conference
on Data Mining Workshops, IEEE Computer Society, IEEE 978-0-7695-4257-7/10, 2010.
6. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplied Data Processing on Large Clusters, Communications of the ACM,
Vol. 51, No. 1, pp. 107-113, January 2008.
7. Apache Hadoop, http://hadoop.apache.org.
8. Apache Mahout, http://mahout.apache.org.
15 October 2014 20
Thank you!
15 October 2014 21
Questions?
15 October 2014 22

S-ar putea să vă placă și