Documente Academic
Documente Profesional
Documente Cultură
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
Ryan Womack
More Hadoop
Ecosystem Tools
Other Providers
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Fun, hopefully
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
May help you decide how far and in which direction you
want to go with big data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
Introduction
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Setup
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
About Me
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
A buzzword...
A concept...
A set of practices...
An ecosystem...
Image of Big Data
Reality of Big Data
Big Data Landscape
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
1. Velocity
Hadoop +
MapReduce
2. Variety
AWS (Amazon
Web Services)
3. Volume
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hadoop
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Hadoop Architecture
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
MapReduce
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
MapReduce Examples
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
MapReduce Illustration
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Downside:
I
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
AWS setup
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Click Create.
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Amazon S3
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Other Providers
More Hadoop
Ecosystem Tools
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Pig
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hue
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hive
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Spark
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
Spark in CDH
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Administration
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Cloudera
Hands-On Big
Data
Ryan Womack
Introduction
Cloudera (http://www.cloudera.com)
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
HighDimensional
and Sparse Data
Big Data in
Practice
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
Termination
Hortonworks
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hortonworks (http://www.hortonworks.com)
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Ambari revisited
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
RHadoop
Tessera
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
RHadoop
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
R in the Cloud
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Tessera
Tessera is developed by Purdue, Pacific Northwest National
Laboratory, and Mozilla. Launched in November 2014, this
project holds a lot of promise.
I Running in the R environment, Tessera provides its own
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
High-dimensional data
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
= arg min
n
X
i=1
(yi 0
p
X
j=1
xij j ) +
p
X
|j |
j=1
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
Statistical Science.
I We (artificially) penalize having large numbers of variables, so the
variants popular
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Sparse data
In numerical analysis, a sparse matrix is a matrix in which most of
the elements are zero. By contrast, if most of the elements are
nonzero, then the matrix is considered dense. The fraction of zero
elements over the total number of elements in a matrix is called the
sparsity (density) [Wikipedia]
I
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Conclusion
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
Termination
Hands-On Big
Data
Ryan Womack
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination
References
Hands-On Big
Data
Ryan Womack
1.
Peter B
uhlmann and Sara van de Geer. Statistics for
High-Dimensional Data: Methods, Theory and Applications.
Springer, 2011.
2.
3.
4.
Richard Hill, Laurie Hirsch, Peter Lake, and Siavash Moshiri. Guide
to Cloud Computing: Principles and Practice. Computer
Communications and Networks. Springer, 2013.
5.
6.
7.
Introduction
Big Data
Hadoop +
MapReduce
AWS (Amazon
Web Services)
Pig and Hive
More Hadoop
Ecosystem Tools
Other Providers
R and Big Data
HighDimensional
and Sparse Data
Big Data in
Practice
Termination