Bine ați venit la Scribd!

Săriți peste schemele de tip carusel

2 ApacheMahout

Încărcat de

Habib Mrad

0% au considerat acest document util (0 voturi)

9 vizualizări19 pagini

apache

Titlu original

2.ApacheMahout

Drepturi de autor

Formate disponibile

PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

apache

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

9 vizualizări19 pagini

2 ApacheMahout

Încărcat de

Habib Mrad

apache

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 19

Căutați în document

Apache

The Elephant Driver

Presenters:
Antonio Loureiro Severien
Emmanouil Dimogerontakis
Muhammad Anis uddin Nasir
What is Apache Mahout?
Machine learning and data mining framework for
classification, clustering and recommendation

The Apache Mahout free machine learning library's goal

is to build scalable machine learning tools for use on
analysing big data on a distributed manner
Machine Learning
"Machine Learning is programming computers to optimize a
performance criterion using example data or past
experience" - Alpaydin, 2004

Machine learning is concerned with the design and

development of algorithms that allow machines to make
decisions or even evolve behaviors based on collection of
empirical data.
Data Mining
Data mining, also called knowledge discovery in
databases(KDD) is the process of discovering interesting
and useful patterns and relationships in large volumes of
data.
Combines tools from:
statistics
artificial intelligence (such as neural networks and
machine learning)
with database management to analyze large data sets.
-Britannica Online Encyclopedia
Why Machine Learning and Data
Mining?

Data, Data, DATA!!!

Tasks too Hard to Program

Customizing software
Available Machine Learning Tools

WEKA
R
KEEL
Others...

Not enough?
Apache Mahout vs others?
Many open source Machine Learning
libraries either:
Lack Community
Lack Documentation and Examples
Lack the Apache License
(business opportunity)
Are research-oriented
(not fit for production yet)
Lack Scalability
Mahout = Elephant Driver?
Why we need scalability?
Big Data
Applications
Recommendation features
Clustering of information
Classification

Examples: Movie recommendations, stock

analysis, fraud detection, ad-sense
recommendation, etc...

How do we do this?
Supported Algorithms
Classification
Clustering
Recommender / Collaborative Filtering
Evolutionary Algorithms
Pattern Mining
Regression
Dimension reduction
Similarity Vectors
Classification
(learn to assign categories to documents)

Fully functional
Logistic Regression (SGD)
Bayesian

Integrated to Mahout Development

Random Forests (integrated)
Online Passive Aggressive (integrated)
Boosting (awaiting patch commit)

Open to be worked on...

Hidden Markov Models (HMM) - Training is done in Map-Reduce
Support Vector Machines (SVM) (open)
Perceptron and Winnow (open)
Neural Network (open)
Clustering
(group items that are topically related)

Fully functional
Expectation Maximization (EM)
Hierarchical Clustering

Integrated to Mahout Development

Canopy Clustering
K-Means Clustering
Fuzzy K-Means
Mean Shift Clustering
Dirichlet Process Clustering
Latent Dirichlet Allocation
Spectral Clustering
Minhash Clustering
Top Down Clustering
Recommenders /
Collaborative Filtering
(find items a user might like /
find items that appear together)

Integrated to Mahout Development

Non-distributed recommenders ("Taste") (integrated)
Distributed Item-Based Collaborative Filtering (integrated)
Collaborative Filtering using a parallel matrix factorization (integrated)
Who is using it?
Opportunities
Developers
Researchers
Small Business
Large Business
Consultancy...
on Mahout
on specific data analysis
Open data
etc...
Apache Mahout
Business?

Ideas?

Suggestions?

Questions?
Where to start?
Wikipedia Bayes Example
https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html

What does it do?

Classify wikipedia data dump by countries.
Objective: Predict what country an unseen article
should be categorized into.
References
General
http://www.slideshare.net/sdec2011/sdec2011-mahout-the-what-the-how-and-
the-why
http://www.slideshare.net/gsingers/intro-to-mahout-dc-hadoop
http://www.slideshare.net/aneeshabakharia/lca2011-mahout
Hands-on
http://www.slideshare.net/OReillyOSCON/hands-on-mahout
Who is using it?
https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Apache Mahout
http://mahout.apache.org/
Quickstart
https://cwiki.apache.org/MAHOUT/quickstart.html

S-ar putea să vă placă și

Learning Guide: Cardiovascular Diseases: Be Able To Discuss Each of The Following
Document2 pagini
Learning Guide: Cardiovascular Diseases: Be Able To Discuss Each of The Following
Habib Mrad
Încă nu există evaluări
3.0 - Matrix Properties
Document2 pagini
3.0 - Matrix Properties
Habib Mrad
Încă nu există evaluări
Cardiology Today Next Gen Innovators: Meet The
Document1 pagină
Cardiology Today Next Gen Innovators: Meet The
Habib Mrad
100% (1)
3.2 - Hypothesis Testing (P-Value Approach)
Document3 pagini
3.2 - Hypothesis Testing (P-Value Approach)
Habib Mrad
Încă nu există evaluări
6.0 - Test of Proportion
Document3 pagini
6.0 - Test of Proportion
Habib Mrad
Încă nu există evaluări
Huang Meta Analyses Stat Methods Med Res 2014 0962280214537394
Document35 pagini
Huang Meta Analyses Stat Methods Med Res 2014 0962280214537394
Habib Mrad
Încă nu există evaluări
4.0 - Matrix Inverse
Document2 pagini
4.0 - Matrix Inverse
Habib Mrad
Încă nu există evaluări
Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
Document27 pagini
Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
Habib Mrad
Încă nu există evaluări
Class Notes
Document147 pagini
Class Notes
Habib Mrad
Încă nu există evaluări
5.4 - Eigendecomposition
Document2 pagini
5.4 - Eigendecomposition
Habib Mrad
Încă nu există evaluări
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Document33 pagini
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Habib Mrad
Încă nu există evaluări
Generalization Bounds and Stability: 9.520 Class 14, 03 April 2006 Sasha Rakhlin
Document25 pagini
Generalization Bounds and Stability: 9.520 Class 14, 03 April 2006 Sasha Rakhlin
Habib Mrad
Încă nu există evaluări
9780521190176
Document344 pagini
9780521190176
Habib Mrad
Încă nu există evaluări
Data Science Capstone - Week 2 Milestone - Exploratory Data Analysis On Text Files
Document7 pagini
Data Science Capstone - Week 2 Milestone - Exploratory Data Analysis On Text Files
Habib Mrad
Încă nu există evaluări
Class 03
Document40 pagini
Class 03
Habib Mrad
Încă nu există evaluări
Class 02
Document42 pagini
Class 02
Habib Mrad
Încă nu există evaluări
Class 01
Document75 pagini
Class 01
Habib Mrad
Încă nu există evaluări
Stat 1
Document67 pagini
Stat 1
Habib Mrad
Încă nu există evaluări
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
De la Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Evaluare: 4 din 5 stele
4/5 (895)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
De la Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Evaluare: 4 din 5 stele
4/5 (5794)
Shoe Dog: A Memoir by the Creator of Nike
De la Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Evaluare: 4.5 din 5 stele
4.5/5 (537)
Grit: The Power of Passion and Perseverance
De la Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Evaluare: 4 din 5 stele
4/5 (588)
The Yellow House: A Memoir (2019 National Book Award Winner)
De la Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Evaluare: 4 din 5 stele
4/5 (98)
Principles: Life and Work
De la Everand
Principles: Life and Work
Ray Dalio
Evaluare: 4 din 5 stele
4/5 (599)
Yes Please
De la Everand
Yes Please
Amy Poehler
Evaluare: 4 din 5 stele
4/5 (1891)
The Little Book of Hygge: Danish Secrets to Happy Living
De la Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Evaluare: 3.5 din 5 stele
3.5/5 (400)
Never Split the Difference: Negotiating As If Your Life Depended On It
De la Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Evaluare: 4.5 din 5 stele
4.5/5 (838)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
De la Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Evaluare: 4.5 din 5 stele
4.5/5 (474)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
De la Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Evaluare: 3.5 din 5 stele
3.5/5 (231)
Rise of ISIS: A Threat We Can't Ignore
De la Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Evaluare: 3.5 din 5 stele
3.5/5 (137)
The Emperor of All Maladies: A Biography of Cancer
De la Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Evaluare: 4.5 din 5 stele
4.5/5 (271)
Fear: Trump in the White House
De la Everand
Fear: Trump in the White House
Bob Woodward
Evaluare: 3.5 din 5 stele
3.5/5 (738)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
De la Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Evaluare: 4.5 din 5 stele
4.5/5 (266)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
De la Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Evaluare: 4.5 din 5 stele
4.5/5 (345)
On Fire: The (Burning) Case for a Green New Deal
De la Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Evaluare: 4 din 5 stele
4/5 (74)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
De la Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Evaluare: 3.5 din 5 stele
3.5/5 (2259)
Team of Rivals: The Political Genius of Abraham Lincoln
De la Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Evaluare: 4.5 din 5 stele
4.5/5 (234)
The Unwinding: An Inner History of the New America
De la Everand
The Unwinding: An Inner History of the New America
George Packer
Evaluare: 4 din 5 stele
4/5 (45)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
De la Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Evaluare: 4 din 5 stele
4/5 (1090)
Angela's Ashes: A Memoir
De la Everand
Angela's Ashes: A Memoir
Frank McCourt
Evaluare: 4.5 din 5 stele
4.5/5 (440)
Steve Jobs
De la Everand
Steve Jobs
Walter Isaacson
Evaluare: 4.5 din 5 stele
4.5/5 (806)
Bad Feminist: Essays
De la Everand
Bad Feminist: Essays
Roxane Gay
Evaluare: 4 din 5 stele
4/5 (1016)
The Glass Castle: A Memoir
De la Everand
The Glass Castle: A Memoir
Jeannette Walls
Evaluare: 4.5 din 5 stele
4.5/5 (1713)
John Adams
De la Everand
John Adams
David McCullough
Evaluare: 4.5 din 5 stele
4.5/5 (2409)
The Outsider: A Novel
De la Everand
The Outsider: A Novel
Stephen King
Evaluare: 4 din 5 stele
4/5 (1839)
The Light Between Oceans: A Novel
De la Everand
The Light Between Oceans: A Novel
M.L. Stedman
Evaluare: 4.5 din 5 stele
4.5/5 (789)
Brooklyn: A Novel
De la Everand
Brooklyn: A Novel
Colm Toibin
Evaluare: 3.5 din 5 stele
3.5/5 (1937)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
De la Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Evaluare: 4.5 din 5 stele
4.5/5 (121)
The Woman in Cabin 10
De la Everand
The Woman in Cabin 10
Ruth Ware
Evaluare: 3.5 din 5 stele
3.5/5 (2322)
Little Women
De la Everand
Little Women
Louisa May Alcott
Evaluare: 4 din 5 stele
4/5 (104)
A Man Called Ove: A Novel
De la Everand
A Man Called Ove: A Novel
Fredrik Backman
Evaluare: 4.5 din 5 stele
4.5/5 (4609)
Wolf Hall: A Novel
De la Everand
Wolf Hall: A Novel
Hilary Mantel
Evaluare: 4 din 5 stele
4/5 (3811)
Manhattan Beach: A Novel
De la Everand
Manhattan Beach: A Novel
Jennifer Egan
Evaluare: 3.5 din 5 stele
3.5/5 (792)
The Perks of Being a Wallflower
De la Everand
The Perks of Being a Wallflower
Stephen Chbosky
Evaluare: 4.5 din 5 stele
4.5/5 (2104)
The Art of Racing in the Rain: A Novel
De la Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Evaluare: 4 din 5 stele
4/5 (4200)
The Constant Gardener: A Novel
De la Everand
The Constant Gardener: A Novel
John le Carré
Evaluare: 3.5 din 5 stele
3.5/5 (104)
A Tree Grows in Brooklyn
De la Everand
A Tree Grows in Brooklyn
Betty Smith
Evaluare: 4.5 din 5 stele
4.5/5 (1929)
Her Body and Other Parties: Stories
De la Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Evaluare: 4 din 5 stele
4/5 (821)
Sing, Unburied, Sing: A Novel
De la Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Evaluare: 4 din 5 stele
4/5 (1103)
The Future by ChatGPT
Document41 pagini
The Future by ChatGPT
hoaihung
Încă nu există evaluări
An Introduction To Machine Learning and Analysis of Its Use in Rheumatic Diseases
Document21 pagini
An Introduction To Machine Learning and Analysis of Its Use in Rheumatic Diseases
Agner R Parra
Încă nu există evaluări
Chapter 6. Decision Tree Classification
Document19 pagini
Chapter 6. Decision Tree Classification
muhamad saepul
Încă nu există evaluări
Fundamentals of Neural Networks PDF
Document476 pagini
Fundamentals of Neural Networks PDF
ShivaPrasad
100% (1)
Comparing CNN and Imaging Processing Seismic Fault Detection Methods, Qi, Et Al, 2020
Document4 pagini
Comparing CNN and Imaging Processing Seismic Fault Detection Methods, Qi, Et Al, 2020
Andrés Manuel
Încă nu există evaluări
Assignment Week 11-Deep-Learning PDF
Document7 pagini
Assignment Week 11-Deep-Learning PDF
ashish kumar
100% (2)
A Siamese Neural Network Based Face Recognition From Masked Faces ARXIV
Document11 pagini
A Siamese Neural Network Based Face Recognition From Masked Faces ARXIV
nidhal karchoud
Încă nu există evaluări
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
Document25 pagini
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
HakanKalaycı
Încă nu există evaluări
NeuralNetwork Learning
Document22 pagini
NeuralNetwork Learning
Mayuri Bapat
Încă nu există evaluări
Fundamentals of Deep Learning
Document40 pagini
Fundamentals of Deep Learning
Ankit Biswas
Încă nu există evaluări
Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models
Document25 pagini
Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models
Nazarbayev Nursultan
Încă nu există evaluări
Ai Generated Script
Document1 pagină
Ai Generated Script
thr39287
Încă nu există evaluări
2021 10 11 - Intro ML - Inserm
Document41 pagini
2021 10 11 - Intro ML - Inserm
po esperitable
Încă nu există evaluări
Teshale Abstract
Document1 pagină
Teshale Abstract
Bekaam
Încă nu există evaluări
Amharic Abstractive Text Summarization
Document5 pagini
Amharic Abstractive Text Summarization
polymorph
Încă nu există evaluări
Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)
Document3 pagini
Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)
Aviral Lamsal
Încă nu există evaluări
Chartered Data Scientists Curriculum 2020 PDF
Document4 pagini
Chartered Data Scientists Curriculum 2020 PDF
Rohit Roy
Încă nu există evaluări
ICT - Artificial Intelligence
Document14 pagini
ICT - Artificial Intelligence
Freaky Alchemist
Încă nu există evaluări
Sample Project Report
Document26 pagini
Sample Project Report
ABHI
Încă nu există evaluări
Book Name Authors Publisher Amazon Link
Document2 pagini
Book Name Authors Publisher Amazon Link
Omar Shahid
Încă nu există evaluări
Machine Learning
Document4 pagini
Machine Learning
Krishna Chivukula
100% (2)
Exam 2003 B
Document20 pagini
Exam 2003 B
kib6707
Încă nu există evaluări
Deep Learning PIAIC
Document229 pagini
Deep Learning PIAIC
waqarmwach
100% (1)
Facial Emotion Recognition Using Convolutional Neural Networks
Document11 pagini
Facial Emotion Recognition Using Convolutional Neural Networks
Sudesha Basu
Încă nu există evaluări
CI Prebook
Document10 pagini
CI Prebook
Mohana Krishnan
Încă nu există evaluări
Deep Convolutional Neural Network With Mixup
Document12 pagini
Deep Convolutional Neural Network With Mixup
sr5160
Încă nu există evaluări
Car License Plate Recognition Using MATLABs Code
Document27 pagini
Car License Plate Recognition Using MATLABs Code
wesam
Încă nu există evaluări
Section A: Ques. 1
Document31 pagini
Section A: Ques. 1
Anmol Rai
Încă nu există evaluări
Features Extraction and Reduction Techniques With Optimized SVM For Persian/Arabic Handwritten Digits Recognition
Document19 pagini
Features Extraction and Reduction Techniques With Optimized SVM For Persian/Arabic Handwritten Digits Recognition
Yudish
Încă nu există evaluări
Improved Optical Character Recognition With Deep Neural Network
Document5 pagini
Improved Optical Character Recognition With Deep Neural Network
harshithays
Încă nu există evaluări