Documente Academic
Documente Profesional
Documente Cultură
Customizing software
Available Machine Learning Tools
WEKA
R
KEEL
Others...
Not enough?
Apache Mahout vs others?
Many open source Machine Learning
libraries either:
Lack Community
Lack Documentation and Examples
Lack the Apache License
(business opportunity)
Are research-oriented
(not fit for production yet)
Lack Scalability
Mahout = Elephant Driver?
Why we need scalability?
Big Data
Applications
Recommendation features
Clustering of information
Classification
How do we do this?
Supported Algorithms
Classification
Clustering
Recommender / Collaborative Filtering
Evolutionary Algorithms
Pattern Mining
Regression
Dimension reduction
Similarity Vectors
Classification
(learn to assign categories to documents)
Fully functional
Logistic Regression (SGD)
Bayesian
Fully functional
Expectation Maximization (EM)
Hierarchical Clustering
Ideas?
Suggestions?
Questions?
Where to start?
Wikipedia Bayes Example
https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html