Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learning Apache Mahout Classification
Learning Apache Mahout Classification
Learning Apache Mahout Classification
Ebook227 pages1 hour

Learning Apache Mahout Classification

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Explore the different types of classification algorithms available in Apache Mahout
  • Create and evaluate your own ready-to-use classification models using real world datasets
  • A practical guide to problems faced in classification with concepts explained in an easy-to-understand manner
Who This Book Is For

If you are a data scientist who has some experience with the Hadoop ecosystem and machine learning methods and want to try out classification on large datasets using Mahout, this book is ideal for you. Knowledge of Java is essential.

LanguageEnglish
Release dateFeb 26, 2015
ISBN9781783554966
Learning Apache Mahout Classification

Related to Learning Apache Mahout Classification

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Learning Apache Mahout Classification

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learning Apache Mahout Classification - Gupta Ashish

    Table of Contents

    Learning Apache Mahout Classification

    Credits

    About the Author

    About the Reviewers

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    Why subscribe?

    Free access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    1. Classification in Data Analysis

    Introducing the classification

    Application of the classification system

    Working of the classification system

    Classification algorithms

    Model evaluation techniques

    The confusion matrix

    The Receiver Operating Characteristics (ROC) graph

    Area under the ROC curve

    The entropy matrix

    Summary

    2. Apache Mahout

    Introducing Apache Mahout

    Algorithms supported in Mahout

    Reasons for Mahout being a good choice for classification

    Installing Mahout

    Building Mahout from source using Maven

    Installing Maven

    Building Mahout code

    Setting up a development environment using Eclipse

    Setting up Mahout for a Windows user

    Summary

    3. Learning Logistic Regression / SGD Using Mahout

    Introducing regression

    Understanding linear regression

    Cost function

    Gradient descent

    Logistic regression

    Stochastic Gradient Descent

    Using Mahout for logistic regression

    Summary

    4. Learning the Naïve Bayes Classification Using Mahout

    Introducing conditional probability and the Bayes rule

    Understanding the Naïve Bayes algorithm

    Understanding the terms used in text classification

    Using the Naïve Bayes algorithm in Apache Mahout

    Summary

    5. Learning the Hidden Markov Model Using Mahout

    Deterministic and nondeterministic patterns

    The Markov process

    Introducing the Hidden Markov Model

    Using Mahout for the Hidden Markov Model

    Summary

    6. Learning Random Forest Using Mahout

    Decision tree

    Random forest

    Using Mahout for Random forest

    Steps to use the Random forest algorithm in Mahout

    Summary

    7. Learning Multilayer Perceptron Using Mahout

    Neural network and neurons

    Multilayer Perceptron

    MLP implementation in Mahout

    Using Mahout for MLP

    Steps to use the MLP algorithm in Mahout

    Summary

    8. Mahout Changes in the Upcoming Release

    Mahout new changes

    Mahout Scala and Spark bindings

    Apache Spark

    Using Mahout's Spark shell

    H2O platform integration

    Summary

    9. Building an E-mail Classification System Using Apache Mahout

    Spam e-mail dataset

    Creating the model using the Assassin dataset

    Program to use a classifier model

    Testing the program

    Second use case as an exercise

    The ASF e-mail dataset

    Classifiers tuning

    Summary

    Index

    Learning Apache Mahout Classification


    Learning Apache Mahout Classification

    Copyright © 2015 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: February 2015

    Production reference: 1210215

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78355-495-9

    www.packtpub.com

    Credits

    Author

    Ashish Gupta

    Reviewers

    Siva Prakash

    Tharindu Rusira

    Vishnu Viswanath

    Commissioning Editor

    Akram Hussain

    Acquisition Editor

    Reshma Raman

    Content Development Editor

    Merwyn D'souza

    Technical Editors

    Monica John

    Novina Kewalramani

    Shruti Rawool

    Copy Editors

    Sarang Chari

    Gladson Monteiro

    Aarti Saldanha

    Rashmi Sawant

    Project Coordinator

    Neha Bhatnagar

    Proofreaders

    Simran Bhogal

    Steve Maguire

    Indexer

    Monica Ajmera Mehta

    Graphics

    Sheetal Aute

    Abhinash Sahu

    Production Coordinator

    Conidon Miranda

    Cover Work

    Conidon Miranda

    About the Author

    Ashish Gupta has been working in the field of software development for the last 8 years. He has worked in different companies, such as SAP Labs and Caterpillar, as a software developer. While working for a start-up where he was responsible for predicting potential customers for new fashion apparels using social media, he developed an interest in the field of machine learning. Since then, he has worked on using big data technologies and machine learning for different industries, including retail, finance, insurance, and so on. He has a passion for learning new technologies and sharing the knowledge thus gained with others. He has organized many boot camps for the Apache Mahout and Hadoop ecosystem.

    First of all, I would like to thank open source communities for their continuous efforts in developing great software for all. I would like to thank Merwyn D'Souza and Reshma Raman, my editors for this project. Special thanks to the reviewers of this book.

    Nothing can be accomplished without the support of family, friends, and loved ones. I would like to thank my friends, family, and especially my wife and my son for their continuous support throughout the writing of this book.

    About the Reviewers

    Siva Prakash is working as a tech lead in Bangalore. He has extensive development experience in the analysis, design, development, implementation, and maintenance of various desktop, mobile, and web-based applications. He loves trekking, traveling, music, reading books, and blogging.

    You can find him on LinkedIn at https://www.linkedin.com/in/techsivam.

    Tharindu Rusira is currently a computer science and engineering undergraduate at the University of Moratuwa, Sri Lanka. As a student researcher, he has strong interests in machine learning, compilers, and high-performance computing.

    Tharindu has also worked as a research and development software engineering intern at Zaizi Asia (Pvt) Ltd., where he first started using Apache Mahout during the implementation of an enterprise-level content management and information retrieval system.

    He sees the potential of Apache Mahout as a scalable machine learning library for industry-level implementations and has even contributed to the Mahout 0.9 release, the latest stable release of Mahout.

    He is available on LinkedIn at https://www.linkedin.com/in/trusira.

    Vishnu Viswanath is a senior big data developer who has many years of industrial expertise in the arena of machine learning. He is a tech enthusiast and is passionate about big data and has expertise on most big-data-related technologies.

    You can find him on LinkedIn at http://in.linkedin.com/in/vishnuviswanath25.

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www2.packtpub.com/books/subscription/packtlib

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Free access for Packt account holders

    If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

    Preface

    Thanks to the progress made in the hardware industries, our storage capacity has increased, and because of this, there are many organizations

    Enjoying the preview?
    Page 1 of 1