Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

R Data Science Essentials
R Data Science Essentials
R Data Science Essentials
Ebook265 pages1 hour

R Data Science Essentials

Rating: 2 out of 5 stars

2/5

()

Read preview

About this ebook

Learn the essence of data science and visualization using R in no time at all

About This Book

- Become a pro at making stunning visualizations and dashboards quickly and without hassle
- For better decision making in business, apply the R programming language with the help of useful statistical techniques.
- From seasoned authors comes a book that offers you a plethora of fast-paced techniques to detect and analyze data patterns

Who This Book Is For

If you are an aspiring data scientist or analyst who has a basic understanding of data science and has basic hands-on experience in R or any other analytics tool, then R Data Science Essentials is the book for you.

What You Will Learn

- Perform data preprocessing and basic operations on data
- Implement visual and non-visual implementation data exploration techniques
- Mine patterns from data using affinity and sequential analysis
- Use different clustering algorithms and visualize them
- Implement logistic and linear regression and find out how to evaluate and improve the performance of an algorithm
- Extract patterns through visualization and build a forecasting algorithm
- Build a recommendation engine using different collaborative filtering algorithms
- Make a stunning visualization and dashboard using ggplot and R shiny

In Detail

With organizations increasingly embedding data science across their enterprise and with management becoming more data-driven it is an urgent requirement for analysts and managers to understand the key concept of data science. The data science concepts discussed in this book will help you make key decisions and solve the complex problems you will inevitably face in this new world.
R Data Science Essentials will introduce you to various important concepts in the field of data science using R. We start by reading data from multiple sources, then move on to processing the data, extracting hidden patterns, building predictive and forecasting models, building a recommendation engine, and communicating to the user through stunning visualizations and dashboards.
By the end of this book, you will have an understanding of some very important techniques in data science, be able to implement them using R, understand and interpret the outcomes, and know how they helps businesses make a decision.

Style and approach

This easy-to-follow guide contains hands-on examples of the concepts of data science using R.
LanguageEnglish
Release dateJan 13, 2016
ISBN9781785286360
R Data Science Essentials

Related to R Data Science Essentials

Related ebooks

Computers For You

View More

Related articles

Reviews for R Data Science Essentials

Rating: 2 out of 5 stars
2/5

1 rating1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 2 out of 5 stars
    2/5
    If you want to learn R, this book is not a good choice.

Book preview

R Data Science Essentials - Koushik Raja B.

Table of Contents

R Data Science Essentials

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started with R

Reading data from different sources

Reading data from a database

Data types in R

Variable data types

Data preprocessing techniques

Performing data operations

Arithmetic operations on the data

String operations on the data

Aggregation operations on the data

Mean

Median

Sum

Maximum and minimum

Standard deviation

Control structures in R

Control structures – if and else

Control structures – for

Control structures – while

Control structures – repeat and break

Control structures – next and return

Bringing data to a usable format

Summary

2. Exploratory Data Analysis

The Titanic dataset

Descriptive statistics

Box plot

Exercise

Inferential statistics

Univariate analysis

Bivariate analysis

Multivariate analysis

Cross-tabulation analysis

Graphical analysis

Summary

3. Pattern Discovery

Transactional datasets

Using the built-in dataset

Building the dataset

Apriori analysis

Support, confidence, and lift

Support

Confidence

Lift

Generating filtering rules

Plotting

Dataset

Rules

Sequential dataset

Apriori sequence analysis

Understanding the results

Reference

Business cases

Summary

4. Segmentation Using Clustering

Datasets

Reading and formatting the dataset in R

Centroid-based clustering and an ideal number of clusters

Implementation using K-means

Visualizing the clusters

Connectivity-based clustering

Visualizing the connectivity

Business use cases

Summary

5. Developing Regression Models

Datasets

Sampling the dataset

Logistic regression

Evaluating logistic regression

Linear regression

Evaluating linear regression

Methods to improve the accuracy

Ensemble models

Replacing NA with mean or median

Removing the highly correlated values

Removing outliers

Summary

6. Time Series Forecasting

Datasets

Extracting patterns

Forecasting using ARIMA

Forecasting using Holt-Winters

Methods to improve accuracy

Summary

7. Recommendation Engine

Dataset and transformation

Recommendations using user-based CF

Recommendations using item-based CF

Challenges and enhancements

Summary

8. Communicating Data Analysis

Dataset

Plotting using the googleVis package

Creating an interactive dashboard using Shiny

Summary

Index

R Data Science Essentials


R Data Science Essentials

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference:1040116

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B32PB, UK.

ISBN 978-1-78528-654-4

www.packtpub.com

Credits

Authors

Raja B. Koushik

Sharan Kumar Ravindran

Reviewers

Jeremy Gray

Navin K Manaswi

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Manish Nainan

Content Development Editor

Mehvash Fatima

Technical Editor

Suwarna Patil

Copy Editor

Tasneem Fatehi

Project Coordinator

Shipra Chawhan

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

About the Authors

Raja B. Koushik is a business intelligence professional with over 7 years of experience and is currently working in one of the leading international IT services companies. His primary interest lies for business intelligence technologies, such as ETL, reporting, and dashboarding, along with analytics based on statistics. He has worked with one of the world's largest companies for both their U.S. as well as UK business in the healthcare and leasing domains. He holds an engineering degree with specialization in information technology from Anna University.

I would like to thank my friends, for I don't know how far I would have come without you guys. I would like to thank Sharan, for giving me this opportunity and also to the Packt team for their constant support. I would like to dedicate this book to Saranya, my wife, for always believing in me and for being so encouraging and supportive of my endeavours; to Shravani, my little bundle of joy, for all the joy and happiness that she has given me; last but not the least, to my parents, Mr Boopalan and Mrs Geetha, without you both I am nothing.

Sharan Kumar Ravindran is a data scientist with over 5 years of experience and is currently working with a leading e-commerce company in India. His primary interest lies in statistics and machine learning, and he has worked with multiple customers across Europe and the U.S. in the e-commerce and IoT domains. He holds an MBA degree with specialization in marketing and business analysis. He conducts workshops, partnering with Anna University, to train their staff, research scholars, and volunteers in analytics. In addition to co-authoring Data Science Essentials with R by Packt Publishing, Sharan has also co-authored Mastering Social Media Mining with R by Packt Publishing. He maintains www.rsharankumar.com, a website with links to his social profiles and data blog.

I would like to thank all my friends, colleagues, and family members, without whom I wouldn't have learned as much as I did. I would also like to thank the readers of my first book, Mastering Social Media Mining, whose feedback helped me a lot. I would like to specially thank my mother, dad, wife, and sister for all the support they provided. I would like to dedicate this book to my grandparents, son, and niece.

About the Reviewers

Jeremy Gray is a data scientist with over 8 years of experience and is based in Toronto.

He completed his PhD in biology at the University of Auckland (the birthplace of R) and worked as a post-doctoral fellow and course instructor at the University of Toronto. His research interests are primarily in using R as an integrated machine learning environment, financial modeling, and consumer analytics, as well as pedagogical methods in scientific computing.

I would like to thank my wonderful fiancé, Mandy Cheema, for her support during the reviewing of this book.

Navin K Manaswi is a data science professional who loves to delve into messy complex data to bring meaningful insights out of it. Although he has been recognized as one of the top 10 data scientists in India, he still loves to learn everyday as a curious child does. Having done both his bachelor's and master's from IIT Kanpur, he has been contributing to the world of data analytics, machine learning, big data technologies, and business intelligence. So far, he has worked at the intersection of technologies and business domains of supply chain management, sales and marketing, finance, and healthcare.

I would like to thank my mother, Smt. Geeta, for invaluable guidance.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and readPackt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

According to an article in Harvard Business Review, a data scientist's job is the best job of the 21st century. With the massive explosion in the amount of data generated, and with organizations becoming increasingly data-driven, the requirement for data science professionals is ever increasing.

R Data Science

Enjoying the preview?
Page 1 of 1