R Data Science Essentials
2/5
()
About this ebook
About This Book
- Become a pro at making stunning visualizations and dashboards quickly and without hassle
- For better decision making in business, apply the R programming language with the help of useful statistical techniques.
- From seasoned authors comes a book that offers you a plethora of fast-paced techniques to detect and analyze data patterns
Who This Book Is For
If you are an aspiring data scientist or analyst who has a basic understanding of data science and has basic hands-on experience in R or any other analytics tool, then R Data Science Essentials is the book for you.
What You Will Learn
- Perform data preprocessing and basic operations on data
- Implement visual and non-visual implementation data exploration techniques
- Mine patterns from data using affinity and sequential analysis
- Use different clustering algorithms and visualize them
- Implement logistic and linear regression and find out how to evaluate and improve the performance of an algorithm
- Extract patterns through visualization and build a forecasting algorithm
- Build a recommendation engine using different collaborative filtering algorithms
- Make a stunning visualization and dashboard using ggplot and R shiny
In Detail
With organizations increasingly embedding data science across their enterprise and with management becoming more data-driven it is an urgent requirement for analysts and managers to understand the key concept of data science. The data science concepts discussed in this book will help you make key decisions and solve the complex problems you will inevitably face in this new world.
R Data Science Essentials will introduce you to various important concepts in the field of data science using R. We start by reading data from multiple sources, then move on to processing the data, extracting hidden patterns, building predictive and forecasting models, building a recommendation engine, and communicating to the user through stunning visualizations and dashboards.
By the end of this book, you will have an understanding of some very important techniques in data science, be able to implement them using R, understand and interpret the outcomes, and know how they helps businesses make a decision.
Style and approach
This easy-to-follow guide contains hands-on examples of the concepts of data science using R.
Related to R Data Science Essentials
Related ebooks
R Machine Learning By Example Rating: 0 out of 5 stars0 ratingsR High Performance Programming Rating: 4 out of 5 stars4/5R for Data Science Rating: 5 out of 5 stars5/5R Machine Learning Essentials Rating: 0 out of 5 stars0 ratingsIntroduction to R for Business Intelligence Rating: 0 out of 5 stars0 ratingsMastering Machine Learning with R Rating: 0 out of 5 stars0 ratingsData Analysis with R Rating: 5 out of 5 stars5/5Mastering Predictive Analytics with R Rating: 4 out of 5 stars4/5Web Application Development with R Using Shiny - Second Edition Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsPractical Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Bayesian Models with R Rating: 5 out of 5 stars5/5Regression Analysis with Python Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5Practical Data Science Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Machine Learning with R - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Probabilistic Graphical Models in R Rating: 0 out of 5 stars0 ratingsR Object-oriented Programming Rating: 3 out of 5 stars3/5Learning Social Media Analytics with R Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Getting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsPython Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsCreating Data Stories with Tableau Public Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsPython Data Analysis Rating: 4 out of 5 stars4/5Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsMastering Social Media Mining with R Rating: 5 out of 5 stars5/5Learning Data Mining with Python - Second Edition Rating: 0 out of 5 stars0 ratings
Computers For You
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Master Builder Roblox: The Essential Guide Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsElon Musk Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands Rating: 5 out of 5 stars5/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsMastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5AP® Computer Science Principles Crash Course Rating: 0 out of 5 stars0 ratingsGarageBand Basics: The Complete Guide to GarageBand: Music Rating: 0 out of 5 stars0 ratingsDark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5
Reviews for R Data Science Essentials
1 rating1 review
- Rating: 2 out of 5 stars2/5If you want to learn R, this book is not a good choice.
Book preview
R Data Science Essentials - Koushik Raja B.
Table of Contents
R Data Science Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Started with R
Reading data from different sources
Reading data from a database
Data types in R
Variable data types
Data preprocessing techniques
Performing data operations
Arithmetic operations on the data
String operations on the data
Aggregation operations on the data
Mean
Median
Sum
Maximum and minimum
Standard deviation
Control structures in R
Control structures – if and else
Control structures – for
Control structures – while
Control structures – repeat and break
Control structures – next and return
Bringing data to a usable format
Summary
2. Exploratory Data Analysis
The Titanic dataset
Descriptive statistics
Box plot
Exercise
Inferential statistics
Univariate analysis
Bivariate analysis
Multivariate analysis
Cross-tabulation analysis
Graphical analysis
Summary
3. Pattern Discovery
Transactional datasets
Using the built-in dataset
Building the dataset
Apriori analysis
Support, confidence, and lift
Support
Confidence
Lift
Generating filtering rules
Plotting
Dataset
Rules
Sequential dataset
Apriori sequence analysis
Understanding the results
Reference
Business cases
Summary
4. Segmentation Using Clustering
Datasets
Reading and formatting the dataset in R
Centroid-based clustering and an ideal number of clusters
Implementation using K-means
Visualizing the clusters
Connectivity-based clustering
Visualizing the connectivity
Business use cases
Summary
5. Developing Regression Models
Datasets
Sampling the dataset
Logistic regression
Evaluating logistic regression
Linear regression
Evaluating linear regression
Methods to improve the accuracy
Ensemble models
Replacing NA with mean or median
Removing the highly correlated values
Removing outliers
Summary
6. Time Series Forecasting
Datasets
Extracting patterns
Forecasting using ARIMA
Forecasting using Holt-Winters
Methods to improve accuracy
Summary
7. Recommendation Engine
Dataset and transformation
Recommendations using user-based CF
Recommendations using item-based CF
Challenges and enhancements
Summary
8. Communicating Data Analysis
Dataset
Plotting using the googleVis package
Creating an interactive dashboard using Shiny
Summary
Index
R Data Science Essentials
R Data Science Essentials
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2016
Production reference:1040116
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B32PB, UK.
ISBN 978-1-78528-654-4
www.packtpub.com
Credits
Authors
Raja B. Koushik
Sharan Kumar Ravindran
Reviewers
Jeremy Gray
Navin K Manaswi
Commissioning Editor
Dipika Gaonkar
Acquisition Editor
Manish Nainan
Content Development Editor
Mehvash Fatima
Technical Editor
Suwarna Patil
Copy Editor
Tasneem Fatehi
Project Coordinator
Shipra Chawhan
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Arvindkumar Gupta
Cover Work
Arvindkumar Gupta
About the Authors
Raja B. Koushik is a business intelligence professional with over 7 years of experience and is currently working in one of the leading international IT services companies. His primary interest lies for business intelligence technologies, such as ETL, reporting, and dashboarding, along with analytics based on statistics. He has worked with one of the world's largest companies for both their U.S. as well as UK business in the healthcare and leasing domains. He holds an engineering degree with specialization in information technology from Anna University.
I would like to thank my friends, for I don't know how far I would have come without you guys. I would like to thank Sharan, for giving me this opportunity and also to the Packt team for their constant support. I would like to dedicate this book to Saranya, my wife, for always believing in me and for being so encouraging and supportive of my endeavours; to Shravani, my little bundle of joy, for all the joy and happiness that she has given me; last but not the least, to my parents, Mr Boopalan and Mrs Geetha, without you both I am nothing.
Sharan Kumar Ravindran is a data scientist with over 5 years of experience and is currently working with a leading e-commerce company in India. His primary interest lies in statistics and machine learning, and he has worked with multiple customers across Europe and the U.S. in the e-commerce and IoT domains. He holds an MBA degree with specialization in marketing and business analysis. He conducts workshops, partnering with Anna University, to train their staff, research scholars, and volunteers in analytics. In addition to co-authoring Data Science Essentials with R by Packt Publishing, Sharan has also co-authored Mastering Social Media Mining with R by Packt Publishing. He maintains www.rsharankumar.com, a website with links to his social profiles and data blog.
I would like to thank all my friends, colleagues, and family members, without whom I wouldn't have learned as much as I did. I would also like to thank the readers of my first book, Mastering Social Media Mining, whose feedback helped me a lot. I would like to specially thank my mother, dad, wife, and sister for all the support they provided. I would like to dedicate this book to my grandparents, son, and niece.
About the Reviewers
Jeremy Gray is a data scientist with over 8 years of experience and is based in Toronto.
He completed his PhD in biology at the University of Auckland (the birthplace of R) and worked as a post-doctoral fellow and course instructor at the University of Toronto. His research interests are primarily in using R as an integrated machine learning environment, financial modeling, and consumer analytics, as well as pedagogical methods in scientific computing.
I would like to thank my wonderful fiancé, Mandy Cheema, for her support during the reviewing of this book.
Navin K Manaswi is a data science professional who loves to delve into messy complex data to bring meaningful insights out of it. Although he has been recognized as one of the top 10 data scientists in India, he still loves to learn everyday as a curious child does. Having done both his bachelor's and master's from IIT Kanpur, he has been contributing to the world of data analytics, machine learning, big data technologies, and business intelligence. So far, he has worked at the intersection of technologies and business domains of supply chain management, sales and marketing, finance, and healthcare.
I would like to thank my mother, Smt. Geeta, for invaluable guidance.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and readPackt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
According to an article in Harvard Business Review, a data scientist's job is the best job of the 21st century. With the massive explosion in the amount of data generated, and with organizations becoming increasingly data-driven, the requirement for data science professionals is ever increasing.
R Data Science