End-to-End Data Science with SAS: A Hands-On Programming Guide

Ebook621 pages5 hours

End-to-End Data Science with SAS: A Hands-On Programming Guide

Name: End-to-End Data Science with SAS: A Hands-On Programming Guide
Author: James Gearheart
ISBN: 9781642958065

By James Gearheart

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Learn data science concepts with real-world examples in SAS!

End-to-End Data Science with SAS: A Hands-On Programming Guide provides clear and practical explanations of the data science environment, machine learning techniques, and the SAS programming knowledge necessary to develop machine learning models in any industry. The book covers concepts including understanding the business need, creating a modeling data set, linear regression, parametric classification models, and non-parametric classification models. Real-world business examples and example code are used to demonstrate each process step-by-step.

Although a significant amount of background information and supporting mathematics are presented, the book is not structured as a textbook, but rather it is a user’s guide for the application of data science and machine learning in a business environment. Readers will learn how to think like a data scientist, wrangle messy data, choose a model, and evaluate the model’s effectiveness. New data scientists or professionals who want more experience with SAS will find this book to be an invaluable reference. Take your data science career to the next level by mastering SAS programming for machine learning models.

Skip carousel

LanguageEnglish

PublisherSAS Institute

Release dateJun 26, 2020

ISBN9781642958065

Author

James Gearheart

James Gearheart is an experienced Senior Data Scientist and Machine Learning Engineer who has developed transformative machine learning and artificial intelligence models. He has over 20 years of experience in developing and analyzing statistical models and machine learning products that optimize business performance across several industries including digital marketing, financial services, public health care, environmental services, social security and worker’s compensation. James has been both a data science manager and an individual contributor on projects ranging from logistics, anti-money laundering, CRM, cash demand forecasting, customer segmentation, digital marketing, A/B testing, attribution and attrition. His research interests include applied data science, machine learning, and artificial intelligence.

Related authors

Skip carousel

Related to End-to-End Data Science with SAS

Related ebooks

Skip carousel

Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
Ebook
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
byKattamuri S. Sarma
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The R Perspective
Ebook
SAS Viya: The R Perspective
byYue Qi
Rating: 0 out of 5 stars
0 ratings
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Ebook
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
byLisa Fine
Rating: 0 out of 5 stars
0 ratings
PROC SQL: Beyond the Basics Using SAS, Third Edition
Ebook
PROC SQL: Beyond the Basics Using SAS, Third Edition
byKirk Paul Lafler
Rating: 0 out of 5 stars
0 ratings
Exercises and Projects for The Little SAS Book, Sixth Edition
Ebook
Exercises and Projects for The Little SAS Book, Sixth Edition
byRebecca A. Ottesen
Rating: 0 out of 5 stars
0 ratings
Mastering the SAS DS2 Procedure: Advanced Data-Wrangling Techniques, Second Edition
Ebook
Mastering the SAS DS2 Procedure: Advanced Data-Wrangling Techniques, Second Edition
byMark Jordan
Rating: 0 out of 5 stars
0 ratings
SAS Macro Programming Made Easy, Third Edition
Ebook
SAS Macro Programming Made Easy, Third Edition
byMichele M. Burlew
Rating: 3 out of 5 stars
3/5
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
Biostatistics by Example Using SAS Studio
Ebook
Biostatistics by Example Using SAS Studio
byRon Cody
Rating: 0 out of 5 stars
0 ratings
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
Ebook
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
Machine Learning with SAS Viya
Ebook
Machine Learning with SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
Practical and Efficient SAS Programming: The Insider's Guide
Ebook
Practical and Efficient SAS Programming: The Insider's Guide
byMartha Messineo
Rating: 0 out of 5 stars
0 ratings
The SAS Programmer's PROC REPORT Handbook: ODS Companion
Ebook
The SAS Programmer's PROC REPORT Handbook: ODS Companion
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
Ebook
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
bySAS Institute
Rating: 1 out of 5 stars
1/5
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ebook
Getting Started with SAS Programming: Using SAS Studio in the Cloud
byRon Cody
Rating: 0 out of 5 stars
0 ratings
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
Ebook
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
byChris Holland
Rating: 0 out of 5 stars
0 ratings
Learning SAS by Example: A Programmer's Guide, Second Edition
Ebook
Learning SAS by Example: A Programmer's Guide, Second Edition
byRon Cody
Rating: 3 out of 5 stars
3/5
Fundamentals of Programming in SAS: A Case Studies Approach
Ebook
Fundamentals of Programming in SAS: A Case Studies Approach
byJames Blum
Rating: 0 out of 5 stars
0 ratings
Advanced SQL with SAS
Ebook
Advanced SQL with SAS
byChristian FG Schendera
Rating: 0 out of 5 stars
0 ratings
Categorical Data Analysis Using SAS, Third Edition
Ebook
Categorical Data Analysis Using SAS, Third Edition
byMaura E. Stokes
Rating: 0 out of 5 stars
0 ratings
SAS Visual Analytics for SAS Viya
Ebook
SAS Visual Analytics for SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
SAS Statistics by Example
Ebook
SAS Statistics by Example
byRon Cody
Rating: 5 out of 5 stars
5/5
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
Ebook
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
byTricia Aanderud
Rating: 5 out of 5 stars
5/5
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
Base SAS Interview Questions You'll Most Likely Be Asked
Ebook
Base SAS Interview Questions You'll Most Likely Be Asked
byVibrant Publishers
Rating: 0 out of 5 stars
0 ratings
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
Ebook
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
bySAS Institute
Rating: 4 out of 5 stars
4/5
PROC DOCUMENT by Example Using SAS
Ebook
PROC DOCUMENT by Example Using SAS
byMichael Tuchman
Rating: 0 out of 5 stars
0 ratings
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
Ebook
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
byBarry K. Goodwin
Rating: 5 out of 5 stars
5/5
SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings
Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS
Ebook
Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS
byDr. Goutam Chakraborty
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
A Concise Guide to Object Orientated Programming
Ebook
A Concise Guide to Object Orientated Programming
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Microsoft Access: Database Creation and Management through Microsoft Access
Ebook
Microsoft Access: Database Creation and Management through Microsoft Access
bySteven Bright
Rating: 0 out of 5 stars
0 ratings
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
Ebook
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
byMike Girvin
Rating: 3 out of 5 stars
3/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Ultimate Enterprise Data Analysis and Forecasting using Python
Ebook
Ultimate Enterprise Data Analysis and Forecasting using Python
byShanthababu Pandian
Rating: 0 out of 5 stars
0 ratings
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Brainstorming and Beyond: A User-Centered Design Method
Ebook
Brainstorming and Beyond: A User-Centered Design Method
byChauncey Wilson
Rating: 0 out of 5 stars
0 ratings
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
Ebook
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
byRobert I. Kabacoff
Rating: 0 out of 5 stars
0 ratings
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
Machine Learning: A Comprehensive, Step-by-Step Guide to Learning and Understanding Machine Learning Concepts, Technology and Principles for Beginners: 1
Ebook
Machine Learning: A Comprehensive, Step-by-Step Guide to Learning and Understanding Machine Learning Concepts, Technology and Principles for Beginners: 1
byPeter Bradley
Rating: 0 out of 5 stars
0 ratings
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
Ebook
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
byCyrus Jackson
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
Ebook
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Programmable Logic Controllers
Ebook
Programmable Logic Controllers
byWilliam Bolton
Rating: 4 out of 5 stars
4/5
Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges
Ebook
Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges
byAndrea Lonza
Rating: 0 out of 5 stars
0 ratings
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
The Systems Thinker - Mental Models: The Systems Thinker Series, #3
Ebook
The Systems Thinker - Mental Models: The Systems Thinker Series, #3
byAlbert Rutherford
Rating: 0 out of 5 stars
0 ratings
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Oracle Quick Guides: Part 2 - Oracle Database Design
Ebook
Oracle Quick Guides: Part 2 - Oracle Database Design
byMalcolm Coxall
Rating: 0 out of 5 stars
0 ratings
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Logic Design: A Review Of Theory And Practice
Ebook
Logic Design: A Review Of Theory And Practice
byGlen G. Jr. Langdon
Rating: 0 out of 5 stars
0 ratings
Thinking in Algorithms: Strategic Thinking Skills, #2
Ebook
Thinking in Algorithms: Strategic Thinking Skills, #2
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Podcast episode
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
Podcast episode
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
byData Engineering Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
Podcast episode
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
byData Engineering Podcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
Podcast episode
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
byData Engineering Podcast
0 ratings
0% found this document useful
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
Podcast episode
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
Podcast episode
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
Podcast episode
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
byData Engineering Podcast
0 ratings
0% found this document useful
Troubleshooting Kafka In Production: Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: Troubleshooting in Production". In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate potential sources of trouble.
Podcast episode
Troubleshooting Kafka In Production: Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: Troubleshooting in Production". In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate potential sources of trouble.
byData Engineering Podcast
0 ratings
0% found this document useful
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
Podcast episode
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack: If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right.
Podcast episode
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack: If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right.
byData Engineering Podcast
0 ratings
0% found this document useful
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
Podcast episode
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
byWhat the Duck - Another Supply Chain Podcast
0 ratings
0% found this document useful
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
Podcast episode
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
byData Engineering Podcast
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
Podcast episode
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
Podcast episode
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
byData Engineering Podcast
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+: A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units.
Podcast episode
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+: A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Business NAS appliances 2022
PC Pro Magazine
Article
Business NAS appliances 2022
Apr 10, 2022
4 min read
The Network NAS appliances 2024
PC Pro Magazine
Article
The Network NAS appliances 2024
Apr 4, 2024
4 min read
Rack Servers For SMEs
PC Pro Magazine
Article
Rack Servers For SMEs
Feb 11, 2021
4 min read
STEVE CASSIDY “As My Rule Goes, Always Follow The Sound Made When Things Are Swept Under The Carpet”
PC Pro Magazine
Article
STEVE CASSIDY “As My Rule Goes, Always Follow The Sound Made When Things Are Swept Under The Carpet”
Sep 7, 2023
8 min read
Parallels Remote Application Server 18
PC Pro Magazine
Article
Parallels Remote Application Server 18
Aug 12, 2021
2 min read
Best Password Managers For Your Android Device
Android Advisor
Article
Best Password Managers For Your Android Device
Jul 5, 2023
7 min read
Business NAS appliances 2021
PC Pro Magazine
Article
Business NAS appliances 2021
May 13, 2021
4 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Benchmark your SSD
APC
Article
Benchmark your SSD
Nov 2, 2020
4 min read
PC Matic For Mac: Don’t Bother
MacWorld
Article
PC Matic For Mac: Don’t Bother
Feb 13, 2024
3 min read
The Problem Solvers
APC
Article
The Problem Solvers
Oct 31, 2022
6 min read
Doctor
Maximum PC
Article
Doctor
Oct 11, 2022
6 min read
AI As A Service
PC Pro Magazine
Article
AI As A Service
Jul 9, 2020
2 min read
Automotive Grade Linux
Linux Format
Article
Automotive Grade Linux
Nov 16, 2021
9 min read
Enterprise-grade Monitoring Made Easy
Linux Format
Article
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
Bitwarden vs LastPass
Maximum PC
Article
Bitwarden vs LastPass
Mar 2, 2021
4 min read
SATA Drives
PC Pro Magazine
Article
SATA Drives
Nov 10, 2022
2 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Time To Embrace Software-as-a-service
MoneyWeek
Article
Time To Embrace Software-as-a-service
Nov 17, 2023
Has your business embraced the software-as-a-service (SaaS) revolution yet? Research suggests that 53% of businesses in the UK now rely on SaaS solutions, with 80% expected to move to this approach by 2025. For small businesses, the benefits could be
1 min read
What Type Of SSD Should You Buy?
Tech Advisor
Article
What Type Of SSD Should You Buy?
Mar 31, 2021
5 min read
Drivedx: Mac Utility Provides Hints And Warnings When Your Drive Is About To Fail
MacWorld
Article
Drivedx: Mac Utility Provides Hints And Warnings When Your Drive Is About To Fail
Jun 18, 2019
4 min read
Rediscover Speed With The Redis Revolution
Linux Format
Article
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
The Ultimate PC Build Guide
APC
Article
The Ultimate PC Build Guide
Apr 1, 2024
14 min read
Western Digital MyCloud Home Duo 8TB
Maximum PC
Article
Western Digital MyCloud Home Duo 8TB
Oct 15, 2019
3 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Speed Up Windows
PC Powerplay
Article
Speed Up Windows
Sep 2, 2019
9 min read
Business NAS appliances 2023
PC Pro Magazine
Article
Business NAS appliances 2023
Apr 6, 2023
4 min read
Computer-aided Design
Linux Format
Article
Computer-aided Design
Jul 25, 2023
13 min read
PC Matic for Mac
Macworld UK
Article
PC Matic for Mac
Jan 12, 2024
3 min read

Related categories

Skip carousel

Reviews for End-to-End Data Science with SAS

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

End-to-End Data Science with SAS - James Gearheart

73014_for_eBook.jpg

The correct bibliographic citation for this manual is as follows: Gearhart, James. 2020. End-to-End Data Science with SAS®: A Hands-On Programming Guide. Cary, NC: SAS Institute Inc.

End-to-End Data Science with SAS®: A Hands-On Programming Guide

ISBN 978-1-64295-808-9 (Hard cover)

ISBN 978-1-64295-804-1 (Paperback)

ISBN 978-1-64295-805-8 (Web PDF)

ISBN 978-1-64295-806-5 (Epub)

ISBN 978-1-64295-807-2 (Kindle)

For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.

For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.

The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated.

U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those set forth in this Agreement.

SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414

June 2020

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

SAS software may be provided with certain third-party software, including but not limited to open-source software, which is licensed under its applicable third-party software license agreement. For license information about third-party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses.

Contents

About This Book

About The Author

Chapter 1: Data Science Overview

Introduction to This Book

The Current Data Science Landscape

Introduction to Data Science Concepts

Chapter Review

Chapter 2: Example Step-by-Step Data Science Project

Overview

Business Opportunity

Initial Questions

Get the Data

Select a Performance Measure

Train / Test Split

Target Variable Analysis

Predictor Variable Analysis

Adjusting the TEST Data Set

Building a Predictive Model

Decision Time

Implementation

Chapter Review

Chapter 3: SAS Coding

Overview

Get Data

Explore Data

Manipulate Data

Export Data

Chapter 4: Advanced SAS Coding

Overview

DO Loop

ARRAY Statements

SCAN Function

FIND Function

PUT Function

FIRST. and LAST. Statements

Macros Overview

Macro Variables

Macros

Defining and Calling Macros

Chapter Review

Chapter 5: Create a Modeling Data Set

Overview

ETL

Extract

Data Set

Transform

Load

Chapter Review

Chapter 6: Linear Regression Models

Overview

Regression Structure

Gradient Descent

Linear Regression Assumptions

Linear Regression

Simple Linear Regression

Multiple Linear Regression

Regularization Models

Chapter Review

Chapter 7: Parametric Classification Models

Overview

Classification Overview

Logistic Regression

Visualization

Logistic Regression Model

Linear Discriminant Analysis

Chapter Review

Chapter 8: Non-Parametric Models

Overview

Modeling Data Set

K-Nearest Neighbor Model

Tree-Based Models

Random Forest

Gradient Boosting

Support Vector Machine (SVM)

Neural Networks

Chapter Review

Chapter 9: Model Evaluation Metrics

Overview

General Information

Model Output

Accuracy Statistics

Black-Box Evaluation Tools

Chapter Review

About This Book

What Does This Book Cover?

Hello, my name is James, and I’m an addict. I’m addicted to data science books, web courses, instructional videos, blogs, data science podcasts, predictive modeling competitions, and coding. This addiction takes up the majority of my mental energy. From the time that I wake up until I fall asleep (and all through my dreams), I’m generally thinking about data science concepts and coding. I’m going to bet that many of you are in a similar situation. If so, I’m sure that you have been as frustrated as I have been about the massive hole in the instructional data science market.

The market is overrun with data science books for Python, R, and Hadoop. These books provide an overview of data science and in-depth instructions on the various machine learning models, and they provide the associated development code for those particular programming languages. Although these books are great resources for data scientists, they do not offer direct programming instruction to the most popular programming language in the business community. SAS is used by 95% of Fortune 100 companies, and these companies are the leading employers of data scientists. There is an incredible opportunity to fill the need of professional data scientists for hands-on machine learning training with real-world examples.

The unfortunate reality for many SAS programmers is that we often do not have access to the latest and greatest SAS products. SAS Enterprise Miner, SAS Visual Analytics, SAS Forecast Server, and SAS Viya are all incredible products, but they are not universally available to all SAS programmers. It is essential that a data scientist who is working in a SAS environment be able to develop and implement machine learning models in any SAS environment. Even if data scientists have access to SAS Viya, it is incredibly beneficial for them to have a solid understanding of the programming code that drives the models that they develop in SAS Viya.

This book, End-to-End Data Science in SAS®, provides all SAS programmers insight into the models, methodology, and SAS coding required to develop machine learning models in any industry. It also serves as a reference for programmers of any language who either want to expand their knowledge base or who have just been hired into a data scientist position where SAS is the preferred language.

The goal of this book is to provide clear and practical explanations of the data science environment, machine learning techniques, and the SAS code necessary for the proper development and evaluation of these highly desired techniques. These explanations are demonstrated with real-world business applications across a variety of industries. All code and data sets are publicly available in a dedicated GitHub repository.

Is This Book for You?

If you are interested in this book, then you (or most likely the organization that you work for) have SAS installed on your computer. However, not all SAS installations are created equal. Some programmers work in Base SAS (also called PC SAS). Others have a variety of SAS software available to them:

● SAS Enterprise Guide

● SAS Enterprise Miner

● Visual Analytics

● SAS Studio

● SAS Viya

This list is just a sample of the many SAS products available. In addition to these products, there are several software components that SAS offers:

● SAS/ACCESS software

● SAS/ETS software

● SAS/IML software

● SAS ODS Graphics Editor

● SAS/OR software

● SAS/STAT software

Your company’s IT department generally dictates the SAS products that you have and the software components that are available to you. If you desperately want SAS Viya or SAS/ETS, you will often have to fight the power to get it. I sincerely hope that you can access one or many of these SAS products because they are awesome, and they will make your life as a data scientist much easier and much more productive. However, if you are like me and you have to develop predictive models without the benefit of all the toys that SAS has to offer, then this book is what you have been waiting for.

SAS Software Requirements

The minimum requirement for the majority of procedures detailed in this book is SAS 9.2 with SAS/STAT installed. This requirement should cover most SAS users. With this minimum requirement, we will be able to develop:

● Linear regressions

● Logistic regressions

● Clustering

● Decision trees

Some of the more advanced procedures will require SAS Enterprise Miner to be installed. These procedures will include:

● PROC HPFOREST

● PROC TREEBOOST

● PROC SVM

Don’t panic! Just because you are limited to Base SAS with SAS/STAT installed does not mean that we are limited to the predefined procedures that come with SAS/STAT. We are limited only by our ingenuity. I will include methods of how to perform some of these more advanced procedures by developing them from scratch, using only the procedures available in SAS/STAT. The code to create these procedures from scratch can get pretty complicated, but I’ll provide step-by-step explanations and all code will be available in the repository.

Programming Knowledge Assumed

If you only have experience working with Microsoft Excel spreadsheets, then don’t worry; we can get you up and running in SAS. However, if you’ve never worked with data in any capacity, including Excel spreadsheets, then maybe this book isn’t for you.

I will assume that you have some experience writing basic formulas. For example, in Excel:

=AVERAGE(B12: B32)

=SUM(A1: A14)

I will also assume that you can perform basic computer commands such as create a new file, save a file, open a file, and so on.

As long as you have met these minimum requirements and have a desire to learn awesome new skills that are guaranteed to increase your value to any organization, then we are good to go.

Icons Used in This Book

Example Code and Data

All data used for examples in this book have been accessed through public-use data repositories. These repositories provide a wide range of data sets that span across a variety of disciplines. The code that is demonstrated in the book is primarily created by the author. Any code that has been shown that was not created by the author has been cited with the appropriate credit to the developer. There is a fantastic community of SAS developers who openly share their code with the world. I strongly encourage you to search the web and connect with these communities on the SAS website, GitHub, Stack Overflow, and several other communities.

You can access the example code and data for this book by visiting the GitHub repository at https://github.com/Gearhj/End-to-End-Data-Science.

SAS University Edition

This book is compatible with SAS University Edition. If you are using SAS University Edition, then begin here: https://support.sas.com/ue-data.

We Want to Hear from You

Do you have questions about a SAS Press book that you are reading? Contact us at saspress@sas.com.

SAS Press books are written by SAS Users for SAS Users. Please visit sas.com/books to sign up to request information on how to become a SAS Press author.

We welcome your participation in the development of new books and your feedback on SAS Press books that you are using. Please visit sas.com/books to sign up to review a book.

Learn about new books and exclusive discounts. Sign up for our new books mailing list today at https://support.sas.com/en/books/subscribe-books.html.

Learn more about this author by visiting his author page at http://support.sas.com/gearheart. There you can download free book excerpts, access example code and data, read the latest reviews, get updates, and more.

Author Acknowledgments

I would like to thank SAS for allowing me the opportunity to publish this book through SAS Press. It is an honor to be part of the SAS author community. I would also like to give special thanks to my editor Suzanne Morgen for her patience and her incredible eye for detail.

Additional thanks are given to all of the reviewers who painstakingly reviewed raw chapters and provided needed critical feedback:

● Christopher Battiston

● David Ghan

● Funda Gunes

● Sunil Gupta

● Mark Jordan

● Robin Langford

● Premkumar Varma

I would still be a poor musician in Warren, Ohio if it were not for Lorin Ranbom, Mitali Ghatak and the Health Services Research Section at the Ohio Department of Jobs and Family Services who took a chance on hiring me right out of college. Thanks to Dan Hecht, Donna Bush, Lora Summers, Kendy Markmen, Dave Dorsky, Hope McGonigle, Tracy Cloud and Eric Edwards for introducing me to SAS and providing me all of the support that I needed.

A continuous thank you to the best SAS programmer that I ever met, Matt Bates. I cannot count how many times that I bothered Matt about how to perform tasks from the mundane to the insanely complex. He has always come through for me and I am eternally grateful.

A huge thank you to Mihai Gavril, Vijay Narapsetty, Dan Gedrich, and Jay Das for providing a work environment that is collaborative, innovative and focused on excellence. I couldn’t ask for a better team.

Thank you to my parents Richard and Lou Ann Gearheart for all of their love and support and my brothers Scott, Glen, Kenny, George, and Tod. You have always been there for me.

My beautiful son, Jacob Gearheart, is the pride of my life and I am amazed every day with your wit, charm, intelligence, and kindness. I pray that I will be more like you every day.

And of course, the love of my life, Tanya Gearheart. I must have written you a thousand love letters and none of them have come close to how much that I love you and need you in my life. You are my entire world and the only reason that I wake up every day is to spend more time with you. Always, madly.

About The Author

James has been both a data science manager and an individual contributor on projects ranging from logistics, anti-money laundering, CRM, cash demand forecasting, customer segmentation, digital marketing, A/B testing, attribution and attrition. His research interests include applied data science, machine learning, and artificial intelligence.

Learn more about this author by visiting his author page at http://support.sas.com/Gearhart. There you can download free book excerpts, access example code and data, read the latest reviews, get updates, and more.

Chapter 1: Data Science Overview

Introduction to This Book

Minimum Effective Dose

The Current Data Science Landscape

Types of Analytics

Data Science Skills

Introduction to Data Science Concepts

Supervised Versus Unsupervised

Parametric Versus Non-parametric

Regression Versus Classification

Overfitting Versus Underfitting

Batch Versus Online Learning

Bias-Variance Tradeoff

Step-by-Step Example of Finding Optimal Model Complexity

Curse of Dimensionality

Transparent Versus Black Box Models

No Free Lunch

Chapter Review

Introduction to This Book

Over the past decade, there has been an explosion of interest in the application of statistics to the vast amounts of data being generated every second by nearly everyone on the planet. This interest has been driven primarily by tech giants that learned how to monetize this information by providing free services to individuals in exchange for the data that these individuals generate. Every click, tweet, picture upload, page view, like, share, follow, purchase, comment, retweet, email, and logon is stored in vast data warehouses maintained by tech giants and made available to their army of data scientists to turn your data into actionable insights.

You may feel flattered that you have a secret admirer that obsesses about your every click, your interests, your hopes and dreams, your wants, your insecurities, and your wildest fantasies. You may envision tech billionaires like Mark Zuckerberg or Jeff Bezos lying awake at night thinking about you and how to give you exactly what you want. Maybe that image is a bit of a stretch, but these CEOs, and all of their competitors, do have the customer at the center of their business models. This phenomenon is not limited to tech companies. Traditional brick and mortar stores (Walmart, Target, Best Buy), telecommunication companies (AT&T, Verizon, Comcast), medical institutions, entertainment conglomerates, utility companies, and government offices all have teams of data scientists whose sole function is to know what you want before you even know it yourself.

Amazon knows that because of your interest in data science books that you might be interested in this one. Netflix knows that because you binged the entire first three seasons of Black Mirror over the past weekend, that you will also be interested in the reboot of The Outer Limits. These insights are the result of teams of data scientists who access the information that you have freely provided and develop models that categorize individuals into similar groups based on a variety of attributes. They also develop predictive models that try to determine what you will do or want in the future.

Don’t believe me? Here is a quick test. Based on the only information that I have about you (your interest in data science books focusing on SAS), I will assume that you are a male aged between 25 and 40 years old with graduate-level education and live in an English-speaking country. You probably work in a large corporation. Maybe your desk is a bit messy, but you have a method to the madness.

How’d I do? If these guesses were a complete miss, then you must be a unique individual to not fall into any of the traditional demographic and interest categories of data scientists. But I would guess that at least half of these descriptions pertain to you. These guesses might appear to be a bit broad or general. But remember, they are based on only a single general piece of information: your interest in data science books focused on SAS.

Now imagine that I have a database of your entire web history, including all your past purchases, your social connections, your financial statements, your search terms, your emails, and your viewing habits. Imagine that I took all of this information and input it into a complex predictive model that was built with the sole purpose of calculating the probability that you would be interested in a specific product. Imagine that I gathered this same database of information for ten million individuals (a small sample size in the world of digital information where the data sets can contain several trillion data points) and built a deep learning model that calculated the probability that you would be interested in a travel rewards credit card offer with a 2% cashback program.

That’s a pretty specifically defined target variable! Given the quantity and quality of information that I have about the population and a well-defined target variable, I should be able to build a predictive model that can calculate, with a reasonable degree of confidence, the probability of an individual responding to this credit card offer. This information could be used to allocate my marketing budget efficiently. I’d probably achieve a higher response rate with this approach than if I spent the budget on a radio commercial broadcast to a broad population. If the response rate of the travel website campaign is 1.5% while the response rate of the radio ad is 0.03%, then our predictive model provided a 50X lift in response rate! That is one way to optimize your marketing budget.

In this example, all of this data has been used to deliver a highly targeted ad for the right product to the right individual at the right time with the right message. This targeting mechanism is just one example of how data science is used every day by a wide variety of businesses. Major corporations have used data science techniques to target, classify, segment, cluster, recommend, adjust offers, personalize, upsell, cross-sell, influence and predict every aspect of customer behavior.

Minimum Effective Dose

This book will follow the Minimum Effective Dose (MED) method of teaching data science. The goal of this method is to get you up and running with sufficient background information on the various data science techniques and provide you with specific examples and access to real data and also provide SAS code that is thoroughly explained. I do not want you to get bogged down with a bunch of mathematical theories on the inner workings of the procedures. We will touch on the formulas, but I will not drone on about it. If you are interested in a thorough explanation of the mathematics of data science, I suggest that you check out The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. It is the bible of statistical mathematics for the data science community.

So, what does MED mean?

● No math! (well...very little math)

● An understandable and practical explanation of data science methods:

◦ Which method to use for a given situation

◦ What to watch out for

◦ How data science works with real business data (you will not find the standard academic data sets such as the iris data set or the MNIST data set in this book)

● Step-by-step instructions on how to do specific procedures

● How to evaluate the performance of your model:

◦ How to apply your model to new data

◦ How to understand the SAS code provided

The Current Data Science Landscape

Over the past few years, the job titles for business analysts have changed. These changes in job titles reflect the additional demands of the business analyst. If you were in the business analytics field before 2012, maybe you’ve had one of these job titles:

● Analyst

● Statistician

● Quant

● Researcher

● Analytical Consultant

● Data Architect

● Database Administrator

● Data Guy

Since 2012, many of these job titles have changed to:

● Data Scientist

● Machine Learning Engineer

● Artificial Intelligence Engineer

One of the first questions that you may have is, what is the difference between a data scientist and an analyst? The initial answer is that 80% of their work is the same:

● Access data

● Clean data

● Combine data sets

● Summarize data

● Answer business questions

● Create reports

However, data scientists have some exciting tools in their toolbox, including:

● Accessing and using unstructured data: Web scraping, Twitter tweets, Facebook comments, and so on

● Making sense of data that they do not know using unsupervised machine learning methods: Clustering and PCA

● Predicting future outcomes: Forecasting and predictive modeling

These additional functions of the data scientist take their role to a new level by being able to move past the traditional descriptive statistics of reporting and move into the exciting world of predictive analytics, prescriptive analytics, and optimization.

Types of Analytics

Let’s look at a few different types of analytics that data scientists should be familiar with.

● Descriptive Analytics – This is a retrospective look at what has happened in the past and what is currently happening. Firms generate a lot of descriptive analytics to understand business trends, customer demand, production thresholds, and countless other business metrics. Descriptive reporting is not flashy, but it is the life’s blood of all business. Since these are events that have already been realized, this reporting focuses on reporting actual figures along with their associated statistics (averages, standard deviations, and so on).

● Predictive Analytics – Historical data is used to make predictions about the future. These are statistical estimates of future possibilities based on historical data. This type of reporting is accompanied by confidence intervals, odds ratios, and probabilities.

● Prescriptive Analytics – Also called optimization, prescriptive analytics identifies impactful factors and recommends the optimal values of these factors to achieve a predefined result.

SAS created Figure 1.1. This figure breaks down the previous list even further into eight levels of analytics. By increasing the level of intelligence in these analytical products, we are expanding the business value of analytics to the organization. The first four stages are based on reporting functions of analytics, while the last four stages are based on probabilistic functions of analytics. The business value of these functions is incredibly beneficial to businesses.

Figure 1.1: Eight Levels of Analytics

For example, if you are at the horse track and getting ready to place your bet, would you rather have a report on the historical performance of the horses, or would you rather have some probabilistic information on what might happen in the upcoming race? My money would be on the probabilistic report.

Data Science Skills

I’m sure that most of you are as tired as I am of looking at the graphic in Figure 1.2 below. We have seen it on every data science blog, presentation, and instructional medium. Regardless of the ubiquity of the graphic, there is still a clear message that is conveyed. It shows the vital interplay and intersection of three diverse fields that had traditionally been performed separately.

Figure 1.2: Data Science Venn Diagram

Adapted from: Conway, Drew. (September 2010). The Data Science Venn Diagram. Retrieved from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.

The main sections in the diagram in Figure 1.2 are as follows:

● Computer skills – represented in the Hacking Skills section of the diagram. This skill set is essential for accessing and manipulating structured and unstructured data. Data scientists need to be able to gather data, clean it, merge it with other data sources, prepare it for modeling, and implement machine learning algorithms on this data to answer a specific question.

● Math and Statistics Knowledge – A data scientist needs to have the prerequisite statistical background to understand the goal, process, and function of a predictive algorithm.

● Business knowledge – represented in the Substantive Expertise section of the diagram. In this context, the term business can represent any field of study that requires knowledge of that specific environment. Machine learning models are developed to answer business questions. A data scientist needs to have knowledge of that environment to develop the initial business questions that can be modeled. This type of knowledge is critical for knowing what type of data will be required to answer the question, what direction the data should be moving, and what answers are counterintuitive.

The subsections in the diagram in Figure 1.2 are:

● Machine Learning – The intersection of Hacking Skills and Math and Statistics Knowledge excludes the Substantive Expertise skill set. This subsection is a reference to computer scientists who can develop sophisticated predictive models based solely on finding correlations in given data sets. Regardless of how sophisticated or accurate these models may be, these models lack the background business knowledge and are therefore prone to spurious correlations, weak inferences and ethical concerns.

● Traditional Research – The intersection of Math and Statistics Knowledge and Substantive Expertise is excluding the Hacking Skills section of the graph. Before the application of computers to solve all our daily problems, researchers had to apply mathematical knowledge to their field of expertise to develop theoretical models. If you received your advanced degree before 1995, then I’m talking about you.

● Danger Zone – Certainly my favorite section of the graphic. The intersection of Hacking Skills and Substantive Expertise that excludes the Math and Statistics Knowledge highlights the dangerous area of misaligned models, misinterpreted results, and unfounded conclusions.

Introduction to Data Science Concepts

The language of data science is based on a series of concepts that represent the issues that data scientists have to consider when they are developing a project. These terms and definitions are a kind of shorthand for deep theoretical issues that data scientists must consider at each step of their project. For example, if I complain to another colleague that my go-to regression model is not working well on this new project, he might respond, No free lunch. That doesn’t mean that he assumed I was going to steal his lunchbox!

In this section, I will outline some of the most common concepts in data science so that we have a common understanding of these concepts.

Supervised Versus Unsupervised

There are two main categories that machine learning models can fall into. These are supervised and unsupervised models. The presence of a target variable differentiates them. The target variable represents the item that you are interested in. It is also called the response variable. If there is a value that you are trying to predict, you have a supervised method. Supervised models have a target variable, while unsupervised models do not have a target variable.

Many data science terms are used interchangeably. Predictor variables are also called descriptive variables, independent variables, or input variables. The target variable is also called the response variable or the dependent variable.

Supervised Models

Most of the models that you have been exposed to have been supervised models. These are the main predictive models that filter out your spam, identify your handwriting, recommend your next song, decide which ad to show you, and make a thousand other selections for you every day.

These models include, but are not limited to:

● Regression Models

● Decision Trees

● Random Forests

● Gradient Boosting

● Neural Networks

● Support Vector Machines (SVM)

● K-Nearest Neighbors

● Naïve Bayes Models

Each one of these models is built on a data set that has a target variable along with one to several thousand predictor variables. If a data set contains a target variable, then supervised machine learning algorithms are most commonly used with this type of data. These machine learning algorithms identify the associations between the predictor variables and the target variable. The result is a formula that can be used on new observations where the target value is not known.

For example, if you want to predict the quality ranking of a bottle of wine, then you would generally have a data set that contains several observations that have the ranking of wine for each observation. The quality ranking would be the target variable. The data set would also include several descriptive variables along with the target variable. These descriptive variables describe several different factors that influence the target variable. Table 1.1 shows nine predictive variables that are associated with the target variable quality.

In mathematical terms, the predictor variable would traditionally be labeled as X, and multiple predictor variables would be listed as X1, X2, X3, and so on. The target variable would be labeled as y, so that for each observation of the predictor measurement there is an associated response measurement where i represents the ith observation.

Table 1.1: Wine Quality Data

In the case of our wine quality prediction example, you could build a regression model that results in a formula like this:

Quality = 0.796 + FixedAcidity (0.0349) + VolatileAcidity (-0.8287)

+ Cholorides (-0.0464) + pH (-0.203) + Sulphates (0.1386)

+ Alcohol (0.4663)

This formula states that the quality score of wine is a function of six main features of the wine. The model determines which features are associated with the wine quality score by assessing the correlation between each feature and the quality score. The model also produces model weights (also called coefficients) that are used to predict the quality score of new bottles of wine that have not been scored yet.

If we were given an information sheet on a brand new bottle of wine that has not been scored for quality, then we would be able to input the information from each of these predictors into our formula and predict the quality of the wine, as shown in Table 1.2.

Table 1.2: Wine Data Set Predictive Weights

Each value of the new wine information is multiplied by the predictive weight (coefficient) produced by the model. These calculated values are combined with the model intercept to generate a predicted score.

According to our predictive model, this new bottle of wine should have a quality score of 5. Notice that not all the predictive variables were used in the final predictive model. The model retained only the predictors that had a statistically significant relationship with the target variable. Don’t worry; these concepts will be explained in detail in the regression modeling chapter of the book.

Unsupervised Models

Unsupervised models are built on data sets that do not have a target variable. This type of analysis is slightly more difficult because of the lack of direction that is usually provided by the target variable. The goal of unsupervised models is inherently different from supervised models. Supervised models are generally constructed to predict the response of future observations or to understand better the relationship between the predictors and the target variable. On the other hand, unsupervised models do not predict anything. They are built to understand the relationship between the given variables. The goal of unsupervised models is to identify naturally occurring groups or classifications of the observations.

The most common types of unsupervised learning models include:

● k-Means Clustering

● Hierarchical Cluster Analysis (HCA)

● Principal Component Analysis (PCA)

● t-distributed Stochastic Neighbor Embedding (t-SNE)

For example, suppose that you have a database of customer attributes and you want to know if you can detect groups of similar customers. These groups could be beneficial in marketing campaigns or for targeting purposes. It is a common practice to run a k-means clustering algorithm that will assign each observation

Enjoying the preview?

Page 1 of 1

End-to-End Data Science with SAS: A Hands-On Programming Guide

About this ebook

James Gearheart

Related authors

Related to End-to-End Data Science with SAS

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for End-to-End Data Science with SAS

What did you think?

Book preview

End-to-End Data Science with SAS - James Gearheart

Contents

About This Book

What Does This Book Cover?

Is This Book for You?

SAS Software Requirements

Programming Knowledge Assumed

Icons Used in This Book

Example Code and Data

SAS University Edition

We Want to Hear from You

Author Acknowledgments

About The Author

Chapter 1: Data Science Overview

Introduction to This Book

Minimum Effective Dose

The Current Data Science Landscape

Types of Analytics

Data Science Skills

Introduction to Data Science Concepts

Supervised Versus Unsupervised