Sunteți pe pagina 1din 47

Data Science with R

Lesson 2 Case Study

Analytics Methodology
Analytics Methodology

• The Analytics Methodology is a suggested framework that we could


apply to solve business problems with data
Analytics Methodology

• The Analytics Methodology is a suggested framework that we could


apply to solve business problems with data

• Let’s apply this framework to Heart Disease case study


Analytics Methodology

• The Analytics Methodology is a suggested framework that we could


apply to solve business problems with data

• Let’s apply this framework to Heart Disease case study

There are four Solution


Problem Solution Solution
stages in the Implement
Definition Design Monitoring
methodology: ation
Problem Definition
Problem Definition

How to decide on the


objective
Problem Definition

How to decide on the


objective

• Majority of the time, problem objectives are clearly


stated by client
• Sometimes, the clients needs an end to end
exploratory data analysis.
• if strong patterns are noticed, you can formulate
your objective and build a model
Possible Objectives
Possible Objectives
Possible Objectives
Possible Objectives
Possible Objectives
Possible Objectives
Possible Objectives
Most intuitive problem statement
Most intuitive problem statement

Which patients are likely to get a heart


disease?
Solution Design
Solution Design

How to implement a solution


Solution Design

How to implement a solution

• Generate a response score for each patient, and rank order patients
in descending order of response score
Solution Design

How to implement a solution

• Generate a response score for each patient, and rank order patients
in descending order of response score
• Then, choose a certain number of patients to analyse and figure
what factors are leading the group to have a higher chance of
getting a heart disease
Solution Design

How to implement a solution

• Generate a response score for each patient, and rank order patients
in descending order of response score
• Then, choose a certain number of patients to analyse and figure
what factors are leading the group to have a higher chance of
getting a heart disease
• Output ideally would be an Excel sheet or simple application that
would calculate a response score or probability given input
attribute values
Solution Design: Most Appropriate Algorithm
Solution Design: Most Appropriate Algorithm

• Build a classification model to find out which patients are most likely to
suffer from heart Disease
Solution Design: Most Appropriate Algorithm

• Build a classification model to find out which patients are most likely to
suffer from heart Disease
• Patients to be put into “Will have a heart disease” and “Will not have
a heart disease” categories
Solution Design: Most Appropriate Algorithm

• Build a classification model to find out which patients are most likely to
suffer from heart Disease
• Patients to be put into “Will have a heart disease” and “Will not have
a heart disease” categories
• Appropriate classification algorithms :
Binary logistic Regression
Decision Trees
Solution Implementation
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:


Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
• Data understanding, using descriptive statistics and visualizations
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
• Data understanding, using descriptive statistics and visualizations

• Data Preparation:
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
• Data understanding, using descriptive statistics and visualizations

• Data Preparation:
• Fix any issues found during data exploration (e.g, missing data etc)
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
• Data understanding, using descriptive statistics and visualizations

• Data Preparation:
• Fix any issues found during data exploration (e.g, missing data etc)
• Prepare data for modelling process (e.g convert qualitative data to numeric,
etc)
Solution Implementation

• This is the longest phase in most analytics projects, and includes all processes
starting with data extraction through model insight generation

• In this case study, it will include:

• Data Exploration:
• Data Assessment for reliability, completeness
• Data understanding, using descriptive statistics and visualizations

• Data Preparation:
• Fix any issues found during data exploration (e.g, missing data etc)
• Prepare data for modelling process (e.g convert qualitative data to numeric,
etc)

• Model Building
• Implement (multiple) predictive modelling algorithms (predict customer will
have heart disease or not)
Solution Implementation

• Model Validation and Finalization


Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
• Generate business insights from the finalized model (e.g which
attributes are important in determining response)
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
• Generate business insights from the finalized model (e.g which
attributes are important in determining response)
• Generate recommendations based on model outcome
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
• Generate business insights from the finalized model (e.g which
attributes are important in determining response)
• Generate recommendations based on model outcome

• Model Implementation
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
• Generate business insights from the finalized model (e.g which
attributes are important in determining response)
• Generate recommendations based on model outcome

• Model Implementation
• Implement (multiple) predictive modelling algorithms
Solution Implementation

• Model Validation and Finalization


• Model validation to assess fit and power
• Finalize model based on performance and validation

• Model Insights
• Generate business insights from the finalized model (e.g which
attributes are important in determining response)
• Generate recommendations based on model outcome

• Model Implementation
• Implement (multiple) predictive modelling algorithms
• Run multiple iterations to identify best performing models
Solution Monitoring

• Assess model performance periodically


Solution Monitoring

• Assess model performance periodically


• Update model with more recent data
END OF LESSON 2 CASE STUDY

S-ar putea să vă placă și