Predictive Analytics Learning Objectives

Predictive Analytics
Certificate Program
Learning Objectives
Module 1: What is Predictive Analytics & R Basics?

• Identify the problem and assess whether it should be addressed with predictive
modeling. Understand differences and similarities between traditional analysis
techniques.
• Learn predictive modeling tools - layout and basic commands of R
• Learn predictive modeling tools - Practice writing basic R scripts and complete additional
suggested practice, if necessary
Module 2: Effective Problem Definition and Project Management

• Translate a vague question into one that can be analyzed with data, statistics and
machine leaning to solve a business problem.
• Use case design and evaluation/prioritization based on available data and technology,
significance of business impact and/or implementation considerations
• Implement and select appropriate technology in order to efficiently utilize statistical and
machine learning techniques taking into account problem objectives and
implementation constraints
• List and understand the importance of key principles in creating and managing a
predictive modeling team.
Module 3: Data Design, Transformation & Visualization

• Identify common data types, structured, unstructured and semi-structured
• Learn variable types and applicable terminology
• Identify and evaluate the quality (including common data problems) of appropriate data
sources for a problem
• Identify the types of regulatory, professional standard, and ethical issues surrounding
predictive modeling and data collection/use and where they apply to situations
• Introduce lapse, mortality and health datasets use for exercises
• Implement effective data design: time frame, sampling, granularity
• Use common data blending techniques, e.g. fuzzy matching
• Learn how, why and when to transform the data, using scaling, normalization,
standardization, binarization, encoding and imputation.
• Apply each technique using an example model
• Create and interpret histograms, bar charts and frequency plots
• Visualize data using one-way, two-way, box-plot, to identify potential errors, outliers
and trends in the data
Module 4: Data Exploration

• Identify data issues by exploring one variable to understand the distribution is as
expected and detect any outliers
• Determine the significant relationships between two variables using scatter plots,
calculating correlations and investigating conditional means.
• Determine relationships between many variables and select material ones using
principle component analysis
• Determine relationships between many variables and select material ones using
independent component analysis
• Determine relationships between many variables and select material ones using singular
value decomposition
• Take appropriate action when results of data exploration deviate from what is expected
and apply judgment to resolve those differences
Module 5: Feature Generation & Selection

• Define the term "feature" and understand the difference to "variable"
• Use subject matter expertise and prior knowledge about the data to create features that
lead to more effective models.
• List the principles, advantages and disadvantages and limitations of using filter based
selection techniques for tuning a data set to be used in modelling.
• Select appropriate features for a model using Pearson, Kendall and Spearman
correlation as selection criteria (Pearson, Kendall and Spearman correlation).
• Select appropriate features for a model using Mutual information as selection criteria
(Mutual information).
• Select appropriate features for a model using Chi squared as selection criteria (Chi
squared).
• List the principles, advantages and disadvantages and limitations of using permutation
based selection techniques for tuning a data set to be used in modelling.
PA Certificate Program: Learning Objectives 2

• Apply concepts such as accuracy, precision and recall to select features to be used in
classification modelling problems (Classification - accuracy, precision, recall).
• Apply concepts such as MSE, RSE and coefficient of determination to select features to
be used in regression modelling problems (Regression - MSE, RSE, coefficient of
determination).
• List the principles, advantages and disadvantages and limitations of using algorithm
based selection techniques for tuning a data set to be used in modelling.
• Use Ridge, Lasso, Elastic Net and tree based methods to select appropriate features to
be used for modelling (Ridge, Lasso, Elastic Net, Trees (detailed lessons in section 5).
• Text mining: Apply various text mining methods in order to generate appropriate
features for use when modelling text data.
Module 6: Model Development & Validation

• Understand how different business problems affect the decisions made about model
development and validation.
• Understand the difference between supervised, unsupervised and reinforcement
learning and identify examples of problems each would be applied to.
• Understand the difference between classification and regression problems and explain
the features of models that make them suitable/unsuitable for each type/
• Understand and explain the concepts of bias, variance and model complexity and the
bias variance tradeoff and the implications this holds for building robust models
• Understand the importance of using train, test & holdout data samples during modeling
and be able to apply this method appropriately to fit and validate a model. List
advantages, disadvantages and common pitfalls when using this method.
• Understand and apply the method of using cross validation during modeling. List
advantages, disadvantages and common pitfalls when using this method.
• Supervised learning - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed, apply the method to data, and interpret and describe the results.
o Decision Trees
o Generalized Linear models (identity, poisson, gamma, tweedie, binomial and
shrinkage methods)
o Ensemble methods (bagging, boosting and blending – specifically Gradient boost
machines)

• Unsupervised learning - For each of the following techniques, understand when it is
needed, apply the method to data, and interpret and describe the results.
o K-means clustering
o Hierarchical clustering
• Advanced topics - For each of the following techniques, understand when it is
needed
o Instance based learning
o Support Vector Machines
o Bayesian Learning
o Additive models
o Topic modeling
o Neural networks
o Gaussian mixture models
o Genetic equation search
o Grid search

Predictive Analytics Learning Objectives

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Predictive Analytics Learning Objectives

Încărcat de

Drepturi de autor:

Formate disponibile

Predictive Analytics

Module 1: What is Predictive Analytics & R Basics?

Module 2: Effective Problem Definition and Project Management

Module 3: Data Design, Transformation & Visualization

Module 4: Data Exploration

Module 5: Feature Generation & Selection

PA Certificate Program: Learning Objectives 2

Module 6: Model Development & Validation

PA Certificate Program: Learning Objectives 3

PA Certificate Program: Learning Objectives 4

S-ar putea să vă placă și