Sunteți pe pagina 1din 4

Predictive Analytics

Certificate Program
Learning Objectives

Module 1: What is Predictive Analytics & R Basics?


• Identify the problem and assess whether it should be addressed with predictive
modeling. Understand differences and similarities between traditional analysis
techniques.
• Learn predictive modeling tools - layout and basic commands of R
• Learn predictive modeling tools - Practice writing basic R scripts and complete additional
suggested practice, if necessary

Module 2: Effective Problem Definition and Project Management


• Translate a vague question into one that can be analyzed with data, statistics and
machine leaning to solve a business problem.
• Use case design and evaluation/prioritization based on available data and technology,
significance of business impact and/or implementation considerations
• Implement and select appropriate technology in order to efficiently utilize statistical and
machine learning techniques taking into account problem objectives and
implementation constraints
• List and understand the importance of key principles in creating and managing a
predictive modeling team.

Module 3: Data Design, Transformation & Visualization


• Identify common data types, structured, unstructured and semi-structured
• Learn variable types and applicable terminology
• Identify and evaluate the quality (including common data problems) of appropriate data
sources for a problem
• Identify the types of regulatory, professional standard, and ethical issues surrounding
predictive modeling and data collection/use and where they apply to situations
• Introduce lapse, mortality and health datasets use for exercises
• Implement effective data design: time frame, sampling, granularity
• Use common data blending techniques, e.g. fuzzy matching
• Learn how, why and when to transform the data, using scaling, normalization,
standardization, binarization, encoding and imputation.
• Apply each technique using an example model
• Create and interpret histograms, bar charts and frequency plots
• Visualize data using one-way, two-way, box-plot, to identify potential errors, outliers
and trends in the data

Module 4: Data Exploration


• Identify data issues by exploring one variable to understand the distribution is as
expected and detect any outliers
• Determine the significant relationships between two variables using scatter plots,
calculating correlations and investigating conditional means.
• Determine relationships between many variables and select material ones using
principle component analysis
• Determine relationships between many variables and select material ones using
independent component analysis
• Determine relationships between many variables and select material ones using singular
value decomposition
• Take appropriate action when results of data exploration deviate from what is expected
and apply judgment to resolve those differences

Module 5: Feature Generation & Selection


• Define the term "feature" and understand the difference to "variable"
• Use subject matter expertise and prior knowledge about the data to create features that
lead to more effective models.
• List the principles, advantages and disadvantages and limitations of using filter based
selection techniques for tuning a data set to be used in modelling.
• Select appropriate features for a model using Pearson, Kendall and Spearman
correlation as selection criteria (Pearson, Kendall and Spearman correlation).
• Select appropriate features for a model using Mutual information as selection criteria
(Mutual information).
• Select appropriate features for a model using Chi squared as selection criteria (Chi
squared).
• List the principles, advantages and disadvantages and limitations of using permutation
based selection techniques for tuning a data set to be used in modelling.

PA Certificate Program: Learning Objectives 2


• Apply concepts such as accuracy, precision and recall to select features to be used in
classification modelling problems (Classification - accuracy, precision, recall).
• Apply concepts such as MSE, RSE and coefficient of determination to select features to
be used in regression modelling problems (Regression - MSE, RSE, coefficient of
determination).
• List the principles, advantages and disadvantages and limitations of using algorithm
based selection techniques for tuning a data set to be used in modelling.
• Use Ridge, Lasso, Elastic Net and tree based methods to select appropriate features to
be used for modelling (Ridge, Lasso, Elastic Net, Trees (detailed lessons in section 5).
• Text mining: Apply various text mining methods in order to generate appropriate
features for use when modelling text data.

Module 6: Model Development & Validation


• Understand how different business problems affect the decisions made about model
development and validation.
• Understand the difference between supervised, unsupervised and reinforcement
learning and identify examples of problems each would be applied to.
• Understand the difference between classification and regression problems and explain
the features of models that make them suitable/unsuitable for each type/
• Understand and explain the concepts of bias, variance and model complexity and the
bias variance tradeoff and the implications this holds for building robust models
• Understand the importance of using train, test & holdout data samples during modeling
and be able to apply this method appropriately to fit and validate a model. List
advantages, disadvantages and common pitfalls when using this method.
• Understand and apply the method of using cross validation during modeling. List
advantages, disadvantages and common pitfalls when using this method.
• Supervised learning - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed, apply the method to data, and interpret and describe the results.
o Decision Trees
o Generalized Linear models (identity, poisson, gamma, tweedie, binomial and
shrinkage methods)
o Ensemble methods (bagging, boosting and blending – specifically Gradient boost
machines)

PA Certificate Program: Learning Objectives 3


• Unsupervised learning - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed, apply the method to data, and interpret and describe the results.
o K-means clustering
o Hierarchical clustering
• Advanced topics - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed
o Instance based learning
o Support Vector Machines
o Bayesian Learning
o Additive models
o Topic modeling
o Neural networks
o Gaussian mixture models
o Genetic equation search
o Grid search

PA Certificate Program: Learning Objectives 4

S-ar putea să vă placă și