Sunteți pe pagina 1din 11

Data Oriented Thinking and Decis

knowledge, for them to take powerful decisions across the board, a recently de
expertise, they should be able to create, setup and interact with the Data Scie
Overview INSOFE a pioneer in Data Science training for both engineers and managers h
data savvy and take the organization to the next level in data
Day 1: Developing a Data Driven Decision Makers Mindset
Duration

Topics discussed

2 hours

The type of problems where data oriented


approaches can provide disruptive results

2 hours

Framework to prioritize problems using Analytics


and setting up the right teams

2 hours

The steps involved in solving an analytics problem.


The data and feature engineering

2 hours

A checklist on sensitive points that needs to be


managed carefully

Day 2: Framework for transforming data to insights (8 hours)


Duration

Topics discussed

2 hours

Visualization for actionable insights. Standard


techniques for creating extremely powerful and
simple visualizations.

1.5 hours

Recommendation Engine

1.5 hours

Optimization

1.5 hours

Fraud Detection

1.5 hours

Time Series Models

ented Thinking and Decision Making For Future Leaders

sions across the board, a recently developing data oriented thinking will help. While they may not have program
etup and interact with the Data Science team.
for both engineers and managers has developed a 2 day program that helps high performers and future leaders
e next level in data

s Mindset
Activities
The participants define some of the problems they
face on a daily basis in the analytics context. Based
on their intuition, they pick one problem for the next
two sessions
They assess the pain and gain of solving each problem
and arrive at a priority
For the problem chosen, they define the single table
view, engineer the features and come up with broad
solution architecture
They understand estimating the data and
understanding the errors, defining error metrics,
building validation strategies

hts (8 hours)
Activities

They design a suitable visualization for the problem


they are working on.

Participants will learn important variants of data


oriented problems, recommendation engines, fraud
detections and optimization

r Future Leaders

nted thinking will help. While they may not have programming

ay program that helps high performers and future leaders become

Desired Outcomes

The participants will learn how to assimilate complex


data quickly and handle a situation by breaking it down
and systematically examining its implications

Desired Outcomes

They also become hands-on with one of the most


important aspect of analytics, visualization
These sessions will enable the participants to
understand and manage other
non-standard data oriented problems

Topic

Planning and Thinking Skills for Architecting Data Science Solutions


10V's of data, Understanding Classification, Segmentation, Regression and Optimization (The general
Understanding Statistical (Discriminative and Generative), Non-Parametric (Instance Based and Iterati
Inductive Learning: Bias Versus Variance, Learning Curves
Goodness Metrics: MSE, MAPE, RMSE, Precision, Recall, Accuracy, F1 Statistic, ROC Curves
The Limitations of the Classical Models (Curse of Dimensionality, Non-Linearity, Velocity and Size)
The Latest Trends: Sub-Space, Spectral, Kernal and Neural Networks
Essential Engineering Skills in Big Data Analytics: Data preprocessing
Introduction to R, Binning, Standardization, Normalization
Type Conversion, Merging
Normal Curves, Central Tendency and Outlier Detection
Dimensionallity Reduction: PCA, SVD approaches
Handling Missing Values (K-NN, MI, Clustering etc.)
Essential Engineering Skills in Big Data Analytics: Data Visualization
Data Exploration - Histograms, Bar Chart, Box Plot, Line Graph, Scatter Plot

Data Story Telling - The Science, ggplot, Bubble Charts with Multiple Dimensions, Gauge Charts, Treem

Fundamentals of Probability and Statistical Methods

Probabilities, joint and conditional probabilites, simulations and estimations. Introduction to gaussian m

Data types, basic probabilities, Probability distributions (Discrete and Continuous) -Bernoulli, Binomia
Describing the relationship between attributes: Covariance; Correlation; ChiSquare
Special emphasis on Normal distribution; Central Limit Theorem
Inferential stats: t, f chi-square testing

Inferential statistics: How to learn about the population from a sample and vice versa; Sampling distrib
Hypothesis Testing
Statistics and Probability in Decision Modeling: Linear Regression

Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for finding
Constructing a Linear Regression, Diagnostics
Interpretation and Applications

Statistics and Probability in Decision Modeling: Logistic Regression


Why Linear Regression Fails and Logit Function

Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for finding
Constructing Logistic Regression, Diagnostics
Interpretation and Applications
Statistics and Probability in Decision Modeling: Time Series
Regression on Time
Modeling Seasonality as Deviation
Statistician's Approach: Components of a Time Series and Estimation Methods
Smoothing: Moving Average, Weighted and Exponential Moving
Holt Winters Method
Box-Jenkins and ARIMA
Methods and Algorithms in Machine Learning Supervised: Decision Trees
Rule Based Knowledge: Logic of Rules, Evaluating Rules, Rule Induction and Association Rules

Construction of Decision Trees through Simplified Examples; Choosing the "Best" attribute at each Non
Gain, Gini Index, Chi Square, Regression Trees

Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; ot
Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
Oblique Decision Trees
Methods and Algorithms in Machine Learning Supervised: Instance based learning
K-NN method, wilson editing and triangulation
K-NN in collaborative filtering, digit recognition
Methods and Algorithms in Machine Learning Supervised: Ensembles
Methods of Ensembling (Stacking, Mixture of Experts)
Bagging and Random forest (Logic, Practical Applications)
Ada Boost
Gradient Boosting Machines
Methods and Algorithms in Machine Learning Supervised: Neural Networks
Motivation for Neural Networks and Its Applications
Perceptron and Single Layer Neural Network, and Hand Calculations
Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
Application of Neural Net In Face and Digit Recognition
Deep Learning Basics: Restricted Boltzman Machines
Self Organizing Maps

Methods and Algorithms in Machine Learning Unsupervised: Clustering


Concept of Distance and related math background
K-Means Clustering
Expectation Maximization
Hierarchical Clustering
Spectral and Kernel Clustering

Methods and Algorithms in Machine Learning Supervised: Support Vector Machines and H
Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
Applications and Interpretation
Optimization and Decision Analysis
Linear Programming (Graphical Explanation)
Dual form and Sensitivity
Problem Setting and Applications
Goal Programming
Quadratic Programming: Gradient, Hessian, Lagrangian
Application of Quadratic Programming: Portfolio Allocation
Evolutionary Search (Genetic Algorithms)
Practical Applications
Text Mining, Social Network Analysis and Natural Language Processing

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Proper
Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting
Stemming; Chunking)
Handling big graphs
The purpose of it all: Finding patterns in data
Finding patterns in text: Mahout, text mining, text as a graph
Engineering Big Data with Hadoop Ecosystem
Introduction to Big Data
Data center as a computer
Storing big bytes
Rapidly ingesting & organizing unstructured data
Your key tool: Split and Merge
Querying big data
Processing big data
Statistics and Probability in Decision Modeling: Nave Bayes
Fundamentals of Probability; Random Variables, Distributions, Conditional and Marginal Probability

Bayes Theorem and Its Aplications


Bayesian Belief nets, MAP, Nave Rule and Nave Bayes
Applications of Nave Bayes in Text Mining, Spam Engines and Classifications
Total

Lecture
Time
(Hours)

Hands-onLab Time
(Hours)

1
0.5
0.5
0.5
0.5
0.5

1
0.5
0.5
0.5
0.5
0.5

0.5
1
1.5
1

0.5
1
1.5
1

1.5

1.5

1.5

1.5

4
4
4

4
4
4

2
1

2
1

1
1

1
1

0.5
0.5
1

0.5
0.5
1

0.5

0.5

0.5

0.5

1.5

1.5

1.5

1.5

1
1

1
1

2
2

2
2

0.5
1.5
1
1

0.5
1.5
1
1

1
1
1.5
1.5
1.5
1.5

1
1
1.5
1.5
1.5
1.5

1.5
1
1
1
1

1.5
1
1
1
1

2
2

2
2

2
2
2
1
1.5
1.5
1
1

2
2
2
1
1.5
1.5
1
1

4
4
4

4
4
4

4
4
4

4
4
4

4
4
4
4

4
4
4
4

2
2
1

2
2
1

144

144
288

S-ar putea să vă placă și