Documente Academic
Documente Profesional
Documente Cultură
Under Supervision Of
Prof. Dr. P. K. Das
IIFT (Kolkata Campus)
Presenter
Bidyut Kumar Mondal
Roll 5
MBA (IB) 2012 - 15
Agenda
1
Background
Objectives
Perspective on Big
Data
Repository of
Analytical Tools
Repository of Big Data
Techniques
An Application to Credit Risk
Modeling
Binary Logistic Regression
Research Methodology
4
5
6
10 Conclusion
Background
Recent trends
towards data driven
industry
Organizations
utilizing power of big
data are ahead of
competition.
Objectives
Area
Statistical Method
Data Requirement
Market Segmentation
Cluster Analysis
Purchase Intention
Factor Analysis
Churn Analysis
Group belongingness
Discriminant Analysis
Regression Analysis
ANOVA
Multidimensional scaling
10
11
Demand Forecasting
12
Conjoint Analysis
13
Quality Control
Hypotheses Testing
14
Regression Analysis
Data Pattern
Business Area
Analysis Tool
Customer activity based data like Website tracking history, purchase data, call centre
data, mobile data etc. are example of activity-based data
Predictive Analysis
Segmentation
Cluster Analysis
Predictive Analysis
Digital Marketing
Factor Analysis
Predictive Analysis
Purchase Intention
Predictive Analysis
Churn Analysis
Predictive Analysis
Credit Default
Modelling
Predictive Analysis
Agriculture
Discriminant Analysis
Predictive Analysis
Logistics
Discriminant Analysis
Predictive Analysis
Retail
Classification Techniques
Predictive Analysis
Retail
Regression Analysis
Predictive Analysis
Healthcare
Predictive Analysis
CRM
Multiple Regression
Predictive Analysis
Marketing
(Cross Sell)
Multiple Regression
Retail
(Inventory
Requirement)
9
10
User online profile data and their online purchase history and pattern
Customers footprints in network, clicks, browse, comments, review etc.
Customer product/service usage pattern data and customer demography data
Bank and financial institution data about loan and their current status along with
customer demography
Historical data received from the GPS tracker of consignments in shipment about its
location and condition
Data on customer buying pattern and clicking pattern on different cultural festival from
online retail website.
Customer purchase data given that the customers are provided with facilities like bonus
card.
Patient health data and their track record of disease.
11
Historical Data of marketing expenses and the demand of that period for several years
12
customers' spending,
usage and other behaviour exhibited in a retail shop
13
Predictive Analysis
Predictive Analysis
Finance
Regression Analysis.
Predictive Analysis
Economics
Different document and their key words while uploading the document in online website.
Predictive Analysis
Web Publishing
Discriminant Analysis
14
15
16
Logistics
Online Retail
Retail
Healthcare
Lowering NPA
Increase
Customer Base
Odds Ratio
Research Methodology
Data Collection
Data Cleaning
Data Coding
Binary Logistic
Regression
ROC Analysis &
Model Selection
Data
Raw Data
(3,91,000)
Debt
Consolidation
Credit Card
Home Loan
Observed
Step 11
is_defaulter
Overall Percentage
a. The cut value is .500
46085
825
2311
3053
Unselected Casesc
is_defaulter
Percenta
ge
0
1
Correct
4224
98.2
796
98.2
7
56.9 2057
2626
56.1
94.00
94.00
Model 2
Predicted
Selected Casesb
Observed
is_defaulter
0
is_defaulter
Step 11
Overall Percentage
a. The cut value is .500
Unselected Casesc
Percentage
Correct
is_defaulter
0
Percentage
Correct
46012
898
98.1
42246
797
98.1
2658
2706
50.4
2342
2341
50
93.2
93.4
Model
Cutoff
TP
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TN
40473
43423
44843
45624
46085
46402
46609
46772
46864
FP
4822
4370
3953
3490
3053
2631
2141
1651
1074
FN
6437
3487
2067
1286
825
508
301
138
46
Sensitivity
542
994
1411
1874
2311
2733
3223
3713
4290
0.99
0.98
0.97
0.96
0.95
0.94
0.94
0.93
0.92
Specificity
1- Specificity
0.43
0.56
0.66
0.73
0.79
0.84
0.88
0.92
0.96
0.57
0.44
0.34
0.27
0.21
0.16
0.12
0.08
0.04
Model
Cutoff
TP
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TN
39834
43216
44667
45466
46012
46362
46586
46746
46869
FP
4771
4236
3658
3171
2706
2246
1815
1374
881
FN
7076
3694
2243
1444
898
548
324
164
41
Sensitivity
593
1128
1706
2193
2658
3118
3549
3990
4483
Specificity
0.99
0.97
0.96
0.95
0.95
0.94
0.93
0.92
0.91
0.40
0.53
0.62
0.69
0.75
0.80
0.85
0.89
0.96
1- Specificity
0.60
0.47
0.38
0.31
0.25
0.20
0.15
0.11
0.04
Model1 Wins
S.E.
Wald
Sig.
Exp(B)
Lower
Upper
annual_inc
3184.977
0.0
delinq_2yrs
0.708
0.089
62.6
0.0
2.03
1.704
2.419
dti
0.278
0.005
3643.476
0.0
1.32
1.15
1.564
-0.027
0.007
14.967
0.0
0.973
0.96
0.987
0.004
6074.822
0.0
1.004
1.004
1.004
funded_amnt_inv
-0.004
6165.636
0.0
0.996
0.996
0.996
inq_last_6mths
-1.077
0.032
1128.505
0.0
0.341
0.32
0.363
int_rate
1.597
0.719
901.239
0.0
4.9382
4.8201
5.101
mths_since_last_delin
q
0.031
0.001
492.118
0.0
1.031
1.029
1.034
term_months
-0.11
0.005
532.142
0.0
0.896
0.888
0.905
total_rec_late_fee
5.256
0.104
2538.213
0.0
191.686
10.511
0.238
1951.251
0.0
36735.161
emp_length_year
funded_amnt
Constant
156.24 235.175
Conclusion
The organization who will interpret it and convert them to actionable information will
outperform among the competitors.
Google, Amazon, Microsoft, IBM, DHL, P&G are some the leading organization who have
leading the big data analytics in current market
How big data analytics and its strength will be used in an organization depends on organization culture