Sunteți pe pagina 1din 22

Decision Making

using Big Data


Analytics
in International
Business

Under Supervision Of
Prof. Dr. P. K. Das
IIFT (Kolkata Campus)

Presenter
Bidyut Kumar Mondal
Roll 5
MBA (IB) 2012 - 15

Agenda
1

Background

Objectives

Perspective on Big
Data
Repository of
Analytical Tools
Repository of Big Data
Techniques
An Application to Credit Risk
Modeling
Binary Logistic Regression

Research Methodology

Results & Interpretation

4
5
6

10 Conclusion

Background
Recent trends
towards data driven
industry

Huge volume of data


is being generated
everyday.

So, big data analytics


came into existence.

Issue is how to store


& analyze the data to
get information

Organizations
utilizing power of big
data are ahead of
competition.

Big data will change


the way people live

Objectives

To develop a repository of analytical tools appropriate for real-life


problem solving in different sectors.

To study the use of big data analytics in different domains for


taking international business decision.

To apply appropriate classification techniques to establish a


model to classify defaulter in loan on secondary big data.

Big data a perspective

Repository of Analytical Techniques


Item

Area

Statistical Method

Data Requirement

Market Segmentation

Cluster Analysis

Buy & sell data for long period

Purchase Intention

Factor Analysis

Survey to get rating of each product attribute

Churn Analysis

Binary Logistic Regression

Credit Default Probability

Binary Logistic Regression

Group belongingness

Discriminant Analysis

Probability of Disease of a group

Binary Logistic Regression

Calculate Price Elasticity

Regression Analysis

Calculate Productivity of Employees

ANOVA

Find out brand positioning/product


positioning

Multidimensional scaling

10

Lost Sales Analysis

Binary Logistic Regression

11

Demand Forecasting

Time Series Forecasting

12

New Product Design

Conjoint Analysis

13

Quality Control

Hypotheses Testing

14

Customer Loyalty Analysis

Regression Analysis

Instances of customers who left and who


stayed with the service/organization
Data where there are instances of default and
non-default both
Data where there are instances of person
belonging to a group and do not belong
Data where there are instances of person
having disease and do not have
Price of a product in different times and sales
of the product at that time
Data of employee output on different work
condition
Customer rating on similarity for each pair of
product or brand in 7 point Likert scale.
Data where there are instances of bids
resulting in sales and which do not got
converted into sales
Historical demand data of previous years for
more than 10 years of data
Preference rank data for each of the
attribute of the product is taken from the
respondent.
A random sample is drawn from the
production floor and Z test or t test is applied
on the sample.
Data should be collected from sample
respondent about how satisfied they are with
product and how long he is buying

Repository of Big data Techniques


Item

Data Pattern

Big Data Analysis

Business Area

Analysis Tool

Customer activity based data like Website tracking history, purchase data, call centre
data, mobile data etc. are example of activity-based data

Predictive Analysis

Segmentation

Cluster Analysis

Predictive Analysis

Digital Marketing

Factor Analysis

Predictive Analysis

Purchase Intention

Binary Logistic Regression

Predictive Analysis

Churn Analysis

Binary Logistic Regression

Predictive Analysis

Credit Default
Modelling

Binary Logistic Regression

Predictive Analysis

Agriculture

Discriminant Analysis

Predictive Analysis

Logistics

Discriminant Analysis

Predictive Analysis

Retail

Classification Techniques

Predictive Analysis

Retail

Regression Analysis

Predictive Analysis

Healthcare

Binary Logistic Regression

Predictive Analysis

CRM

Multiple Regression

Predictive Analysis

Marketing
(Cross Sell)

Multiple Regression

Retail
(Inventory
Requirement)

Time Series Forecasting

9
10

User online profile data and their online purchase history and pattern
Customers footprints in network, clicks, browse, comments, review etc.
Customer product/service usage pattern data and customer demography data

Bank and financial institution data about loan and their current status along with
customer demography

Historical health parameter data of animals in a dairy firm

Historical data received from the GPS tracker of consignments in shipment about its
location and condition

Data on customer buying pattern and clicking pattern on different cultural festival from
online retail website.

Customer purchase data given that the customers are provided with facilities like bonus
card.
Patient health data and their track record of disease.

11

Historical Data of marketing expenses and the demand of that period for several years

12

customers' spending,
usage and other behaviour exhibited in a retail shop

13

Historical demand data in store level and inventory level.

Predictive Analysis

Historical data of risk and return of a portfolio.

Historical data of unemployment of a country.

Predictive Analysis

Finance

Regression Analysis.

Predictive Analysis

Economics

Time Series Forecasting

Different document and their key words while uploading the document in online website.

Predictive Analysis

Web Publishing

Discriminant Analysis

14
15
16

Big data Case Studies


Agriculture

Texan Dairy: Case Cattle Health

Logistics

DHL: Case Predictive Analysis

Online Retail

Amazon: Case Predictive Shipment

Retail

Walmart: Case Customer Loyalty

Healthcare

CCHHS: Case Disease Prediction

An Application to Credit Risk


Modelling
Is it possible to
predict whether a
customer is likely to
default in the loan
before sanctioning?

Lowering NPA

Increase
Customer Base

Binary Logistic Regression Variables


Dependant Variable - Dichotomous

Independent Variable Categorical or numerical

Independent Variable Categorical variables need coding

Binary Logistic Regression Assumptions


Logistic regression does not rely on distributional assumptions in the
same sense that discriminant analysis does.
However, your solution may be more stable if your predictors have a
multivariate normal distribution

Additionally, as with other forms of regression, multi-collinearity


among the predictors can lead to inflated standard errors

The procedure is most effective when group membership is a truly


categorical variable

Binary Logistic Regression - Odds

Odds Ratio

Log of odds Ratio

Research Methodology
Data Collection

Data Cleaning

Data Coding

Binary Logistic
Regression
ROC Analysis &
Model Selection

Data
Raw Data
(3,91,000)

Debt
Consolidation

Credit Card

Home Loan

Binary Logistic Regression Variables

Results & Interpretation Credit


Card Segment
Model 1
Predicted
Selected Cases
is_defaulter
Percentage
Correct
0
1
b

Observed

Step 11

is_defaulter

Overall Percentage
a. The cut value is .500

46085

825

2311

3053

Unselected Casesc
is_defaulter
Percenta
ge
0
1
Correct
4224
98.2
796
98.2
7
56.9 2057
2626
56.1
94.00

94.00

Model 2
Predicted
Selected Casesb

Observed

is_defaulter
0
is_defaulter
Step 11
Overall Percentage
a. The cut value is .500

Unselected Casesc

Percentage
Correct

is_defaulter
0

Percentage
Correct

46012

898

98.1

42246

797

98.1

2658

2706

50.4

2342

2341

50

93.2

93.4

ROC Curve - Model1

Model

Cutoff

TP
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9

TN
40473
43423
44843
45624
46085
46402
46609
46772
46864

FP
4822
4370
3953
3490
3053
2631
2141
1651
1074

FN
6437
3487
2067
1286
825
508
301
138
46

Sensitivity
542
994
1411
1874
2311
2733
3223
3713
4290

0.99
0.98
0.97
0.96
0.95
0.94
0.94
0.93
0.92

Specificity
1- Specificity
0.43
0.56
0.66
0.73
0.79
0.84
0.88
0.92
0.96

0.57
0.44
0.34
0.27
0.21
0.16
0.12
0.08
0.04

ROC Curve Model2

Model

Cutoff

TP
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9

TN
39834
43216
44667
45466
46012
46362
46586
46746
46869

FP
4771
4236
3658
3171
2706
2246
1815
1374
881

FN
7076
3694
2243
1444
898
548
324
164
41

Sensitivity
593
1128
1706
2193
2658
3118
3549
3990
4483

Specificity
0.99
0.97
0.96
0.95
0.95
0.94
0.93
0.92
0.91

0.40
0.53
0.62
0.69
0.75
0.80
0.85
0.89
0.96

1- Specificity
0.60
0.47
0.38
0.31
0.25
0.20
0.15
0.11
0.04

ROC Curve Credit Card Segment

Model1 Wins

Model1 Credit Card Segment


Variables

S.E.

Wald

Sig.

95% C.I.for EXP(B)

Exp(B)

Lower

Upper

annual_inc

3184.977

0.0

delinq_2yrs

0.708

0.089

62.6

0.0

2.03

1.704

2.419

dti

0.278

0.005

3643.476

0.0

1.32

1.15

1.564

-0.027

0.007

14.967

0.0

0.973

0.96

0.987

0.004

6074.822

0.0

1.004

1.004

1.004

funded_amnt_inv

-0.004

6165.636

0.0

0.996

0.996

0.996

inq_last_6mths

-1.077

0.032

1128.505

0.0

0.341

0.32

0.363

int_rate

1.597

0.719

901.239

0.0

4.9382

4.8201

5.101

mths_since_last_delin
q

0.031

0.001

492.118

0.0

1.031

1.029

1.034

term_months

-0.11

0.005

532.142

0.0

0.896

0.888

0.905

total_rec_late_fee

5.256

0.104

2538.213

0.0

191.686

10.511

0.238

1951.251

0.0

36735.161

emp_length_year
funded_amnt

Constant

156.24 235.175

Conclusion
The organization who will interpret it and convert them to actionable information will
outperform among the competitors.

Google, Amazon, Microsoft, IBM, DHL, P&G are some the leading organization who have
leading the big data analytics in current market

How big data analytics and its strength will be used in an organization depends on organization culture

Challenges Data Collection, Technical, Expertise

Threats Individual Privacy, Need Govt regulation & monitoring

S-ar putea să vă placă și