Prediction For Subscription in Bank and Best Segment For Launch of New Product

The following data is from a direct marketing campaign of a bank.
The goal of the

campaign was to get the customer to subscribe a term deposit. The marketing campaign
consisted of making multiple phone calls to the customer.
Below is a description of the explanatory variables available in the data.
Bank client data

1 Age
2 Job admin.","unknown","unemployed","management","housemaid","entrepre
neur","student", "blue-collar","self-
employed","retired","technician","services"
3 Marital "married","divorced","single"; note: "divorced" means divorced or
widowed
4 Education unknown","secondary","primary","tertiary"
5 Default Has credit in default (yes, no)
6 Balance average yearly balance, in euros (numeric)
7 Housing Has housing loan (yes, no)
8 loan Has personal loan (yes no)
9 Contact contact communication type (categorical:
"unknown","telephone","cellular")
10 Day last contact day of the month (numeric)
11 Month last contact month of year (categorical: "jan", "feb", "mar", ..., "nov",
"dec")
12 Duration last contact duration, in seconds (numeric)
13 Campaign number of contacts performed during this campaign and for this client
(numeric, includes last contact)
14 Pdays number of days that passed by after the client was last contacted from a
previous campaign (numeric, -1 means client was not previously
contacted)
15 Previous number of contacts performed before this campaign and for this client
(numeric)
16 Poutcome outcome of the previous marketing campaign (categorical:
"unknown","other","failure","success"
17 Dependen Has the client subscribed a term deposit (yes, no)
t variable
(Y)
1. Use Logistic regression to build a model.
The SAS System
The LOGISTIC Procedure
1
Model Information
Data Set WORK.BANK
Response Variable y
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 4521

Number of Observations Used 4521
Response Profile
Ordered Total
Value y Frequency
1 yes 521
2 no 4000
Probability modeled is y='yes'.
Class Level Information

Class Value Design Variables
job admin. 1 0 0 0 0 0 0 0 0 0 0
blue-collar 0 1 0 0 0 0 0 0 0 0 0
entrepreneur 0 0 1 0 0 0 0 0 0 0 0
housemaid 0 0 0 1 0 0 0 0 0 0 0
management 0 0 0 0 1 0 0 0 0 0 0
retired 0 0 0 0 0 1 0 0 0 0 0
self-employed 0 0 0 0 0 0 1 0 0 0 0
services 0 0 0 0 0 0 0 1 0 0 0
student 0 0 0 0 0 0 0 0 1 0 0
technician 0 0 0 0 0 0 0 0 0 1 0
unemployed 0 0 0 0 0 0 0 0 0 0 1
unknown -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
marital divorced 1 0
2
Class Level Information
Class Value Design Variables
married 0 1
single -1 -1
education primary 1 0 0
secondary 0 1 0
tertiary 0 0 1
unknown -1 -1 -1
default no 1
yes -1
housing no 1
yes -1
loan no 1
yes -1
month apr 1 0 0 0 0 0 0 0 0 0 0
aug 0 1 0 0 0 0 0 0 0 0 0
dec 0 0 1 0 0 0 0 0 0 0 0
feb 0 0 0 1 0 0 0 0 0 0 0
jan 0 0 0 0 1 0 0 0 0 0 0
jul 0 0 0 0 0 1 0 0 0 0 0
jun 0 0 0 0 0 0 1 0 0 0 0
mar 0 0 0 0 0 0 0 1 0 0 0
may 0 0 0 0 0 0 0 0 1 0 0
nov 0 0 0 0 0 0 0 0 0 1 0
oct 0 0 0 0 0 0 0 0 0 0 1
sep -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
poutcome failure 1 0 0
other 0 1 0
success 0 0 1
unknown -1 -1 -1
contact cellular 1 0
telephone 0 1
unknown -1 -1
3
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics

Intercept and
Criterion Intercept Only Covariates
AIC 3233.000 2259.651
SC 3239.417 2535.560
-2 Log L 3231.000 2173.651
Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 1057.3489 42 <.0001
Score 1311.3773 42 <.0001
Wald 659.5289 42 <.0001
Type 3 Analysis of Effects

Wald
Effect DF Chi-Square Pr > ChiSq
age 1 0.3528 0.5525
job 11 19.9450 0.0461
marital 2 7.5790 0.0226
education 3 5.7916 0.1222
default 1 1.5935 0.2068
housing 1 3.5463 0.0597
loan 1 9.9137 0.0016
month 11 104.5815 <.0001
poutcome 3 102.0796 <.0001
balance 1 0.0500 0.8230
contact 2 38.7227 <.0001
day 1 4.0427 0.0444
duration 1 437.3055 <.0001
4
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
campaign 1 6.2314 0.0126
pdays 1 0.0097 0.9217
previous 1 0.0208 0.8853
Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -2.6748 0.4910 29.6779 <.0001 0.069
age 1 -0.00423 0.00713 0.3528 0.5525 0.996
job admin. 1 0.0580 0.1895 0.0938 0.7594 1.060
job blue-collar 1 -0.3344 0.1747 3.6627 0.0556 0.716
job entrepreneur 1 -0.1917 0.3147 0.3712 0.5424 0.826
job housemaid 1 -0.2949 0.3469 0.7230 0.3952 0.745
job management 1 -0.0150 0.1583 0.0089 0.9247 0.985
job retired 1 0.6896 0.2434 8.0236 0.0046 1.993
job self-employed 1 -0.1231 0.2852 0.1863 0.6660 0.884
job services 1 -0.0876 0.2165 0.1638 0.6857 0.916
job student 1 0.4365 0.3189 1.8734 0.1711 1.547
job technician 1 -0.1346 0.1597 0.7100 0.3995 0.874
job unemployed 1 -0.5815 0.3554 2.6773 0.1018 0.559
marital divorced 1 0.2582 0.1164 4.9195 0.0266 1.295
marital married 1 -0.2113 0.0836 6.3911 0.0115 0.810
education primary 1 0.00504 0.1628 0.0010 0.9753 1.005
education secondary 1 0.0851 0.1183 0.5178 0.4718 1.089
education tertiary 1 0.3258 0.1391 5.4859 0.0192 1.385
default no 1 -0.2723 0.2157 1.5935 0.2068 0.762
housing no 1 0.1300 0.0690 3.5463 0.0597 1.139
loan no 1 0.3148 0.1000 9.9137 0.0016 1.370
month apr 1 -0.0726 0.1908 0.1449 0.7034 0.930
month aug 1 -0.3808 0.1690 5.0757 0.0243 0.683
month dec 1 0.0418 0.5804 0.0052 0.9426 1.043
5
Standard Wald
month feb 1 0.1295 0.2216 0.3417 0.5588 1.138
month jan 1 -1.1959 0.3240 13.6227 0.0002 0.302
month jul 1 -0.8241 0.1826 20.3663 <.0001 0.439
month jun 1 0.4815 0.2241 4.6172 0.0317 1.619
month mar 1 1.4259 0.3214 19.6793 <.0001 4.161
month may 1 -0.5627 0.1671 11.3422 0.0008 0.570
month nov 1 -0.9156 0.2088 19.2357 <.0001 0.400
month oct 1 1.2884 0.2638 23.8489 <.0001 3.627
poutcome failure 1 -0.7036 0.1656 18.0635 <.0001 0.495
poutcome other 1 -0.2124 0.1922 1.2221 0.2689 0.809
poutcome success 1 1.7413 0.1797 93.9424 <.0001 5.705
balance 1 -3.91E-6 0.000017 0.0500 0.8230 1.000
contact cellular 1 0.4954 0.1129 19.2564 <.0001 1.641
contact telephone 1 0.4252 0.1670 6.4841 0.0109 1.530
day 1 0.0164 0.00816 4.0427 0.0444 1.017
duration 1 0.00423 0.000202 437.3055 <.0001 1.004
campaign 1 -0.0704 0.0282 6.2314 0.0126 0.932
pdays 1 -0.00010 0.000996 0.0097 0.9217 1.000
previous 1 -0.00551 0.0382 0.0208 0.8853 0.995
Odds Ratio Estimates

95% Wald
Effect Point Estimate Confidence Limits
age 0.996 0.982 1.010
job admin. vs unknown 0.594 0.189 1.871
job blue-collar vs unknown 0.401 0.128 1.253
job entrepreneur vs unknown 0.463 0.130 1.642
job housemaid vs unknown 0.417 0.114 1.533
job management vs unknown 0.552 0.180 1.690
job retired vs unknown 1.117 0.346 3.611
job self-employed vs unknown 0.496 0.144 1.702
6
95% Wald
job services vs unknown 0.514 0.159 1.660
job student vs unknown 0.867 0.244 3.079
job technician vs unknown 0.490 0.159 1.510
job unemployed vs unknown 0.313 0.084 1.168
marital divorced vs single 1.357 0.910 2.023
marital married vs single 0.848 0.635 1.133
education primary vs unknown 1.524 0.756 3.068
education secondary vs unknown 1.651 0.874 3.118
education tertiary vs unknown 2.100 1.087 4.057
default no vs yes 0.580 0.249 1.351
housing no vs yes 1.297 0.989 1.700
loan no vs yes 1.877 1.268 2.778
month apr vs sep 0.518 0.231 1.161
month aug vs sep 0.381 0.174 0.833
month dec vs sep 0.581 0.141 2.391
month feb vs sep 0.634 0.277 1.455
month jan vs sep 0.169 0.063 0.449
month jul vs sep 0.244 0.109 0.547
month jun vs sep 0.902 0.392 2.076
month mar vs sep 2.319 0.878 6.130
month may vs sep 0.318 0.146 0.693
month nov vs sep 0.223 0.097 0.511
month oct vs sep 2.021 0.827 4.940
poutcome failure vs unknown 1.129 0.603 2.114
poutcome other vs unknown 1.846 0.930 3.661
poutcome success vs unknown 13.020 7.025 24.132
balance 1.000 1.000 1.000
contact cellular vs unknown 4.121 2.637 6.439
contact telephone vs unknown 3.842 2.085 7.079
day 1.017 1.000 1.033
duration 1.004 1.004 1.005
7
95% Wald
campaign 0.932 0.882 0.985
pdays 1.000 0.998 1.002
previous 0.995 0.923 1.072
Association of Predicted Probabilities and Observed

Responses
Percent Concordant 90.1 Somers' D 0.805
Percent Discordant 9.6 Gamma 0.807
Percent Tied 0.2 Tau-a 0.164
Pairs 2084000 c 0.903
2. What percent of customers in data subscribed to a term deposit?
Answer:
Response Profile
Ordered y Total
Value Frequency
1 yes 521
2 no 4000
11.52% of Customers in data subscribed to a term deposit
3. Interpret the meaning of each (significant) coefficient clearly.
Answer: Significant Coefficient are those which have p value les that .05.( The column Pr>Chi-
Square gives you the probability of realizing the estimate in the Parameter estimate column if the
estimate were truly zero if this value is < 0.05 the estimate is considered to be significant.)
8
Standard Wald
Intercept 1 -2.6748 0.4910 29.6779 <.0001 0.069
age 1 -0.00423 0.00713 0.3528 0.5525 0.996
job admin. 1 0.0580 0.1895 0.0938 0.7594 1.060
job blue-collar 1 -0.3344 0.1747 3.6627 0.0556 0.716
job entrepreneur 1 -0.1917 0.3147 0.3712 0.5424 0.826
job housemaid 1 -0.2949 0.3469 0.7230 0.3952 0.745
job management 1 -0.0150 0.1583 0.0089 0.9247 0.985
job retired 1 0.6896 0.2434 8.0236 0.0046 1.993
job self-employed 1 -0.1231 0.2852 0.1863 0.6660 0.884
job services 1 -0.0876 0.2165 0.1638 0.6857 0.916
job student 1 0.4365 0.3189 1.8734 0.1711 1.547
job technician 1 -0.1346 0.1597 0.7100 0.3995 0.874
job unemployed 1 -0.5815 0.3554 2.6773 0.1018 0.559
marital divorced 1 0.2582 0.1164 4.9195 0.0266 1.295
marital married 1 -0.2113 0.0836 6.3911 0.0115 0.810
education primary 1 0.00504 0.1628 0.0010 0.9753 1.005
education secondary 1 0.0851 0.1183 0.5178 0.4718 1.089
education tertiary 1 0.3258 0.1391 5.4859 0.0192 1.385
default no 1 -0.2723 0.2157 1.5935 0.2068 0.762
housing no 1 0.1300 0.0690 3.5463 0.0597 1.139
loan no 1 0.3148 0.1000 9.9137 0.0016 1.370
month apr 1 -0.0726 0.1908 0.1449 0.7034 0.930
month aug 1 -0.3808 0.1690 5.0757 0.0243 0.683
month dec 1 0.0418 0.5804 0.0052 0.9426 1.043
month feb 1 0.1295 0.2216 0.3417 0.5588 1.138
month jan 1 -1.1959 0.3240 13.6227 0.0002 0.302
month jul 1 -0.8241 0.1826 20.3663 <.0001 0.439
month jun 1 0.4815 0.2241 4.6172 0.0317 1.619
month mar 1 1.4259 0.3214 19.6793 <.0001 4.161
month may 1 -0.5627 0.1671 11.3422 0.0008 0.570
month nov 1 -0.9156 0.2088 19.2357 <.0001 0.400
9
Standard Wald
month oct 1 1.2884 0.2638 23.8489 <.0001 3.627
poutcome failure 1 -0.7036 0.1656 18.0635 <.0001 0.495
poutcome other 1 -0.2124 0.1922 1.2221 0.2689 0.809
poutcome success 1 1.7413 0.1797 93.9424 <.0001 5.705
balance 1 -3.91E-6 0.000017 0.0500 0.8230 1.000
contact cellular 1 0.4954 0.1129 19.2564 <.0001 1.641
contact telephone 1 0.4252 0.1670 6.4841 0.0109 1.530
day 1 0.0164 0.00816 4.0427 0.0444 1.017
duration 1 0.00423 0.000202 437.3055 <.0001 1.004
campaign 1 -0.0704 0.0282 6.2314 0.0126 0.932
pdays 1 -0.00010 0.000996 0.0097 0.9217 1.000
previous 1 -0.00551 0.0382 0.0208 0.8853 0.995
Significant variables:
Paramter Estimate
job retired .6896
The logistic regression Coefficient comparing if the person job is retired compared to the person job is
unknown the difference in log odds is expected to be 0.6896 units higher
marital divorced .2582
From maximum likelihood table, we infer that the estimated logistic regression coefficient comparing
marital status divorced to the Marital status single given the other variables constant in the model. The
difference in log odds is expected to be 0.2582 units higher for divorced compared to single
marital married -.2113

From maximum likelihood table, we infer that the estimated logistic regression coefficient comparing
marital status Married to Marital status single given the other variables constant in the model. The
difference in log odds is expected to be 0.2113 units lower for married compared to single
education tertiary .3258

This implies that if the customer is highly educated and is tertiary level as compared to education
unknown, the odd of buying the term deposit will be higher by .3258.
loan no .3148
10
From the estimate value, we can say that the estimated logistic regression coefficient comparing if the
person does not have a personal loan compared to the person has a loan the difference in log odds is
expected to be 0.3148 units higher
month aug -.3808

We can observe that the estimated logistic regression coefficient comparing if month is August
compared to September the difference in log odds is expected to be .3808 units lower
month jan -1.1959

We can observe that the estimated logistic regression coefficient comparing if month is January
compared to September the difference in log odds is expected to be 1.1959 units lower
month jul -.8241

We can observe that the estimated logistic regression coefficient comparing if month is July compared
to September the difference in log odds is expected to be .8241 units lower
month jun .4815

We can observe that the estimated logistic regression coefficient comparing if month is June compared
to September the difference in log odds is expected to be .4815 units higher
month mar 1.4259

We can observe that the estimated logistic regression coefficient comparing if month is March
compared to September the difference in log odds is expected to be 1.4259 units higher
month may -.5627

We can observe that the estimated logistic regression coefficient comparing if month is May compared
to September the difference in log odds is expected to be .5627 units lower
month nov -.9156
We can observe that the estimated logistic regression coefficient comparing if month is November
compared to September the difference in log odds is expected to be .9156 units lower
month oct 1.2884

We can observe that the estimated logistic regression coefficient comparing if month is October
compared to September the difference in log odds is expected to be 1.2884 units higher
poutcome failure -.7036

By Seeing the values, the estimated logistic regression coefficient comparing if the outcome of the
previous marketing campaign is Failure compared to unknown the difference in log odds is expected to
be 0.7036 units lower.
poutcome success 1.7413
11
By seeing the values, the estimated logistic regression coefficient comparing if the outcome of the
previous marketing campaign is success compared to unknown the difference in log odds is expected to
be 1.7413 units higher
contact cellular .4954

We can clearly say that the estimated logistic regression coefficient comparing contact communication
type cellular to contact communication type unknown given the other variables constant in the model.
The difference in log odds is expected to be 0.4954 units higher for contact communication type cellular
compared to contact communication type unknown
contact telephone .4252

From the values, we know that the estimated logistic regression coefficient comparing contact
communication type telephone to contact communication type unknown given the other variables
constant in the model. The difference in log odds is expected to be 0.4252 units higher for contact
communication type Telephone compared to contact communication type unknown
day .0164
It can be interpreted that one unit change in last day of contact in month results in .0164 increase in
odds of buying the term deposited
duration .00423
We can say that as the last contact duration increases by one unit keeping all the others variables
constant the difference in log odds is expected to be 0.00423 units higher
campaign -.0704
Campaign are the number of contacts performed during this campaign and for a particular client
increases by one unit keeping all the variables constant the difference in log odds is expected to
be 0.0704 units lower.
4. Interpret the fit of the model based on -2logL, AIC, SC.
Answer:
Model Fit Statistics
Intercept and
Criterion Intercept Only Covariates
AIC 3233.000 2259.651
SC 3239.417 2535.560
-2 Log L 3231.000 2173.651
AIC, SC, -2LogL value are similar to adjuster R-square that we have in linear regression and the
lesser value it is the better model it will be. We also know that the larger difference between the
values of AIC /SC intercept only and Intercept and Covariates is better the model fit.
12
Also, the ratio of difference in -2Log L intercept only and -2Log L intercept and covariates upon
-2Log L intercept gives the mc fadden R square which is 32.725% which is considered to be a
good model Mc fadden R square indicates that 32.725 % of variation in dependent variable is
explained by our explanatory variables.
5. What is the percent concordant? What does it mean?
Answer:
Association of Predicted Probabilities and Observed
Responses
Percent Concordant 90.1 Somers' D 0.805
Percent Discordant 9.6 Gamma 0.807
Percent Tied 0.2 Tau-a 0.164
Pairs 2084000 c 0.903
By definition we know that Percent Concordant is a a pair of observations with different

observed responses is said to be concordant if the observation with the lower ordered response
value has a lower predicted mean score than the observation with the higher ordered response
value
It says 90.1 % are model predict correct output. Percent Concordant depicts how many time are
product will be right.
6. Use the parameters to predict whether a customer will subscribe to a term deposit
for the new data bank-full.csv (hint: use proc score).
Answer: From the predicted data after applying Score in SAS, we can say that 6.04% of the
customer will subscribe to a term deposit for the new bank-full data whereas 93.96 will not.
The SAS System
The FREQ Procedure

Into: y
Cumulative Cumulative
I_y Frequency Percent Frequency Percent
no 42479 93.96 42479 93.96
yes 2732 6.04 45211 100.00
13
The SAS System
The FREQ Procedure

Frequency Table of I_y by y
Percent
y
Row Pct
Col Pct I_y(Into: y) no yes Total
no 38965 3514 42479
86.18 7.77 93.96
91.73 8.27
97.60 66.44
yes 957 1775 2732

2.12 3.93 6.04
35.03 64.97
2.40 33.56
Total 39922 5289 45211

88.30 11.70 100.00
I_y Predicted Score, yExpected Score

Accuracy of model is ((True Positive + True Negative)/(True Positive +True Negative + False
positive + False Negative))*100 = 90.11
So, accuracy of our model is 90.11% and 6.04 %(2732rows) times customer will subscribe.
InnovPDA is planning of launching a new product in the market.

It is an innovative product having the capability of transmitting and receiving both
data and voice.
This new product can send and receive emails, access the Internet through wireless
links.
InnovPDAs objective is to segment the market and identify the demographic
variables that define the segments in order to launch an advertising campaign to
target the appropriate segments.
14
Data Description:
Survey was targeted at respondents across a broad range of occupations.

Respondents were pre-screened to include those likely to purchase a PDA in the next 6-
12 months.
The survey was designed to collect both behavioral and demographic information for
cluster and discriminant analysis purposes.
Segmentation Variables:
X1 Are you among the first to adopt new technologies? (1- Strongly disagree, 7-
strongly agree)
X2 Do you use a pager or an instant messaging service often? (1- never, 7- very
often)
X3 Do you use a cell phone often? (1- never, 7-very often)
X4 Do you use contact management tools often? (1- never, 7-very often)
X5 How often do colleagues send you time sensitive information when you are
away from office? (1- never, 7-very often)
X6 How often do you send time sensitive information when away from office?
(1- never, 7-very often)
X7 How often do you require remote informational access? (1- never, 7-very
often)
X8 How important is it for you to share information remotely with colleagues?
(1- not at all important, 7-very important)
X9 How important is it to view information on a large size high-resolution
display? (1- not at all important, 7-very important)
X10 How important is it to continuous email access to you? (1- not at all
important, 7-very important)
X11 How important is it for you to have permanent web access? (1- not at all
X12 How important is it to for you to use multimedia features? (1- not at all
X13 How important is it to for you to have light communication device? (1- not at
all important, 7-very important)
X14 How much extra per month are you willing to pay to get features like instant
communication, cellular phone etc. in a PDA? (in dollars)
X15 How much are you willing to pay for a PDA with all the above mentioned
features? (in dollars)
Demographic Variables: ( 0 = No, 1 = Yes) Data in PDA_2.csv
Y1 Age
Y2 Education (1 high school, 2 some college, 3 college graduate, 4
graduate degree)
Y3 Income
15
Which industry do you belong to?
Y4 Construction
Y5 Emergency
Y6 Sales
Y7 Maintenance and Service
Y8 Professional
Y9 Computer
Y10 Do you own a PDA?

Y11 Do you own a cell phone?
Y12 Do you own or have access to a laptop?
Y13 How often do you work from a remote location?
Media consumption
Y14 Business Week
Y15 PC Magazine
Y16 Field and Stream
Y17 Modern Gourmet?
Questions:
7. Perform hierarchical cluster analysis on the InnovPDA data (use PDA_1.csv).
Answer:
16
17
18
8. How many segments will you keep? How will you decide that?
Answer:
To find the number of clusters, we used Pseudo t-squared test, so we start at the top of the
printed output and look for the first relatively large value, ( i.e. 39.5) and move back up one
cluster. Thus, we select 3 clusters.
1) Keep 4 clusters.
Answer:
19
9. Describe the different clusters based on their mean values of responses to X1-X15
and Y1-Y17.
Answer:
Cluster 1: They are moderately open to innovation. These people are in their early 40s. Their
income is around 52. They use cell phone quiet often. They use contact management tools very
often. They require remote informational access very often. It is moderately important for these
people to share information remotely. It is moderately important for them to view information
on a large size high resolution display. Multimedia features are important. Permanent web access
is important. It is not very important for them to have a light communication device. They are
some college graduate. They are willing to pay $490.6. They have access to laptop and use cell
phone. They sometimes work from a remote location.
20
Cluster 2:
They too are moderately open to innovation. These people are in their early 40s. Their income is
around 73. They use cell phone quiet often. They do not use contact management tools very
often. They require remote informational access often but not much as compare to other clusters.
It is moderately important for these people to share information remotely. It is moderately
important for them to view information on a large size high resolution display. Multimedia
features are moderately important. Permanent web access is quite important. Continuous email
access is important. It is important for them to have a light communication device. Email is very
much important to them. They are willing to pay $335.8. They are some college graduate. They
also work very often from a remote location.
21
Cluster 3:
They are moderately open to innovation, we can say that from the factor of innovator. These
people are in their mid-30s. Their income is around 58.9. They use cell phone quiet often. They
use contact management tools very often. It is moderately important for these people to share
information remotely. It is important for them to view information on a large size high resolution
display. Multimedia features are important. Permanent web access is very important. Continuous
email access is very important. It is important for them to have a light communication device.
They are willing to pay $269.7. They are some college graduate. They work often from a remote
location. They have access to laptop.
22
Cluster 4:
With less value of Innovator we can say that they are not open to innovation. These people are in
their early 40s. Their income is around 51. They use cell phone quiet often. They do not use
contact management tools very often. They do not receive time sensitive information quiet often.
They sometimes require remote informational access. It is moderately important for these people
to share information remotely. It is important for them to view information on a large size high
resolution display. Multimedia features are not important. Permanent web access is moderately
important. Continuous email access is moderately important. It is important for them to have a
light communication device. They are willing to pay $405. They are also few college graduate.
They work often from a remote location. Also, they have access to laptop.
10. Based on the above analysis, give a label to each cluster.
Answer: Label for each cluster:

Cluster 1: Low Income High Price
Cluster 2: High Income Low Price
Cluster 3: High Remote Access Low Price
Cluster 4: High Remote Access High Price
11. Which cluster would you target first and which would you target next? Explain
why?
23
Answer: Based on the above clusters, cluster 3 would be the cluster our firm could target. As these people
work often from remote location and would require a PDA. They use cellphone quiet often. Multimedia
features and web access are important to them. They are also willing to pay a decent amount of $269. It
is also important to them to have a light communication device.
After Cluster 3 we may target cluster 2 as price for them is relatively low and price is economical and also,
they often work from remote location. Cluster 2 seems good to target after cluster 3 in comparison to
others.
12. Repeat cluster analysis using K means clustering (use average distance and keep 4
segments). Repeat 4, 5, and 6 with this method.
Answer: We performed a nonhierarchical cluster analysis on behavioral data and set the
clusters to 4. We also standardized the data before performing the clustering. After clustering, I
merged both the data.
Clusters:
24
Custer1 is with also low price and early 40s age group with income i.e. high purchasing power of
the cluster. Label can be low price and high purchasing power and also preferred to work
remotely.
25
Custer2 is with also low price and mid 30s age group. This cluster is ideal to target as income is
also good for this cluster. Label Can be low price and middle income group.
26
Custer3 is with also high price and early 40s age group. This group seems to have good
purchasing power with average income group. Label can be High Price and average Income
group
27
Custer4 is with high price and late 30s age group. This cluster has really high purchasing power
and that implies it will also be good segment to target. Label can be High Income and high price
Based on the above clusters, cluster 2 would be the cluster that our firm could target. As these
people work often from remote location (5.3) and would require a PDA. Income group is also
average income group but they have purchasing power to go for the product. They use cellphone
quiet often. Multimedia features and web access are important to them. They are also having
income around 58.3 with price approx. 268.22. It is also important to them to have a light
communication device.
28

Prediction For Subscription in Bank and Best Segment For Launch of New Product

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Prediction For Subscription in Bank and Best Segment For Launch of New Product

Încărcat de

Drepturi de autor:

Formate disponibile

The following data is from a direct marketing campaign of a bank.

The goal of the

Below is a description of the explanatory variables available in the data.

Bank client data

1. Use Logistic regression to build a model.

The SAS System

The LOGISTIC Procedure

Number of Observations Read 4521

Probability modeled is y='yes'.

Class Level Information

Model Fit Statistics

Testing Global Null Hypothesis: BETA=0

Type 3 Analysis of Effects

Analysis of Maximum Likelihood Estimates

Odds Ratio Estimates

Association of Predicted Probabilities and Observed

2. What percent of customers in data subscribed to a term deposit?

3. Interpret the meaning of each (significant) coefficient clearly.

marital married -.2113

education tertiary .3258

month aug -.3808

month jan -1.1959

month jul -.8241

month jun .4815

month mar 1.4259

month may -.5627

month oct 1.2884

poutcome failure -.7036

poutcome success 1.7413

contact cellular .4954

contact telephone .4252

4. Interpret the fit of the model based on -2logL, AIC, SC.

5. What is the percent concordant? What does it mean?

By definition we know that Percent Concordant is a a pair of observations with different

The SAS System

The FREQ Procedure

The FREQ Procedure

yes 957 1775 2732

Total 39922 5289 45211

I_y Predicted Score, yExpected Score

InnovPDA is planning of launching a new product in the market.

Survey was targeted at respondents across a broad range of occupations.

Demographic Variables: ( 0 = No, 1 = Yes) Data in PDA_2.csv

Y10 Do you own a PDA?

7. Perform hierarchical cluster analysis on the InnovPDA data (use PDA_1.csv).

10. Based on the above analysis, give a label to each cluster.

Answer: Label for each cluster:

S-ar putea să vă placă și