Documente Academic
Documente Profesional
Documente Cultură
1
Model Information
Data Set WORK.BANK
Response Variable y
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value y Frequency
1 yes 521
2 no 4000
2
Class Level Information
Class Value Design Variables
married 0 1
single -1 -1
education primary 1 0 0
secondary 0 1 0
tertiary 0 0 1
unknown -1 -1 -1
default no 1
yes -1
housing no 1
yes -1
loan no 1
yes -1
month apr 1 0 0 0 0 0 0 0 0 0 0
aug 0 1 0 0 0 0 0 0 0 0 0
dec 0 0 1 0 0 0 0 0 0 0 0
feb 0 0 0 1 0 0 0 0 0 0 0
jan 0 0 0 0 1 0 0 0 0 0 0
jul 0 0 0 0 0 1 0 0 0 0 0
jun 0 0 0 0 0 0 1 0 0 0 0
mar 0 0 0 0 0 0 0 1 0 0 0
may 0 0 0 0 0 0 0 0 1 0 0
nov 0 0 0 0 0 0 0 0 0 1 0
oct 0 0 0 0 0 0 0 0 0 0 1
sep -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
poutcome failure 1 0 0
other 0 1 0
success 0 0 1
unknown -1 -1 -1
contact cellular 1 0
telephone 0 1
unknown -1 -1
3
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
4
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
campaign 1 6.2314 0.0126
pdays 1 0.0097 0.9217
previous 1 0.0208 0.8853
5
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)
month feb 1 0.1295 0.2216 0.3417 0.5588 1.138
month jan 1 -1.1959 0.3240 13.6227 0.0002 0.302
month jul 1 -0.8241 0.1826 20.3663 <.0001 0.439
month jun 1 0.4815 0.2241 4.6172 0.0317 1.619
month mar 1 1.4259 0.3214 19.6793 <.0001 4.161
month may 1 -0.5627 0.1671 11.3422 0.0008 0.570
month nov 1 -0.9156 0.2088 19.2357 <.0001 0.400
month oct 1 1.2884 0.2638 23.8489 <.0001 3.627
poutcome failure 1 -0.7036 0.1656 18.0635 <.0001 0.495
poutcome other 1 -0.2124 0.1922 1.2221 0.2689 0.809
poutcome success 1 1.7413 0.1797 93.9424 <.0001 5.705
balance 1 -3.91E-6 0.000017 0.0500 0.8230 1.000
contact cellular 1 0.4954 0.1129 19.2564 <.0001 1.641
contact telephone 1 0.4252 0.1670 6.4841 0.0109 1.530
day 1 0.0164 0.00816 4.0427 0.0444 1.017
duration 1 0.00423 0.000202 437.3055 <.0001 1.004
campaign 1 -0.0704 0.0282 6.2314 0.0126 0.932
pdays 1 -0.00010 0.000996 0.0097 0.9217 1.000
previous 1 -0.00551 0.0382 0.0208 0.8853 0.995
6
Odds Ratio Estimates
95% Wald
Effect Point Estimate Confidence Limits
job services vs unknown 0.514 0.159 1.660
job student vs unknown 0.867 0.244 3.079
job technician vs unknown 0.490 0.159 1.510
job unemployed vs unknown 0.313 0.084 1.168
marital divorced vs single 1.357 0.910 2.023
marital married vs single 0.848 0.635 1.133
education primary vs unknown 1.524 0.756 3.068
education secondary vs unknown 1.651 0.874 3.118
education tertiary vs unknown 2.100 1.087 4.057
default no vs yes 0.580 0.249 1.351
housing no vs yes 1.297 0.989 1.700
loan no vs yes 1.877 1.268 2.778
month apr vs sep 0.518 0.231 1.161
month aug vs sep 0.381 0.174 0.833
month dec vs sep 0.581 0.141 2.391
month feb vs sep 0.634 0.277 1.455
month jan vs sep 0.169 0.063 0.449
month jul vs sep 0.244 0.109 0.547
month jun vs sep 0.902 0.392 2.076
month mar vs sep 2.319 0.878 6.130
month may vs sep 0.318 0.146 0.693
month nov vs sep 0.223 0.097 0.511
month oct vs sep 2.021 0.827 4.940
poutcome failure vs unknown 1.129 0.603 2.114
poutcome other vs unknown 1.846 0.930 3.661
poutcome success vs unknown 13.020 7.025 24.132
balance 1.000 1.000 1.000
contact cellular vs unknown 4.121 2.637 6.439
contact telephone vs unknown 3.842 2.085 7.079
day 1.017 1.000 1.033
duration 1.004 1.004 1.005
7
Odds Ratio Estimates
95% Wald
Effect Point Estimate Confidence Limits
campaign 0.932 0.882 0.985
pdays 1.000 0.998 1.002
previous 0.995 0.923 1.072
Answer:
Response Profile
Ordered y Total
Value Frequency
1 yes 521
2 no 4000
11.52% of Customers in data subscribed to a term deposit
Answer: Significant Coefficient are those which have p value les that .05.( The column Pr>Chi-
Square gives you the probability of realizing the estimate in the Parameter estimate column if the
estimate were truly zero if this value is < 0.05 the estimate is considered to be significant.)
8
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -2.6748 0.4910 29.6779 <.0001 0.069
age 1 -0.00423 0.00713 0.3528 0.5525 0.996
job admin. 1 0.0580 0.1895 0.0938 0.7594 1.060
job blue-collar 1 -0.3344 0.1747 3.6627 0.0556 0.716
job entrepreneur 1 -0.1917 0.3147 0.3712 0.5424 0.826
job housemaid 1 -0.2949 0.3469 0.7230 0.3952 0.745
job management 1 -0.0150 0.1583 0.0089 0.9247 0.985
job retired 1 0.6896 0.2434 8.0236 0.0046 1.993
job self-employed 1 -0.1231 0.2852 0.1863 0.6660 0.884
job services 1 -0.0876 0.2165 0.1638 0.6857 0.916
job student 1 0.4365 0.3189 1.8734 0.1711 1.547
job technician 1 -0.1346 0.1597 0.7100 0.3995 0.874
job unemployed 1 -0.5815 0.3554 2.6773 0.1018 0.559
marital divorced 1 0.2582 0.1164 4.9195 0.0266 1.295
marital married 1 -0.2113 0.0836 6.3911 0.0115 0.810
education primary 1 0.00504 0.1628 0.0010 0.9753 1.005
education secondary 1 0.0851 0.1183 0.5178 0.4718 1.089
education tertiary 1 0.3258 0.1391 5.4859 0.0192 1.385
default no 1 -0.2723 0.2157 1.5935 0.2068 0.762
housing no 1 0.1300 0.0690 3.5463 0.0597 1.139
loan no 1 0.3148 0.1000 9.9137 0.0016 1.370
month apr 1 -0.0726 0.1908 0.1449 0.7034 0.930
month aug 1 -0.3808 0.1690 5.0757 0.0243 0.683
month dec 1 0.0418 0.5804 0.0052 0.9426 1.043
month feb 1 0.1295 0.2216 0.3417 0.5588 1.138
month jan 1 -1.1959 0.3240 13.6227 0.0002 0.302
month jul 1 -0.8241 0.1826 20.3663 <.0001 0.439
month jun 1 0.4815 0.2241 4.6172 0.0317 1.619
month mar 1 1.4259 0.3214 19.6793 <.0001 4.161
month may 1 -0.5627 0.1671 11.3422 0.0008 0.570
month nov 1 -0.9156 0.2088 19.2357 <.0001 0.400
9
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)
month oct 1 1.2884 0.2638 23.8489 <.0001 3.627
poutcome failure 1 -0.7036 0.1656 18.0635 <.0001 0.495
poutcome other 1 -0.2124 0.1922 1.2221 0.2689 0.809
poutcome success 1 1.7413 0.1797 93.9424 <.0001 5.705
balance 1 -3.91E-6 0.000017 0.0500 0.8230 1.000
contact cellular 1 0.4954 0.1129 19.2564 <.0001 1.641
contact telephone 1 0.4252 0.1670 6.4841 0.0109 1.530
day 1 0.0164 0.00816 4.0427 0.0444 1.017
duration 1 0.00423 0.000202 437.3055 <.0001 1.004
campaign 1 -0.0704 0.0282 6.2314 0.0126 0.932
pdays 1 -0.00010 0.000996 0.0097 0.9217 1.000
previous 1 -0.00551 0.0382 0.0208 0.8853 0.995
Significant variables:
Paramter Estimate
job retired .6896
The logistic regression Coefficient comparing if the person job is retired compared to the person job is
unknown the difference in log odds is expected to be 0.6896 units higher
marital divorced .2582
From maximum likelihood table, we infer that the estimated logistic regression coefficient comparing
marital status divorced to the Marital status single given the other variables constant in the model. The
difference in log odds is expected to be 0.2582 units higher for divorced compared to single
loan no .3148
10
From the estimate value, we can say that the estimated logistic regression coefficient comparing if the
person does not have a personal loan compared to the person has a loan the difference in log odds is
expected to be 0.3148 units higher
11
By seeing the values, the estimated logistic regression coefficient comparing if the outcome of the
previous marketing campaign is success compared to unknown the difference in log odds is expected to
be 1.7413 units higher
duration .00423
We can say that as the last contact duration increases by one unit keeping all the others variables
constant the difference in log odds is expected to be 0.00423 units higher
campaign -.0704
Campaign are the number of contacts performed during this campaign and for a particular client
increases by one unit keeping all the variables constant the difference in log odds is expected to
be 0.0704 units lower.
Answer:
Model Fit Statistics
Intercept and
Criterion Intercept Only Covariates
AIC 3233.000 2259.651
SC 3239.417 2535.560
-2 Log L 3231.000 2173.651
AIC, SC, -2LogL value are similar to adjuster R-square that we have in linear regression and the
lesser value it is the better model it will be. We also know that the larger difference between the
values of AIC /SC intercept only and Intercept and Covariates is better the model fit.
12
Also, the ratio of difference in -2Log L intercept only and -2Log L intercept and covariates upon
-2Log L intercept gives the mc fadden R square which is 32.725% which is considered to be a
good model Mc fadden R square indicates that 32.725 % of variation in dependent variable is
explained by our explanatory variables.
Answer:
Association of Predicted Probabilities and Observed
Responses
Percent Concordant 90.1 Somers' D 0.805
Percent Discordant 9.6 Gamma 0.807
Percent Tied 0.2 Tau-a 0.164
Pairs 2084000 c 0.903
6. Use the parameters to predict whether a customer will subscribe to a term deposit
for the new data bank-full.csv (hint: use proc score).
Answer: From the predicted data after applying Score in SAS, we can say that 6.04% of the
customer will subscribe to a term deposit for the new bank-full data whereas 93.96 will not.
13
The SAS System
Segmentation Variables:
X1 Are you among the first to adopt new technologies? (1- Strongly disagree, 7-
strongly agree)
X2 Do you use a pager or an instant messaging service often? (1- never, 7- very
often)
X3 Do you use a cell phone often? (1- never, 7-very often)
X4 Do you use contact management tools often? (1- never, 7-very often)
X5 How often do colleagues send you time sensitive information when you are
away from office? (1- never, 7-very often)
X6 How often do you send time sensitive information when away from office?
(1- never, 7-very often)
X7 How often do you require remote informational access? (1- never, 7-very
often)
X8 How important is it for you to share information remotely with colleagues?
(1- not at all important, 7-very important)
X9 How important is it to view information on a large size high-resolution
display? (1- not at all important, 7-very important)
X10 How important is it to continuous email access to you? (1- not at all
important, 7-very important)
X11 How important is it for you to have permanent web access? (1- not at all
important, 7-very important)
X12 How important is it to for you to use multimedia features? (1- not at all
important, 7-very important)
X13 How important is it to for you to have light communication device? (1- not at
all important, 7-very important)
X14 How much extra per month are you willing to pay to get features like instant
communication, cellular phone etc. in a PDA? (in dollars)
X15 How much are you willing to pay for a PDA with all the above mentioned
features? (in dollars)
Y1 Age
Y2 Education (1 high school, 2 some college, 3 college graduate, 4
graduate degree)
Y3 Income
15
Which industry do you belong to?
Y4 Construction
Y5 Emergency
Y6 Sales
Y7 Maintenance and Service
Y8 Professional
Y9 Computer
Questions:
Answer:
16
17
18
8. How many segments will you keep? How will you decide that?
Answer:
To find the number of clusters, we used Pseudo t-squared test, so we start at the top of the
printed output and look for the first relatively large value, ( i.e. 39.5) and move back up one
cluster. Thus, we select 3 clusters.
1) Keep 4 clusters.
Answer:
19
9. Describe the different clusters based on their mean values of responses to X1-X15
and Y1-Y17.
Answer:
Cluster 1: They are moderately open to innovation. These people are in their early 40s. Their
income is around 52. They use cell phone quiet often. They use contact management tools very
often. They require remote informational access very often. It is moderately important for these
people to share information remotely. It is moderately important for them to view information
on a large size high resolution display. Multimedia features are important. Permanent web access
is important. It is not very important for them to have a light communication device. They are
some college graduate. They are willing to pay $490.6. They have access to laptop and use cell
phone. They sometimes work from a remote location.
20
Cluster 2:
They too are moderately open to innovation. These people are in their early 40s. Their income is
around 73. They use cell phone quiet often. They do not use contact management tools very
often. They require remote informational access often but not much as compare to other clusters.
It is moderately important for these people to share information remotely. It is moderately
important for them to view information on a large size high resolution display. Multimedia
features are moderately important. Permanent web access is quite important. Continuous email
access is important. It is important for them to have a light communication device. Email is very
much important to them. They are willing to pay $335.8. They are some college graduate. They
also work very often from a remote location.
21
Cluster 3:
They are moderately open to innovation, we can say that from the factor of innovator. These
people are in their mid-30s. Their income is around 58.9. They use cell phone quiet often. They
use contact management tools very often. It is moderately important for these people to share
information remotely. It is important for them to view information on a large size high resolution
display. Multimedia features are important. Permanent web access is very important. Continuous
email access is very important. It is important for them to have a light communication device.
They are willing to pay $269.7. They are some college graduate. They work often from a remote
location. They have access to laptop.
22
Cluster 4:
With less value of Innovator we can say that they are not open to innovation. These people are in
their early 40s. Their income is around 51. They use cell phone quiet often. They do not use
contact management tools very often. They do not receive time sensitive information quiet often.
They sometimes require remote informational access. It is moderately important for these people
to share information remotely. It is important for them to view information on a large size high
resolution display. Multimedia features are not important. Permanent web access is moderately
important. Continuous email access is moderately important. It is important for them to have a
light communication device. They are willing to pay $405. They are also few college graduate.
They work often from a remote location. Also, they have access to laptop.
11. Which cluster would you target first and which would you target next? Explain
why?
23
Answer: Based on the above clusters, cluster 3 would be the cluster our firm could target. As these people
work often from remote location and would require a PDA. They use cellphone quiet often. Multimedia
features and web access are important to them. They are also willing to pay a decent amount of $269. It
is also important to them to have a light communication device.
After Cluster 3 we may target cluster 2 as price for them is relatively low and price is economical and also,
they often work from remote location. Cluster 2 seems good to target after cluster 3 in comparison to
others.
12. Repeat cluster analysis using K means clustering (use average distance and keep 4
segments). Repeat 4, 5, and 6 with this method.
Answer: We performed a nonhierarchical cluster analysis on behavioral data and set the
clusters to 4. We also standardized the data before performing the clustering. After clustering, I
merged both the data.
Clusters:
24
Custer1 is with also low price and early 40s age group with income i.e. high purchasing power of
the cluster. Label can be low price and high purchasing power and also preferred to work
remotely.
25
Custer2 is with also low price and mid 30s age group. This cluster is ideal to target as income is
also good for this cluster. Label Can be low price and middle income group.
26
Custer3 is with also high price and early 40s age group. This group seems to have good
purchasing power with average income group. Label can be High Price and average Income
group
27
Custer4 is with high price and late 30s age group. This cluster has really high purchasing power
and that implies it will also be good segment to target. Label can be High Income and high price
Based on the above clusters, cluster 2 would be the cluster that our firm could target. As these
people work often from remote location (5.3) and would require a PDA. Income group is also
average income group but they have purchasing power to go for the product. They use cellphone
quiet often. Multimedia features and web access are important to them. They are also having
income around 58.3 with price approx. 268.22. It is also important to them to have a light
communication device.
28