Documente Academic
Documente Profesional
Documente Cultură
October 2015
Name of Instructor:
Seema CHOKSHI
INSTRUCTIONS TO CANDIDATES
Marks Awarded
Section I
Section II
TOTAL
1
Section-I (60 marks) [Estimated time: 80 Minutes]
Question 1) 6 Marks
What are the 3 classes of Analytics based on Method and Purpose of the task?
1. Descriptive
2. Predictive
3. Prescriptive
Question 2) 4 Marks
The list given below shows some of the analytics tasks that an organization performs on a daily
basis. Your task is to categorize them into one of the 3 types of Analytics classes based on
classification of Analytics on Method and Purpose as identified by the question above.
a) Operations team creates a daily dashboard of the numbers of calls handled by the
representatives.
b) The marketing manager clusters the customers into groups based on the demographic
profile of the customers in order to better understand the customer portfolio.
c) The sales team creates a test plan to send out multiple offers to the customers in order to
test different price points and find out which one works best for which customer.
d) The Credit risk team from the loan division develops a model to predict the future risk of
non- payment by the customers.
2
[Provide your answers on the next page]
a) descriptive
b) descriptive
c) prescriptive
d) predictive
Question 3)
Answer a few questions related to the dataset on details of the voters in the US presidential
elections. You can see a few records from the dataset to understand the variables included in the
dataset and answer the questions accordingly.
3
Question 3A) 6 Marks
Identify the data types of the variables below as one of the 4:
Numerical Continuous, Numerical Discrete, Ordinal, Nominal
1. Age
2. Gender
3. State
4. Children
5. Salary
6. Opinion
1. Numerical discreet
2. Nominal
3. Nominal
4. Numerical discreet
5. Numerical Continuous
6. Ordinal
4
Question 3B) 4 Marks
What is meant by Binning or discretising a numerical variable? Give an example of binning
choosing the appropriate variable from the dataset given above.
Categorizing a numerical variable by putting the data into discrete categories (called bins)
is called binning or discretizing.
5
Question 4) 10 Marks
Mr. & Mrs. Chang lead a middle class family life with 5 kids. As the income for the family is limited
they plan ahead of time every year. Mr. Chang has asked his smartest son Sam to suggest him
what amount (on an average) should be kept aside every month as pocket money for each of the
kids.
Sam is doing a 2nd Major in analytics and so he collects data on the last month’s expenses by all
the kids in the house which includes expenses by Kim towards the yearly field trip. Zina is
reluctant to disclose her expenditure to Sam (it’s a usual case of sibling rivalry).
You can find the table below and from the options given, pick up the value that should be
suggested to Mr. Chang. Explain how you got it.
1. $ 286
2. $144
3. $358
4. $107.5
1. Remove the outlier and the missing values from your data.
6
Question 5) 15 Marks
Below are some details about retail outlets in different locations in Singapore. Using K-Means
algorithm for K =2 split the stores into 2 groups. You only need to do 2 iterations and assign
stores to the nearest cluster. All the variables are equally important. Take Orchard and Boon
Keng as initial seeds for cluster centers C1 and C2. Report the centroid values after 2nd
iteration.
7
Question 6)
97 countries were clustered into 3 groups on the basis of 3 variables namely birth rate, death
rate and gross national product (GNP). Study the results shared below and answer the
questions that follow.
2. cluster-2
8
Question 6B) 5Marks
The clusters created are differentiated based on the variables used in the cluster analysis. Identify
the variable for which the largest proportion of its variance can be explained by the cluster
formations. Which measure indicates this value?
1. stnd_gnp
2. R-Sqaure or RSQ/(1-RSQ)
Further use the information below and comment on the profiles of 3 clusters. You can either use
z-scores or general comparison of population and cluster means to come up with possible profiles
for these 3 clusters.
9
cluster- 1 developing countries Profile:
cluster mean pop mean pop std z-score BR nearly equal to population mean
BR 26.538 29.229 13.546 -0.19866 DR lesser than population mean
DR 8.508 10.836 4.647 -0.50097 Gross National Product lesser compared to population mean
GNP 2514.71 5741.25 8093.68 -0.39865
- End of Section I -
10
Section-II (30 marks) [Estimated time: 40 Minutes]
Question 7) 5 Marks
1. detection of mistakes
2. checking of assumptions
3. preliminary selection of appropriate models
4. determining relationships among the explanatory variables, and
5. Assessing the direction and rough size of relationships between explanatory and outcome
variables.
Question 8) 5 Marks
The goal is that points in the same cluster have a small distance from one another, while points in
different clusters are at a large distance from one another.
11
Question 9) 5 Marks
How do you decide which variables in the data are suitable for clustering?
1. Variables used in clustering should be relevant to the reason for clustering for the business case at
hand.
Example: In order to cluster retail stores based on type of sales cluster based on sales per product
category
2. Variable chosen should not be missing for many observations
What are the steps in data preparation for k means clustering? For each step mentioned,
supplement your answer with the reason for performing it.
12
Question 11) 10 Marks
What are the 2 methods of profiling clusters? Give a brief explanation of each method.
1. Z-Score method
2. Centroid method
13
- End of Section II -
14
- End of Paper -
15