Sunteți pe pagina 1din 15

AY2015-16 Term 2 Examinations

October 2015

ANLY104 Analytics Foundations

Name of Instructor:
Seema CHOKSHI

INSTRUCTIONS TO CANDIDATES

1 The time allowed for this examination paper is 2 hours.


2 This examination paper contains a total of two (2) sections, 11 questions and contains
Fourteen (14) pages including this instruction sheet and one page for rough work.
3 This is a CLOSED BOOK examination.
4 No external dataset is required to answer any questions.
5 You are required to answer ALL questions.

Name of Student: __________

Student Identification No: _______

Marks Awarded
Section I
Section II
TOTAL

1
Section-I (60 marks) [Estimated time: 80 Minutes]

Question 1) 6 Marks
What are the 3 classes of Analytics based on Method and Purpose of the task?

1. Descriptive
2. Predictive
3. Prescriptive

Question 2) 4 Marks
The list given below shows some of the analytics tasks that an organization performs on a daily
basis. Your task is to categorize them into one of the 3 types of Analytics classes based on
classification of Analytics on Method and Purpose as identified by the question above.

a) Operations team creates a daily dashboard of the numbers of calls handled by the
representatives.

b) The marketing manager clusters the customers into groups based on the demographic
profile of the customers in order to better understand the customer portfolio.

c) The sales team creates a test plan to send out multiple offers to the customers in order to
test different price points and find out which one works best for which customer.

d) The Credit risk team from the loan division develops a model to predict the future risk of
non- payment by the customers.

2
[Provide your answers on the next page]

a) descriptive
b) descriptive
c) prescriptive
d) predictive

Question 3)
Answer a few questions related to the dataset on details of the voters in the US presidential
elections. You can see a few records from the dataset to understand the variables included in the
dataset and answer the questions accordingly.

3
Question 3A) 6 Marks
Identify the data types of the variables below as one of the 4:
Numerical Continuous, Numerical Discrete, Ordinal, Nominal

1. Age
2. Gender
3. State
4. Children
5. Salary
6. Opinion

1. Numerical discreet
2. Nominal
3. Nominal
4. Numerical discreet
5. Numerical Continuous
6. Ordinal

4
Question 3B) 4 Marks
What is meant by Binning or discretising a numerical variable? Give an example of binning
choosing the appropriate variable from the dataset given above.

Categorizing a numerical variable by putting the data into discrete categories (called bins)
is called binning or discretizing.

[Refer to lecture notes for details…]

Question 3C) 5 Marks


What appropriate analysis technique (graphical or non- graphical) will you use to analyse the
distribution of
1. Income
2. Gender

1. Graphical like histogram

2. Non-graphical such as summary table


OR
Graphical such as bar graph

5
Question 4) 10 Marks

Mr. & Mrs. Chang lead a middle class family life with 5 kids. As the income for the family is limited
they plan ahead of time every year. Mr. Chang has asked his smartest son Sam to suggest him
what amount (on an average) should be kept aside every month as pocket money for each of the
kids.

Sam is doing a 2nd Major in analytics and so he collects data on the last month’s expenses by all
the kids in the house which includes expenses by Kim towards the yearly field trip. Zina is
reluctant to disclose her expenditure to Sam (it’s a usual case of sibling rivalry).

You can find the table below and from the options given, pick up the value that should be
suggested to Mr. Chang. Explain how you got it.

Name Expense (in $) Gender


Ron 100 M
Eddy 180 M
Kim 1000 F
Sam 150 M
Zina F

1. $ 286
2. $144
3. $358
4. $107.5

1. Remove the outlier and the missing values from your data.

Average estimate per kid = (100+180+150)/3


= $144

6
Question 5) 15 Marks

Below are some details about retail outlets in different locations in Singapore. Using K-Means
algorithm for K =2 split the stores into 2 groups. You only need to do 2 iterations and assign
stores to the nearest cluster. All the variables are equally important. Take Orchard and Boon
Keng as initial seeds for cluster centers C1 and C2. Report the centroid values after 2nd
iteration.

Outlet Sales Size ProfitMargin


Brashbasah 5.1 3.5 0.2
Jurong 6.3 2.5 1.5
BoonKeng 5.4 3.9 0.4
DhobyGhaut 5.3 3.7 0.2
Orchard 6.7 3 1.7

Cluster -1: Jurong and Orchard

Cluster – 2: Brash basah, Boon Keng and Dhoby ghaut

Final cluster centers:


C11 5.27 3.7 0.27
C21 6.5 2.75 1.6

7
Question 6)

97 countries were clustered into 3 groups on the basis of 3 variables namely birth rate, death
rate and gross national product (GNP). Study the results shared below and answer the
questions that follow.

Question 6A) 5Marks


Which measure helps you decide on the homogeneity of the clusters? Which is the least
homogeneous cluster out of the 3 clusters?

1. RMS standard deviation

2. cluster-2

8
Question 6B) 5Marks
The clusters created are differentiated based on the variables used in the cluster analysis. Identify
the variable for which the largest proportion of its variance can be explained by the cluster
formations. Which measure indicates this value?

1. stnd_gnp
2. R-Sqaure or RSQ/(1-RSQ)

[Refer to lecture notes for details…]

Question 6C) 10Marks

Further use the information below and comment on the profiles of 3 clusters. You can either use
z-scores or general comparison of population and cluster means to come up with possible profiles
for these 3 clusters.

9
cluster- 1 developing countries Profile:
cluster mean pop mean pop std z-score BR nearly equal to population mean
BR 26.538 29.229 13.546 -0.19866 DR lesser than population mean
DR 8.508 10.836 4.647 -0.50097 Gross National Product lesser compared to population mean
GNP 2514.71 5741.25 8093.68 -0.39865

cluster- 2 under develped countries


cluster mean pop mean pop std z-score Profile:
BR 44.596 29.229 13.546 1.134431 BR very high compared to population mean
DR 16.667 10.836 4.647 1.254788 DR higher than population mean
GNP 601.46 5741.25 8093.68 -0.63504 Gross National Product very low compared to population mean

cluster- 3 Developed Nations


cluster mean pop mean pop std z-score Profile:
BR 14.31 29.229 13.546 -1.10136 BR less than population mean
DR 8.375 10.836 4.647 -0.52959 DR less than population mean
GNP 19682.7 5741.25 8093.68 1.722511 Gross National Product very high compared to population mean

- End of Section I -

10
Section-II (30 marks) [Estimated time: 40 Minutes]

Question 7) 5 Marks

What are the main reasons of performing Exploratory Data Analysis?

1. detection of mistakes
2. checking of assumptions
3. preliminary selection of appropriate models
4. determining relationships among the explanatory variables, and
5. Assessing the direction and rough size of relationships between explanatory and outcome
variables.

Question 8) 5 Marks

What are the 2 main objectives of cluster analysis?

The goal is that points in the same cluster have a small distance from one another, while points in
different clusters are at a large distance from one another.

11
Question 9) 5 Marks

How do you decide which variables in the data are suitable for clustering?

1. Variables used in clustering should be relevant to the reason for clustering for the business case at
hand.
Example: In order to cluster retail stores based on type of sales cluster based on sales per product
category
2. Variable chosen should not be missing for many observations

Question 10) 5 Marks

What are the steps in data preparation for k means clustering? For each step mentioned,
supplement your answer with the reason for performing it.

1. Decide the clustering variables


2. Variable Transformations – check for outliers and for very skewed data use transformations
like log, square root for reducing the skewness of the variables
3. Standardize variables selected if the range of the variables are different across the clustering
variables
4. Decide the weight and multiply the standardized variable with the weight. The final Clusters will
be more differentiated on the higher weight variable

12
Question 11) 10 Marks

What are the 2 methods of profiling clusters? Give a brief explanation of each method.

1. Z-Score method
2. Centroid method

[Refer to lecture note for details…]

13
- End of Section II -

- Space for Rough Work -

14
- End of Paper -

15

S-ar putea să vă placă și