Mid TermPaper - Solution PDF

AY2015-16 Term 2 Examinations
October 2015
ANLY104 Analytics Foundations
Name of Instructor:
Seema CHOKSHI
INSTRUCTIONS TO CANDIDATES
1 The time allowed for this examination paper is 2 hours.

2 This examination paper contains a total of two (2) sections, 11 questions and contains
Fourteen (14) pages including this instruction sheet and one page for rough work.
3 This is a CLOSED BOOK examination.
4 No external dataset is required to answer any questions.
5 You are required to answer ALL questions.
Name of Student: __________
Student Identification No: _______
Marks Awarded
Section I
Section II
TOTAL
1
Section-I (60 marks) [Estimated time: 80 Minutes]
Question 1) 6 Marks
What are the 3 classes of Analytics based on Method and Purpose of the task?
1. Descriptive
2. Predictive
3. Prescriptive
Question 2) 4 Marks
The list given below shows some of the analytics tasks that an organization performs on a daily
basis. Your task is to categorize them into one of the 3 types of Analytics classes based on
classification of Analytics on Method and Purpose as identified by the question above.
a) Operations team creates a daily dashboard of the numbers of calls handled by the
representatives.
b) The marketing manager clusters the customers into groups based on the demographic
profile of the customers in order to better understand the customer portfolio.
c) The sales team creates a test plan to send out multiple offers to the customers in order to
test different price points and find out which one works best for which customer.
d) The Credit risk team from the loan division develops a model to predict the future risk of
non- payment by the customers.
2
[Provide your answers on the next page]
a) descriptive
b) descriptive
c) prescriptive
d) predictive
Question 3)
Answer a few questions related to the dataset on details of the voters in the US presidential
elections. You can see a few records from the dataset to understand the variables included in the
dataset and answer the questions accordingly.
3
Question 3A) 6 Marks
Identify the data types of the variables below as one of the 4:
Numerical Continuous, Numerical Discrete, Ordinal, Nominal
1. Age
2. Gender
3. State
4. Children
5. Salary
6. Opinion
1. Numerical discreet
2. Nominal
3. Nominal
4. Numerical discreet
5. Numerical Continuous
6. Ordinal
4
Question 3B) 4 Marks
What is meant by Binning or discretising a numerical variable? Give an example of binning
choosing the appropriate variable from the dataset given above.
Categorizing a numerical variable by putting the data into discrete categories (called bins)
is called binning or discretizing.
[Refer to lecture notes for details…]
Question 3C) 5 Marks

What appropriate analysis technique (graphical or non- graphical) will you use to analyse the
distribution of
1. Income
2. Gender
1. Graphical like histogram
2. Non-graphical such as summary table

OR
Graphical such as bar graph
5
Question 4) 10 Marks
Mr. & Mrs. Chang lead a middle class family life with 5 kids. As the income for the family is limited
they plan ahead of time every year. Mr. Chang has asked his smartest son Sam to suggest him
what amount (on an average) should be kept aside every month as pocket money for each of the
kids.
Sam is doing a 2nd Major in analytics and so he collects data on the last month’s expenses by all
the kids in the house which includes expenses by Kim towards the yearly field trip. Zina is
reluctant to disclose her expenditure to Sam (it’s a usual case of sibling rivalry).
You can find the table below and from the options given, pick up the value that should be
suggested to Mr. Chang. Explain how you got it.
Name Expense (in $) Gender

Ron 100 M
Eddy 180 M
Kim 1000 F
Sam 150 M
Zina F
1. $ 286
2. $144
3. $358
4. $107.5
1. Remove the outlier and the missing values from your data.
Average estimate per kid = (100+180+150)/3

= $144
6
Below are some details about retail outlets in different locations in Singapore. Using K-Means
algorithm for K =2 split the stores into 2 groups. You only need to do 2 iterations and assign
stores to the nearest cluster. All the variables are equally important. Take Orchard and Boon
Keng as initial seeds for cluster centers C1 and C2. Report the centroid values after 2nd
iteration.
Outlet Sales Size ProfitMargin

Brashbasah 5.1 3.5 0.2
Jurong 6.3 2.5 1.5
BoonKeng 5.4 3.9 0.4
DhobyGhaut 5.3 3.7 0.2
Orchard 6.7 3 1.7
Cluster -1: Jurong and Orchard
Cluster – 2: Brash basah, Boon Keng and Dhoby ghaut
Final cluster centers:

C11 5.27 3.7 0.27
C21 6.5 2.75 1.6
7
Question 6)
97 countries were clustered into 3 groups on the basis of 3 variables namely birth rate, death
rate and gross national product (GNP). Study the results shared below and answer the
questions that follow.
Question 6A) 5Marks

Which measure helps you decide on the homogeneity of the clusters? Which is the least
homogeneous cluster out of the 3 clusters?
1. RMS standard deviation
2. cluster-2
8
Question 6B) 5Marks
The clusters created are differentiated based on the variables used in the cluster analysis. Identify
the variable for which the largest proportion of its variance can be explained by the cluster
formations. Which measure indicates this value?
1. stnd_gnp
2. R-Sqaure or RSQ/(1-RSQ)
[Refer to lecture notes for details…]
Question 6C) 10Marks
Further use the information below and comment on the profiles of 3 clusters. You can either use
z-scores or general comparison of population and cluster means to come up with possible profiles
for these 3 clusters.
9
cluster- 1 developing countries Profile:
cluster mean pop mean pop std z-score BR nearly equal to population mean
BR 26.538 29.229 13.546 -0.19866 DR lesser than population mean
DR 8.508 10.836 4.647 -0.50097 Gross National Product lesser compared to population mean
GNP 2514.71 5741.25 8093.68 -0.39865
cluster- 2 under develped countries

cluster mean pop mean pop std z-score Profile:
BR 44.596 29.229 13.546 1.134431 BR very high compared to population mean
DR 16.667 10.836 4.647 1.254788 DR higher than population mean
GNP 601.46 5741.25 8093.68 -0.63504 Gross National Product very low compared to population mean
cluster- 3 Developed Nations

cluster mean pop mean pop std z-score Profile:
BR 14.31 29.229 13.546 -1.10136 BR less than population mean
DR 8.375 10.836 4.647 -0.52959 DR less than population mean
GNP 19682.7 5741.25 8093.68 1.722511 Gross National Product very high compared to population mean
- End of Section I -
10
Section-II (30 marks) [Estimated time: 40 Minutes]
Question 7) 5 Marks
What are the main reasons of performing Exploratory Data Analysis?
1. detection of mistakes
2. checking of assumptions
3. preliminary selection of appropriate models
4. determining relationships among the explanatory variables, and
5. Assessing the direction and rough size of relationships between explanatory and outcome
variables.
Question 8) 5 Marks
What are the 2 main objectives of cluster analysis?
The goal is that points in the same cluster have a small distance from one another, while points in
different clusters are at a large distance from one another.
11
Question 9) 5 Marks
How do you decide which variables in the data are suitable for clustering?
1. Variables used in clustering should be relevant to the reason for clustering for the business case at
hand.
Example: In order to cluster retail stores based on type of sales cluster based on sales per product
category
2. Variable chosen should not be missing for many observations
What are the steps in data preparation for k means clustering? For each step mentioned,
supplement your answer with the reason for performing it.
1. Decide the clustering variables

2. Variable Transformations – check for outliers and for very skewed data use transformations
like log, square root for reducing the skewness of the variables
3. Standardize variables selected if the range of the variables are different across the clustering
variables
4. Decide the weight and multiply the standardized variable with the weight. The final Clusters will
be more differentiated on the higher weight variable
12
What are the 2 methods of profiling clusters? Give a brief explanation of each method.
1. Z-Score method
2. Centroid method
[Refer to lecture note for details…]
13
- End of Section II -
- Space for Rough Work -
14
- End of Paper -
15

Mid TermPaper - Solution PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Mid TermPaper - Solution PDF

Încărcat de

Drepturi de autor:

Formate disponibile

AY2015-16 Term 2 Examinations

ANLY104 Analytics Foundations

1 The time allowed for this examination paper is 2 hours.

Name of Student: __________

Student Identification No: _______

[Refer to lecture notes for details…]

Question 3C) 5 Marks

1. Graphical like histogram

2. Non-graphical such as summary table

Name Expense (in $) Gender

Average estimate per kid = (100+180+150)/3

Outlet Sales Size ProfitMargin

Cluster -1: Jurong and Orchard

Cluster – 2: Brash basah, Boon Keng and Dhoby ghaut

Final cluster centers:

Question 6A) 5Marks

1. RMS standard deviation

[Refer to lecture notes for details…]

Question 6C) 10Marks

cluster- 2 under develped countries

cluster- 3 Developed Nations

What are the main reasons of performing Exploratory Data Analysis?

What are the 2 main objectives of cluster analysis?

Question 10) 5 Marks

1. Decide the clustering variables

[Refer to lecture note for details…]

- Space for Rough Work -

S-ar putea să vă placă și