Documente Academic
Documente Profesional
Documente Cultură
This exam paper must not be removed from the venue Venue ____________________
School of Business
Exam Conditions:
Reading Time: Read only. Students are not permitted to write on the
examination paper
English dictionary
English thesaurus
Instructions To Students:
Total ________
Page 1 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
Given the Python list abc=[2, 3, 5, 7, 11, 13, 1], write the answer that will result from
evaluating each of the following expressions:
f) abc[1] 3
g) abc[-1] 1
h) abc[2:4] [5, 7]
i) len(abc) 7
j) sum(abc) 42
_______________________________________________________________________
For example, evaluating the expression data["Task2"].count() will tell us how many
students submitted Task 2.
Page 2 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
_______________________________________________________________________
a) df.columns
Returns a list of all the column names.
b) df.shape
Returns a pair of numbers, (r,c) where r is the number
of rows and c is the number of columns.
c) df.describe()
Displays statistics (such as count, min, max, mean,
25% and 75% quartiles) for all numeric columns in the
table.
d) df.head()
Gets the first few rows of the table (5 by default).
e) df.count()
Gets the counts of ALL columns (that is, the number of
non-NaN values).
_______________________________________________________________________
a) Explain the typical process you would follow to use machine learning to predict
the level of monthly sales for all the stores. Give a number and a title for each of
the steps that you would take, and briefly explain each step. [10 marks]
Typical Steps:
1. import sklearn functions. E.g. DecisionTreeClassifier(_)
2. load the dataset
3. split the dataset 80/20 into training and testing data
4. create a classifier object (choose your parameters, e.g. max_depth)
5. use the classifier to 'fit' the training data (= learn the model)
6. use the classifier to 'predict' outcomes for new data
7. measure the accuracy of the classifier (on the test data)
b) Sketch out some example Python code that you would use to implement the
above steps using a Decision Tree model. Use Python comments to identify each
step. [10 marks]
Sample Answer: (from Week 7 lecture slides).
# 1. import libraries.
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
Page 3 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
# 2. load a dataset
iris = load_iris()
# 4 create classifier
dt = DecisionTreeClassifier(max_depth=3)
# 6 predict outcomes
y_predicted = dt.predict(X_test)
# 7 measure accuracy
accuracy_score(y_test, y_predicted)
a) clustering algorithms
Unsupervised algorithm that groups instances into similar
groups. E.g. K-Means algorithm.
b) regression algorithms
Supervised algorithm that predicts numeric answers. E.g.
Linear Regression algorithm.
c) classification algorithms
Supervised algorithm that predicts a discrete 'class' for each
instance (such as Yes/No result). E.g. Decision Tree algorithm.
Page 4 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
_______________________________________________________________________
Supervised is when the expected outcome is known (for the training data)
Unsupervised is when we have input data only – no known answers.
_______________________________________________________________________
The following confusion matrix shows the results of applying a Decision Tree machine
learning algorithm to 1000 historical examples of customer churn from the previous
year. The columns show whether the customer did really leave ('Churn') or stay ('Not
Churn'). The rows show the prediction output from the learned Decision Tree model.
Page 5 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
Calculate the values of the following evaluation metrics for this model (since you do
not have a calculator, you can write them as a fraction) [2 marks each]:
ReadItNow is considering using this model as the basis a new marketing campaign to
better retain their existing customers. They will send special offers to all the people
that the model predicts Yes (that is, the customers that are in danger of 'churning'
away from ReadItNow). The cost of these discounts will average $10 per customer,
but it is expected that it will halve the churn rate, which will save on average half of
the annual subscription of each customer who is persuaded not to churn. The
following cost-benefit matrix summarises the annual costs and expected benefits of
this campaign for each group of customers.
Churn Not Churn
f) Calculate the expected income after this marketing campaign? Show your
working. [4 marks]
Answer [4 marks]:
Expected income = $80 * 400 + $110 * 200 + $60 * 100 + $120 * 300
= 32000 + 22000 + 6000 + 36000
= $96,000
The cost-benefit matrix for the next year without the marketing campaign is:
Answer [4 marks]:
Page 6 of 7
USC Melb/Syd, Semester 3, 2018 ICT706 Data Analytics
h) Would you recommend that ReadItNow goes ahead with the marketing
campaign? Explain your reason. [2 marks]
Recommendation [2 marks]:
YES. I would recommend that they go ahead with the marketing campaign
because it will increase income.
END OF EXAMINATION
Page 7 of 7