Documente Academic
Documente Profesional
Documente Cultură
INSTRUCTIONS: -
1. Candidates should answer all the questions in the same order provided in the question paper.
2. Any activity that compromises the integrity of the examination will not be permitted.
3. Students should complete the examination within the provided timeline.
4. Candidates are expected to check and ensure that the correct answer file (in. ipynb format) is uploaded
in LMS.
Dataset Information:
The dataset given is about TB prevalence, all forms (per 100000 populations per year) in different countries.
Group countries based on how similar their situation has been year-by-year to understand the world situation
regarding the tuberculosis disease. The cluster information is given for reference. Please remove the same
before building the models.
Note: Mention all the assumptions made and also if some of the sub questions cannot be done, please mention
the reason for not doing.
3. Clustering: Use PCA dimensions to cluster the data. Apply K-means and Agglomerative clustering.
(30 Marks)
Some pointers which would help you, but don’t be limited by these
a. Find the optimal K Value. (5 marks)
b. Apply Clustering and find out if the data points have been clustered correctly using appropriate
visualization (20 marks)
UNSUPERVISED LEARNING
c. Evaluate the clusters formed using appropriate metrics to support the model built and compare
both the models. (5 marks)
4. Use the cluster labels from the best method above and convert the problem to a supervised learning
classification. (15 marks)
a. Split dataset into train and test (70:30) (2 marks)
b. Are both train and test representative of the overall data? How would you ascertain this
statistically? (3 marks)
c. In case of a Supervised Machine Learning Problem, how will you decide when to apply
PCA & How do you improve the accuracy of the model? Write clearly the changes that
you will make before re-fitting the model. Fit the final model. Please feel free to have any number
of iterations to get to the final answer. Marks are awarded based on the quality of final model
you are able to achieve. (10 marks)