Sunteți pe pagina 1din 3

BSCSHonors Program

CS-402

GIFT University Gujranwala

Course: Data Mining


Resource Person: Nadeem Qaisar Mehmood Total Points: 50 Submission Due: Saturday 8th December, 2012

(Fall 2012) ASSIGNMENT2 (Proximity & Classification)


03-December-2012

Instructions: Please Read Carefully!


This is a group assignment. A group must have at most 3 members (but not more than this) Each individual member must pass the viva for this assignment to get any marks for this assignment. The viva will be conducted after the submission of this assignment. Please do not copy the assignment. All copies will be awarded a straightforward ZERO. However you are allowed to share the ideas and helping each other in discussion. You are expected to submit this assignment as: a. A single .zip file containing all the source files of your implementation. This zip file must be named as CS402-AS02-(ROLLNUMBER1)(ROLLNUMBER2).zip and nothing else! Assignment is to be submitted electronically via email at nadeemqaisar@gift.edu.pk till the dead line. a. The subject of the email should be: CS402-AS02-(ROLLNUMBER1) (ROLLNUMBER2). b. Attach the zip file to the email. c. Keep the body of the email as empty. d. Send a copy of your email to you other group member. Send this email to the above address on or before the due date and time. There will be a 25% penalty against late submissions. No assignment will be submitted after Sunday 9th December 2012. You have to follow above strict dead lines.

NOTE: You must pass the subsequent viva of this assignment to actually have any marks for this assignment.

Page 1 of 3

BSCSHonors Program

CS-402

GIFT University Gujranwala

1. For the following vectors, x x and y y, Write a program that accepts the x and y vectors and calculate the indicated similarity or distance measures. [25]
1) x=(1,1,1,1), y=(2,2,2,2) cosine, correlation, Hamming, Euclidean 2) x=(0,1,0,1), y=(1,0,1,0) cosine, correlation, Hamming, Euclidean, Jaccard, SMC 3) x=(1,1,0,1,01), y=(1,1,1,0,0,1) cosine, correlation, Jaccard, SMC Note: Program only one function for each asked similarity measure which will be enough for each individual vector for each of the above three sub parts.

2. Find proximity calculation between the following document vectors


In a document vector each attribute is a component of a vector. The value of each component is the number of times the corresponding term occurs in the document. Such kinds of term vectors are provided in the following document vectors. You have to write a program which shall find cosine based complete proximity matrix against the following document vectors.

[25]

timeout

season

coach

game

score

team

ball

lost

pla y

wi n

Document 1 Document 2 Document 3

3 0 0

0 7 1

5 0 0

0 2 0

2 1 1

6 0 2

0 0 2

2 3 0

0 0 3

2 0 0

Note: To find details about cosine based proximity measurement, please refer to Tans book chapter 02 at page number 75. You can reuse the cosine calculation function programmed above in question number 01.

3. Program for Impurity measurement Calculation

(50 Marks)

Following data contains information about the eye patients who got subscription to use lenses based on their disease age, spectacle and stigma reports. The disease age varies between young, pre-presbyopic, and presbyopic. However spectacle prescription would be myope and hypermetrope. Either a patient can have astigma or not. Use this data for a binary classification problem to find the following measures:

1 2 3 4

AGE Young Young Young Young

ASTIGMA No Yes No Yes

SPECTACLE Myope Myope Hypermetrope Hypermetrope

CONTACT LENSES Soft Hard Soft Hard

Page 2 of 3

BSCSHonors Program

CS-402

GIFT University Gujranwala

5 6 7 8 9
I. II. III. IV. V.

pre-presbyopic pre-presbyopic pre-presbyopic Presbyopic Presbyopic

No Yes No Yes No

Myope Myope Hypermetrope Myope hypermetrope

Soft Hard Soft Hard Soft

Compute the GINI and Entropy for the overall collection of the training examples. Compute the GINI index for the Age attribute. Compute the GINI index for the Astigma attribute. Compute the GINI index for the Spectacle attribute. Which attribute is the better one? Age, Astigma or Spectacle?

Note: You may extend it for entropy measurement. (Optional) Note: It is optional for you to do this assignment using C++ or Java programming languages only.

END OF ASSIGNMENT

Page 3 of 3

S-ar putea să vă placă și