0 evaluări0% au considerat acest document util (0 voturi)
100 vizualizări31 pagini
This document describes a student project that aims to evaluate student performance using data mining techniques. It was submitted by three students - Rahul Raghavan, Manas Saxena, and Sagar Wahal - to fulfill requirements for a Bachelor's degree in Information Technology. The project was guided by an assistant professor. It includes an introduction describing the importance and motivation for analyzing educational data to predict academic performance. A literature review covers previous works on using data mining for student evaluation. The proposed work will involve defining the problem, methodology, data flow, and analysis to evaluate strengths/weaknesses and predict future performance.
Descriere originală:
Novel Approach to evaluate student performance using Data Mining
Titlu original
Novel Approach to Evaluate Student Performance Using Data Mining
This document describes a student project that aims to evaluate student performance using data mining techniques. It was submitted by three students - Rahul Raghavan, Manas Saxena, and Sagar Wahal - to fulfill requirements for a Bachelor's degree in Information Technology. The project was guided by an assistant professor. It includes an introduction describing the importance and motivation for analyzing educational data to predict academic performance. A literature review covers previous works on using data mining for student evaluation. The proposed work will involve defining the problem, methodology, data flow, and analysis to evaluate strengths/weaknesses and predict future performance.
This document describes a student project that aims to evaluate student performance using data mining techniques. It was submitted by three students - Rahul Raghavan, Manas Saxena, and Sagar Wahal - to fulfill requirements for a Bachelor's degree in Information Technology. The project was guided by an assistant professor. It includes an introduction describing the importance and motivation for analyzing educational data to predict academic performance. A literature review covers previous works on using data mining for student evaluation. The proposed work will involve defining the problem, methodology, data flow, and analysis to evaluate strengths/weaknesses and predict future performance.
performance using Data Mining Submitted in partial fulfillment for the award of the degree Of BACHELOR OF ENGINEERING In INFORMATION TECHNOLOGY By
Rahul Raghavan Manas Saxena Sagar Wahal
Under the guidance of Mr. Anil Vasoya Designation Assistant Professor (IT)
Academic Year 2013-2014
Synopsis Report On Novel Approach to evaluate student performance using Data Mining Submitted in partial fulfillment for the award of the degree Of BACHELOR OF ENGINEERING In INFORMATION TECHNOLOGY By
Rahul Raghavan Manas Saxena Sagar Wahal
Under the guidance of Mr. Anil Vasoya Designation Assistant Professor (IT) Academic Year 2013-2014
ACKNOWLEGEMENT We are foremost thankful to the Principal of our college Dr. B.K. Mishra who has taken strenuous efforts in providing us with excellent lab facilities. We are greatly indebted to our internal project guide Prof. Anil Vasoya for his guidance and enlightened comments, which has helped us in better understanding our project work. We would like to thank him for his helpful suggestions and numerous discussions which he has guided us. We are also thankful to our Head of Department Dr. Kamal Shah and Project Co- coordinator Dr.Vinayak Bharadi who always gave us constant motivation guidance and encouragement for the project. We are also grateful to our classmates and friends who have given us feedback and encouragement Finally we would wish to thank our college Thakur College of Engineering and Technology for providing us with a platform and the necessary facilities to make this project
Name of Students:
Rahul Raghavan
Manas Saxena
Sagar Wahal
ABSTRACT Data mining is a process of extracting hidden information from huge volumes of data. The various data mining techniques used are Classification, Clustering and Association mining. All these techniques can be applied to educational data to predict a students academic performance and also to determine the areas he is currently lacking in. The student can evaluate his performance and find out area to improve. In order to increase his percentage. While calculating a students performance we take into consideration a students marks in previous semesters and his term test marks, attendance, viva marks and other factors. Here we use One R algorithm and Frequency table to predict the score which determines how important a particular area is. The accuracy of this algorithm can be measured by comparing the predicted score with the actual score. Teachers can forward the result of students report. They can also determine which students are currently lacking based on their marks and other factors. Using this data teacher can motivate a student to improve his performance in a particular area. Also students can view the report themselves and can make improvements based on area which they are lacking in.
CERTIFICATE This is to certify that Rahul Raghavan, Manas Saxena and Sagar Wahal are the bonafide students of Thakur College of Engineering and Technology, Mumbai. They have satisfactorily completed the requirements of the PROJECT-I as prescribed by University of Mumbai while working on Novel approach to evaluate student performance using Data Mining.
Thakur College of Engineering and Technology Kandivali (E), Mumbai 400101
Place: Date: (College Round Seal) (Signature) Name: (Head of department)
(Signature) Name: (Principal) C O N T E N T S
Chapter No. Topic
Page No.
Chapter 1 Overview
1.1 Importance of Project 1.2 Literature Survey 1.3 Motivation 1.4 Scope of the Project
1
1 2 4 5 Chapter 2 Proposed Work
2.1 Problem Definition 2.2Methodology 2.3 Data Flow Diagram 2.4 As per guides instructions 6
6 8 13 17 Chapter 3 Analysis & Planning
3.1 Feasibility Study 3.2 Project Planning 3.3 Gantt Chart 18
18 21 22 Chapter 4 Results & Discussion
23
Chapter 5
Conclusion 24 References 25
1
Chapter 1: Overview 1.1 Importance of the project
Evaluation is a systematic process of collecting, analyzing and interpreting evidences of students progress and achievement both in cognitive and non-cognitive areas of learning for the purpose of taking a variety of decisions. Evaluation, thus, involves gathering and processing of information and decision-making.
The present system of evaluation at school stage suffers from a number of imperfections. The first and foremost shortcoming of the evaluation system is that it focuses only on cognitive learning outcomes and completely ignores the non-cognitive aspects which are a vital component of human personality.
Another shortcoming of the present examination system is that the results are declared in terms of raw marks which also depend on the subjectivity of the examiner.
In our project we try to extract useful knowledge from graduate students data collected from Thakur College of Engineering & Technology- Mumbai. Here, we use various data mining algorithms to evaluate students performance. By using these algorithms we extract knowledge that describes students performance at the end of the semester examination. It also helps earlier in identifying the dropouts and students who need special attention and allow the professor to provide appropriate advising and counseling.
This project attempts to correct the fallacies of the current system of student evaluation. The project intends to extract knowledge from the raw data present. This information can help the college management get an insight into the strengths and weaknesses of a student. Armed with this information the college management can help students work on their personal weaknesses. The project also contains a tool that can predict the performance of the students in future examinations. This project provides an innovative approach towards the evaluation of student performance. It enhances the reach of the current system which helps the students grow as individuals.
2
1.2 Literature Survey 1.2.1 Use of Data Mining Techniques for the Evaluation of Student Performance: A Case Study ABSTRACT: In this paper the author introduces the concept of extracting information from large volume of database of Sri Sai University- Palampur. The author uses marks obtained by students in their post graduate exam and other factors. Also the authors introduces various techniques to improve post graduate students performance and identify students with low grades. The data include one and half year period of data. Authors use Clustering, Decision Tree and Neural Networks are used evaluate students performance. It also helps in identifying dropouts and students who need special counseling. The drawback of this system is: The system only takes into consideration the marks of the students. It completely ignores the non-cognitive factors. We believe that those factors have a lasting impact on the performance of the student. The system does not provided any suggestions for future options for the student The system does not evaluate the strengths and weaknesses of the student[1] Author: Er. Rimmy Chuchra
1.2.2 Predicting students performance using ID3 and c4.5 Classification algorithms
ABSTRACT: This paper introduces the concept of predicting a students marks based on previous performances The authors takes into consideration number factors like scores in board examinations of classes X and XII The system uses a number of data mining algorithms like ID3 and C.5 to predict the marks accurately However the drawbacks of this system are: 3
The system does not take into consideration a students family background ,socio economic factors and friend circle Also the system does not suggest ways a person can improve his marks The system also does not give proper results in case of missing data[2] Author(s): Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao
4
1.3 Motivation One set of the existing system of student evaluation process involves analysis of newly generated data from separate examination conducted solely for the system. In this project we perform data mining on the data which is already available. This provides easy integration of our system with the current system. In addition, we also include non-cognitive data like family background, friend circle etc. Another set of system which we studied evaluated the student based on existing system but it did not include the analysis of strengths and weaknesses of a student. Another important aspect we did not find in the systems we studied was the inclusion of future options. Thus, broadly speaking the our motivation for the project is to provide a student evaluation system that can be easily integrated with existing system and in addition to that provide mathematically calculated recommendations about the ways to improve the performance in the coming semester.
5
1.4 Scope of the Project The system will take into account a number of factors by gathering data about a students Semester marks, Term Test Marks, Attendance, Students background and various other factors from Thakur College of Engineering and Technology (Mumbai), IT Department and predicting student marks. All these factors will be taken into consideration while designing the final project which could be used by a student for decision making process. This information can be used by a student to monitor his progress. Also this can be used by a student to determine his academic strength. This system will analyze the performance of the student and highlight those parameters on which the student needs to work on in order to improve their performance in the near future.
6
CHAPTER 2: PROPOSED WORK 2.1 Problem Definition The primary purpose of our project is to provide a novel approach to evaluate student performance using data mining. Data mining is a type of sorting technique which is actually used to extract hidden patterns from databases. The major advantages of using data mining are the fast retrieval of data or information, Knowledge Discovery from databases, detection of hidden patterns, and reduction in the level of complexity, time saving etc. The main objective of educational institutions is to provide high quality education to its students and to improve the quality of managerial decisions. One way to achieve high quality education is by discovering knowledge from educational database and using it to create an environment that helps students grow better. The application has the following objectives: To predict student performance on the basis of both congestive and non-congestive parameters. The parameters are as follows:- Aggregate Marks Term Work Term Test Marks Viva Marks Practical Marks 10 th Marks 12 th marks Attendance Family background Hostelite or days scholar Friend Circle Educational background of father and mother MH-CET marks Any Live KTs or not Educational background of siblings Current posting of siblings Current posting of father Income of family Mothers job profile After prediction of the performance of the student, analysis based on different parameters used to make the prediction. Identifying the strengths and weaknesses of the student on the basis of this prediction. 7
Communicating to the student the parameters on which they need to work in order to improve their overall performance. Flexible design strategy which allows future updating and improvements. Round the clock availability. Easy and understandable graphical user interface. Formidable security measures to ward off any attacks on database of students. We expect this application to be used by college professors and administrators for evaluating student performance and taking important managerial decisions. The primary objective of this application is to provide a detailed evaluation of a student to the professors so that they can condition their teaching style to suit the needs of the student. The information provided by the application can also be used by visiting companies to filter out those candidates that suit their requirements.
8
2.2Methodology 1.2.1: One R Algorithm: Description: One R, short for "One Rule", is a simple, yet accurate, classification algorithm that generates one rule for each predictor in the data, and then selects the rule with the smallest total error as its "one rule. The basic idea behind this algorithm is to test every single attribute and branch for every value of that attribute. In our case we are predicting how a particular student can improve is performance using other attributes like Term Test marks, Viva Marks and other factors. Algorithm: Use of One R algorithm to calculate the weight age to be given to each parameter. Classifying each parameter in ranges namely high , medium and low Classifying the target attribute into high ,medium and low Calculating success percentage of each parameter Now we calculate total error for each frequency table and find the frequency table with minimum or low total error. A low total error means higher contribution to improve the accuracy of the model One Rule Algorithm on a chosen data set: To illustrate we have collected the sample data of students performance. We already know the marks student have obtained in semester 4. Here we use this information to calculate the impact each input parameter has on the final result.
Figure 1.2.1.1 Sample data of students H-High, L-Low, M- Medium 9
We have classified our input parameters into the following categories: Aggregate marks in previous semester: Above 70 High Between 60 and 70 inclusive Medium Below 60- Low Attendance in this semester: Above 80-High Between 60-80 inclusive-Medium Below 60 Low Term work in this semester: Above 85-high Between 75-85 Medium Below 75 Low Viva marks in this semester: Above 85- High Between 80 to 85 Medium Below 80 Low Our class level attribute here is our semester 4 marks. We take each parameter and attempt to match it with our class level attribute. Example:- If input parameter Attendance is High and our class level attribute Semester 4 marks are also High then we have a match. Similarly we calculate the total count of all such matches from our sample data. We finally get the following result: Frequency Table: For input parameter: 3 rd semester marks:
Class level attribute: Semester 4 High Medium Low Semester 3 High 2(Match) 1 0 Medium 1 1(Match) 0 Low 0 0 2(Match) Table 1.2.1.1 Frequency Table generated for the parameter 3 rd semester marks 10
Now we calculate the success rate of the total count of matches with the following formula Success Rate= (Number of successful match / Total number of samples)*100 Example Success Rate for input parameter 3 rd semester marks= ((2+1+2)/7)*100=71.42% Success rate is= 71.42% For input parameter: Attendance:
Class level attribute: Semester 4 High Medium Low Attendance High 3 1 0 Medium 0 1 0 Low 0 0 2 Table 1.2.1.2 Frequency Table generated for the parameter Attendance Success rate= 85.71% For Term work:
Class level attribute: Semester 4 High Medium Low Term work High 2 1 0 Medium 1 1 0 Low 0 0 2 Table 1.2.1.3 Frequency Table generated for the parameter Term Work Success rate=71.42%
11
For Viva:
Class level attribute: Semester 4 High Medium Low Viva High 3 0 0 Medium 0 1 0 Low 0 1 2 Table 1.2.1.4 Frequency Table generated for the parameter Viva Success rate=85.71% We have now calculated the percent success rates for each parameter individually. Now, we calculate the impact each parameter has on the final aggregate score of the student. We do this by calculating the overall impact of each of these factors in predicting the final result Parameter Impact Rate= Success Rate / Success rate Example:-If I want to calculate the Parameter Impact Rate for Input Parameter Attendance it will be done as follows Parameter Impact Rate for attendance= 85.71/ (71.42+85.71+71.42+85.71) = 0.27 Input Parameters Success rate (%) Parameter Impact Rate(PIR) Semester 3 71.42 0.22 Attendance 85.71 0.27 Term Work 71.42 0.22 Viva 85.71 0.27 Table 1.2.1.5 Table for Parameter Impact Rate Once we have obtained the PIR for all the parameters we will use this information to predict the marks of the student in the current semester. We have taken the example of student Shivam Thakur and predicted his marks for 5 th semester on the basis of the PIR we obtained. To predict his marks for the current semester we will have to calculate his Parameter Impact Score (PIS) for the previous semester. After that we will calculate the PIS for the current semester and then by unitary method predict the marks for the current semester. PIS can be obtained by the formula given below: Parameter Impact Score (PIS) = Parameter Impact Rate*(Parameter Value) 12
Ex:-So in case of input parameter 3 rd Semester marks the PIS will be: PIS= 0.22*73=16.06 Now, here we calculate the PIS of Shivam Thakur for his score in Semester 4 Name PIS for 3 rd
Semester Marks PIS for Attendance (4 th semester) PIS for Term Work (4 th semester) PIS for Viva (4 th semester) Shivam Thakur 16.06 24.84 19.8 24.84 Table 1.2.1.6 Table for Parameter Impact Score Mean PIS for Semester 4 is = (16.06+24.84+24.84+19.8)/4 Mean PIS for Semester 4 = 21.39 We now know that when Shivam Thakur obtained mean PIS of 21.39 his aggregate score was 78% Similarly we will calculate the PIS of individual input parameters of Shivam Thakurs for Semester 5 Name 4 th semester marks Attendance (5 th semester) Term Work (5 th semester) Viva (5 th semester) Shivam Thakur 78 80 85 78 PIS 17.16 21.6 18.7 21.06 Table 1.2.1.7 Table for Parameter Impact Score Mean PIS for 5 th semester = (17.16+21.6+18.7+21.06)/4 = 19.63 So, when Mean PIS of Shivam Thakur was 21.39 he obtained 78 % marks. Therefore when mean PIS value is 19.335 the marks he obtains will be:- Predicted Percentage for Semester 5 is = (78/21.39)*19.335=71.58 However his Actual 5 th semester marks =73.2% Hence we were able to make an approximation of the marks he will obtain in semester 5.
13
2.3 DATA FLOW DIAGRAM LEVEL 0 DFD
Fig 2.3.1: Data flow diagram Level 0
Login Faculty Administrator Database 14
LEVEL 1 DFD
Fig 2.3.2: Data flow diagram Level 1 Administrator Verify Student Student Add/delete records Update Data Faculty Database Verify Faculty User Database
Login User Register Student Database Faculty 15
LEVEL 2 DFD Students
Fig 2.3.3: Data flow diagram detailing Students flow
Faculty Login Edit Personal Details Generate Reports Faculty Database User User database Register Student Search Report errors Analysis 16
Level 2 DFD Faculty
Fig 2.3.4: Data flow diagram detailing Facultys flow
Faculty View Student Data Mail Report Generate Report Login Placement Generate Reports Feedback Student Database 17
2.4 As per guides instructions The following modifications were suggested by our guide during the design phase of the project: Prediction: Our guide suggested that we increase the number of parameters which we would use to predict the performance of the students. After further research and evaluation we came up with some additional parameters that will be used to predict the performance of the students: Hostelite or days scholar Educational background of father and mother MH-CET marks Any Live KTs or not Educational background of siblings Current posting of siblings Current posting of father Income of family Mothers job profile
18
CHAPTER 3: Analysis and Planning 3.1 Feasibility Study: As per our project Novel Approach to evaluate student performance the total requirement for setting up the project is given below: Time Feasibility: Our project requires only software so the time required for the project is 68months WE ARE DESIGNING OUR SOFTWARE ACCORDING TO SOFTWARE DEVELOPMENT LIFE CYCLE [SDLC] SR No. CRITERIA TIME PERIOD 1. Feasibility Study 0.5 months 2. Analysis and Data Gathering 1.5 months 4 Design of project 1.5 months 4. Implementation (Coding) 2 months 5. Testing and Finalization 1 months 6. Maintenance 1.5 months TOTAL 8 months Table 3.1.1: Software development lifecycle
SOFTWARE REQUIRED: The whole project is designed using JAVA technology 19
Project Schedule: VII Semester Timeline: Name Duration Start Finish 1.Requrement Analysis 25 days 2/08/2013 27/08/2013 1.1 Software specification 4 days 2/08/2013 6/08/2013 1.2 Presentation 7 days 6/08/2013 13/08/2013 1.3 In house requirement specification 2 days 13/08/2013 15/08/2013 1.4 SRS 8 days 15/08/2013 23/08/2013 1.5 Requirement Gathering 4 days 23/08/2013 27/08/2013 2.Analysis 12 days 27/08/2013 8/09/2013 2.1 User Requirements 3 days 27/08/2013 30/08/2013 2.2 Functional Requirements 5 days 30/08/2013 4/09/2013 2.2 Non functional Requirements 4 days 4/09/2013 8/09/2013 3. Design 21 days 8/09/2013 29/09/2013 3.1 Architecture Design 6 days 8/09/2013 14/09/2013 3.2 Database Schema 7 days 14/09/2013 21/09/2013 3.3 Graphical User Interface 8 days 18/09/2013 29/09/2013 Table 3.1.2: VII Time Line
20
VIII Semester Timeline: Name Durati on Start Finish 4. Coding /Implementation 60 days 25/01/2014 25/03/2014 4.1 Database Creation 3 days 25/01/2014 28/01/2014 4.2 Software Development 12 days 28/01/2014 10/02/2014 4.3 Database Integration 4 days 10/02/2014 14/02/2014 4.5 Coding and Implementation 20 days 14/02/2014 6/03/2014 4.4 Integration 6 days 6/03/2014 12/03/2014 4.5 Implementation of Application 20 days 12/03/2014 25/03/2014 5. Verification and Testing 30 days 25/03/2014 25/04/2014 5.1 Unit Testing 5 days 25/03/2014 30/03/2014 5.2 Stress Testing 5 days 30/03/2014 05/04/2014 5.3 Alpha/Beta Testing 6 days 5/04/2014 11/04/2014 5.4 Acceptance testing 5 days 11/04/2014 16/04/2014 5.5 Performance Testing 5 days 16/04/2014 21/04/2014 5.6 Modification 4 days 21/04/2014 25/04/2014 Table 3.1.3: VIII Time Line
21
3.2 Project Planning The goal of our system is to evaluate ,predict and improve students performance.A student can also monitor his progress.The system will also help students as well as companies to improve their placement process. The key stakeholders of this system are: 1. Students 2. Faculty 3. Administration 4. Development Team Student: The reports can be mailed to the student so that he/she can analyze his/her performance And improve his /her performance Faculty: Faculty will visit the system the most.Faculty will be able to view a students weakness and strengths ,help to improve them accordingly. Administrator: The administrator will be responsible to maintain the system.Administrator will update a Students records.Administrator will be responsible to report errors made by the Facuty and forward it to college.Administrator will be responsible to add/delete/edit student information. Development Team: The development team will be repsonsible for checking any bugs in the system. Also the development team will report System critical error and provide patches for the system. Development team will be reponsible for adding new functionalities to the system Project Deliverables: Platform to interact between between students and faculty Customized report generation for faculty of students Administrator can edit/delete/add student information
22
3.3 Scheduling(TimeLine chart):
Fig 3.3.1: Gantt Scheduling chart created using Microsoft Visio
23
CHAPTER 4: RESULTS & DISCUSSION Our application will use One R algorithm to evaluate a students performance. Our system uses a number of parameter to calculate a parameter index score for the student and evaluate them on the basis of this score. The student will be able to visualize which topics or areas they are currently lacking in. Faculty will mail the analysis report to the student. The student can improve upon the areas he is currently lacking in to improve his overall score. For example if a student notices that his attendance parameter is not up to the mark then he can work upon improving his attendance. This would ultimately help him understand his weak area and allow him to focus more on that area.
24
Chapter 5: Conclusion The educational system is the backbone of progress and development of any society. Greater the ability of the education system to improve the performances of its students better the chance of the society to produce successful citizens. Keeping this fact in mind it is necessary to constantly work towards a more sophisticated education system. Data mining is an incredible concept which provides us hidden information from voluminous and exhaustive databases. Data mining can provide many solutions towards making a stronger education system. Our project is a stepping stone towards the integration of technology and the education system.
25
References: [1] IJCSMR paper-Er. Rimmy Chuchra Use of Data Mining Techniques for the Evaluation of Student Performance: A Case Study [2] IJDKP paper-Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Books: Data Mining Concepts and Techniques Jiawei Han and Micheline Kamber
Websites Java-ww.oracle.com Wikipedia- www.wikipedia.com One Rule Algorithm-www.soc.napier,ac.uk/~peter/vldb/dm/node8.html http://www.saedsayad.com/oner.htm