Sunteți pe pagina 1din 59

Synopsis Report on

Intelligent Heart Disease Prediction System Using Data Mining Techniques


Submitted in partial fulfillment for the award of the degree Of

BACHELORS OF ENGINEERING
In

COMPUTER ENGINEERING
By Ujesh Shetty Pallav Parikh Vivek Shukla

Under the Guidance of Mrs.Megharani Patil Asst.Professor

Synopsis Report on

Intelligent Heart Disease Prediction System Using Data Mining Techniques


Submitted in partial fulfillment for the award of the degree Of

BACHELORS OF ENGINEERING
In

COMPUTER ENGINEERING
By Ujesh Shetty Pallav Parikh Vivek Shukla

Under the Guidance of Mrs.Megharani Patil Asst.Professor

CERTIFICATE
This is to certify that Ujesh Shetty, Pallav Parikh and Vivek Shukla are the bonafide students of Thakur College of Engineering and Technology, Mumbai. They have satisfactorily completed the requirements of the PROJECT-I as prescribed by University of Mumbai while working on Intelligent Heart Disease Prediction System .

Ms. Megharani Patil (Guide)

Dr. R. R. Sedamkar (HOD CMPN)

Dr. B. K. Mishra (Principal)

Internal Examiner (Name and Signature with Date)

ExternalExaminer (Name and Signature with Date)

Thakur College of Engineering and Technology Kandivali(E), Mumbai-400101. PLACE: Mumbai DATE:

CONTENTS

Chapter No.

Topic
List of figure List of tables Abbreviation and symbols Definitions Importance of the Project and its background Literature survey Motivation Scope of the project Organization of the Project report

Pg. No. i ii iii iv 01

Chapter 1

Introduction
1.1 1.2 1.3 1.4 1.5

Chapter 2

Proposed Work
2.1 2.2 2.3 Problem Definition Data Flow Diagram/ Flow chart of Design As per guides instruction Feasibility Study Project Planning Scheduling (Time line chart) As per guides instruction Technology/ Software etc Stage wise Model development ,Flow Chart etc. Implementation stages Installation stages

05

Chapter 3

Analysis and Planning


3.1 3.2 3.3 3.4

Chapter 4

Design, Implementation and Installation


4.1 4.2 4.3 4.4

Chapter 5 Chapter 6

Progress ( Optional)
5.1 Deviations from design schedule 5.2 Remedial measures taken As per Project (Optional)

Chapter N- Results and Discussion 1


Results and discussion for algorithm 1 Results and discussion for algorithm 2 SWOT analysis

Chapter N Appendix References

Conclusion and scope for future work

Chapter 1: Overview

1.1 1.2 1.3 1.4 1.5

Importance of the Project and its background Literature Survey Motivation Scope of the project Organization of the Project report

Chapter 1: Overview
1.1

Importance of the Project and its background


Heart disease is the first ranking among the leading causes of death.

Heart failure and stroke is the most frequent diagnostic category and is one of the leading causes of death.

Heart diseases are the most frequently first-listed diagnoses for hospital discharges.

Approximately 60 billion around the world are afflicted with some form of cardio-vascular disease which includes both heart disease and stroke.

The main objective of this research is to develop a prototype Intelligent Heart Disease Prediction System (IHDPS) using two data mining modelling techniques, namely, Decision Trees & Nave Bayes. IHDPS can discover and extract hidden knowledge, patterns and relationships associated with heart disease from a historical heart disease database. It can answer complex queries for diagnosing heart disease and thus assist healthcare practitioners to make intelligent clinical decisions which traditional decision support systems cannot. By providing effective treatments, it also helps to reduce treatment costs. To enhance visualization and ease of interpretation, it displays the results both in tabular and graphical forms. IHDPS can serve a training tool to train nurses and medical students to diagnose patients with heart disease. It can also provide decision support to assist doctors to make better clinical decisions or at least provide a second opinion. The current version of IHDPS is based on the 15 attributes listed in Figure 1.1. This list may need to be expanded to provide a more comprehensive diagnosis system.

List Of Attributes

Predictable attribute

Diagnosis value 0: < 50% diameter narrowing (no heart disease) value 1: > 50% diameter narrowing (has heart disease) Patient_id Patients identification number Sex value 1: Male value 0 : Female Chest Pain Type value 1: typical type 1 angina value 2: typical type angina value 3: non-angina pain value 4: asymptomatic Fasting Blood Sugar value 1: > 120 mg/dl value 0: < 120 mg/dl Restecg Resting electrographic results value 0: normal value 1: 1 having ST-T wave abnormality value 2: showing probable or definite left ventricular hypertrophy Exang Exercise induced angina value 1: yes value 0: no Slope The slope of the peak exercise ST segment value 1: un sloping value 2: flat value 3: down sloping CA number of major vessels coloured by floursopy (value 0 3) Thal value 3: normal value 6: fixed defect value 7: reversible defect Trest Blood Pressure (mm Hg on admission to the hospital) Serum Cholesterol (mg/dl) Thalach maximum heart rate achieved Oldpeak ST depression induced by exercise relative to rest Age in Year Figure 1.1. Description of attributes

Key attribute
2 3

Input attributes

9 10

11 12 13 14 15

1.2

Literature Survey

Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis, electronic patient records, medical devices, etc. The large amounts of data are a key resource to be processed and analysed for knowledge extraction that enables support for cost-savings and decision making. The majority of conventional clinical decision support system (CDSS) for disease diagnosis are generally based on the symptoms of the patient or data from simple medical questionnaires. The CDSS that is maintained by the health- care provider over time, and includes all of the key administrative clinical data relevant to that persons care under a particular provider, including demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data, medical images and radiology reports. To our knowledge, a CDSS for cardiovascular disease diagnosis using an ensemble of multiple classifiers for comprehensive diagnosis and possible data mining does not currently exist. Specifically, the goal is to improve cardiovascular health and quality of life through the prevention, detection and treatment of risk factors; early identification and treatment of heart-attacks and strokes with prevention of recurrent cardiovascular events.

1.3

Motivation
A major challenge facing healthcare organizations like hospitals and various medical centers across the country is the provision of quality services at affordable costs. Quality service implies diagnosing patients correctly and administering treatments that are effective. Poor clinical decisions can lead to disastrous consequences which are therefore unacceptable. Hospitals must also minimize the cost of clinical tests. They can achieve these results by employing appropriate computer-based information and/or decision support systems. Most hospitals today employ some sort of hospital information systems to manage their healthcare or patient data. These systems typically generate huge amounts of data which take the form of numbers, text, charts and images. Unfortunately, these data are rarely used to support clinical decision making. There is a wealth of hidden information in these data that is largely untapped. This raises

an important question: How can we turn data into useful information that can enable healthcare practitioners to make intelligent clinical decisions? This is the main motivation for this research.

1.4

Scope of the project


The use of decision trees is one of the most popularly applied methods for CDSS due to its simplicity and capacity for humanly understandable inductive rules. Many researchers have employed decision tree to resolve various biological problems, including diagnostic error analysis, potential biomarker finding, and proteomic mass spectra classification. Bayesian networks are a probability-based inference model, increasingly used in the medical domain as a method of knowledge representation for reasoning under uncertainty for a wide range of applications, including disease diagnosis, genetic counselling, expert system development, gene network modelling, and emergency medical decision support system design. The scope of this project is to develop an IHDPS utilizing all the attributes of medical profile using a powerful data mining model which can help:
1 2 3

Physicians identify effective treatments and best practices. Patients receive better and more affordable healthcare services. Healthcare management services where it could be particularly useful in medicine when there is no dispositive evidence favouring a particular treatment option.

Based on patients profile, history, physical examination, diagnosis and utilizing previous treatment patterns, new treatment plans can be effectively suggested. Thus we intend to provide a prototype which will aid better clinical decision

making and act as a companion to the concerned doctors for a second opinion.

1.5

Organization of the Project report

In the upcoming chapters, we highlight different aspects of the designed prototype. In Chapter 2, we provide the elicit definition of the problem along with the Class diagram and Block diagram of the entire system. In the problem definition, we discuss about the need for the project and its impact on the clinical decisions being made. Class diagram shows the various components and entities involved in the system. Block diagram describes the systems functionality in a detailed manner. Here we depict a low-level representation of the prototype system. We also design the static and dynamic analysis diagram, activity diagram, interaction diagram, collaboration diagram and the deployment diagram. In Chapter 3, we discuss the feasibility of our product with respect to various factors such as the technical, economical and operational domains. In the next part we describe the various stages in the planning of the project during the entire course duration. Here the scheduling of the entire project is represented using a Gantt chart. In Chapter 4, we discuss about the technologies and softwares used in the development of the prototype. Along with it we design the model deployment and the block diagram of the prototype. The stages in both the implementation and installation phases are diagramatically represented.

Chapter 2: Proposed Work


2.1 2.2 2.3 Problem definition Data Flow Diagram Static and Dynamic analysis diagram

Chapter 2: Proposed Work

2.1

Problem definition
Many hospital information systems are designed to support patient billing,

inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. They can answer simple queries like What is the average age of patients who have heart disease?, How many surgeries had resulted in hospital stays longer than 10 days?, Identify the female patients who are single, above 30 years old, and who have been treated for cancer. However, they cannot answer complex queries like Identify the important preoperative predictors that increase the length of hospital stay, Given patient records on cancer, should treatment include chemotherapy alone, radiation alone, or both chemotherapy and radiation?, and Given patient records, predict the probability of patients getting a heart disease. Clinical decisions are often made based on doctors intuition and experience rather than on the knowledge-rich data hidden in the database. This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients. Our project proposes that integration of clinical decision support with computer-based patient records could reduce medical errors, enhance patient safety, decrease unwanted practice variation, and improve patient outcome . This suggestion is promising as data modelling and analysis tools, e.g., data mining, have the potential to generate a knowledge-rich environment which can help to significantly improve the quality of clinical decisions.

2.2

Data Flow Diagram

A Data Flow Diagram (DFD) is a diagrammatic representation of the information flows within a system, showing:
How information enters and leaves the system What changes the information Where information is stored.

Level 0:

Level 1:

2.3

Static and Dynamic analysis diagram

2.3.1 Use -Case Specification

Actor s Goal

: Login

Participant s Actors : User, Database. Pre-condition Post-Condition Success Scenario 1) Opens Login page. 2) Enter user name and password 3) Enter the Attribute value 4) Get prediction and Time chart : The actor should have authorized access. : The user has a account and can use application. :

2.3.2 Activity Diagrams: Manage Account:

Steps:
Administrator enters into the home page. Fills in the user name and password. If the username/ password is correct, he enters into the service page and can manage and maintain it.

Login:

Steps:
The user/administrator is in home page. The user provides the username/password. If combination is correct, service page is displayed. If combination is incorrect, the user is allowed to re-enter the username/password.

2.3.3 Interaction Diagram:

Steps:
Administrator enters into the home page. Fills in the user name and password. If the username/ password is correct, he enters into the service page and can manage and maintain it.

Enter the value for various Attribute in prediction model .

2.3.4 Collaboration Diagram:


1: Prompt user to enter details 2: Invoke application 3: Validate attributes

2.3.5 Deployment Diagram:


The deployment diagram shows how a system will be physically deployed in the hardware environment. Its purpose is to show where the different components of the system will physically run and how they will communicate with each other. Since the diagram models the physical runtime, a system's production staff will make considerable use of this diagram.

Chapter 3: Analysis and Planning

3.1 3.2 3.3

Feasibility Study Project planning Scheduling

Chapter 3: Analysis and Planning


3.1 Feasibility Study
If a project is seen to be feasible from the results of the study, the next logical step is to proceed with it. The research and information uncovered in the feasibility study will support the detailed planning and reduce the research time.

3.1.1 Technical Feasibility


Technical feasibility refers to the ability of the process to take advantage of the current state of the technology in pursuing further improvement. Project is developed using Microsoft SQL Server 2005 and Microsoft Visual Basic 2010 technology which

makes it possible to run on any stand-alone machine. The system requirements for the project to work are not large but better configurations will deliver better performance. The Softwares required can be licensed at a reasonable cost. As the project can be implemented on a standalone machine there is an estimation of large scale access in large hospitals and healthcare sectors.

3.1.2 Economic Feasibility


Economic analysis is the most frequently used method for evaluating the effectiveness of a new system. More commonly known as cost/benefit analysis, the procedure is to determine the benefits and savings that are expected from a candidate system and compare them with costs. If benefits outweigh costs, then the decision is made to design and implement the system. As our project is based on Microsoft SQL Server 2005 and Microsoft Visual Basic 2010 which are free softwares and easily available hence we dont have any cost problem. Rather money can be earned by selling this project to any organization. Hence the project seems economically feasible.

3.1.3 Operational Feasibility


The simplicity of the system makes it feasible for usage by operator. Any individual needs to enter the values of attributes and will get the necessary results in processed form. As the system is designed to predict the probability of heart disease it can be utilized to effectively diagnose the disorders in very less time.

3.2 Project planning


SN Task Subtask Activity
Duration

Start Date 10 Aug 2011

End Date 31 Aug 2011

(hrs.) 1. Problem Definition. Formulation of the process statement. Brain storming session amongst 7-8 hrs

the group members. Attempt to find similar implemented solutions to problem. 2. Problem evaluation. Searching for multiple alternative solutions of main objective. Discussion and searching on internet. 15 hrs 24 Aug 2011 7 Sept 2011

3.

Define the input and desired output.

Defining the attributes and data.

Describe the Input data required and output according to the software.

9 hrs

7 Sept 2011

5 Oct 2011

4.

Define functions & behaviour.

Describe modes of interaction and interface. Planning the

Analyzing the concept in terms of functions. Formulated the

12 hrs

14 Sept 2011

12 Oct 2011

5.

Develop-

21 hrs

12 Oct

26 Oct

ment of system architecture.

logical execution of the system being developed software and hardware requirements.

general idea of the working of the process. Visualized a Standard solution which satisfies our goals and objectives.

2011

2011

and analyzing working

6.

Design and Coding

Using the system diagrams to write the actual code for the software.

Use Static & Dynamic diagrams to visualize working of system and write the code.

36 hrs

31 Oct 2011

8 Mar 2012

7.

Debugging

Testing of code at various levels. Documenting errors as they are encountered.

Execution of project to find faults in its working.

16 hrs

8 Mar 2012

28 Mar 2012

8.

Creation of tion

Completely

Formatting the documentation

13-14hrs

28 Mar 2012

20 Apr 2012

Documenta analyzing the

work done till to a desirable

date

need.

3.3 Scheduling

# P r o je c t

T ask

S ta rt E n d D u r 8 / 1 3 / 51 / 11 1 / 1 1 9 2 5

2011

2012

A u g S e p O c t N o v D e c J a n F e b M a rA p r M a y

1 P e r i o d 1 P r o j e c t 8s / 1 3 / 11 /1 2 / 11 20 1 1 .1 R e q u ir e m e n t g a t h e r in g a n d 8 / 1 3 / 91 /1 5 / 1 1 1 6 S p e c if ic a t io n 9 / 2 3 /1 10 1/ 2 1 2/ 1 1

1 . 2 F e a s i b i l i t y S t u d 8 y / 2 7 / 91 / 12 8 / 21 31 1 .3 W e b S u r v e y 1 . 4 M a r k e t A n a l y s i1s 0 / 1 5 1/ 11 /1 9 / 11 81 1 .5 D a t a D e s ig n 1 1 / 2 /1 11 1/ 2 5 1/ 81 1

1 . 6 P r o c e s s D e i g n 1a 1 /d 2 o41 p/ 21t / i12 3 2/z 21a 1t i o n n m i 1 . 7 S t a t i c a n d D y n a1 m / i 9c / A1 /n1 2a / l 1 1 s2 7i s 2 1 y 2 P e r i o d 2 P r o j e c t 2s / 1 2 / 51 / 21 1 / 61 52 2 . 1 I m p l e m e n t a t i o 2n / 1 2 / 41 / 21 5 / 41 52 2 .2 T e s t in g 3 / 3 0 / 41 / 22 7 / 21 12 2 . 3 B u i l d i n g P r o t o t4 y/ p1 e9 / 51 /2 9 / 1 1 2 5 2 .4 F in a l R e p o r t 4 / 3 0 / 51 / 21 1 / 11 02

B u y S !- pm u a r rc t hD a r sa ew d c o p i e s p d o c u m e n t .w i t h o u t a w a t V i s i.st mw w.c r wot d m r a1-8ow 0-7r 06-3 8a7. l2l 9 a c

Chapter 4 : Design and Implementation

4.1 4.2 4.3 4.4

Technology and Software Stage-wise Model Development and Flowchart Implementation Stages Installation Stages

Chapter 4 : Design and Implementation


4.1 Technology and Software
Front End : Visual Basic 2008
Microsoft Visual Studio Express is a set of freeware integrated development environments (IDE) developed by Microsoft that are lightweight versions of the Microsoft Visual Studio product line. Express Editions were conceived beginning with Visual Studio 2005. Visual Studio is extensible by nature, ultimately consisting of a core "shell" that implements all commands, windows, editors, project types, languages, and other features through dynamically loadable modules called "Packages". Microsoft encourages and fosters third-party partners to create modules for Visual Studio via the free VSIP program. The idea of Express editions is to provide streamlined, easy-to-use and easy-to-learn IDEs for users.

Visual Basic Express Visual Basic 2005/2008 (but not Visual Basic 2010) Express Edition contains the Visual Basic 6.0 converter that makes it possible to upgrade Visual Basic 6.0 projects to the Visual Basic.NET. The Express Editions (2005 and 2008) mostly have the same following limitations :

No IDE support for databases other than SQL Server Express and Microsoft Access No support for Web Applications with ASP.NET (this can instead be done with Visual Web Developer Express, though the non-Express version of Visual Studio allows both web and windows applications from the same IDE) No support for developing for mobile devices (no templates or emulator) No Crystal Reports Fewer project templates (e.g. Windows services template, Excel Workbook template) Limited options for debugging and breakpoints No support for creating Windows Services (Can be gained through download of a project template) No support for Open MP Limited deployment options for finished programs VB Express lacks some advanced features of the standard versions. For example, there is no Outlining feature Hide selection to collapse/expand selected text.

Despite the fact that it is a stripped-down version of Visual Studio, some improvements were made upon Visual Basic 2008 from Visual Basic 2005. Visual Basic 2008 Express includes the following improvements over Visual Basic 2005 Express:

Includes the visual Windows Presentation Foundation designer codenamed "Cider" Debugs at runtime Better IntelliSense support o Fixes common spelling errors o Corrects most forms of invalid syntax o Provides suggestions to class names when specified classes are not found

Microsoft SQL Server 2008 It is a relational database server, developed by Microsoft: it is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network (including the Internet). There are at least a dozen different editions of Microsoft SQL Server aimed at different audiences and for different workloads (ranging from small applications that store and retrieve data on the same computer, to millions of users and computers that access huge amounts of data from the Internet at the same time).

SQL Server Enterprise SQL Server Enterprise is a freeware, light-weight, and redistributable edition of Microsoft SQL Server. It provides a no-cost database for developers writing basic Windows applications and web sites. SQL Server Express replaces MSDE 2000 and significantly expands on its feature set. SQL Server Management Studio Express which provides a graphical user interface for administering SQL Server Express can also be downloaded. SQL Server Enterprise Edition includes both the core database engine and add-on services, with a range of tools for creating and managing a SQL Server cluster. It can manage databases as large as 524 petabytes and address 2 terabytes of memory and supports 8 physical processors. The SQL Server Enterprise Edition has the following limitations :

Limited to one physical CPU. Lack of enterprise features support. 1 GB memory limit for the buffer pool. Databases have a 4 GB size limit. (10 GB beginning with SQL Server Express 2008 R2) No data mirroring and/or clustering. No profiler. No workload throttling. No GUI to import or export data from/to spreadsheets. No Server Agent background process.

SQL Server includes better compression features, which also helps in improving scalability. It enhanced the indexing algorithms and introduced the notion of filtered indexes. It also includes Resource Governor that allows reserving resources for certain users or workflows. It also includes capabilities for transparent encryption of data (TDE) as well as compression of backups. SQL Server 2008 supports the ADO.NET Entity Framework and the reporting tools, replication, and data definition will be built around the Entity Data Model.SQL Server Reporting Services will gain charting capabilities from the integration of the data visualization products from Dundas Data Visualization, Inc., which was acquired by Microsoft. On the management side, SQL Server 2008 includes the Declarative Management Framework which allows configuring policies and constraints, on the entire database or certain tables, declaratively. The version of SQL Server Management Studio included with SQL Server 2008 supports IntelliSense for SQL queries against a SQL Server 2008 Database Engine.SQL Server 2008 also makes the databases available via Windows PowerShell providers and management functionality available as Cmdlets, so that the server and all the running instances can be managed from Windows PowerShell. The main unit of data storage is a database, which is a collection of tables with typed columns. SQL Server supports different data types, including primary types such as Integer, Float, Decimal, Char (including character strings), Varchar (variable length character strings), binary (for unstructured blobs of data), Text (for textual data) among others. The rounding of floats to integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (Fix) depending on arguments: SELECT Round(2.5, 0) gives 3.

Logging and Transaction SQL Server ensures that any change to the data is ACID-compliant, i.e. it uses transactions to ensure that the database will always revert to a known consistent state on failure. Each transaction may consist of multiple SQL statements all of which will only make a permanent change to the database if the last statement in the transaction (a COMMIT statement) completes successfully. If the COMMIT successfully completes the transaction is safely on disk. SQL Server implements transactions using a writeahead log. Any changes made to any page will update the in-memory cache of the page, simultaneously all the operations performed will be written to a log, along with the transaction ID which the operation was a part of. Each log entry is identified by an increasing Log Sequence Number (LSN) which is used to ensure that all changes are written to the data files. Also during a log restore it is used to check that no logs are duplicated or skipped. SQL Server requires that the log is written onto the disc before the data page is written back. It must also ensure that all operations in a transaction are written to the log before any COMMIT operation is reported as completed. At a later point the server will checkpoint the database and ensure that all pages in the data files have the state of their contents synchronised to a point at or after the LSN that the checkpoint started. When completed the checkpoint marks that portion of the log file as complete and may free it. This enables SQL Server to ensure integrity of the data, even if the system fails. On failure the database log has to be replayed to ensure the data files are in a consistent state. All pages stored in the roll forward part of the log (not marked as completed) are rewritten to the database, when the end of the log is reached all open transactions are rolled back using the roll back portion of the log file. The database engine usually checkpoints quite frequently. However, in a heavily loaded database this can have a significant performance impact. It is possible to reduce the frequency of checkpoints or disable them completely but the rollforward during a recovery will take much longer Data retrieval The main mode of retrieving data from an SQL Server database is querying for it. The query is expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved. It is processed by the query processor, which figures out the sequence of steps that will be necessary to retrieve the requested data. The sequence of actions necessary to execute a query is called a query plan. There might be multiple ways to process the same query. For example, for a query that contains a join statement and a select statement, executing join on both the tables and then executing select on the results would give the same result as selecting from each table and then executing the join, but result in different execution plans. In such case, SQL Server chooses the plan that is expected to yield the results in the shortest possible time. This is called query optimization and is performed by the query processor itself. SQL Server includes a cost-based query optimizer which tries to optimize on the cost, in terms of the resources it will take to execute the query. Given a query, then

the query optimizer looks at the database schema, the database statistics and the system load at that time. It then decides which sequence to access the tables referred in the query, which sequence to execute the operations and what access method to be used to access the tables. For example, if the table has an associated index, whether the index should be used or not - if the index is on a column which is not unique for most of the columns (low "selectivity"), it might not be worthwhile to use the index to access the data. Finally, it decides whether to execute the query concurrently or not. While a concurrent execution is more costly in terms of total processor time, because the execution is actually split to different processors might mean it will execute faster. Once a query plan is generated for a query, it is temporarily cached. For further invocations of the same query, the cached plan is used. Unused plans are discarded after some time. SQL Server also allows stored procedures to be defined. Stored procedures are parameterized T-SQL queries, that are stored in the server itself (and not issued by the client application as is the case with general queries). Stored procedures can accept values sent by the client as input parameters, and send back results as output parameters. They can call defined functions, and other stored procedures, including the same stored procedure (up to a set number of times). They can be selectively provided access to. Unlike other queries, stored procedures have an associated name, which is used at runtime to resolve into the actual queries. Also because the code need not be sent from the client every time (as it can be accessed by name), it reduces network traffic and somewhat improves performance. Execution plans for stored procedures are also cached as necessary. SQL CLR Microsoft SQL Server 2005 includes a component named SQL CLR ("Common Language Runtime") via which it integrates with .NET Framework. Unlike most other applications that use .NET Framework, SQL Server itself hosts the .NET Framework runtime, i.e. memory, threading and resource management requirements of .NET Framework are satisfied by SQLOS itself, rather than the underlying Windows operating system. SQLOS provides deadlock detection and resolution services for .NET code as well. With SQL CLR, stored procedures and triggers can be written in any managed .NET language, including C# and VB.NET. Managed code can also be used to define UDT's (user defined types), which can persist in the database. Managed code is compiled to .NET assemblies and after being verified for type safety, registered at the database. After that, they can be invoked like any other procedure. However, only a subset of the Base Class Library is available, when running code under SQL CLR. Most APIs relating to user interface functionality are not available. When writing code for SQL CLR, data stored in SQL Server databases can be accessed using the ADO.NET APIs like any other managed application that accesses SQL Server data. However, doing that creates a new database session, different from the one in which the code is executing. To avoid this, SQL Server provides some enhancements to the ADO.NET provider that allows the connection to be redirected to the same session which already hosts the running code. Such connections are called context connections and are set by setting context connection parameter to true in the connection string. SQL Server also provides several other enhancements to the ADO.NET API, including classes to work with tabular data or a single row of data as well as classes to work with internal metadata about the data stored in the database. It

also provides access to the XML features in SQL Server, including XQuery support. These enhancements are also available in T-SQL Procedures in consequence of the introduction of the new XML Datatype (query, value, nodes functions). Services SQL Server also includes an assortment of add-on services. While these are not essential for the operation of the database system, they provide value added services on top of the core database management system. These services either run as a part of some SQL Server component or out-of-process as Windows Service and presents their own API to control and interact with them. Analysis Services SQL Server Analysis Services adds OLAP and data mining capabilities for SQL Server databases. The OLAP engine supports MOLAP, ROLAP and HOLAP storage modes for data. Analysis Services supports the XML for Analysis standard as the underlying communication protocol. The cube data can be accessed using MDX and LINQ queries. Data mining specific functionality is exposed via the DMX query language. Analysis Services includes various algorithms - Decision trees, clustering algorithm, Naive Bayes algorithm, time series analysis, sequence clustering algorithm, linear and logistic regression analysis, and neural networks - for use in data mining. Reporting Services SQL Server Reporting Services is a report generation environment for data gathered from SQL Server databases. It is administered via a web interface. Reporting services features a web services interface to support the development of custom reporting applications. Reports are created as RDL files. Reports can be designed using recent versions of Microsoft Visual Studio (Visual Studio.NET 2003, 2005, and 2008) with Business Intelligence Development Studio, installed or with the included Report Builder. Once created, RDL files can be rendered in a variety of formats including Excel, PDF, CSV, XML, TIFF (and other image formats), and HTML Web Archive. Notification Services Originally introduced as a post-release add-on for SQL Server 2000, Notification Services was bundled as part of the Microsoft SQL Server platform for the first and only time with SQL Server 2005. SQL Server Notification Services is a mechanism for generating data-driven notifications, which are sent to Notification Services subscribers. A subscriber registers for a specific event or transaction (which is registered on the database server as a trigger) : when the event occurs, Notification Services can use one of three methods to send a message to the subscriber informing about the occurrence of the event. These methods include SMTP, SOAP, or by writing to a file in the filesystem. Notification Services was discontinued by Microsoft with the release of SQL Server 2008 in August 2008, and is no longer an officially supported component of the SQL Server database platform.

Integration Services SQL Server Integration Services is used to integrate data from different data sources. It is used for the ETL capabilities for SQL Server for data warehousing needs. Integration Services includes GUI tools to build data extraction workflows integration various functionality such as extracting data from various sources, querying data, transforming data including aggregating, duplication and merging data, and then loading the transformed data onto other sources, or sending e-mails detailing the status of the operation as defined by the user. Full Text Search Service SQL Server Full Text Search service is a specialized indexing and querying service for unstructured text stored in SQL Server databases. The full text search index can be created on any column with character based text data. It allows for words to be searched for in the text columns. While it can be performed with the SQL LIKE operator, using SQL Server Full Text Search service can be more efficient.

SQL Server Full Text Search service architecture


Full Text Search allows for inexact matching of the source string, indicated by a Rank value which can range from 0 to 1000 - a higher rank means a more accurate match. It also allows linguistic matching ("inflectional search"), i.e. linguistic variants of a word (such as a verb in a different tense) will also be a match for a given word (but with a lower rank than an exact match). Proximity searches are also supported, i.e., if the words searched for do not occur in the sequence they are specified in the query but are near each other, they are also considered a match. T-SQL exposes special operators that can be used to access the FTS capabilities. The Full Text Search engine is divided into two processes - the Filter Daemon process (msftefd.exe) and the Search process (msftesql.exe). These processes interact

with the SQL Server. The Search process includes the indexer (that creates the full text indexes) and the full text query processor. The indexer scans through text columns in the database. It can also index through binary columns, and use iFilters to extract meaningful text from the binary blob (for example, when a Microsoft Word document is stored as an unstructured binary file in a database). The iFilters are hosted by the Filter Daemon process. Once the text is extracted, the Filter Daemon process breaks it up into a sequence of words and hands it over to the indexer. With the remaining words, an inverted index is created, associating each word with the columns they were found in. SQL Server itself includes a Gatherer component that monitors changes to tables and invokes the indexer in case of updates. When a full text query is received by the SQL Server query processor, it is handed over to the FTS query processor in the Search process. The FTS query processor breaks up the query into the constituent words, filters out the noise words, and uses an inbuilt thesaurus to find out the linguistic variants for each word. The words are then queried against the inverted index and a rank of their accurateness is computed. The results are returned to the client via the SQL Server process. SQL Server Management Studio SQL Server Management Studio is a GUI tool included with SQL Server 2005 and later for configuring, managing, and administering all components within Microsoft SQL Server. The tool includes both script editors and graphical tools that work with objects and features of the server.SQL Server Management Studio replaces Enterprise Manager as the primary management interface for Microsoft SQL Server since SQL Server 2005. A version of SQL Server Management Studio is also available for SQL Server Express Edition, for which it is known as SQL Server Management Studio Express (SSMSE). SQL Server Management Studio is a software application first launched with the Microsoft SQL Server 2005 that is used for configuring, managing, and administering all components within Microsoft SQL Server. The tool includes both script editors and graphical tools which work with objects and features of the server. A central feature of SQL Server Management Studio is the Object Explorer, which allows the user to browse, select, and act upon any of the objects within the server. It can be used to visually observe and analyze query plans and optimize the database performance, among others.SQL Server Management Studio can also be used to create a new database, alter any existing database schema by adding or modifying tables and indexes, or analyze performance. It includes the query windows which provide a GUI based interface to write and execute queries. Business Intelligence Development Studio Business Intelligence Development Studio (BIDS) is the IDE from Microsoft used for developing data analysis and Business Intelligence solutions utilizing the Microsoft SQL Server Analysis Services, Reporting Services and Integration Services. It is based on the Microsoft Visual Studio development environment but is customized with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports (using Reporting Services), ETL dataflows, OLAP cubes and data mining structures (using Analysis Services).

4.2

Stage-wise Model Development and Flowchart

DATABASE DESIGN 1. PATIENT DATA Field Name Patient_id Age Sex Chest Pain Type Fasting Blood Sugar Restecg Exang Data Type Int(10) Int(10) Int(10) Int(10) Int(10) Int(10) Int(10) Allow Null No No No No No No No

Slope CA Thal Trest Blood Pressure Serum Cholesterol Thalach Oldpeak

Int(10) Int(10) Int(10) Int(10) Int(10) Int(10) Int(10)

No No No No No No No

2. USER DATA

Field Name User Name Password

Data Type Varchar(25) Varchar(25)

Allow Null No No

DATASET :

4.3

Installation Stages

1. Run the Intelligent Heart Disease Prediction Setup as shown below & click on Next button.

2. Select the Installation Folder, where you want to install the setup & click on Next

button.

3. Check for the Disk requirements and choose the appropriate drive to install the software. Click on OK.

4. Click on Next button on Confirm Installation window to confirm the installation.

5. After clicking on the Next button it will run the Setup project as follows:

6. Select the Close button to finish the setup & setup will be installed successfully.

4.4

Implementation Stages

1. Splash Screen

2. Login Form

3. Prediction Form

4. Result

Chapter 5 : Results and Discussion


5.1 Nave Bayesian Algorithm

The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data. Bayes' Theorem finds the probability of an event occurring given the probability of another event that has already occurred. If B represents the dependent event and A represents the prior event, Bayes' theorem can be stated as follows : Consider a supervised learning problem in which we wish to approximate an unknown target function f : X Y or equivalently P(YX). Naive Bayes makes the assumption that each predictor is conditionally independent of the others. For a given target value, the distribution of each predictor is independent of the other predictors. In practice, this assumption of independence, even when violated, does not degrade the model's predictive accuracy significantly, and makes the difference between a fast, computationally feasible algorithm and an intractable one. Sometimes the distribution of a given predictor is clearly not representative of the larger population. For example, there might be only a few customers under 21 in the training data, but in fact there are many customers in this age group in the wider custom.The Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales linearly with the number of predictors and rows. The build process for Naive Bayes is parallelized.Naive Bayes can be used for both binary and multiclass classification problems. The MSE of an estimator with respect to the estimated parameter is defined as

The MSE is equal to the sum of the variance and the squared bias of the estimator

The MSE thus assesses the quality of an estimator in terms of its variation and unbiasedness. Since MSE is an expectation, it is not a random variable. It may be a function of the unknown parameter , but it does not depend on any random quantities. However, when MSE is computed for a particular estimator of the true value of which is not known, it will be subject to estimation error. In a Bayesian sense, this means that there are cases in which it may be treated as a random variable.

Consider the following set of Training Data :


Heart
I ag se che Fasti Heart_r Exa Rest Blood_pre pred Ser Slo C Th Ol D e x st ng ate ng ecg ssure ict um pe A al d 2 67 1 4 160 286 0 2 108 1 1 2 3 3 3

Heart
I ag se che Fasti Heart_r Exa Rest Blood_pre pred Ser Slo C Th Ol D e x st ng ate ng ecg ssure ict um pe A al d 4 41 0 5 56 0 6 65 0 7 50 1 8 43 1 9 34 1 1 60 1 0 1 54 0 1 4 4 2 2 4 1 4 3 130 140 150 150 150 118 145 135 204 294 225 243 247 182 282 304 0 0 0 0 0 0 0 1 2 2 2 2 0 2 2 0 172 153 114 128 171 174 142 170 1 1 0 0 0 0 1 0 1 1 1 2 1 0 2 0 1 0 2 0 2 3 2 0 1 0 1 0 2 2 1 0 3 3 7 7 3 3 7 3 0 0 4 4 0 0 2 0

Probability (yes/M) =0.514 Probability (yes/F) =0.3414 Probability (yes/Chest pain=4) =0.818 Probability (yes/Fasting sugar>120) =0.234 Probability (yes/heart rate>200) =0.3414 Probability (yes / exang=1) =0.1014 Probability (yes/old peak=1) =0.3414 Probability (yes/M) =0.514 Probability (yes/age>60) =0.454 Probability (yes/restecg=2) =0.514 Suppose new tupple appear with attribute t =(64.0,0.0,2.0,140.0,294.0,0.0,2.0,153.0,0.0,1.3,2.0,0.0,3.0,0) P (t/yes)=0.454*0.3414*0.1014*0.5414*0.234*0.515 =0.4529 Since probability is less than 50%, and minimum Confidence is 50 %, it will be classified as No.

Attribute Discrimination
Attributes chest old Thal Slope Slope chest Values 4 0 3 2 1 3 Favors 0 79.101 44.115 31.582 29.208 29.000 Favors All other states 100.000

Thal old old Serum Serum chest Thal chest

7 3 2 <1 >= 1 2 6 1

22.666 19.116 15.465 13.450 13.450 12.995 4.149 2.062

Attribute Profile
Attributes Size chest chest chest chest chest old old old old old old Serum Serum Serum Slope Slope Slope Slope Thal Thal Thal Thal States 4 3 2 1 Missing 0 1 2 3 4 Missing <1 >= 1 Missing 1 2 3 Missing 3 7 6 Missing Population (All) 176 85 46 30 15 0 88 34 22 21 11 0 95 81 0 84 80 12 0 93 71 12 0 1 65 0.815 0.092 0.062 0.031 0.000 0.200 0.246 0.231 0.231 0.092 0.000 0.385 0.615 0.000 0.277 0.662 0.062 0.000 0.292 0.585 0.123 0.000 0 111 0.288 0.360 0.234 0.117 0.000 0.676 0.162 0.063 0.054 0.045 0.000 0.631 0.369 0.000 0.595 0.333 0.072 0.000 0.667 0.297 0.036 0.000 Missing 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

PERFORMANCE ANALYSIS :

Heart_Bayes Partition Index 1 2 3 4 5 6 7 8 9 10

Partition Size 2 2 2 2 2 2 2 2 2 2

Test Classification Classification Classification Classification Classification Classification Classification Classification Classification Classification

Measure Pass Pass Pass Pass Pass Pass Pass Pass Pass Pass Average Standard Deviation Fail Fail Fail Fail Fail Fail Fail Fail Fail Fail Average Standard Deviation Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Average Standard Deviation

Value 1 2 1 1 1 1 1 1 1 1 1.1 0.3 1 0.000e+000 1 1 1 1 1 1 1 1 0.9 0.3 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 2.107e-008

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Classification Classification Classification Classification Classification Classification Classification Classification Classification Classification

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

Lift Lift Lift Lift Lift Lift Lift Lift Lift Lift Average Standard Deviation Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Average Standard Deviation

-0.4055 -1.0986 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4748 0.2079 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 NaN

5.2 Decision Tree Algorithm


A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall under incomplete knowledge, a decision tree should be paralleled by a Probability

model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probabilities.

ID3 (Iterative Dichotomiser 3)

The main idea behind the ID3 algorithm are: Each non-leaf node of a decision tree corresponds to an input attribute, and each arc to a possible value of that attribute. A leaf node corresponds to the expected value of the output attribute when the input attributes are described by the path from the root node to that leaf node. In a good decision tree, each non-leaf node should correspond to the input attribute which is the most informative about the output attribute amongst all the input attributes not yet considered in the path from the root node to that node. This is because we would like to predict the output attribute using the smallest possible number of questions on average. Entropy Entropy is used to determine how informative a particular input attribute is about the output attribute for a subset of the training data. Entropy is a measure of uncertainty in communication systems introduced by Shannon (1948). It is fundamental in modern information theory.

For the same dataset as in the above example Let us first try to split on age attribute For age > 60 Entropy (4F, 5M) = - (4/9) log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy (3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = 0.9710

Gain (A) =E(current Set) -

summation(of all set)

Gain (age>= 60) = 0.9911 (4/9 * 0.8113 + 5/9 * 0.9710) = 0.0911 Similarly,

We have Gain (Chest pain=4) =0.515, Gain(Blood pressure>150)=0.423. Gain (Fast sugar>150) =0.0895,Gain(Heart rate>230)=0.0034. Gain(exang=0)=0.234, Gain(old peak=0)=0.0023 Since Chest- pain has the highest Information gain, the decision tree will start with chest pain.

PERFORMANCE ANALYSIS :

Heart_trees Partition Index 1 2 3 4 5 6 7 8 9 10

Partition Size 2 2 2 2 2 2 2 2 2 2

Test Classification Classification Classification Classification Classification Classification Classification Classification Classification Classification

Measure Pass Pass Pass Pass Pass Pass Pass Pass Pass Pass Average Standard Deviation Fail Fail Fail Fail Fail Fail Fail Fail Fail

Value 1 1 1 1 1 1 1 1 1 1 1.1 0.3 1 0.000e+000 1 1 1 1 1 1 1

1 2 3 4 5 6 7 8 9

2 2 2 2 2 2 2 2 2

Classification Classification Classification Classification Classification Classification Classification Classification Classification

10

Classification

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2 3 4 5 6 7 8 9

2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

Fail Average Standard Deviation Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Average Standard Deviation Lift Lift Lift Lift Lift Lift Lift Lift Lift Lift Average Standard Deviation Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean

1 0.9 0.3 -0.6949 -0.6931 -0.6949 -0.6949 -0.6949 -0.6949 -0.6949 -0.6949 -0.6949 -0.6949 -0.6947 0.0005 -0.0017 -0.6931 -0.0017 -0.0017 -0.0017 -0.0017 -0.0017 -0.0017 -0.0017 -0.0017 -0.0709 0.2074 0.4706 0.5 0.4706 0.4706 0.4706 0.4706 0.4706 0.4706 0.4706

10

Likelihood

Square Error Root Mean Square Error Average Standard Deviation

0.4706 0.4735 0.0088

5.3

Neural Network

An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Modern neural networks are non-linear statistical data modeling tools. They are usually used to model complex relationships between inputs and outputs or to find patterns in data. Artificial neural networks are algorithms that can be used to perform nonlinear statistical modeling and provide a new alternative to logistic regression, the most commonly used method for developing predictive models for dichotomous outcomes in medicine. Neural networks offer a number of advantages, including requiring less formal statistical training, ability to implicitly detect complex nonlinear relationships between dependent and independent variables, ability to detect all possible interactions between predictor variables, and the availability of multiple training algorithms. Disadvantages include its "black box" nature, greater computational burden, proneness to overfitting, and the empirical nature of model development. An overview of the features of neural networks and logistic regression is presented, and the advantages and disadvantages of using this modeling technique are discussed. Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism that 'learns' from observed data. However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential.

Choice of model: This will depend on the data representation and the application. Overly complex models tend to lead to problems with learning. Learning algorithm: There are numerous trade-offs between learning algorithms. Almost any algorithm will work well with the correct hyperparameters for training on a particular fixed data set. However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation. Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust.

With the correct implementation, ANNs can be used naturally in online learning and large data set applications. Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware. Theoretical and computational neuroscience is the field concerned with the theoretical analysis and computational modeling of biological neural systems. Since neural systems are intimately related to cognitive processes and behavior, the field is closely related to cognitive and behavioral modeling.The aim of the field is to create models of biological neural systems in order to understand how biological systems work. To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning (biological neural network models) and theory (statistical learning theory and information theory). Backpropagation is a common method of training artificial neural networks so as to minimize the objective function. Backpropagation is an iterative process that can often take a great deal of time to complete. When multicore computers are used multithreaded techniques can greatly decrease the amount of time that backpropagation takes to converge. If batching is being used, it is relatively simple to adapt the backpropagation algorithm to operate in a multithreaded manner. ATTRIBUTE PROFILE : Attribute Thal Thal Thal Slope Slope Slope sex sex age age age age age age old old old old Heart_rate Heart_rate Value 7 6 3 2 1 3 0 1 77 40 51 70 62 38 1 3 2 0 204 - 230 >= 286 Favors 0 Favors 1 25.49 16 15.98 23.97 20.96 12.71 4.05 1.7 100 76.46 65.24 43.99 41.48 39.31 11.16 10.71 7.43 4.86 12.04 8.57

Heart_rate Heart_rate Heart_rate Fasting Fasting Fasting Fasting Fasting Exang Exang chest chest chest chest CA CA CA CA Blood_pressure Blood_pressure Blood_pressure Blood_pressure Blood_pressure Blood_pressure Blood_pressure

254 - 286 230 - 254 < 204 132 - 144 >= 144 120 - 125 < 120 125 - 132 1 0 2 4 3 1 2 3 1 0 111 120 95 128 137 145 136

7.7 5.56 0.52 23.46 14.21 10.91 2.27 1.5 1.04 0.04 37.91 34.02 27.15 15.85 9.87 9.84 8.01 1.12 98.68 96.93 84.58 79.69 78.59 75.72 75.54

PERFORMANCE ANALYSIS :

Heart_Neural Partition Index 1 2 3 4 5 6 7 8 9

Partition Size 2 2 2 2 2 2 2 2 2

Test Classification Classification Classification Classification Classification Classification Classification Classification Classification

Measure Pass Pass Pass Pass Pass Pass Pass Pass Pass

Value 1 2 1 1 1 1 1 1 1

10

Classification

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Classification Classification Classification Classification Classification Classification Classification Classification Classification Classification

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2 3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

1 2

2 2

Likelihood Likelihood

Pass Average Standard Deviation Fail Fail Fail Fail Fail Fail Fail Fail Fail Fail Average Standard Deviation Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Log Score Average Standard Deviation Lift Lift Lift Lift Lift Lift Lift Lift Lift Lift Average Standard Deviation Root Mean Square Error Root Mean Square Error

1 1.1 0.3 1 0.000e+000 1 1 1 1 1 1 1 1 0.9 0.3 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 -1.0986 2.107e-008 -0.4055 -1.0986 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4055 -0.4748 0.2079 0.6667 0.6667

3 4 5 6 7 8 9 10

2 2 2 2 2 2 2 2

Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood

Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Root Mean Square Error Average Standard Deviation

0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 0.6667 NaN

S-ar putea să vă placă și