Introduction

Introduction
Software cost estimation is the forecasting about the amount of effort required to make a
quality software system in appropriate time manner.
The accuracy of the cost estimation of software projects is very necessary for the software
companies. And for the forecasting of software cost, it is very important to select the correct
software cost estimation techniques.
Software fault prediction is the process of finding defective modules in the software.
Software fault prediction method used to improve the quality of software. Defective module
leads to decrease the customer satisfaction and over budget of the software product.
Defective software poses considerable risk by increasing the development and maintenance
costs and customer dissatisfaction
Tracking the fault as early as possible in software development process will not only improve
the effective cost but also helps to achieve customer satisfaction and reliability of software
developed. It is beneficial to predict the faults because it helps in estimating test effort,
reducing cost and developing a high quality and reliable software
Data mining is the process of exploration and analysis of large data, so that meaningful
pattern and rules can be discovered. The objective of data mining is to design and work
efficiently with large data sets. Data Mining is the process of analyzing data from different
perspectives and summarizing the results as useful information.
Data collected from multiple sources is integrated into a single data storage called as target
data.
Data mining is a crucial step in which intelligent algorithm/techniques are applied to extract
meaningful pattern or rules. Finally, those patterns and rules are interpreted to new or useful
knowledge or information
1
Motivation
Software fault prediction in software engineering is one of the most interesting research
fields. To optimize the cost and time for any software has major issues in any software
organization. The cost and time has increased while the fault/faults were detected in later
phases. If the mechanism to classify the given software is faulty or non faulty in the early
phase of the software, then it is very useful for software developers to reduce the over
budgeting problems and extra time computation problems.
Software fault prediction techniques available in literature have been applied to find bugs at
later stages of development cycle. The cost is massive in last stage as compared to starting in
software development. If it is found that the predicted software is faulty or non faulty before
the testing and maintenance phase, then organization will be capable to minimize them
respectively which will increase the quality.
Objectives
 To reduce the over budgeting problem of the software.

 Software fault prediction is an essential activity in software development to make
software more efficient, economic and also produce quality software.
 To improve various quality and monitoring assurance of software.
 To identify error software modules in stipulated time frame.
2
Review of the previous work
1. Early Software Defect Prediction: A Systematic Map and Review
(Rana O¬ zakõncõ , Aycüa Tarhan ) [1]
Abstract: Software defect prediction is a trending research topic and a wide variety of the
published papers focus on coding phase or after. A limited number of papers, however, include
the prior (early) phases of the software development lifecycle (SDLC).
Objective: The goal of this study is to obtain a general view of the characteristics and usefulness
of Early Software Defect Prediction (ESDP) models reported in scientific literature.
Method: A systematic mapping and systematic literature review study has been conducted. He
has searched for the studies reported between 2000 and 2016. He reviewed 52 studies and
analyzed the trend and demographics, maturity of state-of-research, in-depth characteristics,
success and benefits of ESDP models.
Results: He has found that categorical models that rely on requirement and design phase metrics,
and few continuous models including metrics from requirements phase are very successful. We
also found that most studies reported qualitative benefits of using ESDP models.
Conclusion: He has highlighted the most preferred prediction methods, metrics, datasets and
performance evaluation methods, as well as the addressed SDLC phases. We expect the results
will be useful for software teams by guiding them to use early predictors effectively in practice,
and for researchers in directing their future efforts.
2. Software Fault Prediction: A Systematic Mapping Study
(Juan Murillo-Morera, Christian Quesada-L´opez2, Marcelo Jenkins) [2]
Objective: Data mining techniques and machine learning studies in the fault prediction software
context are mapped and characterized. He has investigated the metrics and techniques and their
performance according to performance metrics studied. An analysis and synthesis of these
studies is conducted.
Method: A systematic mapping study has been conducted for identifying and aggregating
evidence about software fault prediction.
Results: About 70 studies published from January 2002 to December 2014 were identified. Top
40 studies were selected for analysis, based on the quality criteria results. The main metrics used
were: Halstead, McCabe and LOC (67.14%), Halstead, McCabe and LOC + Object-Oriented
(15.71%), others (17.14%). The main models were: Machine Learning (ML) (47.14%), ML +
Statistical Analysis (31.42%), others (21.41%). The data sets used were: private access (35%)
and public access (65%). The most frequent combination of metrics, models and techniques
were: Halstead, McCabe and LOC + Random Forest, Naïve Bayes, Logistic Regression and
Decision Tree representing the (60%) of the analyzed studies.
Conclusions: This article has identified and classified the performance of the metrics, techniques
and their combinations. This will help researchers to select datasets, metrics and models based on
experimental results, with the objective to generate learning schemes that allow a better
prediction software failure
3
3. Heterogeneous Defect Prediction
(Jaechang Nam , Wei F , Sunghun Kim, Member, Tim Menzies , and Lin Tan ) [3]
Objective: The aim of cross-project defect prediction (CPDP) to predict defects for new
projects lacking in defect data by using prediction models built by other projects. However, most
studies share the same limitations: it requires homogeneous data; i.e., different projects must
describe themselves using the same metrics. This paper presents methods for heterogeneous
defect prediction (HDP) that matches up different metrics in different projects. Metric matching
for HDP requires a “large enough” sample of distributions in the source and target projects—
which raises the question on how large is “large enough” for effective heterogeneous defect
prediction. This paper shows that empirically and theoretically, “large enough” may be very
small indeed.
Related Work on Defect Prediction:
a) CPDP Using Same/Common Metric Sets
b) CPDP Using Heterogeneous Metric Sets
Conclusions: The proposed HDP models are feasible and yield promising results. In addition,
He investigated the lower bounds of the size of source and target datasets for effective transfer
learning in defect prediction. Based on his empirical and mathematical studies, he has shown
categories of data sets were as few as 50 instances are enough to build a defect predictor and
apply HDP. HDP is very promising as it permits potentially all heterogeneous datasets of
software projects to be used for defect prediction on new projects or projects lacking in defect
data. In addition, it may not be limited to defect prediction. This technique can potentially be
applicable to all prediction and recommendation based approaches for software engineering
problems.
4
Problem identification
 To reduce the excess time computation.
 To mitigate the over budgeting cases of software and it releases.
 Defect identification, at earlier stages.
 Defect selection and containment at the earliest.
Methodology
1. Acquiring open source data sets related to software defect.
2. Machine learning and other data mining Techniques like classification may be applied to
reduce the fault.
3. Cost estimation techniques should be applied to predict the reasonable and appropriate
cost.
Fig 1: - A Purposed Methodology of cost Estimation and Fault Prediction using Data
Mining Techniques
5
Pert chart
Start
1/09/2018
Review of
previous work
(15 days)
Problem identification
(25 days)
Methodology and planning

(30 days)
implementation
(60 days)
Result and comparison

(60 days)
analysis
(30 days)
End
6
References
1. Early Software Defect Prediction: A Systematic Map and Review , The Journal of
Systems & Software(2018), Rana O¬ zakõncõ , Aycüa Tarhan
2. Software Fault Prediction: A Systematic Mapping Study Juan Murillo-Morera, Christian
Quesada-L´opez2, Marcelo Jenkins
3. Heterogeneous Defect Prediction Jaechang Nam , Wei Fu, Student Member, IEEE,
Sunghun Kim, Member, IEEE,Tim Menzies , Member, IEEE, and Lin Tan, Member,
IEEE
4. Applying Mining Schemes to Software Fault Prediction: A Proposed Approach Aimed at
Test Cost Reduction A.A. Shahrjooi Haghighi, M. Abbasi Dezfuli, S.M. Fakhrahmad
5. A Survey of Different Software Fault Prediction Using Data Mining Techniques Methods
Karpagavadivu.K1, Maragatham.T2, Dr.Karthik.S3
Sign. Of Student Sign. Of Supervisor

Ashwni Kumar ` Dr. D.L.Gupta
Roll no. 1710204 CSED

Introduction

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Introduction

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction

 To reduce the over budgeting problem of the software.

Methodology and planning

Result and comparison

Sign. Of Student Sign. Of Supervisor

S-ar putea să vă placă și