Sunteți pe pagina 1din 6

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

AN AUTOMATED TEST DATA GENERATION USING GENETIC ALGORITHM

ABSTRACT

Mawarny binti Md. Rejab 1 , Rohaida Romli 2 , Nooraini Yusof 3 College of Art and Science, Universiti Utara Malaysia, Kedah, Malaysia

1

mawarny@uum.edu.my, 2 aida@uum.edu.my,

3

e-mail:

nooraini@uum.edu my

Software testing is an activity aims to evaluate quality attributes or define capability of a program to meet the required results. Software testing is cost consuming demanding high allocation of budget in a software development. This is due to several processes are required in the software testing including error debugging, program restructuring and validation of output with different set of test data. More effort and time have been spent in selecting test data, whereby sometimes this needs to be done manually. Nevertheless, the cost can be reduced if these processes are implemented automatically. Thus, automated test data generation has been introduced to automate a process of creating program input that fulfilled testing criteria. Through the years, a number of different approaches have been proposed for generating test data. The efficiency and effectiveness of data generation can be improved when using genetic algorithm compared to other conventional search algorithms. Due to the ability of genetic algorithm in searching optimal solution with certain constraints and requirements, genetic algorithm-based automatic data generation has gained interest from many researchers. Thus, this paper presents an automated test data generation using genetic algorithm due to the ability of genetic algorithm in obtaining an optimum set of test data automatically. The automated test data generation focuses on providing a set of test data in evaluating the correctness of Java programming assignments. The initial set of data is obtained by using a random method and an equivalence partitioning technique is used to generate a set of possible test data from the randomly obtained set. Then, the combinatorial approach is used to create possible combinations set of test data. Afterward, Genetic Algorithm is applied as an optimization approach to select the optimum set of test data.

Keywords:

Test Data, Test Data Generation, Genetic Algorithm.

1.0

INTRODUCTION

Software testing is an activity aims to evaluate quality attributes or define capability of a program to meet the required results. It is conducted to presence of defects or error and provides more assurance for the software quality. Software testing is very labor-intensive and expensive, it accounts for approximately 50% of the cost of a software development [14]. In order to reduce the high cost of software development, various automated testing tools are proposed by several researchers and practitioners. IBM rational robot, mercury winrunner, ranorex, qarun, and testpartner are some examples of the available automated tools and are becoming accepted in software industry. Software testing is a broad term encompassing a wide spectrum of different activities, from the testing of a small piece of code by the developer (unit testing), to the customer validation of a large information system (acceptance testing), to the monitoring at run-time of a network-centric service-oriented application [2]. This study focuses on testing a small piece of code to ensure the correctness of a program by executing it with different test data. Test data is a subset of elements chosen for using in software testing process and consists of representatives set of inputs [4]. It contains a sample of every category of valid and possible invalid data conditions. The correctness of the behavior of the program is evaluated by using several test data.

However, one of the most difficult in software testing is the test data generation. More effort and time have been spent in selecting test data, whereby sometimes this needs to be done manually. Test data generation is defined as a process of preparing test data for testing the validity and quality of software output and it must as well satisfy the pre-defined criteria of a testing [16]. This is the crucial part and cost consuming in a software testing [4]. Test data generation requires relevant and optimum set of data in fine tuning tested software from possible errors and any unexpected mistakes that may exist, at the same time taking into accounts time and number of data required. Generating test data manually is extremely time consuming due to the tedious process, especially for the complex program. Besides, how well a series of test data generation will determine the reliability of the testing process. Therefore, numerous attempts have been made to automate the test data generation. Thus, this paper is organized as follows. First section highlights the automated test data generation. The second section focuses on automated test data generation using genetic algorithm. Then, next section describes the prototype overview and the last part focuses on the development of prototype using genetic algorithm.

©Informatics '09, UM 2009

RDT1 - 7

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

2.0 AUTOMATED TEST DATA GENERATION

Through the years, a number of different approaches have been proposed for generating test data. Test data generation can be categorized into three methods, namely random test data generation, path-oriented test data generation and goal-oriented test data generation [5]. Random test data generation is the simplest and easy to apply in generating test data. Input values are generated randomly until the selected statement is reached. However, this method does not perform well in term of high coverage. Due to rely on probability, there is a low finding semantically small faults. Thus, it just reveals a small percentage of the program input. Thus, it is considered to have the lowest acceptance rate.

Path-oriented approach is a process of selecting a set of paths that covers all the statements satisfying a given criterion and then generating input data to each defined path [11][19]. However, this approach is not guaranteed that every path in program will be exercised if the selected path is satisfying the criterion. Thus, errors in the program flow of control may remain undetected. So, it requires a stronger criterion in order to cover all branches [19]. The path-oriented approach is probably best suited for program with a small number of paths the selected node [5]. Two methods have been proposed to find input data in order to execute the selected path, namely symbolic execution and execution-oriented test data generation.

In contrast, the goal-oriented approach of test data generation is different compared to the path-oriented test data generation. The main objective of the goal-oriented approach is to focus on the part of program that effects the execution of the selected statements and ignore any part of the program that does not influence the execution of the statement [11]. Thus, this approach leads to the identification of input values for selected branches in a program are executed. Two methods have been identified as an extension of the goal-oriented approach, namely chaining approach and assertion-oriented approach. The defined approaches have been automated by integrating them with other methods or techniques. The goal-oriented technique can be automated by using genetic algorithm to search for test data that is satisfied test requirements [17].

Thus, several tools or system prototypes have been proposed by several researchers and practitioners. Casegen is a one of test data generation system that has been implemented as part of the fortran automated code evaluation system [19]. It was designed and built to generate test data automatically for testing fortran program by using symbolic execution technique. Besides, testgen was introduced as a test data generation system for pascal program [10]. This system was developed using the chaining approach as an extension of execution-oriented methods of test data generation. Besides, SELECT system has been generated to assist the formal systematic debugging of programs [3]. SELECT systematically handled the paths of program written in lisp subset by using symbolic execution. SELECT appeared as a successful tool in automatically finding useful test data. It has been constructed with construction of input data constraints to cover selected program paths and automatic determination of actual input data to drive the test program through selected paths. This system is similar to a system called Effigy that has been developed by king and his colleagues at IBM research. Effigy is an interactive symbolic execution system for testing and debugging programs written in a simple pl/i style programming language [9]. This system used symbolic execution approach to generate test data and it was determined by dependency of the program’s control flow on its input. A new automated and combinatorial software testing toll called Jtst has been developed by several researchers from Universiti Sains Malaysia (USM). Jtst is a tool for customizing test data generation based on combinatorial approach. The tool focuses on performing automated black-box testing without considering the behavioral specification [7].

In a program testing, preparing data satisfying the testing condition is not an easy task. It is a time-consuming process especially when dealing with complex program [12]. Nevertheless, with available methods and tools, automated data generation make the task easier. However, best techniques in obtaining optimum test data are still of research interests. There is a need to a technique that is able to produce a set of data more accurately fulfilling the testing criteria. Thus, genetic algorithm is used in automating test data generation due to the ability of genetic algorithm in obtaining an optimum set of test data.

3.0 AUTOMATED TEST DATA GENERATION USING GENETIC ALGORITHM

The efficiency and effectiveness of data generation can be improved when using genetic algorithm compared to other conventional search algorithms [18][12][17]. Due to the ability of Genetic Algorithm in searching optimal solution with certain constraints and requirements, Genetic Algorithm based automatic data generation has gained interest from many researchers. One of the early works on application of Genetic Algorithm in data generation was done by integrating the algorithm in optimization and used real program as an evaluation function in searching process [18]. This method produced input data that was able to bring optimum program output under tested. They concluded that Genetic Algorithm-based technique is able to speed up the searching process by eliminating some weak branches that were also traced in sequential traverse method.

©Informatics '09, UM 2009

RDT1 - 8

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

In addition, Genetic Algorithm-based implements the algorithm in a fuzzy logic based program controlling temperature that comprised of 210 C lines with 35 conditional rules or constraints [12] The study successfully generated test cases that fulfilled those condition decisions. The finding also indicated that the Genetic Algorithm based data generation outperformed the random method that, 33 percent of data generated by the random method failed to satisfy the program requirements. A program known GenerateData was developed to generate data test automatically for some identified programming tasks using genetic algorithm [17]. The experimental results support the findings by others that have found Genetic Algorithm-based algorithms outperform the random methods. Furthermore, the study also found that, GenerateData was able to produce relevant data test even the program under tested were modified.

As an improvement to single use of Genetic Algorithm in searching optimizations, researches then shifted to combining Genetic Algorithm as well as with other techniques [13]. The findings revealed that the combined Genetic Algorithm method achieved highest performance in general by producing data test with wider coverage of program requirements. Instead of focusing on the use of Genetic Algorithm in data generation, and parallel to the growth of machine leaning and software engineering procedures, researches in the related fields then concentrating on the improvement of Genetic Algorithm and also combination with other machine learning techniques.

For instance, a novel work on combination of Genetic Algorithm and formal concept analysis for automatic test data generation was done [8]. They developed a generator known as genet that produces test data for branch coverage and takes a simpler approach than previous Genetic Algorithm-based automatic test generators but exhibits similar behavior in general and also is programming language independent. Genetic Algorithm in genet is used to search for tests and formal concept analysis to organize the relationships between tests and their execution traces. genet learns relationships between branches that provides useful insights for test selection and maintenance, finding a minimal test set, analysis of test failures and understanding of a program’s dynamic control flow.

Meanwhile, a combining the parallel search ability of the adaptive Genetic Algorithm (aGA) with the controllable jumping property of Simulated Annealing (SA) enabled the use of a kind of SAaGA hybrid meta-heuristic algorithm for automatic software test data generation [6]. Experimental results based on some benchmark programs showed that SAaGA is quite flexible with satisfactory results, and require fewer running time than aGA and SA. Considering path coverage as the test adequacy criterion, a Genetic Algorithm can be used for automating the generation of test data for white-box testing based [1]. The main aim is to overcome an inefficiency problem encountered in covering multiple target paths. They have designed a Genetic Algorithm-based test data generator that is, in one run, able to synthesize multiple test data to cover multiple target paths. From implementation of a set of variations of the generator, the experimental results showed that the developed test data generator is more efficient and more effective than others.

4.0 PROTOTYPE OVERVIEW

JTestGen is developed to provide a formal mechanism in automating the process of test data generation for testing numeric data of Java program. The prototype can serve as a useful tool that assists lecturer to mark student’s programming assignment. It also provides less user involvement and understanding on written program by student. In addition, it is definitely useful for lecturer in providing the optimum set of test data that fulfilled the defined testing criterion with no of necessity to be expert or fully understanding the technique of designing test cases. JTestGen is also developed as a tool aids to improve the process of evaluating correctness of the Java programming assignment which was proposed one researcher [20]. Figure 1 depicts the main interface of JTestGen. JTestGen enables the lecturer, who is the main user of JTestGen to set the input specifications by indicating the number of input variables, data type and category of data. After pressed “Generate Data” button, the sequence of reproduction new generations of population will be displayed in the text area on the interface. Based on the input specifications, JTestGen will generate a set of individual test data from randomly obtained list by using an equivalence partitioning. The initial set of data is obtained by using a random method and an equivalence partitioning technique is used to generate a set of possible test data from the randomly obtained set. The combinatorial approach is then used to create possible combinations set of test data. Then, Genetic Algorithm is applied as an optimization approach to select the optimum set of test data.

©Informatics '09, UM 2009

RDT1 - 9

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

International Conference on Informatics and Technology, 2009 Fig. 1: JTestGen Interface 5.0 DEVELOPMENT OF PROTOTYPE

Fig. 1: JTestGen Interface

5.0 DEVELOPMENT OF PROTOTYPE USING GENETIC ALGORITHM

The initial set of test data is designed by using one of the black-box testing which is known an equivalence partitioning technique. In the equivalence partitioning, the input domain is divided into classes of data by representing a set of valid and invalid states. The set of test data that will be generated is certainly based on the input specifications. Generally, the test data is divided into the following categories:

a) Valid test data

b) Invalid test data

c) Illegal test data

The overall process of the test data generation is depicted in figure 2. Genetic Algorithm is used in this study to control the permutation by selecting the optimum data set [15]. Genetic Algorithm commonly known as GA, is one of Artificial Intelligence techniques, grouped under evolutionary computation that simulates the process of natural evolution (e.g. biological chromosomes, X and Y) that include selection, mutation and reproduction. As one of the evolutionary computing branches, Genetic Algorithm solves problem by optimizing combination of variables given a set of constraints. It uses natural selection and genetics-inspired techniques known as crossover and mutation. Due to incorporating Genetic Algorithm in selecting an optimum set of test data, each test data in the cluster is assigned with an appropriate fitness value. The fitness value is selected based on the cluster of the test data. Each cluster represents a critical level of those data in the testing process. Two types of testing have been conducted namely positive and negative testing. The positive testing emphasizes on testing that fulfills program requirement, whilst the negative testing is focused on testing that might produce unexpected results or errors. However, due to ensuring the testing process can be done in more effective way, the design of test cases should reflect on both of positive and negative testing to fully cover any possible circumstances.

©Informatics '09, UM 2009

RDT1 - 10

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

Design scheme of test case
Design
scheme
of test
case

Random selection of test data (initial population)

case Random selection of test data (initial population) Program Specification Random generation of individual test
Program Specification
Program
Specification

Random generation of individual test data

Optimum set of test data Selection of data using GA
Optimum set
of test data
Selection of data
using GA
data Optimum set of test data Selection of data using GA Initial set of test data
data Optimum set of test data Selection of data using GA Initial set of test data
Initial set of test data
Initial set of test
data

Generation of the combination of test data (combinatorial approach)

Fig. 2 : The Overall Process of test Data Generation

The individual of test data is generated by using a random selection technique. The number of test data to be generated is basically relies on the number of input variables obtained from the program specification. The second level of test data is then generated as a set of combinations of test data by using a combinatorial approach. The number of combinations of test data is certainly relies on the generated individual of test data. This approach is implemented repetitively according to the number of input variables. After generating the combination of test data, Genetic Algorithm is then applied to control the variation of data by selecting the optimum set of test data that fulfilled the defined testing criterion or program specification. Before Genetic Algorithm can be applied, a random technique is used over again to randomly select some of the test data from the collection of combination results as a set of initial population that comprising chromosomes of size N. Then, two chromosomes from the population (parents) are selected to make a crossover. A mutation process is randomly performed to some of the chromosomes in the population. Mutation is rarely done in nature to represent a change in the gene [14]. The mutation process flips a randomly selected gene in a chromosome. Mutation can occur at any gene in a chromosome with some probability, probably in the range of 0.001 and 0.01. The Genetic Algorithm process will continue by reproducing some new generations of population until the termination criteria is eventually met. Typically, when all chromosomes of the population produce the same total of fitness value, the termination criteria (or optimization process) can be assumed as met.

6.0

CONCLUSION

The prototype was developed to provide provides a formal mechanism in automating the process of test data generation for testing the correctness of basic Java programming assignments. The integration of Genetic Algorithm as a control mechanism in selecting optimum set of relevant data not just reducing the time but also enhancing the efficiency of data selection that fulfill the defined testing criterion. Therefore, this may improve the existing test data generation technique that sometimes does not meet all testing requirements.

REFERENCES

[1]

A.Ahmed, Moataz & Hermadi, I., “GA-based multiple paths test data generator”, Computers & Operations Research, pp. 3107-3124, 2008.

[2]

Bertolino, A.,”Software testing Research: Achievements, Challenges and Dreams”, Future of Software Engineering. IEEE, pp. 85-103, 2007.

[3]

Boyer, R.S., Elspas B. & Levitt K.N.,”SELECT-A formal System for testing and debugging programs by symbolic execution”, ACM SIGPLAN Notics,1975.

©Informatics '09, UM 2009

RDT1 - 11

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

[4]

Chu, h. D., dodson, j. E. & liu, i. C., fast-a framework for automating statistic-based testing. Available at: http://citeseer.nj.nec.com/73306.html, 1997.

[5]

Ferguson, R. & Korel, B., “The Chaining Approach for Software Test Data Generation”, ACM Transactions on Software Engineering and Methodology, 5(1), 63-86,1996.

[6]

Gao, H., Feng, B. & Zhu, L., ”A kind of SaaGA Hybrid Meta-heuristic Algorithm for the Automatic Test Data Generation”, IEEE,Vol 1. pp. 111-114, 2005.

[7]

Kamal, Z.Z, Norashidi M.I, Mahamed Fadel J.K & Siti Norbaya A.,”A tool for automated test Data Generation (and Execution) based on Combinatiorial approach”, 1 (1), pp. 19-36, 2007.

[8]

Khor, S. & Grogono, P., “Using a Genetic Algorithm and Formal Concept Analysis to Generate Branch Coverage Test Data Automatically”, Proceedings of the 19th International Conference on Automated Software Engineering (ASE’04), IEEE, 2004.

[9]

King, J.C., “A new approach to program testing”, ACM SIGPLAN Notices, pp. 10(6), 228-233, 1975.

[10]

Korel, B., ”Automated Test Data Generation for programs with Procedures.”, ACM SIGSOFT Software Engineering. 21, pp.209-215, 1996.

[11]

Korel, B. & Ali, M.A.,”Assertion-Oriented Automated Test Data Generation”, Proceedings of the 18 th International Conference on Software Engineering, pp. 71-80, 1996.

[12]

Michael, C. C., McGraw, G. E., Schatz, M. A. & Walton, C.C. (1997). Genetic Algorithms for Dynamic Test Data Generation. IEEE. 307-308.

[13]

McGraw, G, Michael, C & Schatz. (1998). Generating Software Test Data by Evolution. Technical Report RSTR-018-97-01, RST Corporation, Sterling.

[14]

Myers, G.J., The Art of Software Testing. New York: John Wiley and Sons, 1979.

[15]

Negnevitsky, M. , Artificial Intelligence: A Guide to Intelligent Systems. Addison-Wesley, Pearson Education Limited, Essex, England, 2002.

[16]

Offutt, A. J., Clark, J., Zhang, T. & Tewary, “Experiments with Data Flow and Mutation Testing”. Retrieved April, 12, 2007, from http://citeseer.nj.nec.com/offutt94experiments.html, 1997

[17]

Pargas, R.P., Harrold, M.J. && Peck, R.R., “Test Data Generation using Genetic Algorithms”, Journal of Software Testing, Verification and Reliability 9, pp. 263-282, 1999.

[18]

Pei, M, Goodman, E.D, Gao, Z & Zhong, K., “Automated Software Test Data Generation Using A Genetic Algorthim”, Michigan State University, 1994.

[19]

Ramamoorthy, C.V., “The Automated Generation of Program Test data”, IEEE Transactions on Software Engineering. 2 (4), pp. 293-300, 1976.

[20]

Rohaida, R., Cik Fazilah, H. & Mazni, O.,“Correctness Assessment Of Java Programming Assignment“, Laporan Akhir Geran Penyelidikan Fakulti, Universiti Utara Malaysia, 2004.

BIOGRAPHY

Mawarny binti Md. Rejab obtained her Master of Computer Science from Universiti Teknologi Malaysia in 2003. Currently, she is a lecturer at the College of Arts & Sciences (Information Technology), Universiti Utara Malaysia. Her research areas include program analysis, software metrics, and software testing. She has published a number of papers related to these areas.

Rohaida binti Romli is a lecturer at College of Arts & Sciences (Information Technology), Universiti Utara Malaysia. Her research areas include software testing, program analysis, software metrics and software quality. Currently, she is doing her phd at Universiti Sains Malaysia and focuses on test data generation.

Nooraini binti Yusof is a lecturer at College of Arts & Sciences (Information Technology), Universiti Utara Malaysia. Her research area focuses more on Artificial Intelligence including neural networks, agents, genetic algorithm. She has published a number of papers related to these areas.

©Informatics '09, UM 2009

RDT1 - 12