Sunteți pe pagina 1din 8

Performance evaluation of Web Information Retrieval Systems and its application to e-Business

Fidel Cacheda, Angel Via Departament of Information and Comunications Technologies Facultad de Informtica, University of A Corua Campus de Elvia s/n, 15071 A CORUA, SPAIN Telephone: +34-981-167000 e-mail: {fidel, avc}@udc.es
Abstract. The evaluation of traditional IR systems is performed in an ideal situation, without workload in the IR system, however the Web IR systems are under different workload levels through the time. Therefore, the performance evaluation must be done considering several workload environments. For this purpose, we have designed and developed USim, a tool for the performance evaluation of Web IR systems based on the simulation of users behaviour. This work is based on several previous theoretical analyses of the users behaviour while using Web search systems. This simulation tool helps in the performance evaluation of the Web IR systems in two different ways: comparing the response times for different search algorithms or engines (calculating the response times for each search engine) and estimating the saturation threshold of the system. The latter point is especially important for guaranteeing an appropriate quality of service to all the users, and therefore, its applications to the e-Business of the IR system are quite significant during the whole life of the system. NOTE: This work has been partially supported by the CICYT (TIC2001-0547).

Introduction

Once an Information Retrieval (IR) system has been designed and developed it is necessary a performance evaluation of the system. The type of evaluation considered depends on the objectives of the retrieval system, but the most common measures are time and space. The faster is the response time and the shorter the storage space used, the best is the IR system. The performance evaluation is the process that obtains the time and space measures. Despite the core of the present article is the performance evaluation of IR systems in the World Wide Web, it is important to mention other measures that constitute the retrieval performance evaluation. These measures study the quality of the documents retrieved by the system, and the most significant are recall and precision [7], although other new measures have appeared in order to contribute with new perspectives. Also, the IR systems in the Web have lead to the development of some specific quality measures for these systems. On the other side, the performance evaluation of an IR system consists in estimating the time of the search process. Kobayashi exposes that there is a relationship among the speed (or the response time), recall and precision [10]. Zobel in [11] describes the guidelines for evaluating the performance and comparing several indexing and retrieval techniques. The main criteria for comparison are the following ones: the extensibility of the type of queries performed, the scalability of the system (measured using the volume of disk space), the response time for the search process

(which is perhaps the single crucial test for an indexing scheme [11]), disk space used by the index data structures, CPU time, disk traffic and memory requirements. Nevertheless, the response time is not an easily estimable parameter because it depends on many other parameters: CPU speed, disk speed, system workload, etc. The same problem arises in the Web IR systems, and gets complicated because these systems must operate under different workload situations, especially with a high number of concurrent users. In fact, in the WEB-TREC one of the measures to obtain is the response time of the different requests sent to the systems [6]. However, the response times are computed without any workload on the system, which is an ideal situation, and as we will prove it can lead to erroneous conclusions. Consequently, it is necessary to evaluate the performance of an IR system in the Web considering different workload situations, and not only the ideal case of a null workload. At this point there are two possible ways of generating a real environment with different workloads. The first one would be to use real users to perform the queries, which should generate different workloads for large periods of time, however this options seems quite arduous. And the second one consists in the use of the simulation for reproducing the behaviour of the users of a Web IR system. In this paper we will explore how simulation can be used to evaluate the performance of Web IR systems. This paper is structured as follows. It starts with the description of the goals, followed by the analysis of the users behaviour of Internet search engines. Section four describes the design, implementation and operation of USim, the simulation tool proposed, and the next one details its usage in the performance evaluation of Web IR systems. Finally, the main conclusions are exposed. 2 Objectives

The main objective followed by this work is the design and development of a simulation tool for the performance evaluation of Web IR systems. This tool attempts to reproduce the behaviour of the users of an Internet search tool, therefore it is essential to determine how Web users use search engines. For this purpose, some previous published works about this subject must be considered. Once the behaviour is fixed, the simulation tool can be implemented. The main aspects to consider for the development of the tool for evaluating the performance are the following ones: Firstly, the implementation should consider any type of Web IR system: spiders, Web directories or meta-searchers. All these search tools have a common search process, although Web directories can retrieve information through the browsing process. These differences must be taken into consideration due to the repercussion in the system load. The tool must determine the response times for the different retrieval processes analysed. The response times usually would be estimated in local environments, though an estimation through Internet must also be possible. From the result pages obtained for the retrieval processes, the tool must acquire the parameters of the answer: number of results retrieved, number of results showed, etc. To compare different algorithms for search engines is the final objective of this tool. Therefore, the two previous points are critical because the evaluation will be accomplished using the response time parameter and also determining the importance of other parameters in the response time.

Analysis of the users behaviour of Web search engines

Recently there have been some studies that examine the behaviour of Web users while they are using an Internet search engine or a directory. The main intention of these studies is to demonstrate that Web users differ significantly from users of traditional Information Retrieval systems. The first study was performed by Kirsch who, at the beginning of 98, presented some search statistics of Infoseek usage in [3]. A bit later, Jansen et al. presented a study of queries, sessions and searched terms obtained in the query logs of Excite [2]. Silverstein et al. examined a very large number of queries taken from Altavista logs in [1], studying not only the queries but also the correlations among them. But none of these studies analysed if this new and different behaviour could fit any mathematical distribution. For this purpose, our research group carried out a more specific and detailed research around the Web IR users, which is extensively covered in [4] and [5]. In these works it is proved that the queries performed, the categories browsed (in case of a Web directory) and the documents visited fit an Exponential distribution (with mean values named search, cats and docs). Moreover it is also demonstrated that there is a linear relationship among these three variables; this permits estimate the number of categories and documents visited starting by the searches performed by the users. 4 USim: a performance evaluation tool

Using our previous studies and the research accomplished by Jansen, Silverstein and Kirsch, we have designed and developed a tool that simulates the users behaviour of a Web IR system. This simulation tool is named USim (Users Simulator), and it has been designed to replace the real users during the development of IR systems for Internet. USim should play the role of the users, performing searches, browsing categories and visiting documents. 4.1 Design and implementation This tool is composed of three main parts associated with the three types of requests: searches, categories and documents. The conclusions obtained in the previous section establish the moment when the request should be sent. As we mentioned before, the three requests fit an Exponential distribution, which characterizes the time between two consecutive events. In this way, using the distribution function of the Exponential distribution and using the inversion method, the simulation process is trivial. This simulation process will decide when the request (search, category or document) will be sent to the retrieval system, but it is necessary also to decide what will be requested. This problem is solved in different ways depending on the type of request. In case of a search, in [1] it is demonstrated that the search strings used by the users dont fit the Zipf law [12]. Consequently, a mathematical model cannot be used and therefore we decided to use the empirical distribution. This distribution consists of 26,654 different search strings with their respective frequency, which was obtained during the analysis described in the previous section. But there are also other parameters that characterize a search, such as: interval retrieved (1-10, 11-20, ), number of results retrieved (10, 20, 30, ), information retrieved (only title, title and description, ) and so on. In [4] and [5] it is established that the high majority of the users dont change the default values in any parameter, except the interval retrieved. Therefore, in the simulation tool, an empirical distribution of the intervals retrieved is used for the simulation process. If the request sent to the system is a browsed category the best way to obtain the

categories to be retrieved is through the whole simulation process, because there is no logical or mathematical distribution that fits the categories retrieved by the users. Thus, the simulation tool uses initially the root category, and then all the categories identifications obtained either searching or browsing are registered and stored in a cache. Each category has a finite life in the cache (commonly, the life of a typical users session). When a category identification is needed, one is selected randomly from this cache. For the documents requested, also during the simulation process the document identifications are retrieved and stored in a cache of documents. In every search and browsed category, the identifications of the result documents are obtained, ranked (the first documents retrieved are considered more relevant and therefore, with a higher probability) and stored. When a document identification is needed, one is selected randomly from this cache. For each type of request (searches, categories and documents) a module was defined. These three modules operate concurrently, sending requests to the retrieval system and storing the results obtained. The final objective of this tool is to measure the response times of a Web IR system. Therefore, for each request sent to the server, the following information is stored in a file for a subsequent analysis: Date and time when the answer is receive from the retrieval system. The identification of the request, which depends on the type of request: the search string or the category or document identification. The time since the request was sent until the response was received (the document HTML was completely received), which constitutes the response time. The number of images included in the answer. The additional time needed to download the images. Moreover, some types of requests need the result Web page to be analysed in order to obtain some extra information from the retrieval system. If the request processed is a search or a category, the following information is also stored. The number of categories included in the result page. The total number of documents retrieved in the search or in the category. The number of documents showed in the result page. The position of the first document showed in the result page. All this information is stored in three different text files, one for each type of request, for a next analysis using statistical applications. 4.2 Operation USim was designed using the object oriented methodology and completely developed in the object oriented language Java, in order to build a multiplatform application and facilitate its operation in any environment. In this section we briefly describe the main characteristics of the functionality of this simulation tool, with the objective of speed up the understanding and comprehension of the performance evaluation situations described in the next section. Firstly it is important to mention that USim can operate with a user interface or in batch mode, using proprietary configuration files. The Figure 1 shows the graphic interface used to configure the general parameters that are related to the whole simulation process. In this part the user can determine the length of the simulation and the life in cache of the category and document identifications gathered by USim during the simulation process. The whole configuration can be stored in order to use the simulation tool in the batch mode, without the user interface. When the simulation starts the application checks, which types of requests will be sent to

Figure 1: General configuration for USim

Figure 2: Searches configuration for USim

the IR system, and each module is started independently. In this way, USim can be used with the main types of IR system: Web directories and search engines. If the system analysed is a Web directory USim must send searches and accesses to categories, meanwhile if the system analysed is a search engine only searches must be sent. The module of accesses to documents is included because some retrieval systems include a middle page between the search results and the final document, which also increases the load of the retrieval system. The remaining of the user interface is used to configure the parameters of each type of request. These parameters are very similar, so we only describe search configuration (see Figure 2). The main search parameters are the number of searches per minute (which is equivalent to the search of the Exponential distribution associated) and the URL of the search system. Also, the names of the parameters needed to perform the search, commonly the search string, the number of results to obtain and the position of the first result. The simulation tool will generate the values associated to these parameters and invoke the search engine passing these parameters and the generated values, using either a GET or POST method. The search file is a text file used to introduce the empirical distribution of the search strings that will be used by USim. And finally, the output file will store all the information obtained for all the requests of this type. It is important to mention that the value of search is not static but it can dynamically change during the simulation using the parameters Increase in and every minute. This will help in the evaluation of the Web IR systems under different workloads using only one simulation process. The configuration of categories and documents is quite similar. In this case, the values of cats and docs are automatically calculated using its linear relationship with search, and the user can directly modify the rest of the parameters. 5 Performance evaluation

USim is a simulation tool used for the performance evaluation of any Web IR system. The performance of a retrieval system can be measured in mainly two different ways using this simulation tool: estimating the maximum number of users the system is able to support, and estimating the response times of the system and comparing different search algorithms. 5.1 Saturation threshold When a retrieval system is put into the Web, one of its critical measures is the maximum

400000

70

60 300000 50

200000

40

Response time (ms)

30

100000

20

10 0 5.00 10.00 12.00 14.00 16.00 18.00 20.00 21.00 23.00 24.00 11.00 13.00 15.00 17.00 19.00 21.00 22.00 24.00 25.00 8.00

Errors

0 5.00 7.00 9.00 11.00 13.00 15.00 17.00 19.00 21.00 23.00 25.00

Searches per minute

Searches per minute

Figure 3: Response time vs. searches per minute

Figure 4: System errors vs. searches per minute

number of requests that it will support in a minute. It is evident that starting from a certain threshold, the performance of the system decreases suddenly, increasing the response times. This point is named saturation threshold. To establish the saturation threshold is fundamental, because in this case some preventive actions can be taken (such as, application management techniques). In this case, USim can easily simulate the effect of many simultaneous users in a controlled environment, over the retrieval system. For this purpose, an experiment was designed where the saturation threshold must be measured for a prototype of Web IR system installed in an Ultra Enterprise 250, with one 300 MHz CPU and 768 MB of main memory. For the estimation of the saturation threshold, only one simulation is prepared starting with 5 searches per minute (and 4.1 browsed categories per minute and 7.5 viewed documents per minute). The initial value of search is increased every 10 minutes in 1 search per minute (and also the equivalent increase is performed in the cats and docs values). The results are showed in Figure 3 and Figure 4. The first graph (see Figure 3) present the response times of the searches performed to the system through the whole simulation. The image is quite clear; approximately above 21 searches per minute the response times start augmenting rapidly. At this point, every new query that is requested to the retrieval system will increase the load of the system and worsen the situation. This condition will stop when the number of requests per minute decreases. Figure 4 illustrates the number of errors pages returned by the system and obviously, the retrieval system operates perfectly until the number of searches is superior to 21 requests per minute, confirming that the saturation threshold is 21 searches per minute. 5.2 Comparison of retrieval systems and response time estimation One of the main goals of the performance evaluation of any IR system is the measure of the response times of the search engine, especially for the comparison of different search systems in order to distinguish the real improvements obtained. The Web IR systems are under different workload levels through the time, starting with periods with a low workload until situation of high workload or even saturation. Obviously, the performance of the system depends on the workload of each moment. Therefore, the performance evaluation must be done considering different workload situations, using USim, in order to elaborate a more complete and representative study. Next, we describe a real situation where the response times for three different search algorithms (named model A, B and C) are measured and compared. Each of these search algorithms have been developed and installed in an Ultra Enterprise 250, with one 300

30000

30000

25000

25000

20000

20000

15000

15000

Response time (ms)

10000 Model A 5000 Model B 0 0 50 150 300 500 750 1000 2000 Model C

Response time (ms)

10000 Model A 5000 Model B 0 50 150 300 500 750 1000 2000 Model C

Number of results

Number of results

Figure 5: Response time vs. number of results (low workload)

Figure 6: Response time vs. number of results (high workload)

MHz CPU and 768 MB of main memory. The performance evaluation is performed over five different workload situations: null (0 searches/minute), low (5 searches/minute), medium (12 searches/minute), high (19 searches/minute) and saturation (23 searches/minute) and the respective values for the browsed categories and visited documents. The performance evaluation is to simulate a real environment using USim, considering the workloads described. Initially, a stabilization period is established and then the response times of the retrieval system are measured. Therefore, the evaluation is composed of two processes. On the one side, USim generates a workload in the retrieval system. On the other side, several queries are performed over the system measuring the response times (in our case, an application is used to send the requests and receive the responses). We show in Figure 5 and Figure 6 the results obtained for the comparison of the three algorithms. All the experiments where analysed using the ANOVA test in order to determine if the factors number of results and type of search engine were relevant (obviously, the number of results is a relevant parameter, but it must be also considered). Figure 5 represents a low workload situation and it is clear that models B and C perform much better that model A. In fact, if the query gets more than 500 results, the response times are reduced in a 50% in models B and C, versus model A. Null and medium workloads situations are equivalent to this one. But the situation changes in Figure 6 where a high workload is generate in the retrieval system. In this case, the most relevant aspect is that the performance of model B is deteriorated, until having a behaviour similar to the model A. Consequently, in a high workload environment, only the model C offers an improvement in the response times (of approximately a 50%) versus the models A and B. The situation of saturation is not described here, because its ANOVA test shows that the selected parameters explain a small part of the variation in the response times (approximately the 50%) and therefore its results are not suitable. This experiment demonstrates the importance of considering the workload in any IR system, and specifically in Web IR systems. Initially, models B and C behaved in a similar way, performing better than the model A. But, in the end, we have found out that only the model C is able to keep the improvement in the performance in all the circumstances, whereas model B decreases its performance in high workload situations. 6 Industrial benefits and conclusions

This paper describes USim, a tool for the performance evaluation of Web IR systems based

on the simulation of users behaviour. This work is based on several previous theoretical analyses of the users behaviour while using search engines, specially the research performed by Jansen et al. in [1], Silverstein et al. in [2], Kirsch in [3] and ourselves in [4] and [5]. Traditionally the evaluation of the indexing and retrieval techniques is performed in an ideal situation, without workload in the IR system, which can lead to erroneous conclusions as we showed in the previous section. However, Web IR systems are under different workload levels through the time and the performance of the system depends on the workload of each moment. Therefore, the performance evaluation must be done considering several workload environments. For this purpose, we have designed and developed USim. This simulation tool helps in the performance evaluation of Web IR systems in two different ways: estimating the saturation threshold of the system and in the comparison of different search algorithms or engines, through the measure of the response times. The latter point is quite interesting from the research point of view, in order to determine the improvements obtained with each search algorithm in all the workload situations. But the former point is more interesting from the industrial point of view for the administrators of a Web IR system. The industrial significance of the saturation threshold is determined because it will estimate the maximum number of users supported by the system before degrading the performance. The industrial benefits of the saturation threshold are related with the quality of service provided by the IR systems, because the response time must be stable and independent of the number of users connected. Moreover, it is essential to determine how close it is the saturation threshold to the current environment, because some actions can be taken (such as, application management techniques or update the hardware of the system) in order to avoid a drop in the performance of the system and a loss of the quality of service offered to the users. 7 References

[1] B. Jansen, A. Spink, J. Bateman, T. Saracevic, Real Life Information Retrieval: A Study Of User Queries On The Web. SIGIR FORUM Spring 98. [2] C. Silverstein, M. Henzinger, H. Marais, M. Moricz. Analysis of a Very Large Web Search Engine Query Log. SIGIR FORUM Fall 99. [3] S. Kirsch, Infoseeks experiences searching the Internet. SIGIR FORUM Fall 98. [4] F. Cacheda, A. Via, Understanding how people use search engines: a statistical analysis for eBusiness. e-2001: e-Business and e-Work Conference 2001, ISBN: 1-58603-205-4 (IOS Press), pp. 319-325. [5] F. Cacheda, A. Via, Experiencies retrieving information in the World Wide Web. 6th IEEE Symposium on Computers and Communications, ISBNs: 0-7695-1177-5, 0-7695-1178-3 (case), 0-76951179-1 (microfiche), pp. 72-79, 2001. [6] D. Hawking, N. Craswell, P. Thistlewaite, D. Harman, Results and challenges in Web search evaluation. The 8th World Wide Web Conference, pp. 243-252, May, 1999. [7] R. Baeza-Yates, B. Ribeiro-Neto, Retrieval Evaluationl. In R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, chapyter 3, pp 73-97. Addison Wesley, ISBN 0-201-39829-X, 1999. [8] M. Agosti, M. Melucci, Information Retrieval on the Web. In M. Agosti, F. Crestani, G. Pasi, editors, Lectures on Information Retrieval: Third European Summer-School, ESSIR 2000. Revised Lectures, Springer-Verlag, Berlin/Heidelberg, 2001, pp. 242-285. [9] S. Lowley, The evaluation of WWW search engines. Journal of Documentation, vol. 56, no. 2, pp. 190211, 2000. [10] M. Kobayashi, K. Takeda, Information Retrieval on the Web. ACM Computing Surveys, vol. 32, no. 2, pp. 144-173, Junio, 2000. [11] J. Zobel, A. Moffat, K. Ramamohanarao, Guidelines for Presentation and Comparison of Indexing Techniques. ACM SIGMOD Record, vol. 25, no. 3, pp. 10-15, Septiembre, 1996. [12] G. Zipf, Human behaviour and the principle of least effort. Ed. Addison-Wesley, 1949.

S-ar putea să vă placă și