Documente Academic
Documente Profesional
Documente Cultură
Neveen ElGamal
Faculty of Computers and Information,
Cairo University,
Giza, Egypt
+201002585680
n.elgamal@fci-cu.edu.eg
Supervised by:
Ali ElBastawissy
Galal Galal-Edeen
alibasta@fci-cu.edu.eg
Galal@acm.org
Thesis State: Middle
ABSTRACT
During the development of the data warehouse (DW), too much
data is transformed, integrated, structured, cleansed, and
grouped in a single structure that is the DW. These various types
of changes could lead to data corruption or data manipulation.
Therefore, DW testing is a very critical stage in the DW
development process.
A number of attempts were made to describe how the testing
process should take place in the DW environment. In this paper,
I will state briefly these testing approaches, and then a proposed
matrix will be used to evaluate and compare these approaches.
Afterwards, I will highlight the weakness points that exist in the
available DW testing approaches. Finally, I will describe how I
will fill the gap in the DW testing in my PhD by developing a
DW Testing Framework presenting briefly its architecture.
Then, I will state the scope of work that I am planning to
address and what type of limitations that exist in this area that I
am expecting to experience. In the end, I will conclude my
work and state possible future work in the field of DW testing.
1.
1.1
1.
2.
3.
4.
5.
6.
7.
INTRODUCTION
8.
9.
1.2
Figure 1.
DW System Architecture
2.
RELATED WORK
2.1
DW Testing Approaches
Unit testing,
2.
Integration Testing,
3.
System Testing,
4.
5.
Security Testing,
6.
Regression Testing,
7.
Performance Testing,
TABLE 1:
2.2
DW Testing Matrices
DW Testing Matrices
Schema Data
DSODS
Backend
ODSDW
DWDM
Frontend DMUI
Operation
2.3
Comparison and Evaluation of
Surveyed Approaches
After studying how each proposed DW testing approach
addressed the DW testing and according to the DW testing
matrices defined in the previous section, a comparison matrix
is presented in table 3 showing the test routines that each
approach covers. The DW testing approaches are represented
on the columns, the what and where dimensions classify
the test routines on the rows. The intersection of rows and
columns indicates the coverage of the test routine in this
approach where represents full coverage and
represents partial coverage. Finally, the when dimension
that indicates whether this test takes place before or after
system delivery is represented by color highlighting the tests
which take place after the system delivery, while the tests
that take place during the system development or when the
system is subject to change are left without color
highlighting.
We were able to compare only 10 approaches, as not enough
data was available for the rest of the approaches.
As it is obvious in table 3, none of the proposed approaches
addressed the entire DW testing matrices. This is simply
because each approach addressed the DW testing process
from its own point of view without leaning on any standard
or general framework. Some of the attempts considered only
parts of the DW framework shown in figure 1. Other
attempts used their own framework for the DW environment
according to the case they are addressing. For example; [8]
used a DW architecture that does not include either an ODS
or DW Layers. The data is loaded from the Data Sources to
the Data Marts directly. This architecture makes the Data
Marts layer acts as both the DW and the Data Mart
interchangeably. Other approaches like [1, 14, 15] did not
include the ODS layer.
From another perspective, there are some test routines that
are not addressed by any approach like; Data Quality factors
like accuracy, precision, continuity, etc... Some major
components of the DW were not tested by any of the
proposed approaches which is the DM Schema and the
additivity of measures in the DMs.
3.
REQUIREMENTS FOR A DW
TESTING FRAMEWORK
Based on carefully studying the available DW testing
approaches using the DW testing matrices presented
previously to analyze, compare and evaluate them, it is
evident that the DW environment lacks the following:
1. The existence of a generic, well defined, DW testing
approach that could be used in any project. This is
because each approach presented its testing techniques
based on its DW architecture which limits the
reusability of the approach in other DW projects with
different DWs architectures.
2. None of the existing approaches included all the test
routines needed to guarantee a high quality DW after
delivery.
Schema
Data
Test
Requirement testing
User Requirements coverage
ODS Logical Model
Field Mapping
Data type constraints
Aspects of Transformation rules
Correct Data Selection
Integrity Constraints
Parent-child relationship
Record counts
Duplicate Detection
Threshold Test
Data Boundaries
Data profiling
Random record comparison
Surrogate keys
DWDM
Schema
Schema
Data
DW Conceptual Schema:
DW Logical Model:
Integrity Constraints
Threshold test
Data type constraints
Hierarchy level integrity
Granularity
Derived attributes checking
Record counts
No constants loaded
Null records
Field-to-Field test
Data relationships
Data transformation
Duplicate detection
Value totals
Data boundaries
Quality factors
Compare transformed data with
expected transformation
Data Aggregation
Reversibility of data from DW
to DS
Confirm all fields loaded
Simulate Data Loading
DMUI
DM Schema Design
Calculated Members
Irregular hierarchies
Aggregations
Correct data filters
Additivity Guards
Data
ODSDW
Schema
Test
[1]
[4]
[14]
[3]
[2]
[23]
[15]
[18]
[8]
[13]
ODSDW
Frontend
[1]
[4]
[14]
[3]
[2]
[23]
[15]
[18]
[8]
[13]
Backend
Overall
Schema
3.
4.
A PROPOSED DW TESTING
FRAMEWORK
In my PhD I am planning to develop a DW testing
framework that is generic enough to be used in several DW
projects with different DW architectures. The frameworks
primary goal is to guide the testers through the testing
process by recommending the group of test routines that are
required given the projects customized DW architecture.
The proposed framework is supposed to include definitions
for the test routines. The main target of our research is to
benefit from existing DW testing approaches by adopting the
test routine definitions from available testing approaches and
define test routines that were not addressed or
comprehensively defined by any previous approach.
In our study we will prioritize the test routines according to
importance and impact on the output product, so that the
tester could select the tests that highly affect the quality of
the delivered system if any scheduling or budget limitations
were faced.
Part of the test routine definition should include how the test
routines can be automated or get automatic support if full
5.
THE ARCHITECTURE OF THE
PROPOSED FRAMEWORK
This section will present the architecture of the framework
proposed to show the work flow of the framework when it is
put into operation. As Shown in figure 2, the key player of
the DW testing process is the Test Manager who feeds the
system with the DW architecture under test and the current
state of the DW. In other words, which component of the
DW is developed so far. This step is done because we
support the testing through system development in our
proposed framework.
The DW Architecture Analyzer component then studies the
received data and compares it with the dependencies between
test routines from the Test Dependency Graph with the
assistance of the Test Dependency Manager component and
passes the data to the Test Recommender to generate an
Abstract Test Plan.
The process of preparing the Detailed Test plan then splits
into two different directions according to the type of test
routine whether it is a validation test or a verification test. In
case of validation test types, which are the more complicated
testing routines, the Validation Manager involves the
Business Expert(s) and System User(s) in addition to
accessing the relevant data from the system Repository to
prepare the part of the test plan concerns the validation test
routines. For the Verification Test Routines, the Verification
Manager along with the Test Case Generator and Test Data
Generator modules helps in preparing the Detailed Test Plan
of verification test routines.
7.
CONCLUSIONS AND FUTURE
WORK
6.
8.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]