Documente Academic
Documente Profesional
Documente Cultură
Agenda
Challenges of DWH testing Planning for DWH tests Tester skills for DWH testing Basic ETL verifications Defects you can expect to find Testing tools identified
DWH -- Definition
A data warehouse or enterprise data warehouse (DW, DWH, or EDW) is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
Source, Wikipedia.org, 2013
Wayne Yaddow, 2013 3
Data completeness
Data transformations Data quality Performance and scalability Integration testing
User-acceptance testing
Regression testing
Wayne Yaddow, 2013 10
13
14
15
16
17
19
Valuable Books
20
21
Data Profiling
Column / attribute / field profiling provides
statistical measurements associated with:
frequency distribution of data values number of records number of null (i.e., blank) values data types (e.g., integers, characters) field length unique values patterns in the data
Wayne Yaddow, 2013 22
23
Valuable Book
25
29
30
Testing Automation Informaticas Data Validation Option (DVO) RTTS QuerySurge Analytics Tools J statistics, visualization, data manipulation Perl data manipulation, scripting R statistics
31
32
34
1. 2. 3. 4. 5. 6.
Need analysis of a.) source data quality and b.) data field profiles before input to Informatica and other data-build services. QA should participate in all data model and data mapping reviews. Need complete review of ETL error logs and resolution of errors by ETL teams before DB turn-over to QA. Early use of QC during ETL and stored procedure testing to target vulnerable process areas. Substantially improved documentation of PL/SQL stored procedures. QA needs dev or separate environment for early data testing. QA should be able to modify data in order to perform negative tests. (QA currently does only positive tests because the application and data base tests work in parallel in the same environment.) Need substantially enhanced verification of target tables after each ETL load before data turn-over to QA. Need mandatory maintenance of data models and source to target mapping / transformation rules documents from elaboration until transition. Investments in more Informatica and off-the-shelf data quality analysis tools for pre and post ETL.
7. 8. 9.
10. Investments in automated DB regression test tools and training to support frequent data loads.
Wayne Yaddow, 2013 35
36
37