Sunteți pe pagina 1din 4

COMPANY NAME - PROJECT NAME

DATA STAGING CHECKLIST C16-1

Data Staging Checklist


Preliminary Work
0Set up a header format and comment fields for your code 1Hold structured design reviews early enough to allow changes 2Write clean, well-commented code 3Enforce naming standards 4Use the code library and management system 5Test everythingboth unit testing and system testing 6Document everythinghopefully in the information catalog

Step 1. High-Level Plan


7Create a very high-level, one-page schematic of the source-to-target flow 8Identify starting and ending points 9Label known data sources 10Include placeholders for sources yet to be determined 11Label targets 12Include notes about known gotchas

Step 2. Data Staging Tools


13Test, choose, and implement a data staging tool

Step 3. Detailed Plan


14Drill down by target table, graphically sketching any complex data restructuring or transformations 15Graphically illustrate the surrogate-key generation process 16Develop a preliminary job sequencing

Step 4. Populate a Simple Dimension Table


17Static dimension extract 18Creating and moving the result set 19Data compression 20Data encryption 21Static dimension transformation 22Simple data transformations 23Surrogate key assignment 24Combining from separate sources 25Validating one-to-one and one-to-many relationships 26Load 27Bulk loader 28Turn off logging 29Pre-sort the file

Last saved on 3/26/99

The Data Warehouse Lifecycle Toolkit

Page 1

COMPANY NAME - PROJECT NAME

DATA STAGING CHECKLIST C16-1

30Transform with caution 31Aggregations 32Use the bulk loader to perform within-database inserts 33Truncate target table before full refresh 34Index management 35Drop and re-index 36Keep indexes in place 37Maintaining dimension tables 38Warehouse-based maintenance 39Source system based maintenance

Step 5. Implement Dimension Change Logic


40Use surrogate keys 41Dimension table extracts 42Copy entire current master file 43Pull only changed rows source system change flag 44Processing slowly changing dimensions 45Type 1: Overwrite 46Type 2: Create a new dimension record 47Type 3: Push down the changed value into an old attribute field 48Dimension table transformation and load

Step 6. Populate Remaining Dimensions


49Repeat steps 4 & 5 for each remaining dimension

Step 7. Historical Load of Atomic-Level Facts


50Historic fact table extracts 51Capture audit statistics 52Fact table processing 53Fact table surrogate key lookup 54Ensure proper handling of nulls 55Improving fact table content 56Data restructuring 57Data mining transformations 58Flag normal, abnormal, out of bounds, or impossible facts 59Recognize random or noise values from context and mask out 60Apply a uniform treatment to null values 61Flag fact records with changed status 62Classify an individual record by one of its aggregates 63Divide data into training, test, and evaluation sets 64Add computed fields as inputs or targets 65Map continuous values into ranges

Last saved on 3/26/99

The Data Warehouse Lifecycle Toolkit

Page 2

COMPANY NAME - PROJECT NAME

DATA STAGING CHECKLIST C16-1

66Normalize values between 0 and 1 67Convert from textual to numeric or numeral category 68Emphasize the unusual case abnormally to drive recognition

Step 8. Incremental Fact Table Staging


69Incremental fact table extracts 70New transactions 71Updated transactions 72Database logs 73Replication 74Incremental fact table load 75Speeding up the load cycle 76More frequent loading 77Partitioned files and indexes 78Parallel processing 79Multiple load steps 80Parallel execution 81Parallel databases 82Parallel tables

Step 9. Aggregate Table and MOLAP Loads


83Build aggregates 84Maintain aggregates 85Prepare MOLAP loads

Step 10. Warehouse Operation and Automation


86Typical operational functions 87Job definitionflow and dependency 88Job schedulingtime and event based 89Monitoring 90Logging 91Exception handling 92Error handling 93Notification 94Determine job control approach 95Record extract metadata 96Record operations metadata 97Ensure data quality 98Set up archiving in the data staging area 99Develop disk space management procedures

Typical Job Schedule


100Extract dimensionswrite out metadata

Last saved on 3/26/99

The Data Warehouse Lifecycle Toolkit

Page 3

COMPANY NAME - PROJECT NAME

DATA STAGING CHECKLIST C16-1

101Extract factswrite out metadata 102Process dimensions 103Surrogate key/slowly changing processing/key lookup, etc. 104Data quality checkswrite out metadata 105Process facts 106Surrogate key lookupRI checkwrite out failed records 107Data transformations 108Process aggregates 109Load dimensions into base level warehouse (dimensions first if RI is enforced) 110Load facts 111Load aggregates 112Review load processvalidate load against metadata 113Change pointers or switch instance for high uptime (24 x 7), or parallel load warehouses 114Extract and load (or notify) downstream data marts (and other systems) 115Change metadata as needed (e.g., Period table attributescurrent month) 116Write job metadata 117Review job logs, verify successful load cycle

Last saved on 3/26/99

The Data Warehouse Lifecycle Toolkit

Page 4

S-ar putea să vă placă și