Documente Academic
Documente Profesional
Documente Cultură
Agenda
4
4
Frequent Requirements
&
Data AnalysisParsing AddressMatching Monitoring
De-duplication &
& Discovery and Validation
Standardization
Reporting
6
6
Data
Steward
8
8
Brand
Description
90017
iPod
Product_ID
Brand
Size
Color
Description
90017
IPOD
4GB
Red
OrgName
CountryCode
US Dollar
LegalName
ShortName
CountryCode
Currency
ATT
US
USD
FirstName
Judy
Currency
ContactName
Phone
415.555.1212
MiddleName
LastName
Title
Phone
Dent
Bobs Assistant
+1 (415)555-1212
9
9
Address Validation
Validate or correct addresses for
over 240 countries
Have reference data from
international postal agencies
Validate worldwide data in one
environment
Be continuously maintained with
worldwide post offices and
databases
10
10
Address Validation
Address1
Address2
Address3
Address4
Address5
SUITE 333
HUSTEN
TX
99999
Street
City
County
StateCode
StateName
ZIP
ZIP4
Latitude
Longitude
Houston
Harris
TX
Texas
77024
2005
29.283427
-95.46802
12
12
Description
Size
Price
AP-2199
12 in
27.99
AP2199
Nautical Lamp
12 inch
27.99
PA-2119
Sailoring Lamp
1 Foot
34.99
Name
DOB
Address
City
State Zip
W. S. Harrison II PhD
1/33/1967
E. Hartford
NY
16987
1/3/1967
Easthartford
CT
06987
9/9/99
Hartford East
CT
06987
1/13/1967
HartfordCT
2a Jackson Rd #174978
Hartford
6984
CT
06987-4573
Monitoring Quality
Stakeholders need to be aware
Current quality metrics
Alerts if quality thresholds are not being
met
14
14
Line of business
manager
Zero learning curve for business users to review and track data quality metrics,
enabling data quality for the masses.
15
15
Business Empowerment
Simple-to-use browser-based tools
Designed for the tasks and skills
of business data stewards and
analysts
Purpose-built, web-based UI
for fast ramp-up
Scorecarding & trending
View business, not technical,
representations
Interact with data directly
through profiling, rule
validation, and scorecarding
Business
Manager
Analyst
& Steward
IDQ V9.1
17
17
Release 9 DQ Architecture
Informatica
Analyst
Analyst Service
Model
Repository
Service
Informatica
Developer
Repository
Staging
Informatica
Administrator
Profile
Service
ISP
Mapping
Service
SQL
Service
Runtime
MRS
CMS
Service
Profile
Warehouse
Workflow
Manager
Integration Service
Repository
Service
Mapping
Designer
Repository
Metadata
Manager
MM Warehouse
18
18
Node configuration
MRS can run on one node but is not a highly available service
Multiple MRS Services can run on the same node
If MRS fails, it automatically restarts on the same node
19
19
20
20
21
21
22
22
23
23
Informatica Analyst
24
24
Profiling
Column Profiling
Rule Profiling
Rule creation/editing
Collaboration via
Metadata Bookmarks (URLs)
Profile Comments
Bi-directional Sharing of Rules with
9.1 Developer
Sharing of RTM dictionaries
Mapping Specification
Define business logic that populates a
target table
Configure the sources, target, rules,
filters, and joins to transform the data
25
25
Profiling
View
Profile
Statistics
Value / Pattern
Frequency
Analysis
Drilldown
Analysis
26
26
Apply existing
rule or create
new one.
View Profile
results for Rule
output
27
27
Scorecards
Add Profile
Columns to a
Scorecard
Scorecard
Name, Location
Description.
Set Thresholds
28
28
Scorecards
Run the
Scorecard
View Trend
Chart
29
29
30
30
Included in IDQ V9
OOTB Mappings,
Mapplets, Rules
Address Validation
Database
Subscriptions
Data Quality 9
OOTB RTM
Dictionaries
Fee-Based
Content
Geocoding Database
Subscriptions
Country Packages
31
31
Step 1 - Transliteration
Transliterate
Parse
Analyze
Verify
Correct
Format
Enrich
with
Geo
Codes
63
105 52
GREECE
ATHINAS 63
105 52 ATHINA
GREECE
Transliteration
32
32
Step 2 Parsing
Transliterate
Parse
Analyze
Verify
Correct
Format
Enrich
with
Geo
Codes
Parsing
33
33
Step 3 Correction
Transliterate
Parse
Analyze
Verify
Correct
Format
Enrich
with
Geo
Codes
Correction
34
34
Step 4 - Formatting
Transliterate
Parse
Analyze
Verify
Correct
Format
Enrich
with
Geo
Codes
Micros-Fidelio House
6-8 The Grove
Slough
SL1 1QP
Great Britain
Europadamm 2-6
41460 Neuss
Germany
63
105 52
GREECE
Step 5 Enrichment
Transliterate
Parse
Analyze
Verify
Correct
Format
Enrich
with
Geo
Codes
Input address
Geo Coordinates
Latitude: 39.173427
Longitude: -76.801280
Enrichment
36
36
37
37
Bigram Distance
Longer multi token text strings
Hamming Distance
Numeric, date & code
Edit Distance
Strings of arbitrary length
38
38
Sample Rules
Category Name
Rule Type
Examples
Noise Word
Word is Deleted
Word is Deleted
Word is Deleted
Nickname Replace
Word is Replaced
Secondary Lookup
39
39
Material
Shelf
Weight
Quantity
Packaging
DB
DB Biscs 9M 650g US
9M
0.676KG
20
Box
DB
DB Biscuits 9M 650g US
8M
0.667KG
20
Boxes
EDIT
BIGRAM
HAMMING
0.878
0.5
HAMMING HAMMING
0.6
EDIT
0.6
Weights
0.763
Define the threshold that must be met before records will
be output as a possible match
40
40
Classic Matching
Identity Matching
41
41
Cluster ID
Cluster Size
Sequence ID
42
42
PowerCenter Integration
43
43
Scalability
PowerCenter Grid
Connectivity
Batch access
Web Services
DQ as part of ETL process
44
44
45
45
DQ v9.x in PC Upgrade
When upgrading only IDQ to 9.1:
9.1 PC Integration installers need to be run on PC side to allow DQ 9.1
mappings in PWC 9.0.1 / 8.6.1
46
46
47
47
48
48