Documente Academic
Documente Profesional
Documente Cultură
Information Management
Information Management
Agenda
What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family
Information Management
True Appliances
Dedicated device Optimized for purpose Complete solution Fast installation Very easy operation Standard interfaces Low cost
Information Management
A purpose built analytics engine Integrated database, server, data storage Standard interface Low cost of ownership
Speed: 10-100x faster than traditional systems Simplicity: Minimal administration and tuning Scalability: Peta-scale user data capacity Smart: High-performance advanced analytics
Information Management
ETL Developers
No aggregate tables needed simpler ETL logic Faster load and transformation times
Business Analysts
Train of thought analysis 10 to 100x faster True ad hoc queries no tuning, no indexes Ask complex queries against large datasets Lower latency load & query simultaneously OnStream processing by 100s of nodes
Information Management
Integrate
Master Data
Analyze
Big Data
Manage
Data Warehouses
www
Structured Data
Data
Streams
Govern
6
Quality
Lifecycle Management
Information Management
Financial Services
Government
Telecom
Other
7
Information Management
Speed
15,000 users running 800,000+ queries per day 50X faster than before
when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process
- SVP Application Development, Nielsen
Scalability
1 PB on Netezza 7 years of historical data 100-200% annual data growth
NYSE has replaced an Oracle IO relational database with a data warehousing appliance from Netezza, allowing it to conduct rapid searches of 650 terabytes of data.
ComputerWeekly.com
Simplicity
Smart
8
8 2011 IBM Corporation
Information Management
Information Management
10
Information Management
Real $$ Saved
Larger budget allocation for application & asset development Budget shift to strategic, value added activities More visibility within the organization Increased application services with better rates Reduced low end IT oriented services Why Netezza?
Performance matters On Site POCs matter (with 1 DBA vs. 15 DBAs) TCO and ROI matter Results matter The Partner asset matters most
11 2011 IBM Corporation
Information Management
FPGA
CPU
Memory
FPGA
CPU
Lite Host
(IBM xSeries, Red Hat Linux)
BI BI
Memory
Hosts
ETL ETL
FPGA
CPU
Loaders Loaders
Memory
Disk Enclosures
S-Blades
Network Fabric
Applications
Netezza Appliance
12 2011 IBM Corporation
Information Management
FPGA Core
CPU Core
Uncompress
Project
Restrict Visibility
NOTE: There are 96 Disk Drives, 96 FPGA Cores and 96 CPU Cores per rack
13 2011 IBM Corporation Page 13
Information Management
Disk Enclosures
Slice of User Data Swap and Mirror partitions High speed data streaming SQL Compiler Query Plan Optimize Admin
SMP Hosts
Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc.
14
Information Management
Data In / Out
Standard Interfaces
Analytics
ODBC
Data Out
SQL
15
Information Management
Agenda
What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family
16
Information Management
No dbspace/tablespace sizing and configuration No redo/physical log sizing and configuration No journaling/logical log sizing and configuration No page/block sizing and configuration for tables No extent sizing and configuration for tables No Temp space allocation and monitoring No RAID level decisions for dbspaces No logical volume creations of files No integration of OS kernel recommendations No maintenance of OS recommended patch levels No JAD sessions to configure host/network/storage No software to install One simple partitioning strategy: HASH Instead of spending time and effort on tedious DBA tasks, I can use the time for higher BUSINESS VALUE tasks. For example, I can Bring on new applications and groups Quickly build out new data marts Provide more functionality to end users
DBA
17 2011 IBM Corporation
Information Management
Aggregation Tables
Tuning Considerations
Netezza DDL
Create Table - Logical Model CREATE TABLE frt_bill_evnt ( Frt_Bill_Id INTEGER NOT NULL, Load_Id INTEGER, Evnt_Typ_Cd CHAR(3) NOT NULL, Proc_Dt DATE, Adt_Dt DATE, Rcv_Dt DATE, Inv_Dt DATE, Clt_Inv_Dt DATE, Ship_Dt DATE, Rjct_Rsn_Cd CHAR(2), .. Dlvy_Tm INTEGER, Actl_Cmdt_Ds CHAR(30), Carr_Rfrc_Nbr VARCHAR(25)) distribute on hash ( Frt_Bill_Id );
PRIMARY INDEX XPKFRT_BILL_EVNT ( Frt_Bill_Id ) INDEX ( Proc_Dt ) INDEX ( Ship_DtCREATE ) SET TABLE AGG_TISTBL.FRT_BILL_EVNT_WEEK ,NO INDEX ( Pro_NbrFALLBACK ) , INDEX ( Prnt_Cust_Id ) BEFORE JOURNAL, NO INDEX ( Prnt_Cust_Id ,Expr_Dt ) NO AFTER JOURNAL INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id ) ( ,Prnt_Cust_Id ,Expr_Dt ) INDEX ( Evnt_Typ_Cd Ct_Frt_Bill_Id_wk INTEGER NOT NULL, INDEX ( Src_Data_Cd ) Load_Id INTEGER, INDEX XSI_PROC_VALUE ( Proc_Dt ) ORDER BY VALUES ( Proc_Dt ) Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, INDEX XSI_SHIP_VALUE ( Ship_Dt ) ORDER BY VALUES ( Ship_Dt ) Proc_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Week', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Rjct_Rsn_Cd ,Expr_Dt ) 'Freight Bill Audit Week', Adt_Wk DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Frt_Bill_Id ,Seq_Nbr ,Eff_Dt ) Rcv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Week', Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Ship_Dt ,Prnt_Cust_Id ) 'Carrier Invoice Week, DATE FORMAT SET TABLE 'YYYY-MM-DD' TISTBL.FRT_BILL_EVNT TITLE 'Client Invoice Week', ,NO INDEX ( Cust_Id ) Clt_Inv_WkCREATE Ship_Wk DATE FORMAT TITLE 'Freight Bill Ship Week', FALLBACK ,'YYYY-MM-DD' INDEX ( Pymt_Dt ,Pymt_Crnc_Cd ,Chck_Nbr ) Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC; NO BEFORE JOURNAL, INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Carr_Id , NO AFTER JOURNAL Prnt_Cust_Id ,Acct_Nbr ,Trnp_Mode_Cd ,Orig_Id ,Orig_Addr_Id , CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_MONTH ,NO () Dest_Id ,Dest_Addr_Id ,Chck_Nbr FALLBACK , ,Carr_Id Frt_Bill_Id INTEGER NOT NULL,) INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Prnt_Cust_Id ,Chck_Nbr NO ,Prnt_Cust_Id BEFORE JOURNAL, Load_Id INTEGER, INDEX ( Carr_Id ,SCAC ,Plnt_NCS ,Crnc_Cd ) Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT NO AFTER JOURNAL INDEX ( Frt_Bill_Id ,Eff_Dt ) CASESPECIFIC NOT NULL, INDEX XSI_CLT_INV_VALUE ( Clt_Inv_Dt ) ORDER BY VALUES ( Clt_Inv_Dt ) ( Proc_Dt DATE FORMAT TITLE 'Freight Bill INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Clt_Inv_Dt ,Prnt_Cust_Id ) Ct_Frt_Bill_Id_Mth INTEGER NOT NULL, 'YYYY-MM-DD' Process Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Ship_Dt , Load_Id INTEGER, Adt_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, SCAC ,Trnp_Mode_Cd ,Inbd_Outb_Cd ,Expr_Dt ) Proc_Mth DATE FORMAT TITLE 'Freight Bill Process Month', Audit Date', 'YYYY-MM-DD' INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rcv_Dt ,Clt_Inv_Dt , Adt_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Month', Rcv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Carr_Id ,Prnt_Cust_Id ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id ) Audit Rcv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Month', Receive Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rjct_Rsn_Cd ,Seq_Nbr , Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Month, Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Eff_Dt ,Expr_Dt ) Clt_Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Month', Invoice Date','YYYY-MM-DD' TITLE 'Freight Bill Ship Month', INDEX ( BOL_Nbr ) Ship_Mth DATE FORMAT Clt_Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Evnt_Typ_Cd ,Proc_Dt CHAR(2) ,Prnt_Cust_Id ) Rjct_Rsn_Cd CHARACTER SET LATIN NOT CASESPECIFIC , 'Client Invoice Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Carr_Id ,Prnt_Cust_Id ,Crnc_Cd ) Ship_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill INDEX ( Cust_Id ,Plnt_NCS ,Div_Cd ) Ship,Proc_Dt Date', ) INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT INDEX ( Wgt ) CASESPECIFIC, INDEX ( Plnt_NCS ) 18 INDEX ( Acct_Nbr );
Legacy DDL
ed c u d e R
Information Management
TOTAL
1,010
Look at all the weeks/months worth of effort, DBA design and maintenance that we don't have with Netezza. The appliance claims are true.
*: Oracle data does not account for ADDITIONAL effort required in configuring and engineering the file system design to accommodate this index management scheme.
19 2011 IBM Corporation
Information Management
Migrate data (500 GB to 1 TB /hour) Export legacy data to delimited ASCII file or pipe Load legacy data into the NPS system Provide secondary view of data Install Netezza ODBC/JDBC driver on users desktop
20
Information Management
Agenda
What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family
21
Information Management
What is the best choice? What will happen? What will the impact be? What happened? When and where? How much?
22
Information Management
Inefficient process
Inability to experiment
Time consuming
23
Information Management
Information Management
SAS
Analytics Grid
ETL
SQ L
ETL
SQL R, S+
ETL
SQL
25
Information Management
SAS
Analytics Grid
ETL
SQ L
ETL
SQL R, S+
ETL
SQL
26
Information Management
SAS
SQL
R, S+ SQL
27
Information Management
TwinFin(i) Appliance
Advanced Analytic Extensions
Partner Partner // Customer Customer Analytics Analytics Open Open Source Source Analytics Analytics (CRAN, (CRAN, GNU) GNU) Parallelized Parallelized Analytics Analytics
nzMatrix nzMatrix
C/C++ C/C++
Java Java
Python Python
R R
Fortran Fortran
MapReduce/ MapReduce/ Hadoop Hadoop Open Open Framework Framework API API
AMPP AMPP Architecture Architecture + + Netezza Netezza Performance Performance Software Software
28 2011 IBM Corporation
Information Management
Agenda
What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family
29
Information Management
TwinFin
Impact
(150X Performance)
2003
2006
2009
2010
2011
30
Information Management
31
Information Management
Appliance Family
Just
ed! Arriv
oon! S g in Com
Netezza
Development and Test System
Netezza
Data Warehouse High Performance Analytics 1 TB to 1.25 PB
High Capacity
Analytics on Massive Data Queryable Archiving 500 TB to 10+ PB
Ultra Performance
Performance is 10X todays Netezza appliance TBD
1 TB to 10 TB
32
Information Management
33
Information Management
Netezza
Models
1
.........
10
TF3
Snippet Processors Capacity (TB) Effective Capacity (TB)*
TF6 48 16 64
TF12 96 32 128
...
...
24 8 32
Capacity = User Data space Effective Capacity = User Data Space with compression
*: 4X compression assumed
34 2011 IBM Corporation
Information Management
Distributed global data warehouse Automated local highavailability & DR Concurrent query execution Workload balancing among nodes
Currently in Phase 1: Async Replication for DR and Concurrent Access
35
Information Management
Multiple appliances specialized to its task Globally scalable grid Appears as a cloud to users Flexible and elastic Fast data loading and analysis
Data Replication
Department/ Region
Data Capture
Department/ Region
Data Capture
Data Governance
Data Center
NZ Appliance Family
Data Center
NZ Appliance Family
36
Information Management
37
Questions?
Information Management