Sunteți pe pagina 1din 38

Information Management Technology Ecosystems Spring 2011

IBM Netezza Overview

Information Management

2011 IBM Corporation

Information Management

Agenda

What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family

2011 IBM Corporation

Information Management

True Appliances

Dedicated device Optimized for purpose Complete solution Fast installation Very easy operation Standard interfaces Low cost

2011 IBM Corporation

Information Management

What is Netezza Appliance?

A purpose built analytics engine Integrated database, server, data storage Standard interface Low cost of ownership

Speed: 10-100x faster than traditional systems Simplicity: Minimal administration and tuning Scalability: Peta-scale user data capacity Smart: High-performance advanced analytics

2011 IBM Corporation

Information Management

Performance, Value, Simplicity


Operations
Theres nothing to do its an appliance

BI Developers & DBAs faster delivery times


No configuration No physical modeling No indexes No tuning out of the box performance Data model agnostic

ETL Developers
No aggregate tables needed simpler ETL logic Faster load and transformation times

Business Analysts
Train of thought analysis 10 to 100x faster True ad hoc queries no tuning, no indexes Ask complex queries against large datasets Lower latency load & query simultaneously OnStream processing by 100s of nodes

2011 IBM Corporation

Information Management

Netezza as part of IBM Information Management Portfolio

Transactional & Collaborative Applications

Business Analytic Applications

Integrate
Master Data

Analyze
Big Data

Manage
Data Warehouses

www

Structured Data

Data

Data Warehouse Appliances

Streams

External Informatio n Sources

Content Streaming Information

Govern
6

Quality

Lifecycle Management

Security & Privacy

2011 IBM Corporation

Information Management

Sample Netezza client base


Digital Media

Financial Services

Government

Health & Life Sciences Retail / Consumer Products

Telecom

Other
7

Page 7 2011 IBM Corporation

Information Management

IBM Netezza TwinFin Client Derive Value Across Industries

Speed
15,000 users running 800,000+ queries per day 50X faster than before
when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process
- SVP Application Development, Nielsen

Scalability
1 PB on Netezza 7 years of historical data 100-200% annual data growth

NYSE has replaced an Oracle IO relational database with a data warehousing appliance from Netezza, allowing it to conduct rapid searches of 650 terabytes of data.
ComputerWeekly.com

Simplicity

Smart

8
8 2011 IBM Corporation

Information Management

Traditional Data Warehouse Complexity

2011 IBM Corporation

Information Management

Data Warehousing Simplified

10

2011 IBM Corporation

Information Management

Why Netezza over Conventional DW?


Typical Budget Outlay for BI Project
Application Admin Infrastructure

Budget Allocation with Netezza architecture


Application Admin Infrastructure

Real $$ Saved

Larger budget allocation for application & asset development Budget shift to strategic, value added activities More visibility within the organization Increased application services with better rates Reduced low end IT oriented services Why Netezza?
Performance matters On Site POCs matter (with 1 DBA vs. 15 DBAs) TCO and ROI matter Results matter The Partner asset matters most
11 2011 IBM Corporation

Information Management

Netezza Architecture in a Nutshell

FPGA

CPU

Memory

Advanced Advanced Analytics Analytics

FPGA

CPU

Lite Host
(IBM xSeries, Red Hat Linux)

BI BI

Memory

Hosts
ETL ETL

FPGA

CPU

Loaders Loaders

Memory

Disk Enclosures

S-Blades

Network Fabric

Applications

Netezza Appliance
12 2011 IBM Corporation

Information Management

Data Stream Processing

FPGA Core

CPU Core

Uncompress

Project

Restrict Visibility

Complex Joins, Aggs, etc.

NOTE: There are 96 Disk Drives, 96 FPGA Cores and 96 CPU Cores per rack
13 2011 IBM Corporation Page 13

Information Management

Inside the Netezza TwinFin Appliance

Disk Enclosures

Slice of User Data Swap and Mirror partitions High speed data streaming SQL Compiler Query Plan Optimize Admin

SMP Hosts

S-Blades (with FPGA-based Database Accelerator)

Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc.

14

2011 IBM Corporation

Information Management

Netezza Integration with 3rd party Tools (ETL, BI)


Extract / Load OLE-DB JDBC

Data In / Out

Standard Interfaces
Analytics

ODBC

Data Out
SQL
15

2011 IBM Corporation

Information Management

Agenda

What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family

16

2011 IBM Corporation

Information Management

Simple:: Performance Tuning Dramatically Simplified

No dbspace/tablespace sizing and configuration No redo/physical log sizing and configuration No journaling/logical log sizing and configuration No page/block sizing and configuration for tables No extent sizing and configuration for tables No Temp space allocation and monitoring No RAID level decisions for dbspaces No logical volume creations of files No integration of OS kernel recommendations No maintenance of OS recommended patch levels No JAD sessions to configure host/network/storage No software to install One simple partitioning strategy: HASH Instead of spending time and effort on tedious DBA tasks, I can use the time for higher BUSINESS VALUE tasks. For example, I can Bring on new applications and groups Quickly build out new data marts Provide more functionality to end users

DBA
17 2011 IBM Corporation

Information Management

Netezza Simplicity Example


Indices
More Tuning Modifications

Aggregation Tables
Tuning Considerations

Netezza DDL
Create Table - Logical Model CREATE TABLE frt_bill_evnt ( Frt_Bill_Id INTEGER NOT NULL, Load_Id INTEGER, Evnt_Typ_Cd CHAR(3) NOT NULL, Proc_Dt DATE, Adt_Dt DATE, Rcv_Dt DATE, Inv_Dt DATE, Clt_Inv_Dt DATE, Ship_Dt DATE, Rjct_Rsn_Cd CHAR(2), .. Dlvy_Tm INTEGER, Actl_Cmdt_Ds CHAR(30), Carr_Rfrc_Nbr VARCHAR(25)) distribute on hash ( Frt_Bill_Id );

PRIMARY INDEX XPKFRT_BILL_EVNT ( Frt_Bill_Id ) INDEX ( Proc_Dt ) INDEX ( Ship_DtCREATE ) SET TABLE AGG_TISTBL.FRT_BILL_EVNT_WEEK ,NO INDEX ( Pro_NbrFALLBACK ) , INDEX ( Prnt_Cust_Id ) BEFORE JOURNAL, NO INDEX ( Prnt_Cust_Id ,Expr_Dt ) NO AFTER JOURNAL INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id ) ( ,Prnt_Cust_Id ,Expr_Dt ) INDEX ( Evnt_Typ_Cd Ct_Frt_Bill_Id_wk INTEGER NOT NULL, INDEX ( Src_Data_Cd ) Load_Id INTEGER, INDEX XSI_PROC_VALUE ( Proc_Dt ) ORDER BY VALUES ( Proc_Dt ) Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, INDEX XSI_SHIP_VALUE ( Ship_Dt ) ORDER BY VALUES ( Ship_Dt ) Proc_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Week', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Rjct_Rsn_Cd ,Expr_Dt ) 'Freight Bill Audit Week', Adt_Wk DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Frt_Bill_Id ,Seq_Nbr ,Eff_Dt ) Rcv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Week', Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Ship_Dt ,Prnt_Cust_Id ) 'Carrier Invoice Week, DATE FORMAT SET TABLE 'YYYY-MM-DD' TISTBL.FRT_BILL_EVNT TITLE 'Client Invoice Week', ,NO INDEX ( Cust_Id ) Clt_Inv_WkCREATE Ship_Wk DATE FORMAT TITLE 'Freight Bill Ship Week', FALLBACK ,'YYYY-MM-DD' INDEX ( Pymt_Dt ,Pymt_Crnc_Cd ,Chck_Nbr ) Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC; NO BEFORE JOURNAL, INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Carr_Id , NO AFTER JOURNAL Prnt_Cust_Id ,Acct_Nbr ,Trnp_Mode_Cd ,Orig_Id ,Orig_Addr_Id , CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_MONTH ,NO () Dest_Id ,Dest_Addr_Id ,Chck_Nbr FALLBACK , ,Carr_Id Frt_Bill_Id INTEGER NOT NULL,) INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Prnt_Cust_Id ,Chck_Nbr NO ,Prnt_Cust_Id BEFORE JOURNAL, Load_Id INTEGER, INDEX ( Carr_Id ,SCAC ,Plnt_NCS ,Crnc_Cd ) Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT NO AFTER JOURNAL INDEX ( Frt_Bill_Id ,Eff_Dt ) CASESPECIFIC NOT NULL, INDEX XSI_CLT_INV_VALUE ( Clt_Inv_Dt ) ORDER BY VALUES ( Clt_Inv_Dt ) ( Proc_Dt DATE FORMAT TITLE 'Freight Bill INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Clt_Inv_Dt ,Prnt_Cust_Id ) Ct_Frt_Bill_Id_Mth INTEGER NOT NULL, 'YYYY-MM-DD' Process Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Ship_Dt , Load_Id INTEGER, Adt_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, SCAC ,Trnp_Mode_Cd ,Inbd_Outb_Cd ,Expr_Dt ) Proc_Mth DATE FORMAT TITLE 'Freight Bill Process Month', Audit Date', 'YYYY-MM-DD' INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rcv_Dt ,Clt_Inv_Dt , Adt_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Month', Rcv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Carr_Id ,Prnt_Cust_Id ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id ) Audit Rcv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Month', Receive Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rjct_Rsn_Cd ,Seq_Nbr , Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Month, Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Eff_Dt ,Expr_Dt ) Clt_Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Month', Invoice Date','YYYY-MM-DD' TITLE 'Freight Bill Ship Month', INDEX ( BOL_Nbr ) Ship_Mth DATE FORMAT Clt_Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE INDEX ( Evnt_Typ_Cd ,Proc_Dt CHAR(2) ,Prnt_Cust_Id ) Rjct_Rsn_Cd CHARACTER SET LATIN NOT CASESPECIFIC , 'Client Invoice Date', INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Carr_Id ,Prnt_Cust_Id ,Crnc_Cd ) Ship_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill INDEX ( Cust_Id ,Plnt_NCS ,Div_Cd ) Ship,Proc_Dt Date', ) INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT INDEX ( Wgt ) CASESPECIFIC, INDEX ( Plnt_NCS ) 18 INDEX ( Acct_Nbr );

Legacy DDL

Model Considerations Create Table - Logical Model

ed c u d e R

No indices and no tuning!

| 2011 IBM Corporation August 11 Confidential

Information Management

Simple:: Ramifications on TCO


Telecom Call Detail Record FACT (6 billion rows)
Tables Indexes Table Partitions Index Partitions Table Partitions tablespaces Index Partitions tablespaces Table Data Files Index Data Files

Oracle Netezza Object Count * Object Count


1 12 47 564 47 47 170 122 1

TOTAL

1,010

Look at all the weeks/months worth of effort, DBA design and maintenance that we don't have with Netezza. The appliance claims are true.
*: Oracle data does not account for ADDITIONAL effort required in configuring and engineering the file system design to accommodate this index management scheme.
19 2011 IBM Corporation

Information Management

Time of deployment reduced Approach: Load n Go for a unique sale


Migrate data model (Netezza is model agnostic) Export legacy DDL to text file Keep current logical model intact and simple
Keep table names and column definitions Remove all physical database design (extents, blocks, pages, materialized views, locks, indices) Users will keep consistent view and navigation of data

Migrate data (500 GB to 1 TB /hour) Export legacy data to delimited ASCII file or pipe Load legacy data into the NPS system Provide secondary view of data Install Netezza ODBC/JDBC driver on users desktop

Speed and Simplicity = New Analytics

20

2011 IBM Corporation

Information Management

Agenda

What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family

21

2011 IBM Corporation

Information Management

The Analytic Enterprise


Optimization Predictive Analytics BI Reporting and Ad-Hoc Analysis

What is the best choice? What will happen? What will the impact be? What happened? When and where? How much?

22

2011 IBM Corporation

Information Management

Advanced Analytics Challenges


Expensive

Inefficient process

Processing on limited and stale data

Limited analytic complexity

Inability to experiment

Time consuming

Inability to react to market conditions

23

2011 IBM Corporation

Information Management

Big Data Meets Big Math

Analytics Without Constraints


24 2011 IBM Corporation

Information Management

Advanced Analytics the Traditional Way

SAS

Data Warehouse Data

Analytics Grid

ETL
SQ L

Demand Demand Forecasting Forecasting

ETL
SQL R, S+

ETL
SQL

Fraud Fraud Detection Detection

C/C++, Java, Python, Fortran,

25

2011 IBM Corporation

Information Management

Advanced Analytics with TwinFin

SAS

Data Warehouse Data

Analytics Grid

ETL
SQ L

Demand Demand Forecasting Forecasting

ETL
SQL R, S+

ETL
SQL

Fraud Fraud Detection Detection

C/C++, Java, Python, Fortran,

26

2011 IBM Corporation

Information Management

Advanced Analytics with TwinFin

SAS

SQL

Demand Demand Forecasting Forecasting

R, S+ SQL

Fraud Fraud Detection Detection

27

2011 IBM Corporation

Information Management

Advanced Analytics Framework


Applications
Customer Customer Analytics Analytics Application Application Partner Partner Analytics Analytics Application Application Partner Partner Visualization Visualization Partner Partner Data Data Integration Integration Partner Partner Business Business Intelligence Intelligence

TwinFin(i) Appliance
Advanced Analytic Extensions
Partner Partner // Customer Customer Analytics Analytics Open Open Source Source Analytics Analytics (CRAN, (CRAN, GNU) GNU) Parallelized Parallelized Analytics Analytics

Development & Modeling Tools


Partner Partner ADE ADE or or IDE IDE R R CLI/GUI CLI/GUI Eclipse Eclipse

Advanced Analytics Platform


User User Defined Defined Extensions Extensions

Analytic Analytic Executabl Executabl es es

nzMatrix nzMatrix

C/C++ C/C++

Java Java

Python Python

R R

Fortran Fortran

MapReduce/ MapReduce/ Hadoop Hadoop Open Open Framework Framework API API

Massively Massively Parallel Parallel Analytic Analytic Engines Engines

Open Open Language Language API API

Data Warehouse Appliance Platform

AMPP AMPP Architecture Architecture + + Netezza Netezza Performance Performance Software Software
28 2011 IBM Corporation

Information Management

Agenda

What is Netezza? Value Proposition iClass Advanced Analytics Netezza Product Family

29

2011 IBM Corporation

Information Management

Netezzas Market-Leading Evolution


TwinFin with iClass Advanced Analytics
(300X Performance)

Worlds First Analytic Data Warehouse Appliance

TwinFin

Impact

Worlds First Petabyte Data Warehouse Appliance

(150X Performance)

Worlds First 100 TB Data Warehouse Appliance

NPS 10000 Series


(50X Performance)

Worlds First Data Warehouse Appliance

NPS 8000 Series

2003

2006

2009

2010

2011

30

2011 IBM Corporation

Information Management

The Evolving IBM Netezza Appliance Family

31

2011 IBM Corporation

Information Management

Appliance Family

Just

ed! Arriv

oon! S g in Com

Netezza
Development and Test System

Netezza
Data Warehouse High Performance Analytics 1 TB to 1.25 PB

High Capacity
Analytics on Massive Data Queryable Archiving 500 TB to 10+ PB

Ultra Performance
Performance is 10X todays Netezza appliance TBD

1 TB to 10 TB

32

2011 IBM Corporation

Information Management

High Capacity Appliance NEW!


Lowest price/TB in the market Scalability to more than 10 Petabytes Appliance simplicity SQL and analytics on massive data Fast data loading (5.5 TB/hour) Low environmental footprint

Unparalleled Scalability and Value!

33

2011 IBM Corporation

Information Management

Netezza

Models

1
.........

10

TF3
Snippet Processors Capacity (TB) Effective Capacity (TB)*

TF6 48 16 64

TF12 96 32 128

TF24 192 64 256

...

TF48 384 128 512

...

TF120 960 320 1280

24 8 32

Predictable, Linear Scalability throughout entire family

Capacity = User Data space Effective Capacity = User Data Space with compression
*: 4X compression assumed
34 2011 IBM Corporation

Information Management

Replication: Vision and Direction

Distributed global data warehouse Automated local highavailability & DR Concurrent query execution Workload balancing among nodes
Currently in Phase 1: Async Replication for DR and Concurrent Access

35

2011 IBM Corporation

Information Management

The Big Picture:: The Vision of a Seamless Platform


Cloud Deployment

Multiple appliances specialized to its task Globally scalable grid Appears as a cloud to users Flexible and elastic Fast data loading and analysis
Data Replication

Department/ Region
Data Capture

Department/ Region
Data Capture

Data Governance

Multisystem Data Integration / Federation

Data Center
NZ Appliance Family

Data Center
NZ Appliance Family

36

2011 IBM Corporation

Information Management

Bottom LineWhy Netezza?


Provides What Users Want Opens Access to Detail and Historical Data Enables More/New Sophisticated ad hoc and complex queries Faster Response Times Best Performance 10-100 times faster on the largest, most complex queries Lowest Price Significantly less expensive Reduced TCO Reduce maintenance, DBA and system admin expense Installs in hours not days/weeks/months Next Generation Analytic Applications Deployed Redirect Analytic resources to Applications from Infrastructure

37

2011 IBM Corporation

Information Management Technology Ecosystems Spring 2011

Questions?

Information Management

2011 IBM Corporation

S-ar putea să vă placă și