Sunteți pe pagina 1din 39

Efficient Data integration for

High-Performance Analytics
Karl Krycha,
EMEA SAS Teradata CoE, Teradata

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Case Study
Case Study Big Analytics In Database
BoA cuts AML transaction processing time by ten hours
 Following successive mergers, Bank of America was gathering
enormous amounts of data but was not integrating it effectively in its
existing Teradata warehouse (40 data models.)
 Centralised 100 terabyte SAS server that serviced all of the
business using virtually every tool that SAS has available.
 Existing process ran for about 14 hours, expected to take 28 hours
with the anticipated increase of transaction volume. Introduced
SAS-Teradata's in-database processing system to eliminate these
problems.
 The processing time for AML estimated to be reduced between 5 - 8
hours. The process in fact now runs at only 4 hours.
 Informatica system that had been used to extract and load data from
Teradata into SAS was no longer needed, external data stores no
longer required.
Improved processing…

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Case Study
Customer Case Study In Memory
High-Performance Analytics Process
167 Hours
… accelerated modeling !

DEVELOPMENT
EXPLORATION

DEPLOYMENT
Bottom-line Impact:

MODEL

MODEL
DATA
Tens of Millions of
Dollars

SAS In-Memory Analytics for Teradata delivered


84
SECONDS

game changing results!

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Complementary Offerings
Options to manage the entire analytical process

In-Database In-Memory
“Bring SAS analytics to the data” “Accelerate SAS analytics with MPP technology”

• Minimizes data replication • Moves analytic dataset


• Minimize data movement to dedicated high-
performance analytic
• Leverage TD DB and MPP capability
sandbox
for improved performance
• Gives analyst full
• Data preparation and data exploration
platform control
in-database
• ‘In-database’ scoring for SAS models • Leverages ‘in-memory‟ MPP
execution speed
• Supports select SAS procedures
• Supports select SAS procedures
• Supports complex advanced analytics
• Speed up model development phase
• Complex models on large datasets

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


SAS Analytics for Teradata Process Evolution
Reduced Time-to-Intelligence

Nicolas Adamek, EMEA SAS Teradata COE Lead

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics
 MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics
with over 16 customers with a Petabyte or more of data.

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?

Growing Data
Volumes
It„s growing. Quickly.
And it„s everywhere.

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics
 MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics
with over 16 customers with a Petabyte or more of data.

 Big Data = Emerging new data types

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics
 MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics
with over 16 customers with a Petabyte or more of data.

 Big Data = Emerging new data types


 New multi-structured data types with unknown relationships that
require processing of data regardless of size to discover insights.
Examples include web logs, sensor networks, social networks, text.

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
New kinds of data

Structured
data vs.
unstructured
data growth

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics
 MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics
with over 16 customers with a Petabyte or more of data.

 Big Data = Emerging new data types


 New multi-structured data types with unknown relationships that
require processing of data regardless of size to discover insights.
Examples include web logs, sensor networks, social networks, text.

 Big Data = New (non-SQL) analytics

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


What is Big Data?
 Big Data = Large scale (data volume) analytics
 MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics
with over 16 customers with a Petabyte or more of data.

 Big Data = Emerging new data types


 New multi-structured data types with unknown relationships that
require processing of data regardless of size to discover insights.
Examples include web logs, sensor networks, social networks, text.

 Big Data = New (non-SQL) analytics


 New Analytic Frameworks that provide parallel processing on semi-
structured data. Leveraging the power of MapReduce:
Teradata SQL MapReduce, SAS MapReduce (SAS/ACCESS to HADOOP)

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


The Many Dimensions of Big Data
Workload Agility Analytic Complexity
Workload Complexity
Data & analysis latency Analytic capabilities used
Query mix
(real-time, near-real-time, (predictive, advanced statistics,
Concurrent data acquisition &
intra-day, daily) data/text mining, pre-built functions)
analysis

Analyse
Validity
Quality Level
Data

Volatility Variety
Generation rate Structured
Update rate Volume Multi-Structured
Accumulation rate

Source: BI Research 2012

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Key Points Big Data
 Large scale SQL analytics (Volume)
• Teradata has over 25 customers in Petabyte club

 Emerging new data types (Variety, Velocity,


Complexity)
• New multi-structured data types with unknown relationships
that require processing of data regardless of size to discover
insights

 Big Analytics - New Non SQL analytics


• Leveraging the power of MapReduce for new methods
for efficiently analyzing data

6/18/2012 19
Teradata Confidential
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012
Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Big Data Analytics
Do we really need Big Data?

 For consumer
 Better understanding of own behavior
 Integration of activities
 Gamification – turn behavior into enjoyment
 Influence – involvement and recognition

 For companies
 Real behavior – what do people do, and what do they value?
 Faster interaction
 Better targeted offers
 Customer understanding

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Big Data Analytics
Potential Use Cases for Big Data Analytics

Source: IDC

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Big Data Analytics
New Capabilities
New data + new analysis = new capabilities
1. New Data: Relational plus new non-relational data sources
- Machine Data: Click Stream Files, System Log Files
- Customer Interaction Graphs: Social Network Connections
- Micro-transactions: Financial Services Electronic, Mobile Transactions
- Sensor Data: Telecommunications Network Data Records, Electric Grid Data

2. New Analysis: Requiring more than SQL (i.e. MapReduce)


- High Performance Analytics
- On-the-fly Pattern matching and path analysis
- Graph analysis
- Text analysis

3. New Capabilities: Merge the BI and Data Scientist Worlds


- Iterative analysis of data (data exploration and investigative analytics)
- Data Scientist/ Data Ninja/ Analytics Developers /Quants
- Embrace new analytics techniques e.g. MapReduce

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Big Data Analytics
Unstructured Data is Not Analysed

 Data is prepared and …the fact is that virtually no analytics


structure applied before directly analyze unstructured data.
Unstructured data may be an input to
analytics take place an analytic process, but when it comes
 Fingerprints & Polygons time to do any actual analysis, the
unstructured data itself isn’t utilized.
 Sentiment Analysis & Word - Bill Franks
Scoring International Institute for Analytics

 “Big Data” is about more


than ability to store data
 The ability to quickly
structure and analyse data
is required to gain value.

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Key Points Big Data Analytics
 What is Big Data
 Big Data is challenging our current pattern of thought
 Cost effective computing and storage
 Everything can be stored
 Cheap large scale computing power readily available
 Data explosion: Data everywhere, structured, semi-structured,
unstructured, geo-location data, machine generated data, …
 Big Data Analytics
 Big data is the next wave of new data sources that will drive
analytic innovation in business, government, and academia.
The analysis that big data enables will lead to decisions that are
more informed and, in some cases, different from what they are
today.
Bill Franks, Taming The Big Data Tidal Wave

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Integrated Data Warehouse
Our Preferred, Advocated Solution

Integrated Data Lab enables rapid experimentation of new data

Teradata Data Lab – integral part of Analytic Advantage Program


Viewpoint Portlets that enables Data Labs (sandboxing)

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


However,

this integrated DWH approach needs


constantly to be extended and
improved..

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


New Business opportunities
Social Media integration

 Brand understanding: “Do people like me?”


 Market understanding: “What are the hot topics?”
 Influencer analysis: “Who is important?”
 Social network analysis
 Add context to customer information: “What drives
actions?”
 Data mining
 Customer segmentation
 Service led social media strategy “Help me”
 Marketing social media strategy “
 Creating an interaction framework

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Growing marketing capabilities
Social media integration
 Marketers need solid metrics that are meaningful
 Data needs to be analyzed, not just reported
 Not just # of fans, followers & likes
 Typical reporting tools provide basics w/no context

Hardest:
Knowing when top social
Easy: influencers come to site;
Direct sales show customized
from messaging to encourage
Facebook evangelism

Hard: Harder:
Knowing sentiment of posts and Conversion rate from people in various
responding quickly social channels

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Social media integration
Big Data Architecture
Complementary Technologies – One Vision

Enterprise
Transactional
Data Sources

Marketing
Channels

Direct Mail

Retailer
Integrated Data
Warehouse Partner/Dist

Email

Web

Responses

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


SAS-Aster Data Integration
Results
SAS/ACCESS for Aster nCluster (e.g. fraud
detection)
• Transparent connection between Aster Data
SAS and Aster Data Analytic SAS
Platform
• Makes MapReduce processing SQL
Queries
easily accessible to SAS Base SAS on Big Data
developers •Enable SAS system to access big data sets
•High performance bulk Aster Load Utility support
•Seamlessly integrate SAS programs against Aster

SAS Scoring Accelerator for Aster Aster Data


SAS
Models
nCluster SAS

• Push down and process SAS


Enterprise Miner models inside Fast Scoring for SAS Enterprise Miner
Aster Data • Native SAS parallelization for fast scaling
and high performance
• Currently in Limited Availability • Faster data mining process
• Lower IT and development costs

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Teradata SAS Analytic Process Flow
In database / High Performance Analytics
Data Data Model Model Model
Understanding Preparation Development Deployment Execution

Model Manager

3
Enterprise Miner Scoring Accelerator

1 2 3a 4
Data Set Builder for Data Set Builder for
Analytics Accelerator Analytics Accelerator 5
Teradata SQL
SAS SAS

Scoring

ORDER
ORDER NUMBER
ADS
ORDER
STATUS D
ORDER
A ITEM BACKORDERED
TQUANTITY
CUSTOMER E
CUSTOMER NUMBER
CUSTOMER NAME
CUSTOMER CITYORDER ITEM SHIPPED
QUANTITY
CUSTOMER POSTSHIP DATE
CUSTOMER ST
CUSTOMER ADDR
CUSTOMER PHONE
CUSTOMER FAX
ITEM Modeling ADS
QUANTITY
DESCRIPTION

High Performance Analytics

3b

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Teradata Appliance for SAS HPA
Teradata Ecosystem

Appendix Slides

• Teradata appliance running SAS in-memory – Not a data warehouse


• Focused on model development – Not data prep or scoring
• Built for unique SAS analytic modeling – High volumes and performance
• Integrated in the Teradata Analytical Ecosystem – Key differentiator

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Teradata Appliance for SAS High-Performance
Analytics - Model 700
• Purpose built appliance optimized
specifically for SAS In-Memory Analytics
• Executes SAS HPA on a Teradata
appliance and is co-resident with the
database
 Utilizes Teradata for data storage and
management and to supply data to the
HPA routines
 Leverages SAS In-Memory Procedures for
data analysis, model development, and
model scoring
• Intended to be stand alone, dedicated
system, not the EDW or mixed workload
data mart
• Orders-of-magnitude performance gains
by leveraging MPP architecture and
executing in memory analytics from SAS
in parallel

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Teradata Appliance for SAS High-Performance
Analytics - Model 700
Model 700

SAS/STAT® Software
SAS/ETS® Software
Teradata Appliance for SAS High-
SAS® Enterprise Miner™
Performance Analytics
Software
36
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012
Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


Summary
 Business Approach
Identify business processes that you could do more efficiently with
the help of big data and high performance analytics1)
 Deliver Value As You Go
It will take a lot of effort to figure out how to apply a source of big
data to your business. An organization‟s analytic professionals and
their business sponsors must be sure to look for ways to deliver
small, quick win‟s as they go 2)
 Analytical Ecosystem
Acquire or grow the needed technology and analytical skills1)

1) Gartner
2) Bill Franks, Taming The Big Data Tidal Wave

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012


THANK YOU !

Karl Krycha
Managing Consultant
Teradata EMEA Advanced Analytics PS COE
Storchengasse 1
1150 Wien
Austria
karl.krycha@teradata.com

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012

S-ar putea să vă placă și