Sunteți pe pagina 1din 15

Capturing Big Value in Big Data How Use Case Segmentation Drives Solution Design and Technology Selection

n at Deutsche Telekom
Jrgen Urbanski Vice President Big Data & Cloud Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group juergen.urbanski@t-systems.com

Session focus
What is Hadoop? What is the disruptive innovation in Hadoop? What are target use cases, horizontally and telco-specific? How do we start realizing value from Hadoop today? How does Hadoop complement existing investments in business intelligence? Audience participation: 1. Who has heard of Hadoop? 2. Who is using Hadoop somewhere in their organization? (can be proof of concept or production)

Hadoop! Coming soon to a data center near you...


Hadoop is like a data warehouse, but it can store more data, more kinds of data, and perform more flexible analyses. Haoop is open source and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical than conventional data warehouse solutions. Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better Hadoop can deliver a foundation for profitable growth: Gain value from all your data by asking bigger questions
3

Reference architecture view of Hadoop


Data Integration
Real Time Ingestion

Hadoop Core Hadoop Projects Adjacent Categories

Presentation
Workflow and Scheduling Data Isolation Data Visualization and Reporting Clients

Application
Analytics Apps Transactional Apps Analytics Middleware

Batch Ingestion

Access Management

Operations

Security

Data Processing
Data Connectors Batch Processing Real Time/Stream Processing Search and Indexing

Management and Monitoring

Data Management
Metadata Services Distributed Storage (HDFS) Distributed Processing (MapReduce) Non-relational DB Structured InMemory

Data Encryption

Infrastructure
Virtualization Compute / Storage / Network

Disruptive innovation #1: Store first, ask questions later


Much cheaper storage but not just storage
Illustrative acquisition cost ? !

SAN Storage
3-5/GB

NAS Filers
1-3/GB

Enterprise Class Hadoop Storage


???/GB

White Box DAS1)


0.50-1.00/GB

Data Cloud1)
0.10-0.30 /GB

Based on HDS SAN Storage

Based on Netapp Based on Netapp FAS-Series E-Series (NOSH)

Hardware can be self-assembled

Based on large scale object storage interfaces

1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions

Disruptive innovation #2: Parallel processing


SCALE (storage and processing)
Traditional Database EDW MPP Analytics NoSQL Hadoop Distribution

Required on write Reads are fast Standards and structured Limited, no data processing Structured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store

Schema Speed Governance Processing Data types

Required on read Writes are fast Loosely structured Processing coupled with data, parallel processing / scale out Multi and unstructured Data Discovery Processing unstructured data Massive Storage/Processing

Best fit use

.. Big Data: Its About Scale and Structure


6

Target use cases for Hadoop


IT Infrastructure & Operations Higher Business Line of Business & CXO Intelligence & Business Analysts Data Warehousing Capacity Planning & New Utilization Business Models? Customer Profiling & Enterprise Data Revenue Analytics Warehouse Targeted Advertising Offload Analytics Enterprise Data Service Renewal Warehouse Implementation Archive CDR based Data ETL Offload Analytics Fraud Management Longer Time to value

Potential value Lower Cost Storage Enterprise Data Lake Shorter

Lower

Cost effective storage, processing, and analysis

Foundation for profitable growth


7

Enterprise data warehouse offload use case


The Challenge
EDW at capacity; cannot support growing data volumes and variety Expensive to scale, can only keep one year of data Performance issues in business critical apps; little room for discovery, analytics Older data is stored but dark, cannot swim around and explore it

The Solution
Offload low value/byte data from EDW, and rescue data from storage grid or tape Hadoop for data storage and processing (parse, cleanse, apply structure and transform) Free existing capacity for query workloads Retain all data for analysis!

DATA WAREHOUSE
Operational (44%)

DATA WAREHOUSE
Operational (50%) Analytics (50%)

ELT Processing (42%) Analytics (11%)

HADOOP Analytics Processing Storage


8

Enterprise data warehouse offload benefits


Illustrative economics from a large telco
100x overall project capex improvement $200,000 per TB for Teradata versus $2,000 per TB capex for Hadoop. Of the $2,000, 25% is HW, 25% SW, 50% Services.

Most of the services are as follows: The telcos DW has 100ks of lines of SQL code All the ETL is in SQL and needs to be converted into MapReduce jobs Had developed their set of SQL queries over 6 years, and moved over to MapReduce based queries over 6 months. As a result, spend on TDC has decreased from $65m to $35m over a 5 year horizon, most of the remaining $35m spend is maintenance for the legacy TDC deployment. Moreover, TDC performance up by 4x because TDC focused on high volume (lots of apps / users) but low latency (interactive response time) while the rest of the work is offloaded to the Hadoop cluster.

Common objections to Hadoop

We dont have big data problems

We dont have petabytes of data

We cant justify the budget for a new project

We dont have the skills

Were not sure Hadoop is mature/ secure/ enterprise-ready

We already have a scale-out strategy for our EDW/ETL

10

Every organization has data problems! Hadoop can help


MYTH: Big Data means Petabytes
Not just Volume Remember Variety, Velocity Plenty of issues at smaller scales Data processing Unstructured data Often warehouse volumes are small because the technology is expensive, not because there is no relevant data Scalability is about growing with the business, affordably and predictably

MYTH: Big Data means Data Science


Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g. Landing zone capturing and refining multi-structured data types with unknown future value Cost effective platform for retaining lots of data for long periods of time Walk before you run Big Data Is a State of Mind

11

From data puddles and ponds to lakes and oceans


AVOID:
Systems separated by workload type due to contention

GOAL:
Platform that natively supports mixed workloads as shared service

Batch BU1 BU2 BU3 Refine

Interactive Explore

Online Enrich

Big Data

Big Data

Big Data

Big Data
Transactions, Interactions, Observations

Page 12

Waves of adoption crossing the chasm


Wave 1 Batch Orientation Adoption today* Example use cases Response time Data characteristic Architectural characteristic Example technologies Mainstream, 70% of organizations Refine: archival and transformation Hour(s) Volume EDW / RDBMS talk to Hadoop MapReduce, Pig, Hive Analytic apps talk directly to Hadoop ODBC/JDBC, Hive Wave 2 Interactive Orientation Wave 3 Real-Time Orientation

Early adopters, 20% of organizations Explore: query and visualization Minutes

Bleeding edge, 10% of organizations Enrich: real-time decisions Seconds Velocity Derived data also stored in Hadoop HBase, NoSQL, SQL

* Among organizations using Hadoop

13

Where to start inserting Hadoop in your company? A call to action


IT Infrastructure IT Applications Accelerating implementation Solution design driven by target use cases Reference architecture Technology selection and POC Implementation lessons learnt LOB CXO Understanding Big Data Definition Benefits over adjacent and legacy technologies Current mode vs. future mode for analytics Assessing the Economic Potential Target use cases by function and industry Best approach to adoption

14

Key takeaways
The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies. Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data. Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster. Questions? juergen.urbanski@t-systems.com

15

S-ar putea să vă placă și