Big Data - Telco Cloud World Forum - For Distribution

Capturing Big Value in Big Data How Use Case Segmentation Drives Solution Design and Technology Selection
n at Deutsche Telekom
Jrgen Urbanski Vice President Big Data & Cloud Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group juergen.urbanski@t-systems.com
Session focus
What is Hadoop? What is the disruptive innovation in Hadoop? What are target use cases, horizontally and telco-specific? How do we start realizing value from Hadoop today? How does Hadoop complement existing investments in business intelligence? Audience participation: 1. Who has heard of Hadoop? 2. Who is using Hadoop somewhere in their organization? (can be proof of concept or production)
Hadoop! Coming soon to a data center near you...

Hadoop is like a data warehouse, but it can store more data, more kinds of data, and perform more flexible analyses. Haoop is open source and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical than conventional data warehouse solutions. Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better Hadoop can deliver a foundation for profitable growth: Gain value from all your data by asking bigger questions
3
Reference architecture view of Hadoop

Data Integration
Real Time Ingestion
Hadoop Core Hadoop Projects Adjacent Categories
Presentation
Workflow and Scheduling Data Isolation Data Visualization and Reporting Clients
Application
Analytics Apps Transactional Apps Analytics Middleware
Batch Ingestion
Access Management
Operations
Security
Data Processing
Data Connectors Batch Processing Real Time/Stream Processing Search and Indexing
Management and Monitoring
Data Management
Metadata Services Distributed Storage (HDFS) Distributed Processing (MapReduce) Non-relational DB Structured InMemory
Data Encryption
Infrastructure
Virtualization Compute / Storage / Network
Disruptive innovation #1: Store first, ask questions later

Much cheaper storage but not just storage
Illustrative acquisition cost ? !
SAN Storage
3-5/GB
NAS Filers
1-3/GB
Enterprise Class Hadoop Storage

???/GB
White Box DAS1)

0.50-1.00/GB
Data Cloud1)
0.10-0.30 /GB
Based on HDS SAN Storage
Based on Netapp Based on Netapp FAS-Series E-Series (NOSH)
Hardware can be self-assembled
Based on large scale object storage interfaces
1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
Disruptive innovation #2: Parallel processing

SCALE (storage and processing)
Traditional Database EDW MPP Analytics NoSQL Hadoop Distribution
Required on write Reads are fast Standards and structured Limited, no data processing Structured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store
Schema Speed Governance Processing Data types
Required on read Writes are fast Loosely structured Processing coupled with data, parallel processing / scale out Multi and unstructured Data Discovery Processing unstructured data Massive Storage/Processing
Best fit use
.. Big Data: Its About Scale and Structure

6
Target use cases for Hadoop

IT Infrastructure & Operations Higher Business Line of Business & CXO Intelligence & Business Analysts Data Warehousing Capacity Planning & New Utilization Business Models? Customer Profiling & Enterprise Data Revenue Analytics Warehouse Targeted Advertising Offload Analytics Enterprise Data Service Renewal Warehouse Implementation Archive CDR based Data ETL Offload Analytics Fraud Management Longer Time to value
Potential value Lower Cost Storage Enterprise Data Lake Shorter
Lower
Cost effective storage, processing, and analysis
Foundation for profitable growth

7
Enterprise data warehouse offload use case

The Challenge
EDW at capacity; cannot support growing data volumes and variety Expensive to scale, can only keep one year of data Performance issues in business critical apps; little room for discovery, analytics Older data is stored but dark, cannot swim around and explore it
The Solution
Offload low value/byte data from EDW, and rescue data from storage grid or tape Hadoop for data storage and processing (parse, cleanse, apply structure and transform) Free existing capacity for query workloads Retain all data for analysis!
DATA WAREHOUSE
Operational (44%)
DATA WAREHOUSE
Operational (50%) Analytics (50%)
ELT Processing (42%) Analytics (11%)
HADOOP Analytics Processing Storage

8
Enterprise data warehouse offload benefits

Illustrative economics from a large telco
100x overall project capex improvement $200,000 per TB for Teradata versus $2,000 per TB capex for Hadoop. Of the $2,000, 25% is HW, 25% SW, 50% Services.
Most of the services are as follows: The telcos DW has 100ks of lines of SQL code All the ETL is in SQL and needs to be converted into MapReduce jobs Had developed their set of SQL queries over 6 years, and moved over to MapReduce based queries over 6 months. As a result, spend on TDC has decreased from $65m to $35m over a 5 year horizon, most of the remaining $35m spend is maintenance for the legacy TDC deployment. Moreover, TDC performance up by 4x because TDC focused on high volume (lots of apps / users) but low latency (interactive response time) while the rest of the work is offloaded to the Hadoop cluster.
Common objections to Hadoop
We dont have big data problems
We dont have petabytes of data
We cant justify the budget for a new project
We dont have the skills
Were not sure Hadoop is mature/ secure/ enterprise-ready
We already have a scale-out strategy for our EDW/ETL
10
Every organization has data problems! Hadoop can help

MYTH: Big Data means Petabytes
Not just Volume Remember Variety, Velocity Plenty of issues at smaller scales Data processing Unstructured data Often warehouse volumes are small because the technology is expensive, not because there is no relevant data Scalability is about growing with the business, affordably and predictably
MYTH: Big Data means Data Science

Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g. Landing zone capturing and refining multi-structured data types with unknown future value Cost effective platform for retaining lots of data for long periods of time Walk before you run Big Data Is a State of Mind
11
From data puddles and ponds to lakes and oceans

AVOID:
Systems separated by workload type due to contention
GOAL:
Platform that natively supports mixed workloads as shared service
Batch BU1 BU2 BU3 Refine
Interactive Explore
Online Enrich
Big Data
Big Data
Big Data
Big Data
Transactions, Interactions, Observations
Page 12
Waves of adoption crossing the chasm

Wave 1 Batch Orientation Adoption today* Example use cases Response time Data characteristic Architectural characteristic Example technologies Mainstream, 70% of organizations Refine: archival and transformation Hour(s) Volume EDW / RDBMS talk to Hadoop MapReduce, Pig, Hive Analytic apps talk directly to Hadoop ODBC/JDBC, Hive Wave 2 Interactive Orientation Wave 3 Real-Time Orientation
Early adopters, 20% of organizations Explore: query and visualization Minutes
Bleeding edge, 10% of organizations Enrich: real-time decisions Seconds Velocity Derived data also stored in Hadoop HBase, NoSQL, SQL
* Among organizations using Hadoop
13
Where to start inserting Hadoop in your company? A call to action

IT Infrastructure IT Applications Accelerating implementation Solution design driven by target use cases Reference architecture Technology selection and POC Implementation lessons learnt LOB CXO Understanding Big Data Definition Benefits over adjacent and legacy technologies Current mode vs. future mode for analytics Assessing the Economic Potential Target use cases by function and industry Best approach to adoption
14
Key takeaways
The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies. Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data. Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster. Questions? juergen.urbanski@t-systems.com
15

Big Data - Telco Cloud World Forum - For Distribution

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Big Data - Telco Cloud World Forum - For Distribution

Încărcat de

Drepturi de autor:

Formate disponibile

Capturing Big Value in Big Data How Use Case Segmentation Drives Solution Design and Technology Selection

Hadoop! Coming soon to a data center near you...

Reference architecture view of Hadoop

Hadoop Core Hadoop Projects Adjacent Categories

Management and Monitoring

Disruptive innovation #1: Store first, ask questions later

Enterprise Class Hadoop Storage

White Box DAS1)

Based on HDS SAN Storage

Based on Netapp Based on Netapp FAS-Series E-Series (NOSH)

Hardware can be self-assembled

Based on large scale object storage interfaces

Disruptive innovation #2: Parallel processing

Schema Speed Governance Processing Data types

Best fit use

.. Big Data: Its About Scale and Structure

Target use cases for Hadoop

Potential value Lower Cost Storage Enterprise Data Lake Shorter

Cost effective storage, processing, and analysis

Foundation for profitable growth

Enterprise data warehouse offload use case

ELT Processing (42%) Analytics (11%)

HADOOP Analytics Processing Storage

Enterprise data warehouse offload benefits

Common objections to Hadoop

We dont have big data problems

We dont have petabytes of data

We cant justify the budget for a new project

We dont have the skills

Were not sure Hadoop is mature/ secure/ enterprise-ready

We already have a scale-out strategy for our EDW/ETL

Every organization has data problems! Hadoop can help

MYTH: Big Data means Data Science

From data puddles and ponds to lakes and oceans

Batch BU1 BU2 BU3 Refine

Waves of adoption crossing the chasm

Early adopters, 20% of organizations Explore: query and visualization Minutes

* Among organizations using Hadoop

Where to start inserting Hadoop in your company? A call to action

S-ar putea să vă placă și