Documente Academic
Documente Profesional
Documente Cultură
n at Deutsche Telekom
Jrgen Urbanski Vice President Big Data & Cloud Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group juergen.urbanski@t-systems.com
Session focus
What is Hadoop? What is the disruptive innovation in Hadoop? What are target use cases, horizontally and telco-specific? How do we start realizing value from Hadoop today? How does Hadoop complement existing investments in business intelligence? Audience participation: 1. Who has heard of Hadoop? 2. Who is using Hadoop somewhere in their organization? (can be proof of concept or production)
Presentation
Workflow and Scheduling Data Isolation Data Visualization and Reporting Clients
Application
Analytics Apps Transactional Apps Analytics Middleware
Batch Ingestion
Access Management
Operations
Security
Data Processing
Data Connectors Batch Processing Real Time/Stream Processing Search and Indexing
Data Management
Metadata Services Distributed Storage (HDFS) Distributed Processing (MapReduce) Non-relational DB Structured InMemory
Data Encryption
Infrastructure
Virtualization Compute / Storage / Network
SAN Storage
3-5/GB
NAS Filers
1-3/GB
Data Cloud1)
0.10-0.30 /GB
1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
Required on write Reads are fast Standards and structured Limited, no data processing Structured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store
Required on read Writes are fast Loosely structured Processing coupled with data, parallel processing / scale out Multi and unstructured Data Discovery Processing unstructured data Massive Storage/Processing
Lower
The Solution
Offload low value/byte data from EDW, and rescue data from storage grid or tape Hadoop for data storage and processing (parse, cleanse, apply structure and transform) Free existing capacity for query workloads Retain all data for analysis!
DATA WAREHOUSE
Operational (44%)
DATA WAREHOUSE
Operational (50%) Analytics (50%)
Most of the services are as follows: The telcos DW has 100ks of lines of SQL code All the ETL is in SQL and needs to be converted into MapReduce jobs Had developed their set of SQL queries over 6 years, and moved over to MapReduce based queries over 6 months. As a result, spend on TDC has decreased from $65m to $35m over a 5 year horizon, most of the remaining $35m spend is maintenance for the legacy TDC deployment. Moreover, TDC performance up by 4x because TDC focused on high volume (lots of apps / users) but low latency (interactive response time) while the rest of the work is offloaded to the Hadoop cluster.
10
11
GOAL:
Platform that natively supports mixed workloads as shared service
Interactive Explore
Online Enrich
Big Data
Big Data
Big Data
Big Data
Transactions, Interactions, Observations
Page 12
Bleeding edge, 10% of organizations Enrich: real-time decisions Seconds Velocity Derived data also stored in Hadoop HBase, NoSQL, SQL
13
14
Key takeaways
The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies. Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data. Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster. Questions? juergen.urbanski@t-systems.com
15