Sunteți pe pagina 1din 36

Oracle Big Data, In-memory, and Exadata -

One Database Engine to Rule Them All


Dr.-Ing. Holger Friedrich

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

sumIT AG
Consulting and implementation services in Switzerland
Experts for
Data Warehousing,
Business Intelligence,
and Big Data solutions

Focussed on Oracle technology


BI Foundation specialized partner
Data Warehousing specialized partner
Our motto: Get Value From Data
Visit our web site: www.sumit.ch
(in German)

2013 sumIT AG

03/2012

Holger Friedrich
Computer Science diploma of
Karlsruhe Institute of Technology (KIT)
Ph.D. in Robotics and Machine Learning
More than 16 years experience with Oracle technology
Expert for

Data Integration
Data Warehousing,
Data Mining and
Business Intelligence

Technical Director of sumIT AG

First Oracle ACE for DWH/BI in Switzerland

2013 sumIT AG

03/2012

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

DB Architecture - Old Times


Old times = 1977 - 2008
SGA - System Global Area
- Shared Pools (Library Cache etc.)
- Redo Log Buffer
- Buffer Cache
Persistent Storage
- Disk & Tape
- serve database blocks
PGA - Program Global Area
- Query specific processing
and storage

Query processing done in PGA by query specific server processes


2015 sumIT AG

03/2012

Query Processing - Old Times


Server Process

Block Buffer

2015 sumIT AG

Disk

03/2012

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

2008 - Times Are a Changing

2015 sumIT AG

03/2012

Exadata - Architecture
Databases and applications
deployed and configured without
any adaptations
Fast network
via Infiniband
Regular compute servers
Dedicated storage servers
- organised in cells
- discs & flash attached
- run Exadata Storage
Software

2015 sumIT AG

03/2012

10

Exadata - The Secret Sauce


Three reasons for outstanding Exadata performance
Hardware engineering
Local query processing functionality in storage layer
Database engine aware of intelligent storage layer
- extended optimizer costing model and transformations
- extended SW to use Exacta Storage APIs

Divide and conquer for query processing


not just with slave processes (PARALLEL)
not just between compute nodes (RAC)
but between compute and storage nodes
2015 sumIT AG

03/2012

11

Exadata - Storage Software Evolution


Smart Scanning
- execute sub-query in storage cells
- project results in storage already

Keep hot data in Flash Cache


Storage Indexes
- collect min/max column values
- reduce disc access

Smart scanning directly on HCC


data - no decompression required
Offload mining tasks like scoring
Additional data caching in
columnar format in Flash Cache
2015 sumIT AG

03/2012

12

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

13

Information Mgmt Reference Architecture

Big Data

2015 sumIT AG

03/2012

14

The HADOOP Zoo

2015 sumIT AG

03/2012

15

Information Managament Data Flow

2015 sumIT AG

03/2012

16

Big Data - Challenges


Dynamic ecosphere
- Pre-packaged distributions
- Oracle Big Data Appliance

Analytics
- Tools of Hadoop ecosphere
- Oracle Big Data Analytics

Data Integration
- Ever changing Hadoop tool set
- Oracle Data Integrator
- Big Data SQL

2015 sumIT AG

03/2012

17

Big Data Appliance - The Secret Sauce


Three reasons for outstanding BDA performance
Hardware engineering
Local query processing functionality in storage layer
- Big Data SQL = Exadata Storage Software on HADOOP
- Added as process engine to the HADOOP process layer
- BDS agents run independently on HADOOP nodes

Database engine aware of intelligent big data layer


- extended and enhanced External Table API
- extended optimizer costing model and transformations

Exadata success and performance on Big Data


Big Data transparently available for DB queries
2015 sumIT AG

03/2012

18

Big Data SQL - Smart Scan


1.Read data from HDFS data node
- Direct-path reads
- C-based readers when possible
- native HADOOP classes otherwise

2.Translate bytes to Oracle


3.Smart scan on Oracle format
-

apply storage indexes (BDS2.0)


filtering
column projection
parsing JSON/XML
model scoringmodels

High compression benefits


(except cols with distinct values)
2015 sumIT AG

03/2012

19

Big Data SQL 2.0 - Storage Indexes


New feature of Big Data SQL 2.0
Avoid unnecessary disc access
on HADOOP nodes
Index built during first full scan
Granularity in HDFS blocks (256MB)

Index application
- receive filter predicate
- check storage index for blocks
where
predicate between
min and max
- only smart scan matching blocks
2015 sumIT AG

03/2012

20

Big Data SQL - Query Execution

2015 sumIT AG

03/2012

21

Extended External Tables - HIVE


CREATE TABLE order (cust_num VARCHAR2(10),
order_num VARCHAR2(20),
order_date DATE,
item_cnt NUMBER,
description VARCHAR2(100),
order_total (NUMBER(8,2))
new type
ORGANIZATION EXTERNAL
ORACLE_HIVE
(TYPE oracle_hive
ACCESS PARAMETERS (
com.oracle.bigdata.tablename: order_db.order_summary
com.oracle.bigdata.colmap:
{"col":"ITEM_CNT", \
"field":"order_line_item_count"}
com.oracle.bigdata.overflow:
{"action":"TRUNCATE", \
"col":"DESCRIPTION"}
com.oracle.bigdata.erroropt:
[{"action":"replace", \
"value":"INVALID_NUM" , \
"col":["CUST_NUM","ORDER_NUM"]} ,\
optional
{"action":"reject", \
settings
col":"ORDER_TOTAL}]
)
) PARALLEL 4;
2015 sumIT AG

03/2012

22

Extended External Tables - HDFS


CREATE TABLE order (cust_num VARCHAR2(10),
order_num VARCHAR2(20),
order_date DATE,
item_cnt NUMBER,
description VARCHAR2(100),
order_total (NUMBER8,2)) ORGANIZATION EXTERNAL
(TYPE oracle_hdfs
ACCESS PARAMETERS(
new type
com.oracle.bigdata.rowformat: \
ORACLE_HDFS
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
com.oracle.bigdata.fileformat: \
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'\
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
com.oracle.bigdata.colmap:
{"col":"item_cnt", \
"field":"order_line_item_count"}
com.oracle.bigdata.overflow: {"action":"TRUNCATE", \
"col":"DESCRIPTION"}
optional
LOCATION ("hdfs:/usr/cust/summary/*"));

2015 sumIT AG

03/2012

Location
on HDFS

settings

23

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

24

Columnar Stores - Oracles Flavour


transparent column store managed next to the row store
not either/or
persistent storage row-based as before
column store DML-synched in real-time
the entire Oracle DB-ecosphere remains unchanged
- security
- backup
- disaster recovery
- RAC
-
NO application changes required!
2015 sumIT AG

03/2012

25

Advantages
Best for queries that
- scan large quantities of data
- on a rather small set of columns
- compute aggregates on the
results

High compression benefits on


most columns
(except ones containing distinct
values)
Well suited for OLAP/BI

2015 sumIT AG

03/2012

26

Technology Gems
1. In-memory storage index
2. Filtering on binary compressed data
3. Columnar storage of selected columns
4.
5.
6.
7.
8.
9.

Transparent querying across storage hierarchy


Real-time background actualization of columnar store
Parallel query execution on the columnar store
SIMD vector processing
In-memory fault tolerance on RAC
In-memory aggregation

2015 sumIT AG

03/2012

27

Example - In-Memory Aggregation

New optimizer transformation Vector Group By


Resembles well-known star transformation
Two phase, 6 step process
Phase 1 - preparation
1.
2.
3.
4.

Scan dimensions
Build key vectors
Prepare accumulator
Build tmp-tables for
dim select attributes
Phase 2 - computation
5. Scan facts w.r.t.
key vectors
6. Join filtered facts with tmp-tables
2015 sumIT AG

03/2012

28

In-Memory - The Secret Sauce


Many reasons for outstanding In-Memory performance
Conceptual advantage of columnar format
Speed of processing in DRAM
Sum of technology gems (see earlier)
Database engine aware of columnar stores capabilities
- extended optimizer costing model and transformations
- extended SW to use columnar stores APIs

Unprecedented performance for analytics


Transparently available for DB queries

2015 sumIT AG

03/2012

29

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

30

Headquarters
Wikipedia: "Headquarters (HQ) denotes the location where most, if not all, of
the important functions of an organization are coordinated."
Big Data
Storage

Exadata
Storage

Query Process
in DB

HQ

Columnar
Store

Block Buffer
Disks

2015 sumIT AG

03/2012

31

The Database Kernel Rules Them All


Query Franchising in action
optimizer generates execution plan
partial queries are sent out to other engines
- Big Data (SQL)
- Columnar in-memory store
- Exadata storage

partial results are received & further processed


security policies are applied
final results are delivered
Divide and conquer between data management technologies
2015 sumIT AG

03/2012

32

The Key Lies in The Kernel


Database optimizer and execution engine make it happen
Transformer:
- new transformations

Estimator:
- new cost estimation models

Execution engine:
- extended calls and APIs

Only possible because Oracle


owns all implementations
and APIs involved
2015 sumIT AG

03/2012

33

Crucial Part - The Dictionary


The optimizers estimates rely on
- the data dictionary
- statistics

Data Dictionary knows all objects


- Exadata: regular db objects
- In-memory: regular db objects
- Big Data: defined through
External Table declaration

Estimating statistics about


Big Data objects
is challenging
2015 sumIT AG

03/2012

34

Agenda

Introduction
Old Times
Exadata
Big Data
Oracle In-Memory
Headquarters
Conclusions

2015 sumIT AG

03/2012

35

Conclusions
Exadata - boosts execution for traditional applications and analytics
Big Data - provides affordable data management for lots of and unstructured data
In-Memory - serves mighty fast scans, joins, and aggregations for analytics

With other vendors these technologies are either


- not available in the desired quality
- or not tightly integrated, if at all
Data silos & isolated solutions are being built again
But: Oracle provides top solutions for each
In fact: Oracle provides the only portfolio with
- all three technologies tightly integrated
- and central data management through
the Oracle Database
2015 sumIT AG

03/2012

36

S-ar putea să vă placă și