Sunteți pe pagina 1din 21

A short introduction to Vertica

Tommi Siivola, Software Engineer


RedHat Software Developer Meetup 10.09.2014

AGENDA

Quick orientation
Columns
Projections
Clustering
Hybrid storage
Special features

Quick orientation to Vertica


Big data database product from HP
For handling terabytes/petabytes of data
Column-oriented

Quick orientation to Vertica


What does that mean in practice?
Vertica is a relational database

Supports a subset of ANSI SQL-99 standard

JDBC/ODBC drivers

A command line client (vsql)

Quick orientation to Vertica


Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu)
Amazon AMI available for running in Vertica in the cloud
Up to 1 TB of data and a cluster of 3 nodes without license
(so called Community Edition mode)
Larger setups require a license from HP

Concepts: column-oriented
Vertica stores data as columns, instead of each row as unit
Allows for efficient data compression

Can skip unwanted columns when querying

More efficient aggregate value calculations

Concepts: column-oriented
ROWS VS. COLUMNS
2014-03-15

23.43

2014-03-15

23.43

2014-03-15

23.97

2014-03-15

23.97

2014-03-15

24.51

2014-03-15

24.51

2014-03-15

25.05

2014-03-15

25.05

2014-03-15

25.59

2014-03-15

25.59

2014-03-16

26.13

2014-03-16

26.13

2014-03-16

26.67

2014-03-16

26.67

2014-03-16

27.21

2014-03-16

27.21

2014-03-16

27.75

2014-03-16

27.75

2014-03-16

28.29

2014-03-16

28.29

Concepts: column-oriented
RUN LENGTH ENCODING
2014-03-15

23.43

2014-03-15

23.43

2014-03-15

23.97

(5 times)

23.97

2014-03-15

24.51

24.51

2014-03-15

25.05

25.05

2014-03-15

25.59

25.59

2014-03-16

26.13

2014-03-16

26.13

2014-03-16

26.67

(5 times)

26.67

2014-03-16

27.21

27.21

2014-03-16

27.75

27.75

2014-03-16

28.29

28.29

Concepts: column-oriented
SKIP UNWANTED COLUMNS

SELECT value, id FROM table

date

value

id

2014-03-15

23.97

2014-03-15

24.51

2014-03-15

25.05

2014-03-15

25.59

2014-03-16

26.13

2014-03-16

26.67

2014-03-16

27.21

2014-03-16

27.75

2014-03-16

28.29

Concepts: projections
Data physically stored in projections
Projections similar to materialized views
Data optimized for querying during insert
Table has one or more projections
Projection contains one or more columns
Data can be duplicated in projections for query efficiency

Concepts: projections
ONE DATA, MANY PROJECTIONS

Sorted by date

Sorted by id

2014-03-15

23.43

2014-03-16

27.21

2014-03-15

23.97

2014-03-15

23.43

2014-03-15

24.51

2014-03-16

27.75

2014-03-15

25.05

2014-03-15

23.97

2014-03-15

25.59

2014-03-16

26.67

2014-03-16

26.13

2014-03-15

25.05

2014-03-16

26.67

2014-03-15

24.51

2014-03-16

27.21

2014-03-15

25.59

2014-03-16

27.75

2014-03-16

26.13

2014-03-16

28.29

2014-03-16

28.29

Concepts: clustering
Parallel processing
Data segments distributed across cluster nodes

Performance can be increased by adding hardware

Reliability (K-safety)
Tolerates nodes going offline
All nodes can respond to queries queries can be load
balanced between nodes

Concepts: clustering
SEGMENTATION

Node 2

Node 4

SEGMENT2

SEGMENT4

Node 1

Node 3

SEGMENT1

SEGMENT3

Concepts: clustering
K-SAFETY

Node 1

Node 2

Node 4

SEGMENT2

SEGMENT4

SEGMENT3

Node 3

SEGMENT1

SEGMENT3

SEGMENT2

SEGMENT4

SEGMENT1

Concepts: Hybrid storage


Read-optimized storage (ROS)
On disk

Heavily encoded & compressed

Write-optimized storage (WOS)


In memory

No encoding or compression

Concepts: Hybrid storage


Inserted data is first aggregated in WOS
Inserting to WOS is faster, due to lack of compression
and disk write overheads
Background job moves data in batches from WOS to ROS
Writing to ROS is more efficient in batches

Querying is more efficient from ROS

Vertica feature: Pattern matching


Example: Finding sequences in
web site log data
Find all sequences where user
enters the site, browses and
finally makes a purchase
Difficult to express in SQL
Vertica has SQL extension for
finding patterns

user

action

enter

browse

browse

purchase

enter

browse

enter

browse

purchase
PATTERNS IN DATA

Vertica feature: Pattern matching


Example: find sequences where user enters a site, browses
and makes a purchase
SELECT uid,sid,ts,refurl,pageurl,action,
event_name(),pattern_id(),match_id()
FROM clickstream_log
MATCH
(PARTITION BY uid, sid ORDER BY ts
DEFINE
Entry
AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%',
Onsite
AS pageurl ILIKE
'%site.com%' AND action = 'V',
Purchase AS pageurl ILIKE
'%site.com%' AND action = 'P'
PATTERN
P AS (Entry Onsite* Purchase)
ROWS MATCH FIRST EVENT);

Extending Vertica

Custom SQL functions can be created with R, Java or C++


R can be used for creating scalar and transform functions
Java, all of the above + load functions
C++, all of the above + aggregate and analytic functions

Find out more


Vertica free downloads available at (requires registration)
my.vertica.com
Vertica documentation available at (no registration)

www.vertica.com/documentation

C-Store research project (Vertica predecessor)

db.csail.mit.edu/projects/cstore/

THANKS!

Tommi Siivola, Software Engineer


tommi.siivola@eficode.com
+358 (0)50 371 9308
eficode.fi

Autom
at
nivety isoi tai
ja mu
ita
kirjoit
Eficode uksia
n blogis
sa.
E
FICODE
.FI/BLO
GI

S-ar putea să vă placă și