Documente Academic
Documente Profesional
Documente Cultură
† ‡
Ernesto H. Legorreta, CTO
Abstract
VectorSTAR is a high performance columnar RDBMS targeted at
VLDBs in the OLAP, data warehousing, operational BI, nancial en-
gineering, bioinformatics and scientic computation markets. Complex
multi-table associations are computed very eciently by using a vector-
based (instead of set-based) column model. Memory-mapped (instead
of buered) le I/O is used to achieve multiple order of magnitude im-
provements in data loading and query execution times while increasing
reliability and security. Data denition, manipulation and querying is
done with a vectorial SQL dialect (based on function composition) that
provides an interactive querying style supporting exploratory analysis by
end-users, plus a LINQ-style API that simplies database interaction for
application programmers. VectorSTAR uses a hybrid open-source model,
runs on both Linux and Windows 64-bit OS, and supports industry stan-
dard interfaces such as HTTP, ODBC, Microsoft Excel and .NET, and
Sun Java APIs.
1
Contents
1 VectorSTAR 3
1.1 64-bit Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Memory-mapped File I/O . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Data Storage Architecture . . . . . . . . . . . . . . . . . . . . . . 8
Columnar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Schema-oriented . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Information Schema . . . . . . . . . . . . . . . . . . . . . 12
Vector-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Insertions, Deletions, and Updates . . . . . . . . . . . . . . . . . 14
Mapping mode . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 VectorSQL 18
2.1 Exploratory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 ANSI SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Vectorized Operations . . . . . . . . . . . . . . . . . . . . . . . . 22
3 xSTAR 23
xSTAR and ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Just-in-Time CSV Loader . . . . . . . . . . . . . . . . . . . . . . 24
4 Interfaces 25
User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
R statistical programming system . . . . . . . . . . . . . 26
Excel Spreadsheet . . . . . . . . . . . . . . . . . . . . . . 26
2
1 VectorSTAR
VectorSTAR is a high-performance analytic DBMS for enterprise-level appli-
TM
cations on the Linux and Microsoft Windows 64-bit operating systems.
Designed by Vectornova SAdeCV during 2001-2003, it has been continuously
1
developed and eld-tested on large scale applications since its rst deployment
in October 2004. Version 2.0, available since February 2008, is the current
state-of-the-art in high-speed databases world-wide.
VectorSTAR enables users to dene, store, manipulate, share, query, and an-
alyze extremely large amounts of data
2
in a safe, secure, consistent, compatible,
VLDB
and ecient manner. Unlike operational DBMS which usually target OLTP ap-
OLAP
plications that require support for large numbers of concurrent users executing
simple, write-biased, short-lived transactions on relatively small tables (typi-
cally smaller than ten to a hundred million rows), VectorSTAR was designed
for maximum performance in the OLAP, Data Warehousing, Financial Engi-
neering, and Scientic Computation markets, where any number of concurrent
users execute complex, read-biased, potentially long-lived queries, calculations,
and reports on VLDB applications (frequently consisting of billions of rows).
VectorSTAR is the only DBMS specically architected to fulll the most de-
manding requirements of Operational BI and Bioinformatics, two of the fastest
growing emerging markets in IT today.
VectorSTAR is relational. Informally, this means that your data is stored
on rows within tables composed of columns having a specic data type, and is
RDBMS
manipulated and queried using a well-known set of relational operators. More
vs
formally, it means that VectorSTAR adheres to the relational model (as intro-
SQL
duced by E.F. Codd in [Codd70]) as much as any of the major RDBMS in the
market today. Despite a popular misconception, the relational model is not the
same thing as SQL and compatibility with any specic version of SQL is not a
3
requirement for being a RDBMS . VectorSTAR includes an innovative dialect
of SQL which takes advantage of its vectorial architecture while providing an
eective superset of the functionality dened in the SQL2 and SQL3 standards,
including the XML-related functionality dened in SQL2006.
4
VectorSTAR is a columnar DBMS . Roughly, this means that table data
5
is stored in column-major order rather than row-major order . The idea for
columnar RDBMS seems to have arisen simultaneously during the early 1990's
both in industry (SybaseIQ) and the academia (MIT's C-Store). However,
column-based data storage had long been pioneered since the early 1970's by
3
the mainframe-based APL systems. Surprisingly, this simple 90 degree shift in
storage order has been shown to have major performance, functionality, and
ease-of-use implications over the past several years. In consequence, many of
the high-performance DBMS being designed today are columnar. VectorSTAR,
however, diers from most of them in that:
it provides a high-speed bulk data loader which can load multiple columns
in parallel and across a grid/cluster
4
points to a large growth potential from the current level of performance using
11
well known, straightforward techniques , in contrast to other columnar DBMS
which are, arguably, already functioning at their peak architectural capacity
12
today . In summary, for its target application area, VectorSTAR is already
much faster than most competing DBMS (of any kind) and at least as fast as
13
any other current top performer , while still being poised for a signicant in-
crease in performance in its next release. Besides, its high performance benets
come together with unparalleled simplicity and exibility.
VectorSTAR is a native 64-bit application that has been designed from the
14
ground up to take full advantage of the x86-64 architecture (developed by
15
AMD in the early 2000's and then cloned by Intel ) which has now become the
de facto standard 64-bit architecture in the industry, clearly surpassing the Sun
16 17
SPARC , IBM POWER, and HP/Intel Itanium architectures in momentum,
pace of innovation, adoption rate, and market share.
Among the most powerful new capabilities oered by the x86-64 architec-
ture is its vastly larger address space
18
, which increases the maximum amount
vast
of addressable memory available to the OS (kernel mode) and applications (user
address
space
11 Human compression of variable length values (e.g., text and images), secondary associ-
ation indices, constant-time hash-based search indexes, and massively parallel GPU pipeline
support, for example.
12 Vertica [Vert1], which heavily emphasizes compression as one of the foundations for its
performance model (along with columnar architecture and multiple copies of data columns
sorted in dierent orders [SAB05, SBC07]), may perhaps be an example.
13 which certainly includes KDB [KX1] and perhaps Vertica [Vert1].
14 The x86-64 architecture is now called AMD64 and Intel64 by AMD and Intel, respec-
tively. Intel64 should not be confused with Intel's own IA-64 architecture, implemented on
the Itanium family of CPUs, which is not compatible with the enormous installed base of
x86 applications and has been strongly criticized by some of the most respected names in the
industry, including none other than Donald Knuth.
15 in direct competition to its own previous 64-bit strategy embodied in the Itanium CPU.
16 Sun's Opteron-based x86 servers are now outselling its own SPARC servers by a wide
margin: according to some metrics, up to three quarters of the servers they sell are x86, and
of those, perhaps up to three quarters have Linuxinstead of Solarisspecied as their OS.
17 "The history of the Itanium is mixed at best. Hailed as the x86 killer when launched in
2001, the Itanium never gained a strong following and was being written o in many circles
by mid-decade..." Cole, Arthur, "Waiting for Itanium", IT Business, January 11, 2008
HP withdrew its Itanium-2 based workstations from the market as early as 2004, citing that
...in working with and listening to our high-performance workstation partners and customers,
we have become aware that the focus in this arena is being driven toward 64-bit extension
technology.
SPARC servers out-shipped all Itanium servers by 4.5 times for the during CY07, according
to IDC's Worldwide Quarterly Server Tracker, February 2008.
The Itanium is sometimes jokingly referred to as the "Itanic" among industry analysts.
18 Although previous solutions enlarged the physically addressable memory space up to
64GB by extending the address space to 36-bits, they were architectural patches that did
not make all that memory transparently available to end user applications in a compatible
and straightforward manner.
5
19
mode) from about 2GB each, up to 128TB each . As long as the actual memory
chips and motherboards needed to put signicantly more than 4GB of physi-
cal memory on a system were lacking, this tremendous expansion in memory
addressing was of no direct practical signicance. Today, however, most com-
modity motherboards have the capability to hold 16GB of RAM, and those with
the capability to hold 32-64GB of RAM are available at aordable prices (i.e.,
within the USD$500 through $1'000 range).
Traditional DBMS do not eectively take direct advantage of this order
of magnitude increase in system memory. Essentially, all of the techniques
they employ to eciently move data between the disk subsystem (secondary
memory) and the system RAM (primary memory) were developed and perfected
at a time where 32-bit addressing space was not only a practical, but also a
theoretical constraint, and 4GB of physical RAM was considered prodigious. As
a consequence of this outdated strategy, even when provided with signicantly
more than 4GB of RAM, those DBMS will essentially end up using it solely as
an increased buering space for the system le I/O operations that continuously
move relatively small chunks of data to and from disk, still working under the
assumption that system memory is a scarce resource. The end result is that
performance gains due to increased physical memory are, at best, evolutionary
rather than revolutionary, and usually stop far below the order of magnitude
improvement that is the minimum required to achieve a signicant advance in
dealing with the ever growing amounts of data available today to competitive
20
organizations worldwide .
This failure of traditional RDBMS to take full advantage of the new genera-
tion of hardware that has now become mainstream has fostered the appearance
in-memory
of the in-memory DBMS
21
. The rapid spread of this technology
22
reects a
not
pressing industry need. In general terms, these in-memory DBMS concentrate
scalable
on improving traditional indexing, query optimization, and storage management
techniques by essentially limiting data access to that much which ts in physical
memory. This eectively precludes their utilization in VLDB OLAP and data
warehousing applications (such as those that are common in retail, nancial,
23
scientic, and telecom markets) where a 100 million ceiling on the number of
records would often be unacceptable.
Actually, the practically unlimited memory addressing space oered by 64-
bit CPUs does enable a revolutionary approach to data management that is
new
scalable
19 16TB each on Windows OSs due to a reduced 44-bit addressing space instead of the 48-bit
approach
addressing limit of the current generation of x86-64 CPUs.
20 In some markets, such as retail, network service monitoring, mobile telecom, and world
nancial trading, a performance improvement of at least a 3 orders of magnitude (1000 times)
seems to be required to allow a practical implementation of the new theoretical analysis
techniques discovered within the past two decades.
21 Probably best exemplied by TimesTen, a 1996 HP Labs spin-o acquired by Oracle in
2005 and now marketed as ... the foundation product for real-time data management. [Pro-
viding] application-tier database and transaction management built on a memory-optimized
architecture accessed through industry-standard interfaces.
22 Oracle claims at least 1,500 TimesTen installations worldwide.
23 Assuming 100-byte sized records and 64GB of RAM available.
6
truly scalable: one where application programmers and end users deal with
large arrays of data as if they were in memory, though they're stored as plain
24
sequence-structured binary les (reecting the in-memory structure of the
corresponding arrays) and letting the OS memory virtualization mechanism
transparently page them in and out of physical memory as needed. This is the
approach followed by VectorSTAR.
This next-generation approach to data management not only takes advan-
tage of the large 64-bit memory address space, but actually requires it: if Vec-
torSTAR were a 32-bit application, for example, the 2GB per process memory
address limit would restrict a mapped database to hold a maximum of 2GB of
25
data, clearly too small for enterprise-level databases . This is why no memory-
mapped DBMS such as VectorSTAR could have been practical before the main-
stream availability of 64-bit CPUs.
32-bit Version
24 where the le consists of a simple sequence of binary values representing the basic ele-
mental data types, such as integers, oats and characters.
25 A typical sale ticket item record in a retail data warehouse, for example, will likely take
10-20 columns of, say, 4 bytes each, totalling 50-100 bytes. Just 10 million of those could ll
up the entire 2GB addressing space. Common retail data warehouses hold at least hundreds
of millions of sale ticket item records.
7
Furthermore, some important scientic visualization techniques require ac-
cess to a very small, but unpredictable, portion of a very large data set. In this
case, the unpredictability of the data access pattern prevents a buer-based
system (as employed by non-memory-mapped DBMS) from fectching the right
portion of the data set into memory a priori, whereas a memory-mapped system
will behave optimally in this situation. In any case, intelligent memory mapping
can increase IO throughput by orders of magnitude and this is why modern OS
themselves use memory mapping to implement shared libraries and to load and
run executable program les (.DLL and .EXE on Windows OS, for example).
A memory mapped le is added to a process' virtual memory space (known
as the VAS) without actually reading the le into physical memory. The virtual
memory system will transparently read only those portions of the data set ac-
tually referenced by subsequent code; i.e., physical memory acts as a cache for
data on disk, but a cache that is loaded on a reference basis, not according to a
predetermined strategy. Also, application code which accesses memory-mapped
les is identical to code that accesses private in-memory structures (although
performance may dier).
In short, a memory-mapped database trades buer-based le I/O for the
OS virtual-memory mechanism. The latter provides a private virtual address
buer
space for every process using memory-mapped le I/O on the system page le,
vs
the executable and library les, and the mapped data les associated with the
virtual
process. In contrast to buer-based le I/O, with virtual memory-mapped le
I/O there is no need to manage buers or to use any of the traditional lesystem
I/O calls (fopen, fclose, fread, fgets, ...) 26
to access le data : the OS does
this hard work, and does it eciently, transparently, and reliably (it is one of
its main jobs, anyway). Multiple processes can share memory by mapping their
27
virtual address spaces to the same le or to the page le .
An important consequence of the conceptual simplicity of the memory-
mapped le I/O strategy is that the performance prole of the application is
linear
more linear, with less hierarchic buering and cache layers, and with signi-
performance
cantly less degrees of freedom when compared to the multi-hierarchical buered
model
le I/O alternative.
Columnar
26 they are still used to open and close the memory mapped les themselves, of course.
27 interprocess memory sharing is a common reason for mapping to the paging le.
28 Such as SybaseIQ, MonetDB, C-Store/Vertica, and KX/KDB.
8
data must or should be stored by rows rather than by columns. The fact that
this has been so in the vast majority of the RDBMS systems built to date is
an historical accident related to the common way of thinking about records in
29
application programming languages and fostered by the particular constraints
imposed by the highly concurrent, write-biased OLTP environment.
Current usage in academic, industry, and press writings (discarding outlier
cases likely attributable to misconception), indicates that column-major order
columnar
storage is a necessary and sucient condition to be categorized as a columnar
DBMS
DBMS. However, existing columnar DBMS dier signicantly in several other
diversity
important architectural traits. VectorSTAR, for example, stores every column
30
as a separate le consisting of a vector of values , a trait that is shared by
very few other columnar DBMS. The vast diversity and large number of dier-
entiating traits among current columnar DBMS makes it very hard to come up
with sound, useful generalizations which are broadly applicable to the columnar
DBMS market at this moment.
Nevertheless, as is the case with most other true architectural contrasts,
the contrast between column-major and row-major storage has performance
implications which can be eectively harnessed for competitive advantage.
It is true that any advantage so gained is often likely to be also the ulti-
mate source of a disadvantage in the contrasting context. Read optimization,
magnitude
for example, is often in contrast with write optimization, as is encoding space
assymetry
vs. decoding speed, and any architecture that intrinsically favors one will fre-
quently (but not always nor necessarily) be at a disadvantage when dealing
with the opposite. However, in the column-major vs. row-major architecture
dichotomy, column-major storage unilateraly benets from a signicant magni-
tude assymetry in the underlying domain: column cardinality is many orders of
magnitude larger than row cardinality for the vast majority of tables. If the row
and column cardinalities were of similar magnitudes, then it is certainly true that
column-major storage architecture would have the edge in certain contexts while
row-major storage would have it in others. But the cardinalities are nowhere of
comparable magnitude and operations that take advantage of the vastly larger
cardinality of columns easily result in signicant performance gains when com-
pared with their application under the loop-based, one-by-one case required
by the row-oriented approach (which necessarily ends up physically separating
logically-contiguous column values by those values of the intermediary columns
that constitute the same row). Thus, in the end (probably after a degree of
evolutionary technical development similar to the one that traditional DBMS
31
have undergone over the past 15 years ), columnar DBMS should outperform
29 Files have traditionally being thought as sequences of records consisting of n data values
encapsulated into an atomic n-tuple.
30 which are then memory-mapped into the DBMS process using a single, very ecient,
mmap system call.
Compressed columns are stored in dierent formats, according to the compression scheme
used. RLE-compressed columns are stored in a format that keeps track of the count and the
value of each run, for example.
31 Which likely will not take the same amount of time, as the pace of technological develop-
ment grows exponentially with time.
9
row-oriented ones on most counts.
An important pair of contrasting contexts is that of OLTP vs. OLAP ap-
plications. As OLTP emphasizes the concurrent writing of large amounts of
OLTP
small records, traditional DBMS (which grew up in a world where OLTP was
vs
the main driver for DBMS development) were designed on a row-oriented ar-
OLAP
chitecture that favored the atomic retrieval, insertion, and updating of whole
records at a time. As a result, in practically all current row-oriented RDBMS,
whole sets of records are often pre-fetched from disk into memory buers so
that they can be made available quickly to the DBMS process. Ironically, it is
precisely this behavior that gives rise to one of the rst speedup opportunities
for columnar DBMS.
Consider a table of employees consisting of employee name, birthdate, salary,
department, ngerprint, and photo. On a row-oriented DBMS, a query that
selects those employees in department xxx with salaries greater than yyy and
displays their names and birthdates will cause the unnecessary loading into
memory of the data in the ngerprint and photo columns (which are usually
relatively heavy columns), even though they're never referenced neither in the
where-criteria nor in the display-criteria of the query. On a columnar DBMS,
in contrast, the ngerprint and photo columns will not be touched as a result
of executing this query.
VectorSTAR can go further. A query such as the one described above could
be split so that only the where-criteria was executed on a main (fast, expen-
sive) server, producing what is called a result index set (or NDX) that would
then be passed to a (slower, cheaper) secondary server which would then exe-
cute the display-criteria , nally producing what is called the result set that is
returned to the user. This means that the photo and ngerprint columns, which
32
in this particular case would likely never be used within a where-criteria , do
not even need to be stored on the fast server, but solely on one or more sec-
ondary servers which would not be required to execute processor-intensive or
memory-consuming searches but only return the column values (as specied in
the display-criteria) for the indices specied by the NDX. This is only possible
due to the vectorial column representation used by VectorSTAR.
Note that a consequence of column independence is that total aggregate table
size is no longer an adequate indicator of the global "database size" as used
total size
when performing IT infrastructure planning. On truly columnar DBMS, it is
not
eectively replaced by the maximum column cardinality. relevant
Schema-oriented
In VectorStar, the individual data le representing the contents of a single table
column is the basic building block, at the lowest level, of a data storage strat-
egy designed to reect the logical schema of the database on the disk lesystem.
VectorSTAR uses directories to represent the schema objects (databases, tables,
and columns) and les to hold the column values and various metadata. For
32 Although not in a case where VectorSTAR biometric support was used to query based on
face or ngerprint, for example.
10
example, a VectorSTAR installation with two databases (Db1 and Db2) and mul-
tiple tables each, would look like this on disk: (names ending with / represent
directories):
11
In summary, VectorSTAR's disk-based architecture is completely open and
very straightforward. A specic example is illustrated below for School, one of
the VectorSTAR tutorial databases:
VSTAR/
|__School/
| |__Student/
| | | |__COUNT: 5
| | | |__DESCRIPTION: A registered student at the university
| | |__Name/
| | | |__DATA: anna kuhn|louis herbert blake|...35
| | | |__TYPE: string
| | | |__DESCRIPTION: first name, optional middle name, last name
| | | |+ ...other metadata for Name column
| | |__Gpa/
| | | |__DATA: 3.9 2.4 ...36
| | | |__TYPE: numeric
| | | |__DESCRIPTION: Most recently calculated GPA
| | | |+ ...other metadata for Gpa column
| | |+ ...other columns in Student table
| | | |__DESCRIPTION: A professor at the university
| |+_Teacher/
| |+ ...other tables in School database
|+_Telco/
|+_Retail/
|+ ...other databases in this VectorSTAR node
12
Vector-based
37
VectorSTAR columns are vectors of values rather than sets of values, both
conceptually and physically
38
. All columns in a table must be of the same
cardinality
39
. The i th element of column Col1 corresponds to the i th element
of column Col2. This avoids the need for indexing when associating the (oth-
erwise independent) columns within a table. Furthermore, it also leads to a
straightforward bit index implementation: VectorSTAR's result index set (NDX)
is simply a bit array of the same cardinality as the associated table. The NDX
represents the current set of selected rows on the table (See Section 2.1).
Array Data Types The datatype for a column is not restricted to scalars:
it can also be a multidimensional array. Thus, you can have a column of type
int(1000) for example. This is not, as some may think, a violation of the re-
lational model's principle of normality. Rather, it is the natural extension of
the SQL string type denition, e.g. char(30), to data types other than charac-
ters. This conceptually simple feature has nonetheless important implications in
performance and code simplicity under a wide variety of information modeling
problems. It is frequently useful when modeling 1:N relationships where N is
xed and invariant as, for example, the set of temperature measures returned
by a xed number of thermometers on a given location at a given time, or the
low, high, and closing price of a stock on a given day, etc.
Cubes The XJ/J engine underneath VectorSTAR can create and manipu-
late multidimensional cubes, both persistent and transient, of practically unlim-
ited size. VectorSTAR supports not only the typical slice and dice operations,
but also provides a large number of vectorized operations that manipulate the
data in the cubes without requiring loops as in traditional programming lan-
guages. Furthermore, all user-dened operations are automatically applicable
across any one dimension and over any dimensional partition of the cube.
For example, a cube with sales information for 20 countries, 50 regions, 1000
salesreps, 75 products, and 366 days in a year would be constructed as follows
(assuming the source data is on a le called salesdata) :
[] Sales =: cube 'country 20, region 50, salesrep 1000, product 75, day 266'
37 The rst element (at index 0) of every column serves both as its null reference (i.e., 0 is
the value used by FKs into this column to signal a null reference) and as the holder for its
null value (i.e., the actual bit value of the null value for a column depends on the column's
data type).
38 VectorSQL conforms to the Relational Model and does not provide commands that depend
on a particular ordering of the rows within a table, but XJ functions see the data in columns
as (possibly multidimensional) arrays. This allows for certain optimizations that are not
possible otherwise as shown in the mixed master+detail table described in the Advanced
Retail Tutorial provided with the VectorSTAR distribution.
39 Columns can be grown at dierent moments and thus have dierent apparent cardinalities.
However, the table itself has a CARDINALITY attribute that represents the maximum cardinality
common to all its columns and this is the value that is used to compute all queries on them.
13
[] 'Sales' READ_CUBE 'salesdata'
=> Loaded: 100'000 cells.
The READ_CUBE operation reports 100'000 cells read: these are the non-sparse
elements of a much bigger cube, whose total number of elements is obtained by
the product of all the dimension cardinalities:
In spite that the cube consists ofmore than 27 billion cells, none of the following
calculations takes more than a second on an entry level x86-64 CPU machine.
First, calculate the total revenues (i.e., the sum of all cells):
40
Total revenues by country :
14
two table model: keeping the table version 1 online, using table version 2 to
do the bulk insert, typically of hundreds of thousands to millions of rows, and
then having VectorSTAR switch table version 2 for version 1, using the VERSION
command (this is the reason for the trailing digits name restriction: they're
used to indicate multiple versions of the same table). Once this is done, the
contents of version 2 are copied in the background (perhaps using the OS tools)
to version 1 (either using a delta scheme or simply overwriting, depending on
the context). The next bulk insert will be done on version 1, keeping version 2
online. And so on.
Row inserts/updates/deletes do not aect read performance at all. They
can only clash among themselves.
Insertion is at the column level, meaning that actual insertion to the columns
in a table need not be strictly simultaneous for all its columns. If the values for
a column are not available at the moment of the insert, they can be temporarily
instantiated with NULL if the table allows them (this the default behavior in
VectorSTAR). Later, a BULK UPDATE or ROW UPDATE can be done for the missing
value(s) in that column (bulk updating requires bringing the table oine, but
versioned tables avoid this problem).
Whenever you write a value to an address that is mapped read-write to a le,
the value is committed and ushed to disk transparently, outside of user control.
This usually happens immediately for most practical purposes (however, this
ultimately falls under the OS memory map ush policy).
One should never write directly to a disk le that has been mapped into
42
memory . Data in a memory-mapped le should only be modied through
writing to the corresponding memory addresses instead.
42 Windows Server OS will in fact prohibit the operation. Linux will let it proceed and crash
the associated application process, however.
15
Read-only mode, Private or Shared
Read-only This is the default mode for tables. Inserts are done only in
bulk mode ("bulk inserts"), by bringing the table oine, doing the bulk insert,
then bringing it online again (but see below for versioned tables).
Inserts When inserts are allowed into a table, its columns must be pre-
sized to the maximum space that they will eventually require (somewhat anal-
ogous to what Oracle does in order to maximize performance in certain sit-
uations). Initially, all the "unused" slots are lled with the NULL value (in the
current version, tables that allow Insertion must allow NULL in all columnsexcept
the primary key column, of course, which is automatically initialized to a se-
quence of INT values). The table itself holds its eective cardinality in an internal
variable. Private-insert mode suers no performance penalty at all (compared
to Read-only mode). Shared-insert mode requires a "guardian" process to co-
ordinate, through a semaphore, the writing to the same column by more than
one process.
This coordination is only needed to obtain a valid "row index" (rix) to be
used for the insert. Once gotten hold of a valid rix, a process simply writes the
new values in the columns at that position. The new values are not yet made
available to others until the process does a commit on that rix. At that point,
the table updates its internal SYSTEM VIEW NDX (see below, under Deletes, for
a description) to include that rix (this operation requires locking the NDX le
for writing; readers will not be locked) and increases its eective cardinality
accordingly. At that moment the new value is available to any new reader. If
the process fails to call a commit, that rix will remain unused, as a blank space
in the column (which can later be removed by compacting the table, see below).
If the process call abort, that rix is open for reuse. A table that supports row
inserts has, by default, a column called INSERTID which holds the user identity
of the process doing the insert.
16
To discard older, non-current versions of rows, the table must be compacted.
The COMPACT command gives the option of transferring the non-current versions
to a backup table. A table that supports row versioning has a column called
VERSION which holds a full timestamp value. Row-level versioning is a powerful
feature in certain contexts (you could, for example, ask for those versions of a
record as they were between May 1st and May 3rd, for example). A table that
supports updates has, by default, a column called UPDATEID which holds the user
identity of the process doing the update.
Private vs Shared The private version of all this usage modes avoid
any lock overhead and thus function a the same level of the highest read-only
performance.
1.5 Failover
17
2 VectorSQL
VectorSQL is a vectorial dialect of SQL implemented as a library extension
on top of the XJ vector programming language. In contrast to traditional SQL functional
dialects, VectorSQL is interactive, function-based, vector-valued, and extensible: SQL
extensible: new functions that you create are rst-class citizens in Vec-
torSQL and perform just as fast as the native ones provided in the core
distribution.
45 That is, when scripts are longer than about a hundred lines or so.
46 This is a rough statistic based on VectorSQL translations of about a hundred real-world
SQL scripts done over the past four years, and on a translation of the examples in a popular
SQL cookbook, a current work-in-progress at Vectornova.
18
very fast query execution, so that most ad-hoc queries complete in less
than 5 seconds
a way to keep track of previous results and to use them in ulterior queries
Selection Index The row selection index or NDX represents the current
set of selected rows on the table, based on the immediately previous execution
of WHERE and related VectorSQL commands. The NDX is simply a bit array of
the same cardinality as the associated table. VectorSTAR provides the facilities
to save these bit indexes to an ordinary le, which is typically quite small:
the NDX for a dense selection out of a 10 million record table is barely 1.2MB
uncompressed, and usually less than 200KB when compressed. Compression is
done automatically by the SAVE_SELECTION command using a fast RLE algorithm.
Actually, for a sparse selection, the NDX is stored as an index set (rather than
bitset) and the result size in bytes is equal to the number of selected rows (i.e.,
for 10 rows out of 10 million, the NDX occupies a mere 10 bytes). A unique and
very powerful capability of VectorSTAR is that this saved NDX can then be:
sent over a grid to a dierent node(s) where the output will be produced
and displayed
19
A user can save the current selection set to a le (note that this is not
the same as saving the selected rows, it only saves their references) using the
SAVE_SELECTION command. This le can then be sent to other users who can
LOAD_SELECTION from that le and obtain a selection set on their session that is
the same as if they had performed the same queries as the original user who
created the selection set le.
SQL
48 The word command is chosen as a way to avoid having to specify whether the implemen-
tation is through a statement or a function.
49 Actually, there is really no such thing as a standard SQL outside of the denitional papers
presented to the ANSI and ISO as most SQL vendors implement widely varying functionalities
using even more diverse syntax.
50 Transactional inserts and updates, plus other related commands, are only currently avail-
able on the Taipan alpha.
20
Note both the nested structure and the fact that the order of LOC execution is
not the same as their order of appearance. Note also the need for two temporary
tables. Here is the equivalent in VectorSQL:
VectorSQL
FROM 'Cdr'
WHERE 'TxTime in_day ', RunDate
GROUP_BY 'Region'
SELECT INTO 'Tmp' 'group AS Region, round@sum TotalAmnt AS Amount'
FROM 'Ahr'
WHERE 'TxTime in_day ', RunDate
AND 'AdjType in 17, 20'
GROUP_BY 'RegionId'
Adj =. 'sum abs AdjAmnt Ded1Amnt Ded2Amnt'
SELECT UNION 'Tmp' 'group AS Region, round@sum ', Adj, ' AS Amount'
FROM 'Tmp'
GROUP_BY 'Region'
SELECT INTO 'ReportOutput' _
Tech, Datum, Service
group, sum Amount AS Amount
)
Things to note:
Notice how the 2-level nesting of the original query is attened by Vec-
torSQL, as the cdr and ahr intermediate results do not need to be cal-
culated from inside a subquery.
Notice the use of the INTO modier to SELECT, which is a shortcut for: INSERT
INTO x ;; SELECT y . The use of _ in the last SELECT means "use the following
lines up to the single ) as argument to SELECT".
Experience with a rather varied set of SQL programmer backgrounds and ca-
pabilities has shown that many SQL users have no trouble understanding the
51
VectorSQL implementation of a SQL stored procedure . Frequently, the com-
monality of the command set is promptly perceived, leaving only the dierent
52
ordering of clauses as a notable distinction .
21
There is a reason for the apparently "reverse" order of execution. Since
VectorSQL is based on function composition, it has no complex "statements"
and thus the ordering of the component functions (think "substatements", as
in the FROM, WHERE, SORTED BY, etc., pieces of a full SELECT statement) has to
match the required execution ow, whereas in SQL, where SELECT is a complex
statement, the xed syntatic ordering of its subclauses is only really the result of
its original designer's choice. Note also that the ordering is not strictly "reverse"
in many cases, as with the SORT_BY command, which frequently goes at the end
in VectorSQL queries too.
In summary, existing SQL scrips can be reused in VectorSQL in the sense
53
that their underlying logic won't have to be changed , but they will still require
54
some (mostly mechanical) syntax transformation to accomodate the true-to-
execution-ow ordering required by VectorSQL.
Note that there are no references to the parammeter names anywhere in the
code: this style of function denition is called function-level programming
56
.
53 Although it may well be the case that some potential peformance gains are missed out
by adhereing strictly to the original SQL detailed logic in some cases, such as those that call
for the creation of a large number of temporary les in situations where a natural VectorSQL
rendering of the general logic would not require it.
54 A SQL-to-VectorSQL translator is certainly feasible.
55 such as VectorSQL, XJ, and J (used in VectorSTAR); as well as APL, A+, K, Q, Nial,
and some versions of Fortran.
56 Function-level programming is not the same as functional programming. The former
produces programs by assembling functions using higher-level functor operators (i.e., operators
that work on functions to produce other functions). The latter consists on dening (and later
calling) functions which produce no side-eects and always produce the same result when
called in the same context. Examples of the former are rare, the most complete being FP/FL
by John Backus (creator of FORTRAN), J by K. Iverson and Roger Hui, plus some
quite uncommon LISP programming styles. Examples of the latter, though, abound: standard
LISP, Scheme, O'Caml, and Haskell being among the best known ones.
22
This avg will work for any number and kind (as long as they're specializations
of numeric) of arguments.
57
A more familiar implementation follows, using functional programming :
avg values is
sum: 0
for v in values do
sum: sum + v
end
n: num values
return sum div n
)
These examples clearly point out the signicant reduction in code size (mostly
due to spurious complexity) that can be achieved with the vectorized operations
available in VectorSQL and XJ. Vectorized operations are also often optimized
in ways that a function operating on successive scalars inside a loop could not
be subject to, resulting in signicant performance gains.
As a complete example, here is a very concise library of commonly used
58
statistical functions written in J (NB. introduces a comment ):
3 xSTAR
xSTAR is VectorSTAR's grid-parallel bulk data loader and is specically de-
signed to take advantage of the unique opportunities presented by VectorSTAR's
parallel
memory-mapped columnar architecture. xSTAR can be aordably congured
data
to provide the highest data-loading performance available today using only in-
loader
dustry standard disk subsystems.
23
xSTAR and ETL
Inmmediate availability
Once xSTAR has converted a le from CSV to binary format, the le's data
is immediately available to any VectorSTAR DBMS with access to it (either
through DAS, or across a SAN).
xSTAR can read directly from a shared pipe connection established by the
ETL application, so that no actual intermediate data text le needs to be
created on disk
59 Character-separated text les, typically using pipes, commas, colons, tabs, or blank spaces
as eld separators within rows terminated by newline charactersCRLF on Windows, LF on
Linux/Unix, CR on Macs.
24
Binary File Structure
4 Interfaces
User Interface
console: monolithically, often directly from the console, where the user
interface (UI) is running in the same process as the database
Console Mode DBA The console mode is the most exible of the three
and thus the one often most adequate for doing development on VectorSTAR.
However, it oers little in terms of security and it exposes the whole underly-
ing programming language facilities to the user, so it should not be used for
deployment of end-user applications.
In this mode, both the UI (used to enter VectorSQL commands) and the
actual database code are running on the same OS process, which will show on
your CPU task list as a j.exe process on a Windows OS or its equivalent on a
Linux OS. It is a very lightweight process that res up practically immediately
and typically does not consume more than 500K-750K of RAM initially. This
process is not multithreaded: you run multiple copies of a similar one (without
the UI overhead) to support multiple users in client-server and web congura-
tions. The sheer simplicity of this arrangement means that you can eectively
use the underlying OS process administration facilities to manage your Vec-
torSTAR database. This is in stark contrast to other DBMS systems that re
up so many obscure processes that it becomes very hard to know what each one
does.
After installing VectorSTAR in console mode, you run one of the available
console60 programs that provide a UI to the underlying VectorSTAR process.
API
25
Bridges
Socket-based R interface
require'R'
'Store' OPEN 'Sale,Product'
R_import 'sd($1)'61
sd GET 'Sale.Price'
=> 289.49517039958
JOIN 'What'
GROUP 'Product.Department'
SELECT 'group, sd Qty'
=>
ProductDepartment SdQty
FURNITURE 5.77
APPLIANCES 4.20
CLOTHING 3.41
ELECTRONICS 120.3
61 Equivalent to:
'sd' is R with 'sd($1)'
62 (3) and (4) require VectorSTAR running on Windows.
26
5 Future Release Schedule
5.1 Taipan
Taipan is the code name for the next major release of VectorSTAR, due in 1Q09.
It will provide support for:
triggers
constant-time hash-indexes
Spatial indexing
GML support
27
5.2 Berzerk
28
Nomenclature
.NET CLR programming framework, a Microsoft technology
BI Business Intelligence
C C Programming Language
I/O Input/Ouput
IT Information Technology
29
NetCDF Network Common Data Form
OS Operating System
UI User Interface
CPU footprint - the load that a program imposes on a CPU when idle
30
loop - a series of instructions executed repetitively
31
Index
32-bit, 7 database size, 10
64-bit, 35 decoding speed, 9
dense selection, 19
ANSI, 20 directory, 10
ANSI SQL, 18 disadvantage, 9
APL, 4 disk, 10
arithmetic average, 22 disk subsystem, 23
ATI, 28 disk-based architecture, 12
atomic, 10 display-criteria, 10
DLL, 8
b-tree, 4
double precision, 12
BI, 3
Bioinformatics, 3
encoding space, 9
bit array, 13, 19 end-user, 4
bit index, 13
ETL, 24
buer, 8
Excel, 26
buered le I/O, 8
EXE, 8
bulk data loader, 23
exploratory data analysis, 4
extensible, 18
C, 24, 25
C++, 25
le, 9
cache, 8
lesystem, 10
cardinality, 7, 9, 13, 19
ngerprint, 10
client-server, 25
FireStream, 28
CLR, 25
rst-class citizen, 18
Codd, 3
oating point, 12
code size, 23
FORTRAN, 22
column cardinality, 10
FP/FL, 22
columnar, 3, 8
full backup, 11
command, 18, 20
function composition, 22
comment, 23
function-based, 18
compiled, 18
function-level, 22
concurrent, 10
functional programming, 23
console, 25
constraint, 6 Gops, 28
CPU footprint, 4 GPU, 28
Cryptography, 27 grid, 19
CSV, 24 grid-parallel, 23
cube, 13
Haskell, 22
data, 11 HTTP, 25
data level, 23 HTTPS, 25
data storage, 10 Human compression, 27
data warehousing, 1, 4
32
IEEE, 12 O'Caml, 22
ijx, 25 OLAP, 6
imperative programming, 23 OLTP, 3, 9
in-memory DBMS, 6 OpenCL, 28
interactive, 4, 18 operational BI, 3
intermediate le, 24
internet, 25 page le, 8
intranet, 25 performance, 23
ISO, 20 persistent, 13
IT, 3 photo, 10
Itanium, 5 PocketPC, 7
pointer reference, 24
Java, 25 POWER, 5
JSON, 24 pre-fetch, 10
press, 9
kernel mode, 5 processor-intensive, 10
programming language, 4, 9, 23
legacy, 24
proprietary, 18
library extension, 18
Python, 27
LISP, 22
loading, 23, 24 QUEL, 3
LOC, 18, 21 query, 10
loop, 13, 18, 23, 24
loop-based, 9 R, 26
loops, 22 RAM, 6, 25
Read optimization, 9
Mac OS/X, 7
relational, 3, 8
macro, 24
relational model, 13
mail attachment, 19
relational operator, 3
mainframe, 4
RESTful, 11
memory buer, 10
result index set, 10, 13
memory-consuming, 10
result set, 10
metadata, 11
RLE, 19
mobile, 7
Roger Hui, 22
monolithic, 18
row-oriented, 9, 10
multidimensional, 13
scalar, 22
native, 18
Scheme, 22
NDX, 10, 13, 19
shared pipe, 24
NetCDF, 24
slice and dice, 13
normality, 13
socket, 25
null value, 13
SPARC, 5
null-terminated, 12
sparse selection, 19
NVIDIA, 28
SQL, 3, 4
nvl function, 21
33
SQL2, 3 XML, 3
SQL2006, 3 xSTAR, 23
SQL3, 3
standard, 20
state-of-the-art, 4
statement, 22
statistical, 23
Stored Procedures, 22
subclauses, 22
subquery, 21
synergistic, 4
syntatic ordering, 22
system memory, 6
Taipan, 20
temporary table, 21
temporary tables, 21
Tops, 28
transient, 13
triggers, 27
tutorial, 12
Tutorial D, 3
UI, 25
user interface, 25
user mode, 6
user-dened, 13
VAS, 8
vector-based, 4
vectorized, 18
Vectornova, 3
virtual memory, 4, 8
VLDB, 3
warehousing, 6
web, 25
what-if, 18
where-criteria, 10
Windows Mobile, 7
Windows VISTA, 7
Windows XP, 7
worksheet, 26
write optimization, 9
x86-64, 5, 28
XLS, 26
34
References
[Codd70] E.F. Codd: A Relational Model of Data for Large Shared Data Banks.
1970
[Iver72] K.E. Iverson: A Programming Language. New York, John Wiley &
Sons, Inc., 1972
35
[SC05] M. Stonebraker, U. Cetintemel: One Size Fits All: An Idea whose
Time has Come and Gone. Proc. ICDE, 2005
[TPCH08] The Transaction Processing Council: TPC-H Benchmark. Revision,
2008
[Jant95] J. Jantzen: Array approach to fuzzy logic. Fuzzy Sets and Systems
70:359-370, 1995
[Zade84] L.A. Zadeh: Making computers think like people. IEEE Spectrum pp.
26-32, 1984
36
[Baye97] R. Bayer: The universal B-Tree for multidimensional Indexing: Gen-
eral Concepts. WWCA, Mar 1997
[CD97] An Overview of Data Warehousing and
S. Chaudhuri, U. Dayal:
OLAP Technology. SIGMOD Record 26(1):65-74, 1997
[CI98] C.Y. Chan, Y.E. Ioannidis: Bitmap Index Design and Evaluation.
SIGMOD, 1998
37
[TT01] D. Theodoratos, A. Tsois: Heuristic Optimization of OLAP Queries
in Multidimensionally Hierarchically Clustered Databases. DOLAP,
2001
[WB98] M.C. Wu, A.P. Buchmann: Encoded Bitmap Indexing for Data Ware-
houses. ICDE, 1998
[WOS01] K. Wu, E. J. Otoo, A. Shoshani: A Performance Comparison of
bitmap indexes. CIKM, 2001
[YL94] W.P. Yan, P.A. Larson: Performing Group-By before Join. ICDE,
1994
[YL95] W.P. Yan, P.A. Larson: Eager Aggregation and Lazy Aggregation.
VLDB, 1995
[Fren95] C.D. French: One Size Fits All Database Architectures Do Not Work
for DSS. Proceedings of SIGMOD, 1995
[Syba1] Sybase: SybaseIQ. www.sybase.com
[KX1] KX Systems: KDB plus. www.kx.com
[WKHM00] T. Westmann, D. Kossmann, Sven Helmer, Guido Moerkotte: The
Implementation and Performance of Compressed Databases. SIG-
MOD Record 29(3), 2000
38
[Tand89] Tandem Database Group: NonStop SQL, A Distributed High Per-
formance, High Availability Implementation of SQL. Proceedings of
HPTPS, 1989
39