Documente Academic
Documente Profesional
Documente Cultură
Version 4.1
Operations on “transient” database objects eliminate the complex and costly tasks
of cache and file management which has several beneficial side effects:
• The code path between the application and the data is significantly
shortened. Fewer CPU instructions translate directly into increased
performance. With modern hardware, eXtremeDB read transactions
require nanoseconds while write transactions require up to a few
microseconds.
• Eliminating the complex logic associated with cache and file management
reduces the code size (footprint), but also eliminates multiple redundant
copies of any given piece of data (i.e. a copy in the application, a copy in
the database cache, a copy in the file system cache, and a copy in the file
system itself).
• An all-in-memory database is optimized with different strategies than a
disk-based database. The latter is concerned with minimizing file I/O
operations, and will trade off memory and CPU instructions to avoid file
I/O. An all-in-memory database doesn’t worry about disk I/O so is
optimized to reduce CPU instructions and to maximize the amount of data
that can be stored in a given amount of space. Consequently, eXtremeDB
requires a fraction of the space of disk-based databases to store a given
amount of data.
• eXtremeDB actually provides two libraries for all-in-memory databases:
the optimized “Direct Pointer Arithmetic” library which accesses the
memory locations of database records by simple pointer arithmetic
(offering a 5% -15% performance advantage) or, for shared memory
implementations, the “Offset” library which calculates record locations by
first obtaining their offsets from the beginning address of the database
then converting these to proper pointers (incurring a slight additional
performance cost). Both libraries implement these memory calculations
internally; the only interface applications need manage are simple
connection handles.
• Some embedded systems, such as flight safety related systems, are not
permitted to do dynamic memory allocations because they can lead to
memory leaks that could ultimately cause system failure. eXtremeDB
applications can utilize static memory so that no dynamic memory is
allocated.
The “persistent” modifier in the class declaration identifies database objects that
will be stored on disk. The uniquely configurable eXtremeDB on-disk features
include the following:
data in any dedicated region of memory and coordinates access by multiple tasks
(threads). For multi-process architectures (for example, Sun Solaris, Linux, QNX
Neutrino, etc.), eXtremeDB can manage a database in shared memory and
coordinate access by multiple processes, each potentially with multiple threads.
Beyond the core eXtremeDB In-Memory Database System package, there are four
additional editions that extend the basic core runtime to meet specific user
requirements as follows:
Our Objectives
The primary objective for eXtremeDB is to provide extremely high performance
for the kinds of applications to which it is targeted. These applications are
Product evolution
eXtremeDB has played a significant role in the success of thousands of real-time
applications on a wide range of embedded systems platforms. Driven by requests
from developers and enthusiastic customers, additional features have been added
to extend the basic eXtremeDB core technology to address evolving user
requirements, as can be seen from the list of additional versions above. However,
great care has been taken in each stage of the product evolution to make no
compromises with our basic extreme performance goals. Each additional
interface, be it Transaction Logging, High Availability, SQL or JNI, is provided
in the form of separate libraries that can be linked into an application as desired to
address specific application needs.
While developers who need the absolute best performance for mission critical
applications are reassured to know that the underlying core runtime remains the
fastest, most robust in the industry, they also appreciate knowing that less
demanding applications can interface with the same eXtremeDB databases
through SQL, for example, to generate reports or allow a flexible query interface.
Or a High Availability application may provide the utmost reliability for sensitive
data transactions, while a highly optimized, performance critical application may
perform different transactions simultaneously on the same database.
Depending on your license, you might have received one or both of the eXtremeDB
addendums:
• The “eXtremeDB High Availability Addendum” explains how to implement High
Availability redundancy in the form of Master and Replica applications;
• The “eXtremeDB Transaction Logging Addendum” explains how to implement
Transaction Logging to allow for efficient recovery of in-memory portions of
databases in the event of system failure.
Finally, if you have licensed SQL extensions, you will have received the “eXtremeSQL
User’s Guide” that explains the use of the SQL Interface to eXtremeDB and how to
optimize its features.
For a good source of information regarding implementation details, you are encouraged
to review the source code provided in the sample programs in the eXtremeDB package.
The readme.txt file in the “samples” directory outlines the specific features implemented
in each sample. Building these applications and stepping through them in the debugger is
an excellent way to gain familiarity with the eXtremeDB API.
There are also a number of technical articles and white papers about eXtremeDB
available on the McObject website, http://www.mcobject.com, that you are welcome to
download.
Syntax Conventions
As you will see in the next sections, the eXtremeDB API is divided into two broad
categories: the “Static API” consisting of application independent interface functions, and
the “Generated API” consisting or application specific type-safe interface functions
generated by the eXtremeDB DDL compiler. When referring to eXtremeDB-generated
functions, we use the following predicates:
classname
structname
fieldname
indexname
eventname
to refer to elements of your database design that are used in the function naming
convention of eXtremeDB, for example:
describes the functions that are generated for every field of every class to populate the
database objects’ fields.
In our description of the Database Definition Language (DDL) syntax, we use square
brackets (“[” and “]”) to indicate optional elements, ellipses (“…”) to indicate repeating
items, a bar (“|”) to indicate a choice, and italics to indicate an item that requires further
definition by you. For instance:
Field-statement:
A field-statement declares a field of a class or structure. You must declare the type of the
field (either an atomic data type, a typedef, or a previously defined structure). However,
field-name requires no further definition.
Our perspective on database systems is from the standpoint of embedded and real-
time systems. Thus, these sections may reveal a perspective on database theory
that is different from yours. Our objective is to provide a highly effective storage
solution for real-time and embedded applications, and a highly productive
database development tool for embedded application developers, which makes
use of modern programming techniques. Externally, eXtremeDB exposes a rich
object-oriented database interface to applications, making it extremely easy for
developers to describe, store and manipulate application-specific data. Internally,
eXtremeDB uses storage layout and access methods that are specifically
optimized for the supported data representation.
Definitions
A database is a collection of related data organized for efficient storage and
retrieval. Beyond this, any attempt to more specifically describe a database
inevitably involves individual features of one or more specific database
implementations. The following definitions describe the eXtremeDB
implementation.
The term class is used in most object-oriented languages, such as C++ or Java.
The class defines the properties of the object and the methods used to control the
object’s behavior. This definition is correct for persistent eXtremeDB classes as
well—a database class defines object fields and access methods.
Elements are called fields in eXtremeDB. Other common terms are “attribute” and
“column”. Fields have a type property. Type determines whether the element
holds character, integer, real, or binary data. See Appendix A for a list of
eXtremeDB data types. eXtremeDB also supports arbitrarily large fields through
blob and vector types, and complex fields through the structure type.
In addition to grouping fields into a class, eXtremeDB gives you the tools to
create a sub-grouping we call a structure. A structure declaration names a type
and specifies elements of the structure that can have different types. Structures
and simple types are building blocks to construct object definitions. Structures can
be used as elements of other structures. Like other element types, you can have a
vector of structures.
Generally speaking, fields are either simple or complex. Simple fields are atomic
types such as char, integer, string, and so on. Complex fields can be vectors of
simple types, structures (which may in turn contain structures and vectors),
vectors of structures, and blobs.
In the relational or hierarchical data models, records are constructed from basic
data type fields. The collection of built-in data types and built-in operations were
motivated by the needs of business data processing applications. However, in
many engineering or scientific applications this collection of types is not
adequate. For example, in a scientific application, a requirement could be to
describe a time series and store and access it with appropriate operations. Another
common example is tree-like structures that are widely used in engineering
applications such as routing tables or “electronic program guide” implementations
for set-top boxes. Historically, complex data types and operations on them have
been simulated using basic data types and operations provided by the DBMS with
substantial inefficiency and added complexity. Complex objects are represented
by multiple basic tables or records and defining relationships between them.
objects cannot be represented with one record, and developers are forced to store
parts of an object in different tables and define relationships between the object’s
parts. However, objects are entities, all parts of which are working as a whole.
Consequently, developers usually introduce their own APIs to store and retrieve
objects. These APIs shield inner relations within objects from the application, but
at the same time, introduce extra layers of application code that must be written,
debugged and executed.
• R-Tree: These indexes are commonly used to speed spatial searches; for
example to find the rectangle that bounds a given point, or all rectangles
that overlap a specified rectangle.
At first, it may seem strange that oids must have an identical composition for
every class in the database. In actuality, this models the real world in many
embedded application environments. For example, a system that receives data
from some automated source will receive objects that have an identifier already
supplied by the source. An example could be a network of sensors for which the
oid of every sensor class is sensor-type + sensor-id + measurement-timestamp.
Note that not every class is required to have an oid.
oids and refs are a better alternatives to indexes for establishing inter-object
relationships. An object can have a vector of references, as one means to
implement a one-to-many relationship across classes. Indexes can and should be
used to implement fast random access to objects by one or more key fields, for
sorted access, and for range retrieval. When possible, oid and ref types should be
used to implement relationships.
eXtremeDB also offers the autoid type. Autoid is similar to oid, except that the
structure and value of autoid fields are determined by the eXtremeDB system. An
application uses the autoid_t typedef to declare program variables of type autoid,
and the autoid_t DDL data type to create a reference in one object to the autoid
value of another object. Autoid and autoid_t can be used as an alternative to oid
and ref whenever a natural oid does not exist, is deemed too cumbersome, or an
automatically incrementing identifier is desired.
To express the content and organization of a database, the database designer uses
some or all of these components in a database definition language (DDL) to
create a database schema. The schema is a textual description of the data model.
The steps involved in creating an application that uses eXtremeDB are illustrated
in the following diagram.
mcocomp
Text editor Database schema (schema
compiler)
Application Database
source code files interface files
(.h and .c)
C/C++ compiler
Linker Application
Database Design
There are no object-oriented methodologies that have been specifically devised
for object-oriented database design. However, object models being data oriented
rather than process oriented, suit the needs of database applications quite well.
There are a plethora of books and articles written on the topic of data modeling
and database design. We won’t attempt to give the topic an exhaustive treatment
here, but will attempt to hit the highlights and permit you to seek out additional
resources if a particular area piques your interest.
For eXtremeDB database design, we’ll establish several practical steps that, in our
view, designers will benefit from.
When you are done with the initial schema, develop benchmarks. The best kind of
benchmark is one that models an application well—it is also the most time
consuming and expensive. Unfortunately, the popular benchmarks (for example,
TPC) do not cover many different uses of object-oriented databases, and they are
There are no real quality criteria for object database schema, for example, a
measurement of the degree of redundancy as is done by the framework of
normalization for the relational case. There are no “transformation rules” for
better data representation. Designers rely on the benchmarks to establish a
sufficient degree of schema redundancy. During the design process there is no
clear distinction between the design of the application and the database semantics.
Your database will become an integral part of your application, thus making data
access extremely optimized.
Database Models
In the history of database management systems, four database models have
emerged. In chronological order they are: hierarchical, network, relational, and
object. This section will briefly describe the characteristics of each, solely for the
purpose of positioning eXtremeDB in the database landscape.
The various database models are different physical means of implementing the
schema. To a degree, the choice of database models will impact the schema
design. For instance, a relational database will require that all data be organized
into rows and columns with no repeating data elements, whereas eXtremeDB and
some other databases will allow vectors (also called arrays).
Hierarchical Model
The hierarchical database model, of which IBM’s IMS database is the most
recognized example, organizes data into strict parent/child relationships, hence
the term “hierarchical”. The database is always navigated starting from the root
node, and at each node navigated down the left or right branch until the desired
data is located.
Network Model
The network model database is a superset of the hierarchical model. It expands
the strict parent/child metaphor of the hierarchical model such that a “parent”
record can own one, two or more child record types, and child record types can
have more than one (in fact, any number of) parent records. Records can own
themselves in recursive relationships (like manager-employee, where manager is
an employee). Records can have relationships to records more than one level
removed in the model. Finally, network model databases permit navigation to
start at any point in the model, not just from the root. In fact, there may not appear
to be a root when the database is diagrammed. Network model databases take
their name from the fact that a network of relationships is possible.
Relational Model
E. F. Codd first presented the relational data-model. The model offers a
conceptually different approach to data storage. In the relational database, all data
is represented as simple tables where columns represent data attributes (as values
of specific data types) and rows represent instances or records. Relationships are
implemented by data where a given table’s foreign key columns have identical
values to the corresponding primary key columns in a related table. Relational
databases may be accessed using a high-level non-procedural language. This
language is used to gain access to the relations and the desired set of data and the
programmer does not have to write algorithms for navigation. By using this
approach the physical implementation of the database is hidden, thus the
programmer does not have to know the physical implementation to be able to
access the data.
SQL and relational DBMSs have become widely used due to the separation of the
physical and logical representation (and marketing of course). It is much easier to
understand rows and columns than records and pointers to records. Relational
databases solve the two problems of network and hierarchical databases. Inter-
record (or table, in relational parlance) relationships are implemented via indexes,
not pointers, so the relationship-maintaining information is separate from the data.
This makes it relatively easy to add/drop relationships (by simply
adding/dropping indexes) and to add/drop/modify columns (no pointers to fix up).
And, with no pointers, there is no need to keep track of where you are when
navigating a relational database. The disadvantage of relational is that indexes
take more time to navigate and consume more space than pointers do, so
relational databases tend to have lower performance and require more disk space.
Object Model
There is no official standard for object databases. Object databases employ a data
model that has object-oriented aspects like classes with attributes and methods
and integrity constraints; provide object identifiers (oids) for any persistent
instance of a class; support encapsulation (data and methods); and support
abstract data types. Object databases combine the elements of object orientation
and object-oriented programming languages with database capabilities—they
extend the functionality of object programming languages (for example, C++,
Java) to provide full-featured database programming capability. The result is a
high level of congruence between the data model for the application and the data
model of the database, more natural data structures, and better maintainability and
reusability of code. Under the covers, object databases are often implementations
of the network model, but because of a higher level of abstraction, they don’t
expose the complexity of programming.
The host application language is the language for both the application and the
database. It provides a very direct relationship between the application object and
the stored object. In general the object DBMS is tightly integrated with the host
language such as C++, C, Smalltalk, or Java. In contrast with relational DBMS,
where the query language is the means to create, access, and update objects, in
object DBMS the primary interface for creating and modifying objects is directly
via the host language using the native language syntax. Moreover, every object in
the system can automatically be given an identifier (oid) that is unique and
immutable during the object’s life. One object can contain an oid that logically
references, or points to, another object. These references prove valuable when
associating objects with real-world entities; they also form the basis of features
such as bi-directional relationships.
Object-Relational Model
Object-relational database management products try to unify aspects of both the
relational and object databases. Note, however, that there is also no official
definition of what an object relational database management system is. Object-
relational DBMS employ a data model that attempts to add “OO-ness” to tables.
All persistent information is still in tables, but some of the entries can have richer
data structure, called “abstract data types” which is a data type that is constructed
by combining basic alphanumeric data types. For the query language, object-
relational DBMS support an extended form of SQL, sometimes referred to as
ObjectSQL. The object-relational RDBMS is still relational because the data is
stored in tables of rows and columns, and SQL, with the extensions mentioned, is
the language for data definition, manipulation, and query.
Summary
Because eXtremeDB provides a variety of convenient and efficient types of
indexes, you are free to design and implement a relational database with
eXtremeDB. Conceptually, however, eXtremeDB employs the object-oriented
paradigm for both internal storage layout and the database application
development process. eXtremeDB internals are not built upon any other data
model—it manipulates objects and their properties, not records of any kind.
Externally, eXtremeDB exposes complex data types and object access methods
that are determined by class definitions, not a standard database API. Applications
written in C can instantiate persistent objects; the data manipulation language is
the application’s host language (C, C++ or Java), so the database access is tightly
integrated with the application; applications can take advantage of object
identifiers, oids and autoids, to gain very high-performance access to stored data;
complex data types like vectors are seamlessly supported (again via class
interfaces) from the host programming language.
Example
Let’s consider a hypothetical application, a satellite radio receiver. The receiver
will process two types of broadcast data objects: “program data” that consist of
individual program information such as a title, narrator, content description and
program description, and “program schedule” that consist of schedule time data
such as the schedule start time and duration, a number of time slots within the
schedule, each with their start time and duration, and a reference to the “program
data” that describes the program for that time slot. Program information
transmission is organized in a data stream that consists of objects of both types.
Program schedule and program data information can arrive in any order. It is
possible that a schedule for a program would be placed ahead of program data in
the stream.
The receiver application should keep all the programming information from the
current time forward up to a number of hours, and be able to link each program to
the time slot the program airs. It should replace schedules and program data in the
database as the old ones become obsolete and the new ones arrive. There are
certain performance considerations—data must be written into the database
quickly enough to keep up with the transmission. Each data object received by our
receiver is assigned a unique identifier that is also broadcast by the satellite.
ProgramData {
Object ID;
TitleSize;
Title;
NarratorSize;
Narrator;
For(I = 0;I < MaxDescriptions;I++)
{
ContentDescriptionText | ProgramDescriptionText
}
};
ProgramSchedule {
StartTime;
Duration;
NumberOfTimeSlots;
For(I = 0; I < NumberOfTimeSlots; I++)
{
SlotStartTime;
SlotDuration;
Program_data_ID;
}
};
Following our design methodology, we can clearly introduce two classes that
correspond to real-life objects—“Program Data” class and “Program Schedule”
class. “Program Data” always has “Title” and “Narrator” fields, while program
description text or content description text may or may not be present for a
particular program. In eXtremeDB terms, we would declare them as optional and,
in order to do that, both descriptions must be placed in structures. There are no
data structures shared by both classes. The Program Data class will be declared
with an oid so we can reference a program from the appropriate time slot. Each
Program Schedule has an array of time slots that is nicely implemented as a
vector. We also would like to sort schedules chronologically, so we declare a tree
index based on the schedule’s start time. There is no need to keep the oid of the
Program Schedule, even though it is available to us. The only association between
the two classes is via a reference to a program put into the time slot.
This real-world data stream could be formalized with the following eXtremeDB
schema:
struct ProgID
{
uint4 id; // this will hold the transponder-provided object ID
};
struct content_description
{
string text;
};
struct program_description
{
string text;
};
class prog_data
{
string title;
string narrator;
optional content_description ctext;
optional program_description ptext;
list;
oid;
};
struct time_slot
{
uint4 start_time;
uint2 duration;
Access Methods
The process of gaining access to objects stored in eXtremeDB is called
navigation. There are several methods of navigation available: by oid or autoid,
by hash or tree index, and sequential. These methods are encapsulated in the
programming interface that is generated when the schema compiler processes the
database schema. Rather than employ a pre-defined navigational API to access the
database, you will employ the methods generated for the classes according to your
database design.
OID
Whether an oid is provided by an external source or retrieved with an object as a
reference to another object in the database, an oid can be used to quickly retrieve
the object it identifies. Oids must be unique within all classes in the database.
Uniqueness is enforced during object creation by the eXtremeDB runtime.
oid, autoid, hash, and tree indexes can be used to establish relationships between
classes in the database. For example, to establish a relationship between a sensor
and measurements using oids, you could design the following:
struct sens
{
uint2 sens_type;
uint2 sens_id;
uint4 timestamp;
};
declare oid sens[1000];
class Sensor
{
. . .
vector <ref> measurements;
. . .
};
class Measurement
{
uint4 meas;
oid;
. . .
};
In this example, the class Sensor contains a variable length array (a vector) of
references to oids of the class Measurement. Each element of the vector is the oid
of an instance of the class Measurement and can be used to quickly reference
(locate) the associated Measurement object.
AUTOID
Autoid is similar to oid, except that it is a value that is determined by the
eXtremeDB runtime, and the number-of-expected-entries qualifier applies to the
class, not the entire database. Autoids are of the type autoid_t, a typedef for
8-byte signed integer in mco.h (you cannot change this). Each autoid value is
unique in the database. (This is an implementation detail provided for your
information only; you should not rely on this detail being immutable in future
versions of eXtremeDB.) Relationships between classes can be created with
autoids by defining an autoid_t field to contain the autoid of another object.
Adapting the previous example to use autoid instead of oid, we would have:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
vector <autoid_t> measurements;
. . .
};
class Measurement
{
uint4 meas;
autoid[10000];
. . .
};
In the new example, the class Sensor contains a variable length array (a vector) of
autoid_t that act as references to the autoids of the class Measurement. Each
element of the vector is the autoid of an instance of the class Measurement and
can be used to quickly reference (locate) the associated Measurement object.
Autoid is useful when an object has no natural unique identifier, or the natural
unique identifiers are cumbersome and would impose an unacceptable
performance and space consumption penalty to index.
INDEX
eXtremeDB supports hash indexes and tree indexes of the following types: tree (b-
tree), trie (Patricia Trie), rtree (R-tree spatial), kdtree (kd-tree multi-dimensional)
and user-defined (“custom”). Hash and tree indexes can also be used to uniquely
identify objects in the database. Unlike oids, however, hash and tree index values
are only required to be unique within a class.
To accomplish the previous example without an oid, you could use indexes, as in
the following example:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
};
class Measurement
{
uint2 sens_type;
uint2 sens_id;
uint4 timestamp;
uint4 meas;
tree <sens_type, sens_id> sens;
. . .
};
In the first and second examples, the application programmer would navigate to a
Sensor object through whatever mechanism is logical in the context, and then
iterate over the vector of references named “measurements” to visit each
Measurement for the Sensor.
LIST
A third, sequential, navigation method is available for unordered lists of objects.
To iterate over the objects of a class without regard to any particular order, add
the list directive to the class definition, as in:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
list;
};
char<n> Fixed length byte array less than 64K in char<8> name;
size. Char fields could store C-type strings
or binary data. Trailing null characters for C
strings are not required to be stored, since
eXtremeDB adds them when the string is
read out of the database, provided that the
size of the supplied buffer is large enough
to hold it.
nchar<n> Fixed length byte array less than 64K in In your schema:
size. nchar fields store 2-byte characters nchar<20> uname;
that are sorted by their numerical value.
In your C/C++ program:
This is suitable for many asian languages.
nchar_t uname[21];
wchar<n> Fixed length byte array of less than 64K in In your schema:
size. wchar fields store Unicode characters wchar<20> uname;
that are sorted according to the machine’s
In your C/C++ program:
locale setting.
wchar_t uname[21];
Nstring Variable length byte array less than 64K in In your schema:
size. See nchar. nstring uname;
In your C/C++ program:
nchar_t *uname;
wstring Variable length byte array less than 64K in In your schema:
size. See wchar. wstring uname;
In your C/C++ program:
wchar_t *uname;
Blob Binary data object; a byte array of any size, blob jpeg;
can be greater than 64K in size.
time start_tm[3];
defines an array of three time values. Any element except vectors, blobs and
optional structs can be a fixed size array. Fixed size arrays cannot be used in
indexes; for this, use a vector.
Preprocessor
The eXtremeDB™ DDL compiler allows limited use of the C preprocessor.
Preprocessor directives are typically used to make source programs easy to
change. Directives in the source file tell the compiler to perform specific actions,
such as replacing tokens in the text. The eXtremeDB DDL compiler recognizes
the following directives:
The number sign (#) must be the first nonwhite-space character in the line
containing the directive; white-space characters can appear between the number
sign and the first letter of the directive. Some directives include arguments or
values. Preprocessor directives can appear anywhere in a source file, but they
apply only to the remainder of the source file.
Usage examples:
#include "inc1.h"
#define SYMBOL_LEN 4
#define SYMBOL char<SYMBOL_LEN>
#ifdef X_DEFINED
#include "inc2.h"
#else
#include "inc3.h"
#define SOME_VALUE 34
#endif
Declare Statement
Syntax:
The Declare statement is currently used for two different purposes. The first is to
specify the name of the database. The DDL processor populates implementation
file names based on the dbname passed to the declare statement.
The Declare statement is also used to identify a unique object identifier and the
expected number of objects that will be stored with an oid. Expected-number-of-
entries are used for optimization of eXtremeDB’s runtime operations. It is not
required to be exact. eXtremeDB allows declaration of classes with a unique
identifier called an oid (see class statement). The runtime maintains an internal
index referencing all objects of such classes. Objects can reference each other by
oid using the ref data type. oid must be a user-defined structure, even if the oid
has a single field. Each oid value must be unique within the database.
Only one database and one oid declaration is allowed within a database schema.
Example
struct Id {
uint4 id;
uint4 time_in;
};
declare database market;
declare oid Id[20000];
struct StrId {
char<32> str;
int2 num;
};
declare oid StrId[20000];
Struct Declaration
Syntax:
struct-declaration-list:
Note: Because the purpose of the direct attribute is to allow the application to
read or write the structure in a single operation, the schema compiler (mcocomp)
needs to know the size of the structure at compile time. For this reason the
structure can contain no dynamic (vector or blob) fields. Likewise it is not
possible to use direct on a vector of structures.
Example
The direct keyword causes the schema compiler to generate the following code in
<dbname>.h:
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d;
#pragma pack(1)
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d_aligned_to_1;
#pragma pack()
Note: The Fixed_d structure is the one you use in your application.
Fixed_d_aligned_to_1 is only used internally by eXtremeDB.
Enum Declaration
Syntax:
For definition of enumerated type:
enum [declarator] {enum-list} ;
For declaration of variable of type enum within a class:
declarator element-name;
Description
Example
enum FLOWCONTROL {
XON, CTS
};
class example_using_enum {
FLOWCONTROL fc;
};
Class Declaration
Syntax:
class-elements-list:
element-statement |
access-statement |
event-statement
[; element-statement | access-statement|event-statement …];
element-statement:
type-identifier | struct-name| enum element-name [ = value
[, element-name [= value]] …];
or
vector {type-identifier | struct-name} vector-name;
or
[optional] struct-name element-name;
access-statement:
[voluntary] [unique][userdef] tree < class-element |
struct-name.element-name | vector-element [asc|desc]
[,class-element | struct-name.element-name |
vector-element [asc|desc]…]> indexname;
or
[userdef] hash < class-element | struct-name.element-name
| vector-element [,class-element |
struct-name.element-name | vector-element …]>
index-name[expected-number-of-entries];
or
trie < class-element[,class-element] > indexname;
or
[unique] rtree < class-element > indexname;
or
kdtree < class-element[,class-element] > indexname;
or
oid;
or
autoid[number-of-expected-entries];
or
list;
event-statement:
event < class-element update > event_name |
event < new > event_name |
event < delete > event_name [; event-statement ] ;
class-element:
element-name| structname
vector-element:
vector-name.struct-element | vector-name;
Description
The compact class qualifier limits the total size of the class’ elements to 64K.
That includes not just application data, but also the overhead required by
eXtremeDB.
However, the total size excludes the size of blob data (if any blob data elements
are declared for the class), except 2 bytes for the blob reference.
Element Statements
Element statements declare field names with their types. Fields of type integer,
float, date, and enum can have default values expressed in the schema. Default
values will be assigned to such fields when a new record of the type is created in
the database and no explicit value is ‘put’ in the field. Fields that are struct fields
can be declared as optional. An optional declaration means that the field may or
may not be actually stored in the database. If the field is not stored, the runtime
does not reserve (allocate) space for it within the data layout and the associated
“get” methods will return a null pointer.
struct Id
{
uint4 seq;
};
struct Item
{
uint4 id;
string name;
};
enum FC_ {
XON,
CTS
} flow_control;
declare database simple;
declare OID Id[20000];
class Everything
{
date e_date[7];
time e_time[12];
flow_control fc = XON;
uint2 u2 = 99;
uint4 u4, h;
blob blo;
string c;
vector<uint2> vint;
vector<string> vs;
vector<Item> is;
optional Item alternate;
};
Access Statements
Access statements define the access methods that will be generated for the class.
Access methods will be generated for oid, autoid, indexes, and lists.
contain a unique combination of field values for that index. The runtime will
recognize an attempt to create a duplicate and refuse to do so. Tree and Kd-Tree
indexes can optionally specify ascending or descending order for each element of
the index. The default is ascending.
The voluntary qualifier for an index means that the index can be initiated or
dropped at runtime. Voluntary indexes are not built until an explicit call to do so
is issued by the application. In the same fashion, the application may request to
remove a voluntary index.
The userdef qualifier for a tree index means that the application will provide the
compare functions, and thus control the collating sequence for the index.
The oid definition specifies that a class is stored with an oid of the type defined in
the declare oid statement. Only one oid statement is allowed per class.
eXtremeDB maintains a special index for oids stored in the database to facilitate
locating an object in the database by its oid value. Oids must be assigned a value
that is unique in the database (not just in the class, as is the case for hash and
unique tree indexes). The assignment is explicit—the DDL processor generates
object creation methods that enforce oid assignment by the application. The
runtime verifies that the oid is unique, and refuses to create an object if a
duplicate is found. The DDL processor also generates access methods based on
oid. The oid type is defined via the declare oid statement.
Classes that are defined without the oid qualifier do not have the requirement of
having one, but then lack oid-based access methods.
The list declaration generates access methods to perform a sequential scan of all
objects of a given class. The order in which such scanning is done is determined
by the runtime. Every class must have at least one oid, autoid, hash, or tree index,
or the list definition. The DDL processor will emit a warning otherwise. (Without
one of these, there would be no access method generated for the class.)
Event Statements
Event statements declare the events that the application is interested in. The
eXtremeDB database definition language provides grammar—the database
designer uses to specify that applications should receive notification of certain
events occurring in the database. These events are adding a new object, deleting
an object, and updating an object or specified base type fields of an object (nb.
events are not supported for array and vector elements). Events are specific to
classes. In other words, an add event for class Alpha doesn’t activate the
notification mechanism when an Omega object is added to the database.
The schema grammar documents what events the application will be notified of.
How the application handles the events is determined at run-time by the event
interfaces. Please see the “Event Interfaces” section for the discussion of how
events handlers are registered and invoked.
DDL Processor
The eXtremeDB Data Definition Processor, mcocomp, is executed as follows:
ddlspec is the name of the text file containing the DDL specification (schema). It
can follow any naming convention of your choice.
OPTIONS DESCRIPTION
-o, -O Instructs the processor to generate the optimized version of the eXtremeDB
implementation files; otherwise the default (development) version is
generated. The optimized version generates inline functions, and replaces
some functions with macros that are put into the implementation header file
instead of the implementation “C” file.
-p, -P <path> Specifies the output directory. If the directory is not specified, the files are
written to the ddlspec file directory.
-i, -I <path> Specifies the include directory. If this path is not specified the compiler will
look only in the ddlspec file directory.
-hpp, -c++ Generates a C++ implementation file (.hpp).
-si Specifies verbose structure initialization. By default, the compiler generates
code of the form:
struct A { int i; int j; };
A a = {3,4};
Some C compilers will not accept this form of structure initialization so the
si switch will generate code of the form:
struct A { int i; int j; };
A a;
a.i = 3; a.j = 4;
-x, -X generate XML methods: classname_xml_get, classname_xml_put,
classname_xml_create, classname_xml_schema.
-s, -S suppress copyright notice and timestamp console output
-sql Generate additional metadata in the dictionary required to use the
eXtremeSQL programming interface.
-c, -compact Specifies the “compact” option for all classes in the database. 2-byte offsets
will be used for structures, variable length and optional fields in each class,
and all objects are limited in size to 64K (excluding BLOBs).
-persistent Makes all unspecified classes ‘Persistent’.
-transient Makes all unspecified classes ‘Transient’ (default).
-ws1 When one of more fields or type wchar or wstring are present in the schema,
generate 1-byte wchar strings (default).
-ws2 When one of more fields or type wchar or wstring are present in the schema,
generate 2-byte wchar strings.
-ws4 When one of more fields or type wchar or wstring are present in the schema,
OPTIONS DESCRIPTION
generate 4-byte wchar strings
-x32 Generate 32-bit pointers (default).
-x64 Generate 64-bit pointers.
-help Prints out usage information for mcocomp.
Note on wide-character strings: If one or more fields within a schema are of type
wchar or wstring, the character width for all wide-character strings should be
specified on the mcocomp command line. For example the schema
class A
{
wchar<64> name;
wstring description;
application’s problems have been found and the application can consistently pass
verification tests, it would be a waste of clock cycles to continue checking
function parameters and supporting the debug traps. At this stage, developers can
utilize the optimized version of the eXtremeDB runtime.
Application Interface
eXtremeDB provides support for accessing persistent data inside transactions via
application-specific access methods. Currently, programming interfaces are
generated for the C/C++ language.
The interface consists of two parts. The first part is the group of functions that are
“static” or in other words common for all applications, the application-
independent “static” interface; the second part is the functions that are generated
by the schema compiler to provide type-safe data access methods for a particular
schema, the application-specific “generated” interface. The eXtremeDB runtime
ie. all of the referenced “static” and “generated” functions) is linked together with
the application code.
Also note that an application can simultaneously use multiple databases, each
with a different schema.
DDL Example
The following sample ddl code, “schema.mco”, illustrates the concepts described
in this chapter.
struct SampleStruct {
uint2 s1;
char<20> s2;
};
struct BigStruct {
string str;
uint2 u2;
uint4 u4;
vector <SampleStruct> vss;
};
declare database simple;
/* estimated number of class instances is in square brackets */
declare OID SampleStruct[20000];
/*
* “compact” keyword: Total object size, including overhead is less than 64K.
* Size calculation does NOT count size of blob(s) fields
* embedded in the class
*/
/* oid reference */
ref d;
autoid;
oid;
list;
};
char<n> Fixed length byte array less than 64K in char<8> name;
size. Char fields could store C-type strings
or binary data. Trailing null characters for C
strings are not required to be stored, since
eXtremeDB adds them when the string is
read out of the database, provided that the
size of the supplied buffer is large enough
to hold it.
nchar<n> Fixed length byte array less than 64K in In your schema:
size. nchar fields store 2-byte characters nchar<20> uname;
that are sorted by their numerical value.
In your C/C++ program:
This is suitable for many asian languages.
nchar_t uname[21];
wchar<n> Fixed length byte array of less than 64K in In your schema:
size. wchar fields store Unicode characters wchar<20> uname;
that are sorted according to the machine’s
In your C/C++ program:
locale setting.
wchar_t uname[21];
Nstring Variable length byte array less than 64K in In your schema:
size. See nchar. nstring uname;
In your C/C++ program:
nchar_t *uname;
wstring Variable length byte array less than 64K in In your schema:
size. See wchar. wstring uname;
In your C/C++ program:
wchar_t *uname;
Blob Binary data object; a byte array of any size, blob jpeg;
can be greater than 64K in size.
time start_tm[3];
defines an array of three time values. Any element except vectors, blobs and
optional structs can be a fixed size array. Fixed size arrays cannot be used in
indexes; for this, use a vector.
Preprocessor
The eXtremeDB™ DDL compiler allows limited use of the C preprocessor.
Preprocessor directives are typically used to make source programs easy to
change. Directives in the source file tell the compiler to perform specific actions,
such as replacing tokens in the text. The eXtremeDB DDL compiler recognizes
the following directives:
The number sign (#) must be the first nonwhite-space character in the line
containing the directive; white-space characters can appear between the number
sign and the first letter of the directive. Some directives include arguments or
values. Preprocessor directives can appear anywhere in a source file, but they
apply only to the remainder of the source file.
Usage examples:
#include "inc1.h"
#define SYMBOL_LEN 4
#define SYMBOL char<SYMBOL_LEN>
#ifdef X_DEFINED
#include "inc2.h"
#else
#include "inc3.h"
#define SOME_VALUE 34
#endif
Declare Statement
Syntax:
The Declare statement is currently used for two different purposes. The first is to
specify the name of the database. The DDL processor populates implementation
file names based on the dbname passed to the declare statement.
The Declare statement is also used to identify a unique object identifier and the
expected number of objects that will be stored with an oid. Expected-number-of-
entries are used for optimization of eXtremeDB’s runtime operations. It is not
required to be exact. eXtremeDB allows declaration of classes with a unique
identifier called an oid (see class statement). The runtime maintains an internal
index referencing all objects of such classes. Objects can reference each other by
oid using the ref data type. oid must be a user-defined structure, even if the oid
has a single field. Each oid value must be unique within the database.
Only one database and one oid declaration is allowed within a database schema.
Example
struct Id {
uint4 id;
uint4 time_in;
};
declare database market;
declare oid Id[20000];
struct StrId {
char<32> str;
int2 num;
};
declare oid StrId[20000];
Struct Declaration
Syntax:
struct-declaration-list:
Note: Because the purpose of the direct attribute is to allow the application to
read or write the structure in a single operation, the schema compiler (mcocomp)
needs to know the size of the structure at compile time. For this reason the
structure can contain no dynamic (vector or blob) fields. Likewise it is not
possible to use direct on a vector of structures.
Example
The direct keyword causes the schema compiler to generate the following code in
<dbname>.h:
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d;
#pragma pack(1)
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d_aligned_to_1;
#pragma pack()
Note: The Fixed_d structure is the one you use in your application.
Fixed_d_aligned_to_1 is only used internally by eXtremeDB.
Enum Declaration
Syntax:
For definition of enumerated type:
enum [declarator] {enum-list} ;
For declaration of variable of type enum within a class:
declarator element-name;
Description
Example
enum FLOWCONTROL {
XON, CTS
};
class example_using_enum {
FLOWCONTROL fc;
};
Class Declaration
Syntax:
class-elements-list:
element-statement |
access-statement |
event-statement
[; element-statement | access-statement|event-statement …];
element-statement:
type-identifier | struct-name| enum element-name [ = value
[, element-name [= value]] …];
or
vector {type-identifier | struct-name} vector-name;
or
[optional] struct-name element-name;
access-statement:
[voluntary] [unique][userdef] tree < class-element |
struct-name.element-name | vector-element [asc|desc]
[,class-element | struct-name.element-name |
vector-element [asc|desc]…]> indexname;
or
[userdef] hash < class-element | struct-name.element-name
| vector-element [,class-element |
struct-name.element-name | vector-element …]>
index-name[expected-number-of-entries];
or
trie < class-element[,class-element] > indexname;
or
[unique] rtree < class-element > indexname;
or
kdtree < class-element[,class-element] > indexname;
or
oid;
or
autoid[number-of-expected-entries];
or
list;
event-statement:
event < class-element update > event_name |
event < new > event_name |
event < delete > event_name [; event-statement ] ;
class-element:
element-name| structname
vector-element:
vector-name.struct-element | vector-name;
Description
The compact class qualifier limits the total size of the class’ elements to 64K.
That includes not just application data, but also the overhead required by
eXtremeDB.
However, the total size excludes the size of blob data (if any blob data elements
are declared for the class), except 2 bytes for the blob reference.
Element Statements
Element statements declare field names with their types. Fields of type integer,
float, date, and enum can have default values expressed in the schema. Default
values will be assigned to such fields when a new record of the type is created in
the database and no explicit value is ‘put’ in the field. Fields that are struct fields
can be declared as optional. An optional declaration means that the field may or
may not be actually stored in the database. If the field is not stored, the runtime
does not reserve (allocate) space for it within the data layout and the associated
“get” methods will return a null pointer.
struct Id
{
uint4 seq;
};
struct Item
{
uint4 id;
string name;
};
enum FC_ {
XON,
CTS
} flow_control;
declare database simple;
declare OID Id[20000];
class Everything
{
date e_date[7];
time e_time[12];
flow_control fc = XON;
uint2 u2 = 99;
uint4 u4, h;
blob blo;
string c;
vector<uint2> vint;
vector<string> vs;
vector<Item> is;
optional Item alternate;
};
Access Statements
Access statements define the access methods that will be generated for the class.
Access methods will be generated for oid, autoid, indexes, and lists.
contain a unique combination of field values for that index. The runtime will
recognize an attempt to create a duplicate and refuse to do so. Tree and Kd-Tree
indexes can optionally specify ascending or descending order for each element of
the index. The default is ascending.
The voluntary qualifier for an index means that the index can be initiated or
dropped at runtime. Voluntary indexes are not built until an explicit call to do so
is issued by the application. In the same fashion, the application may request to
remove a voluntary index.
The userdef qualifier for a tree index means that the application will provide the
compare functions, and thus control the collating sequence for the index.
The oid definition specifies that a class is stored with an oid of the type defined in
the declare oid statement. Only one oid statement is allowed per class.
eXtremeDB maintains a special index for oids stored in the database to facilitate
locating an object in the database by its oid value. Oids must be assigned a value
that is unique in the database (not just in the class, as is the case for hash and
unique tree indexes). The assignment is explicit—the DDL processor generates
object creation methods that enforce oid assignment by the application. The
runtime verifies that the oid is unique, and refuses to create an object if a
duplicate is found. The DDL processor also generates access methods based on
oid. The oid type is defined via the declare oid statement.
Classes that are defined without the oid qualifier do not have the requirement of
having one, but then lack oid-based access methods.
The list declaration generates access methods to perform a sequential scan of all
objects of a given class. The order in which such scanning is done is determined
by the runtime. Every class must have at least one oid, autoid, hash, or tree index,
or the list definition. The DDL processor will emit a warning otherwise. (Without
one of these, there would be no access method generated for the class.)
Event Statements
Event statements declare the events that the application is interested in. The
eXtremeDB database definition language provides grammar—the database
designer uses to specify that applications should receive notification of certain
events occurring in the database. These events are adding a new object, deleting
an object, and updating an object or specified base type fields of an object (nb.
events are not supported for array and vector elements). Events are specific to
classes. In other words, an add event for class Alpha doesn’t activate the
notification mechanism when an Omega object is added to the database.
The schema grammar documents what events the application will be notified of.
How the application handles the events is determined at run-time by the event
interfaces. Please see the “Event Interfaces” section for the discussion of how
events handlers are registered and invoked.
DDL Processor
The eXtremeDB Data Definition Processor, mcocomp, is executed as follows:
ddlspec is the name of the text file containing the DDL specification (schema). It
can follow any naming convention of your choice.
OPTIONS DESCRIPTION
-o, -O Instructs the processor to generate the optimized version of the eXtremeDB
implementation files; otherwise the default (development) version is
generated. The optimized version generates inline functions, and replaces
some functions with macros that are put into the implementation header file
instead of the implementation “C” file.
-p, -P <path> Specifies the output directory. If the directory is not specified, the files are
written to the ddlspec file directory.
-i, -I <path> Specifies the include directory. If this path is not specified the compiler will
look only in the ddlspec file directory.
-hpp, -c++ Generates a C++ implementation file (.hpp).
-si Specifies verbose structure initialization. By default, the compiler generates
code of the form:
struct A { int i; int j; };
A a = {3,4};
Some C compilers will not accept this form of structure initialization so the
si switch will generate code of the form:
struct A { int i; int j; };
A a;
a.i = 3; a.j = 4;
-x, -X generate XML methods: classname_xml_get, classname_xml_put,
classname_xml_create, classname_xml_schema.
-s, -S suppress copyright notice and timestamp console output
-sql Generate additional metadata in the dictionary required to use the
eXtremeSQL programming interface.
-c, -compact Specifies the “compact” option for all classes in the database. 2-byte offsets
will be used for structures, variable length and optional fields in each class,
and all objects are limited in size to 64K (excluding BLOBs).
-persistent Makes all unspecified classes ‘Persistent’.
-transient Makes all unspecified classes ‘Transient’ (default).
-help Prints out usage information for mcocomp.
Application Interface
eXtremeDB provides support for accessing persistent data inside transactions via
application-specific access methods. Currently, programming interfaces are
generated for the C/C++ language.
The interface consists of two parts. The first part is the group of functions that are
“static” or in other words common for all applications, the application-
independent “static” interface; the second part is the functions that are generated
by the schema compiler to provide type-safe data access methods for a particular
schema, the application-specific “generated” interface. The eXtremeDB runtime
ie. all of the referenced “static” and “generated” functions) is linked together with
the application code.
Also note that an application can simultaneously use multiple databases, each
with a different schema.
DDL Example
The following sample ddl code, “schema.mco”, illustrates the concepts described
in this chapter.
struct SampleStruct {
uint2 s1;
char<20> s2;
};
struct BigStruct {
string str;
uint2 u2;
uint4 u4;
vector <SampleStruct> vss;
};
declare database simple;
/* estimated number of class instances is in square brackets */
declare OID SampleStruct[20000];
/*
* “compact” keyword: Total object size, including overhead is less than 64K.
* Size calculation does NOT count size of blob(s) fields
* embedded in the class
*/
/* oid reference */
ref d;
autoid;
oid;
list;
};
When using eXtremeDB’s disk-based persistence, ie. when the database contains
some or all persistent class definitions, the application will link with the “disk”
library and thus the API functions called will internally call the appropriate
caching interface. Since “hybrid” databases or disk-only databases require more
detailed considerations such as disk caching behavior, file system dependencies,
data encryption and data integrity checking, the disk-based API functions are
described at the end of some sections under the tag “
Runtime Environment
The eXtremeDB run-time environment is initialized by calling the function
mco_runtime_start(). This function initializes one or more semaphores that
coordinate access to the database dictionary between multiple processes, or
between multiple threads of a single process. Each process must call
mco_runtime_start() once, and only once.
MCO_RET mco_runtime_start(void);
MCO_RET mco_runtime_stop(void);
MCO_RET mco_runtime_info(&info);
The next function is provided to set various per-process global database runtime
options. This function must be called before the database runtime is initialized
via mco_runtime_start().
Device Management
eXtremeDB supports the notion of logical database devices. Logical database
devices are abstractions of physical storage locations that can be conventional
memory (static or heap allocated memory in the application address space),
shared memory (“named” memory shared by multiple processes), or persistent
file system memory such as a simple file, a multi-file or RAID file, or even a raw
disk partition.
The application determines the size and location of the memory device, the type
of memory and how the eXtremeDB runtime will use the device. Applications
specify storage devices at runtime via the devs structure argument passed to the
mco_db_open_dev() API. Typically, an array of device structures is stack-
allocated and initialized prior to calling mco_db_open_dev(). Each memory
device is defined by a mco_device_t structure that specifies:
• what purpose the memory region will serve (database, cache, disk file or
log file),
Persistent Databases
Database Control
This group of functions deals with the database control. The normal flow of
control is that the database (identified by its name) is opened or “created” (its
runtime “meta-data” is created from its dictionary) and is then connected to by an
application. Databases are “extendable”, ie. it is possible to increase the memory
size that is used for storage at runtime. Databases can also be streamed to storage
devices (“saved”) and initialized (“restored”) from data streams.
Creating Databases
All databases, whether all-in-memory, persistent or a hybrid database with both
transient and persistent classes, occupy main memory in the application’s static or
heap memory space. Additional main memory may be allocated for devices that
contain the database data for all-in-memory databases or for the cache used by
persistent and hybrid databases. Memory device specifications (as described
above in section “Device Management”) determine the type of memory
(conventional, shared or disk-based) used for the database data and for the log file
if Transaction Logging is used. And memory can be extended later if necessary.
When the database is created memory devices are initialized and meta-data, in the
form of the database dictionary that informs the eXtremeDB runtime about the
database structure, is loaded into memory. The runtime is initialized with several
user-specified parameters that determine runtime behavior while the application is
running or for the life of the database in the case of persistent databases. While
the mco_db_params_t structure and complete implementation details are
explained in the “Reference Guide”, the usage of key elements of this structure
are described below.
The db_log_type element specifies the logging strategy the runtime will use:
NO_LOG, UNDO_LOG or REDO_LOG. Transaction Logging is an alternative
to persistent disk storage that can provide database recovery (in case of system
failure) and data porting functionality for all-in-memory as well as persistent
databases. The db_log_type value is set to NO_LOG by default. If Transaction
Logging is used, the db_log_type value will be UNDO_LOG or REDO_LOG and
how transactions will be committed to disk is specified in the log_params
parameter. The choice of logging strategy can have a significant impact on
performance (see section “Choosing the Transaction Logging Strategy” in chapter
6 for a more detailed discussion).
Note: All databases that define persistent classes must use mco_db_open_dev()
and it is strongly recommended also for all-in-memory databases, even though for
compatibility purposes the old-style mco_db_open() API can still be used to
create all-in-memory databases.
The following code snippet demonstrates the usage of the device management and
the mco_db_open_dev() API for an application managing a single conventional
memory device:
mco_device_t dev;
mco_db_params_t db_params;
} else {
/* unable to open the database */
/* check the return code for addtional informaton */
}
return 0;
Persistent Databases
eXtremeDB uses two separate security mechanisms to provide data security for
persistent databases: page-level CRC32 checking and database-level encryption
through a private key. These mechanisms can be used separately or in
combination. Both features are enabled at the time the database is created and
apply only to persistent databases. Once the CRC or encryption is enabled, and
the persistent database is allocated, it is not possible to disable security either in
the current, or future sessions. Both are there for the lifetime of the database.
Also note that both mechanisms are page-level, hence all data and indexes are
protected. (For a more detailed description of CRC32 and Encryption see section
“Database Security” in chapter 6 below.)
The following code snippet demonstrates the usage of the device management and
the mco_db_open_dev() API for an application managing four storage devices: a
conventional memory device that is used for the RAM portion of the database,
another conventional memory device for the disk manager cache, and two
persistent devices, one for the database file and another for the transaction log.
The sample further uses encryption and CRC protection and defines the
transaction logging type as UNDO_LOG. (See section “eXtremeDB supports
three types of multi-file devices for persistent databases:
In all three cases there are two ways of defining the file segments:
Disk IO” in Chapter 6 for a detailed discussion of transaction logging and caching
policies, and the samples/core/02-open directory for other code samples.)
mco_runtime_start();
mco_runtime_stop();
return ( MCO_S_OK == rc ? 0 : 1 );
}
Database Connections
An application connects to the database (creates a database handle) by passing a
database name to the mco_db_connect() function. The database must have been
previously created with mco_db_open_dev(). By default, up to 64 simultaneous
connections are allowed to each single database (source code licensees can make
this number smaller or larger). If successful, this function returns a handle to the
database, which is used in subsequent database runtime calls such as transaction
control functions.
Note that applications must create a separate connection for each thread/task that
accesses the database and database connection handles can not be shared between
different tasks.
Applications should disconnect from the database once the database connection is
no longer needed. Disconnecting the database allows the database runtime to “de-
allocate” internal memory used to maintain the connection. The database can’t be
destroyed (closed) until all active connections are closed.
Closing Databases
An opened database is closed (destroyed if all-in-memory database) with the
mco_db_close() function. It takes the database name as a parameter. This
function closes (destroys) the database created by the previous mco_db_open()
call. All connections must have been previously closed in order for the function
to succeed. Once closed, all the database transient data is lost.
Example Application
To put together what we’ve seen so far, the sequence of steps to start eXtremeDB,
open an all-in-memory database, connect to it, disconnect, close and clean up is
illustrated in the following code sample:
void StartDB(){
MCO_RET rc;
mco_device_t dev;
mco_db_params_t db_params;
mco_runtime_info_t info;
rc = mco_db_disconnect( db );
rc = mco_db_close( dbname );
if ( !info.mco_shm_supported )
free(dev.dev.conv.ptr);
mco_runtime_stop();
}
void DbAttach(){
MCO_RET rc;
mco_db_h db;
/* connect to db by dbname */
rc = mco_db_connect( dbname, &db );
if ( rc ) {
printf("\n Could not attach to instance: %d\n", rc);
exit( 1 );
}
rc = mco_db_disconnect( db );
rc = mco_db_close( dbname );
mco_runtime_stop();
}
Separate threads (tasks) within the same process only need to connect to the
database. The run-time is only started one time and the error handler is only set
one time per process:
void DbAttach(){
MCO_RET rc;
mco_db_h db;
/* connect to db by dbname */
rc = mco_db_connect( dbname, &db );
if ( rc ) {
printf("\n Could not attach to instance: %d\n", rc);
exit( 1 );
}
rc = mco_db_disconnect( db );
}
MCO_RET rc;
char * dbname = “a database”;
int size = 1*1024*1024;
mco_device_t extdev;
MCO_RET rc;
char * dbname = “a database”;
int size = 1*1024*1024;
void * memory = malloc( size );
mco_db_save \ mco_db_load
mco_inmem_save \ mco_inmem_load
mco_disk_save \ mco_disk_load and mco_disk_load_file
They provide compatibility checking between saved and loaded versions and
optional CRC32 checking to assure the integrity of data. The CRC32 checking is
The eXtremeDB runtime internal buffering uses the buffer size defined by the
constant MCO_STREAM_BUF_SIZE in file mcocfg.h (the default buffer size of
16 Kbytes can be altered to meet the application’s needs). When the buffer size is
reached the runtime calls the user-defined callback function to read or write. The
callback functions simply returns the following value ranges to the eXtremeDB
runtime:
Binary Evolution
Note: An auto_oid must be declared in the database to use the BSE option. If
the database has no auto_oid and option MCO_RT_OPTION_DB_SAVE_BSE is set,
then functions mco_db_save() and mco_db_load() will return error code
MCO_E_UNSUPPORTED.
Persistent Databases
While mco_db_save() copies all database objects (both persistent and transient),
additional functions mco_inmem_save() and mco_disk_save() are provided to
allow only transient or only persistent classes to be written to a stream. For all-in-
memory databases mco_inmem_save() is equivalent to mco_db_save() (except
that option MCO_RT_OPTION_DB_SAVE_BSE applies only to mco_db_save()),
but for “hybrid” databases this allows the application to stream a snapshot of only
the transient objects in the database. It may be useful to note that
mco_inmem_save() can be used for hybrid databases to make a “light snapshot”
as the application shuts down; this way the persistent part of database is already
stored in main database files and transient part is streamed in a separate file or to
another storage device.
Applications can write the content of only the persistent classes to a stream using
the function:
The calling application provides a stream pointer (a pointer to a file, socket, pipe,
etc.) and the address of the user-defined function that will perform the actual
writes of the stream of bytes representing the database image. eXtremeDB will
call this user-defined function, passing it the stream pointer, a buffer and the size
of the buffer to be written to the destination.
In all three cases there are two ways of defining the file segments:
Persistent Databases
While mco_db_load() loads all database objects (both persistent and transient),
additional functions mco_inmem_load() and mco_disk_load() are provided to
allow only transient or only persistent classes to be loaded (read from the stream).
The function mco_db_open_dev() is called internally to open/create an empty
database then database objects are created from the data stream.
Applications can load only persistent classes from a stream using the function:
MCO_RET mco_disk_load( /*IN*/ void* stream_handle,
/*IN*/ mco_stream_read input_stream_reader,
/*IN*/ const char *dbname,
/*IN*/ mco_dictionary_h dict,
/*IN*/ mco_device_t *devices,
/*IN*/ uint2 n_devices,
/*IN*/ mco_db_params_t * db_params );
It is the application’s responsibility to open the stream, in the proper mode to read
binary data, and to ensure that there is adequate memory to hold the database.
Where:
2048 dbtest.dbs0
1024 dbtest.dbs1
2048 dbtest.dbs2
The mco_disk_load_cache() function loads the database disk cache from the
specified file.
In all three cases there are two ways of defining the file segments:
Both functions take a connection handle as a parameter. The first function returns
the total number of free pages while the second returns total number of available
(originally allocated) pages.
The next function returns the number of indexes in the database. It must be called
in the context of a read-only transaction and is often used in conjunction with the
mco_index_stat_get() API to obtain index statistics at runtime:
Runtime statistics are reported for the given index in the, mco_index_stat_t
structure:
For a more detailed explanation of the index statistics see the Reference Guide.
Persistent databases
The following function returns information about the current state of the database
and log file: the size of the log file in bytes, the size of the database file in bytes
and the amount of space that is actually used in the database file.
Database Calculator
Concurrency Control
Concurrency is defined as the ability for multiple tasks to access shared data
simultaneously. The greater the number of concurrent tasks that can execute
without interfering with each other, the greater the concurrency of the database
system. Database concurrency control mechanisms are implemented through
database transactions. eXtremeDB Transaction Managers ensure that database
transactions are performed concurrently without violating data integrity and that
transactions adhere to ACID principles (see http://en.wikipedia.org/wiki/ACID).
Concurrency Management
There are traditionally two models for database concurrency: optimistic and
pessimistic. Pessimistic concurrency control works on the assumption that data
modification operations are likely to affect any read operation made by a different
task; the database system pessimistically assumes that a conflict will occur.
eXtremeDB behavior when using pessimistic concurrency control is to use locks
and block access to the database when any data is modified.
• EXCLusive: one task at a time may access the database for reading or writing.
MURSIW
The transaction manager makes the most of the aspects of the environment in
which it executes. Fully supporting the ACID principles, it takes advantage of the
embedded, limited multi-tasking nature of applications running on high tech gear.
Usually, there are few simultaneous tasks executing, and rarely are there
simultaneous tasks that require write-access to an object store; the transaction size
is also small. So it is practical to minimize the footprint and simplify the
implementation by eliminating complex transaction synchronization and
enforcing serialization of write transactions.
MVCC
on the same set of objects or indexes, thus providing transaction isolation for each
transaction. The MVCC manager allows applications to choose how transactions
are isolated from each other by setting up the transaction isolation level at
runtime.
Locking optimization
eXtremeDB uses two kinds of synchronization primitives - latches and locks. The
first kind (latch) is a lightweight lock implemented on atomic instructions. It is
used, for example, in btree indexes to lock branches. The second kind (lock) is a
full size synchronization primitive implemented with kernel-locks (and
lightweight atomics for performance, if possible). One lock is used for the
eXtremeDB registry and database header, but all other locks applied during
transaction processing depend on the choice of Transaction Manager:
The developer has complete choice regarding the transaction manager and lock
implementations by selecting one or another of the transaction manager and
synchronization implementations libraries. Most likely the choice will be
between MURSIW and MVCC based on the characteristics of the application. If
it’s mostly read-only with occasional updates, then MURSIW could be the best
choice. If there are a relatively high number of concurrent processes/threads
attempting to modify the database at the same time, then MVCC could be the
better choice. Or if the application is single-threaded, concurrency is not an issue
and clearly it will perform best with the EXCLusive transaction manager. One
can experiment between the transaction managers by just linking the appropriate
library. No application code changes are needed (except to handle conflict errors,
if MVCC is ultimately the choice).
usually offer a number of transaction isolation levels that define the degree to
which one transaction must be isolated from data modifications made by other
transactions. In fact, the ANSI/ISO SQL standard defines four levels of
transaction isolation: Read Uncommitted, Read Committed, Repeatable Read and
Serializable. The eXtremeDB MVCC transaction manager supports three of
these.
Read Committed. When this level is used a transaction always reads committed
data. The transaction will never read data that another transaction has changed
and not yet committed, but it does not ensure that the data will not be changed
before the end of the transaction.
a=1 b=2
t1: a = a + 1; c = a + b
t2: b = b + 2; d = a + b
If t1 and t2 are serialized, and t1 is executed before t2 the result is c=4 d=6;
If t1 and t2 are serialized, and t1 is executed after t2 the result is c=6, d=5;
In the case of Repeatable Read (snapshot), if t2 starts and ends in between the
start and end of the t1, the result of the concurrent execution is c=4 and d=5 and it
does not correspond to any of the serialized results.
Transaction Priority
eXtremeDB supports transaction priorities; it is possible to assign a priority value
to each transaction at runtime. At the time a transaction is registered with the
runtime, the transaction scheduler checks the priority value and shifts the
transaction forward or backwards in the transaction queue. With the MURSIW
transaction manager, applications normally execute as foreground transactions,
Transaction API
Transactions are started by calling one of two eXtremeDB functions:
mco_trans_start() or mco_trans_start_ex(). The second differs only in
that it allows setting the isolation level for the transaction.
After navigating the database to find a desired object, often an application will
need to update the found object. This requires initiating a ReadOnly transaction
to search for the desired object, and then, once found, upgrading the transaction
from ReadOnly to ReadWrite in order to update the record. This is accomplished
by calling the following function:
If an error occurs during a transaction, the transaction enters an error state and
subsequent operations within that transaction will return MCO_E_TRANSACT.
In this case, to obtain the error code of the operation that initially caused the error
condition, use the following function:
MCO_RET mco_get_last_error(/*IN*/ mco_trans_h t);
Isolation Levels
MCO_DEFAULT_ISOLATION_LEVEL = 0x0,
MCO_READ_COMMITTED = 0x1,
MCO_REPEATABLE_READ = 0x2,
MCO_SERIALIZABLE = 0x4
It is possible to redefine the default transaction isolation level for the database
session. This is done via the following function:
MCO_TRANS_ISOLATION_LEVEL mco_trans_set_default_isolation_level(
mco_db_h db,
MCO_TRANS_ISOLATION_LEVEL level );
The application can inspect what transaction isolation levels are supported by the
currently running transaction manager via the following API function:
int mco_trans_get_supported_isolation_levels();
Conflicts
do {
mco_trans_start( db,
MCO_READ_WRITE,
MCO_TRANS_FOREGROUND,
&t);
...<update database>...
rc = mco_trans_commit(t);
} while ( rc == MCO_E_CONFLICT );
Note: When the MVCC transaction manager is used, the application must be
able to tolerate transaction rollbacks due to conflicts as described above.
Persistent databases
When the MVCC transaction manager is used, in the case of a system crash, a
persistent database can contain undeleted old versions and working copies. Their
presence will not break the consistency of the database and doesn't prevent the
normal working of an application, but does unnecessarily consume space.
Detecting these stale object versions requires a complete scan of the database. For
this reason the automatic recovery process doesn't perform this function
automatically.
eXtremeDB provides two methods for the removal of these unused versions. The
application can enable the repair process by setting the mode mask in the
mco_db_params_t when calling mco_db_open_dev () with the value
MCO_DB_MODE_MVCC_AUTO_VACUUM. Or the repair can be performed
explicitly by calling the following API function:
Two-phase Commit
Some applications require a more elaborate control of the transaction commit
processing; specifically, committing the transaction in two steps (phases). The
first phase writes the data into the database, inserts new data into indexes and
checks index restrictions (uniqueness) (altogether, the “pre-commit”) and returns
control to the application. The second phase finalizes the commit.
Please note that in order to use the two-phase commit the eXtremeDB run-time
must be two-phase commit enabled when the run-time is compiled. The two-
phase commit is enabled by the following #define in the mcocfg.h file:
#define MCO_CFG_2PHASE_COMMIT
In order to perform the two-phase commit, the application needs to call the
commit phases sequentially instead of calling on mco_trans_commit(). After the
first commit phase is returned, the application cannot perform any activities
against the database (except initiating the second commit phase or rolling back the
transaction). This process is illustrated in the following code segment:
…
if ( (mco_trans_commit_phase1(t1) == MCO_S_OKAY) &&
global_transaction() == SUCCESS ) )
{
mco_trans_commit_phase2(t1);
}
else
{
mco_trans_rollback(t1);
}
Note: At the time of writing, the two-phase commit API is not available for the
MVCC transaction manager.
...
rc = mco_trans_commit_phase1(&trans);
if (rc == MCO_S_OK) {
/* commit to external database */
rc = mco_trans_iterate(&trans, &my_iterator_callback,
my_iterator_context);
if (rc == MCO_S_OK) {
/* external commit succeeded */
mco_trans_commit_phase2(&trans);
} else {
mco_trans_rollback(&trans);
}
}
Pseudo-nested Transactions
Nested transactions might be necessary when two different application functions
may be called separately or call each other. To facilitate transaction nesting
eXtremeDB allows an application to call mco_trans_start() or
mco_trans_start_ex() before the current transaction is committed or aborted.
The runtime maintains an internal counter that is incremented each time
/* now commit the transaction to complet the insert of the first object */
return mco_trans_commit(t);
}
rc = Transaction_new(t, &trans);
if ( MCO_S_OK != rc )
{
mco_trans_rollback(t);
return 0;
}
Transaction_from_put(&trans, from);
Transaction_to_put(&trans, to);
return mco_trans_commit(t);
}
…
/* perform a simple nested transaction... */
uint4 from1 = 11, to1 = 16, from2 = 7, to2 = 17;
rc = insert_two(db, from1, to1, from2, to2);
sample_rc_check("\t Insert two objects", rc );
mco_db_disconnect(db);
}
sample_close_database(db_name, &dbmem);
}
mco_runtime_stop();
sample_pause_end("\n\n Press any key to continue . . . ");
return ( MCO_S_OK == rc ? 0 : 1 );
}
Cursor Control
eXtremeDB supports the traditional definition of cursors. A cursor is an entity that
represents a “result set”, a sequence of objects, associated with an index. The
cursor is created by a _search or _find operation (see chapter 5 “Generated API”)
and its position within the result set is changed using cursor navigation functions.
Applications use cursors and the cursor API to navigate through the database and
read or update database objects. A special data type, mco_cursor_h is used to
reference a cursor.
Note: If the indexed field(s) with which the cursor is associated are updated,
changing the object’s position within the indexed objects, the cursor will also
change position when the transaction is committed or the object(s) are
checkpointed.
A cursor is obtained by one of the two following generated functions (see chapter
5 “Generated API”):
The next two functions are used to position a cursor. These functions must be
called in the context of a transaction (the first parameter) and accept a cursor as
the second parameter. In the event of a cursor being invalid, the error code
MCO_E_CURSOR_INVALID is returned.
The next two functions are used to navigate a cursor. These functions must be
called in the context of a transaction (the first parameter) and accept a cursor as
the second parameter. In the event of a cursor being invalid, the error code
MCO_E_CURSOR_INVALID is returned. If the end of the result set has been
reached, the status code MCO_S_CURSOR_END is returned
The following function determines whether a cursor is valid and, if so, whether it
was created from a sequential search (list), hash- or tree-based search.
The following function determines the type of object pointed to by the current
cursor, returning the numeric value assigned to a class. (This value is defined in
the interface header file created by the schema compiler.)
MCO_RET mco_cursor_get_class_code(
/*IN*/ mco_cursor_h c,
/*OUT*/ uint2 *classcode);
Before the application can call the memory management functions, the memory
object (heap) must be created. When the memory object is no longer needed it
should be destroyed to prevent memory leaks. The memory management API
provides a standard C-style heap management.
int mco_heap_head_size(void);
The following code fragment illustrates the steps to prepare a heap with
mco_heap_head_size() and mco_heap_init().
void *start_address;
mco_heap_h memory_handle;
mco_runtime_start();
Basic Interfaces
oid-based Interfaces
If an oid is declared for the database DDL, the compiler generates a C structure
that represents the oid and five interfaces. For example, if the schema contains the
oid declaration:
struct structname{
uint4 num_in;
};
declare oid structname[10000];
MCO_RET databasename_delete_object(
/*IN*/ mco_trans_h t,
/*IN*/ const databasename_oid * oid );
MCO_RET databasename_get_class_code(
/*IN*/ mco_trans_h t,
/*IN*/ const databasename_oid * oid,
/*OUT*/ uint2 *classcode );
The first method allows deletion of an object based on its oid, while the second
returns an integer that identifies the class of the object referenced by oid.
For classes declared with oid, two additional methods are generated: one that
locates an object based on its oid and another that obtains the oid of a known
object;
MCO_RET classname_oid_find(
/*IN*/ mco_trans_h t,
/*IN*/ const databasename_oid *id,
/*OUT*/ classname *handle );
autoid Interfaces
If a class is declared to have an autoid, the schema compiler will generate two
methods for the class:
/* database schema */
class referenced {
…
autoid[4000];
…
};
class referencing {
…
autoid_t refd_object;
…
};
autoid_t id;
mco_trans_start( db,
MCO_READ_WRITE,
MCO_TRANS_FOREGROUND,
&t );
The preceding code fragment shows the application code sequence of creating a
new object in the database that has the autoid attribute, retrieving the system-
assigned autoid value, and storing that value in a field of an object of another
class. Later, the autoid value is extracted from the referencing object and used to
locate the referenced object through its _autoid_find function.
event Interfaces
Embedded applications can be designed to respond to data events like creating,
updating or deleting database objects. The eXtremeDB data definition language
provides grammar the database designer uses to cause applications to receive
notification of the following database events: adding a new object, deleting an
object or all objects of a class, checkpoint events and updating an object or a
specified field of an object. Events are specific to classes. In other words, an add
event for class Alpha doesn’t activate the notification mechanism when an Omega
object is added to the database.
The schema grammar describes what events will trigger application notifications.
How the application handles the events is determined at run-time by the event
interfaces. Events can be handled in two ways: synchronously and/or
asynchronously.
There is a small window of possibility for another instance of the event to occur
before the event handler has completed its task and again wait on the event
(events are not queued). This window can be minimized if the handler delegates
the processing of the event to yet another thread, allowing the handler thread to
immediately wait on the event again. If no risk of an unhandled event can be
tolerated, either a synchronous event handler can be used, or the application can
maintain a separate table of unhandled events. Asynchronous events are activated
after the transaction commits. If, within the scope of a single transaction, several
objects are added, or deleted, or several fields are updated which have event
handlers waiting, all the handlers will be activated simultaneously.
Synchronous event handlers are called within the context of the same thread that
caused the event. Care should be taken not to cause extraordinary delays because
the handler has control of a transaction that, by definition, is a write transaction.
Specifically, the handler should not block on an indeterminate external event such
as user input.
Update events can be defined for a class (i.e. all fields of that class) or for a
specific field of a class by specifying the field name in the event declaration. As
with checkpoint events, the application must specify through the handler
registration interface whether the handler will be invoked before or after a field is
updated. Update handlers are activated by any interface method that will cause a
field’s contents to change, for example, classname_fieldname_put(),
classname_fieldname_erase(). If the event handler is called before the update
and the handler invokes classname_fieldname_get() on the field, it will
retrieve the current value in the database. Conversely, if the event is called after
the update, the handler will retrieve the value the application just put in the
database. The user-defined parameter can be used to provide additional
information to the handler such as the incoming value for a before-event handler,
the old value for an after-event handler, or a vector offset for an erase operation.
Note: Both synchronous and asynchronous events can be applied to any given event.
When using eXtremeDB in Shared Memory, Synchronous event handlers must belong
to the same process that caused the event, or the results will be unpredictable. In
particular, do not register a synchronous event handler for class Alpha in process A if
it is possible that process B will insert/update/delete Alpha objects. Use an
asynchronous event handler, instead.
Note: For update events, a class-wide event can not be combined with field update
events for the same class.
The following code fragments illustrate the use of event handling. A sample
schema definition for a class with event notifications follows:
class dropped_call
{
uint4 trunk_id;
…
autoid;
The above class will cause the following definitions to be generated in the
interface header file:
#define upd_trunk 15
// 15 is only illustrative; the actual value is not important
#define add_trunk 16
#define checkpoint_trunk 17
#define del_trunk 18
MCO_RET mco_register_upd_trunk_handler(
/*IN*/ mco_upd_trunk_handler,
/*IN*/ void *param,
/*IN*/ MCO_HANDLING_ORDER when);
MCO_RET mco_register_add_trunk_handler(
/*IN*/ mco_add_trunk_handler,
/*IN*/ void *param,
/*IN*/ MCO_HANDLING_ORDER when);
MCO_RET mco_register_checkpoint_trunk_handler(
/*IN*/ mco_checkpoint_trunk_handler,
/*IN*/ void *param);
MCO_RET mco_register_del_trunk_handler(
/*IN*/ mco_del_trunk_handler,
/*IN*/ void *param);
MCO_RET mco_unregister_upd_trunk_handler(
/*IN*/ mco_upd_trunk_handler);
MCO_RET mco_unregister_add_trunk_handler(
/*IN*/ mco_add_trunk_handler);
MCO_RET mco_unregister_checkpoint_trunk_handler(
/*IN*/ mco_checkpoint_trunk_handler);
MCO_RET mco_unregister_del_trunk_handler(
/*IN*/ mco_del_trunk_handler);
To employ an asynchronous handler for one of the events above, the application
would create a thread and, within the thread function, call:
Where ‘dbh’ is the database handle from the mco_db_connect() method and
‘upd_trunk’ is the value defined in the generated interface file to reference the
event of interest.
For the preceding class definition and its generated interfaces, the following code
fragments illustrate the use of synchronous event handling.
First, the application must register its event handler functions by calling a
function like the following:
int register_events(mco_db_h db)
{
MCO_RET rc;
mco_trans_h t;
mco_trans_start(db,
MCO_READ_WRITE,
MCO_TRANS_FOREGROUND,
&t);
mco_register_add_trunk_handler(t, &new_handler,
(void*) 0);
mco_register_checkpoint_trunk_handler(t, &checkpoint_handler,
(void*) 0,
MCO_BEFORE_UPDATE );
mco_register_del_trunk_handler(t, &delete_handler,
(void *) 0);
mco_register_upd_trunk_handler( t, &update_handler1,
(void *) 0,
MCO_BEFORE_UPDATE );
rc = mco_trans_commit(t);
return rc;
}
The bodies of the handler functions would look like the following:
/* Handler for the "<new>" event. Reads the autoid and prints it out
*/
MCO_RET new_handler( /*IN*/ mco_trans_h t,
/*IN*/ dropped_call * obj,
/*IN*/ MCO_EVENT_TYPE et,
/*INOUT*/ void *param)
{
int8 u8;
param = (int *)1;
return MCO_S_OK;
}
/* Handler for the "<delete>" event. Note that the handler
* is called before the current transaction is committed.
* Therefore, the object is still valid; the object handle
* is passed to the handler and is used to obtain the
* autoid of the object. The event's handler return value
* is passed into the "delete" function and is later
* examined by the mco_trans_commit(). If the value is
* anything but MCO_S_OK, the transaction is rolled back.
* In this sample every other delete transaction is
* committed.
*/
MCO_RET delete_handler( /*IN*/ mco_trans_h t,
/*IN*/ dropped_call * obj,
/*IN*/ MCO_EVENT_TYPE et,
/*INOUT*/ void *user_param)
{
int8 u8;
When the application is finished handling events, the events are unregistered by
calling a function like the following:
rc = mco_trans_commit(t);
return rc;
}
The _delete functions permanently remove the object whose handle is passed
while the _delete_all functions remove all objects of the class from the database.
Memory pages occupied by the object(s) are returned back to the memory
manager, and can be re-used.
Action Interfaces
Individual database objects are accessed by passing the object handle returned by
a _new, _find or _from_cursor function to specific action functions. For each
field of an object and for each element of a structure declared in the schema file,
both _put and _get methods are generated. The semantic rules that the compiler
follows to generate _put and _get function names are simple: the class or structure
name is followed by the field name and then the action word; all separated by
underscores. Action words can be any of the following: put, get, at, put_range,
get_range, alloc, erase, pack and size.
The _put functions are called to update specific field values. Depending on the
type of field, the generated _put function will be one of two forms: for scalar type
fields it will be of the form:
while for char and string fields a pointer and length argument are required:
The _get functions are called to bind a field of an object to an application variable
and the form of the function will vary depending on the type of field. For scalar
type fields it will be of the form:
And if the field is a string then the function takes two extra parameters: the size of
the buffer to receive the string, and an OUT parameter to receive the actual
number of bytes returned. So the generated function will have the form:
If a class has one or more indexes, then the field(s) on which the index is defined
will have an index component (hash table entry or tree node) in addition to the
actual field value. The index component is not inserted when the field’s _put
function is called, but rather when the write transaction containing this update is
committed. Or, alternatively, a _checkpoint function can be called to explicitly
create the index components for this object. The _checkpoint function completes
the object’s update before the transaction is committed, however if the application
decides to rollback the current transaction, all the updates for the object including
index components are discarded. (Committing a transaction implicitly
checkpoints all the objects modified (created/updated/deleted) in the transaction.)
The sequence, then, to create objects, is to _new space for the object, _put one or
more field values into the object, and optionally _checkpoint the object to create
index components. If a unique index constraint is violated, the checkpoint
method will return an appropriate code.
For fields of type string, an additional _size function is generated to return the
actual size of the string value:
#define classname_fieldname_size NN
The functions that operate on vector and array fields require an index argument
but are otherwise functionally equivalent to their scalar counterparts. The _put
functions for fields declared as vector or fixed-size array have the form:
For fixed length arrays and vectors additional _put_range methods are generated
to assign an array of values to a vector or array. The size of the IN array should be
less than or equal to the size of the vector as specified in the prior _alloc function
call, or the size of the array as defined in the database definition.
MCO_RET classname_fieldname_put_range(
/*IN*/ classname *handle,
/*IN*/ uint2 start_index,
/*IN*/ uint2 num,
/*IN*/ const <type> *src );
Please note that _put_range methods are only generated for vectors that consist of
simple scalar elements. For vectors of structures this method is not generated. The
reason is that for simple type vector elements the schema compiler can generate
optimized methods to assign values to them. This optimization is only possible if
the size of the vector element is known at compile time. Also note that it is never
necessary to use a _put_range method to set the vector; the _put function can
always be iterated to assign individual vector element values for the desired
range.
To access a specific element of a vector, the _at functions are provided. The form
of the _at function will vary depending on the type of elements stored in the
vector. For vectors of fixed-length fields it will have the form:
When allocating memory for vectors (see _alloc function below) of variable
length elements, it may be necessary to first determine the actual size of the
vector elements. The _at_len functions are generated for vectors of strings for
this purpose:
MCO_RET classname_fieldname_at_len(
/*IN*/ classname,
/*IN*/ uint2 pos,
/*OUT*/ uint2 *retlen);
MCO_RET classname_fieldname_get_range(
/*IN*/ classname,
/*IN*/ uint2 startIndex,
/*IN*/ uint2 num,
/*OUT*/ const <type> *dest);
The _alloc functions reserve space for a vector field. The application must call the
_alloc function and supply the size of the vector in order to allocate a vector field
within a data layout. Invoking the _alloc function for a vector field of an existing
object will resize the vector. If the new size is less than the current size the vector
is truncated to the new size.
The _erase functions remove an element of a vector from the layout (and all
indexes the element is included in). Note that the vector size remains unchanged.
If an attempt is made to get the erased element, the runtime returns a null pointer
as a result. Also note that the erase method is only generated for vectors of
structures, not for vectors of basic types or strings. For vectors of basic types and
strings, the application should _put a recognizable value in the vector element that
it can interpret as null. (Note that the _erase functions are also generated for
optional struct fields.)
As may have been noticed, use of the _erase functions can leave unused elements
(“holes”) in vector fields. For this reason, the _pack functions are generated for
vector fields to remove “holes” so that the space occupied by the deleted element
is returned to the free database memory pool.
Often database classes will contain many fields with the consequence that
fetching and storing these objects require a long sequence of _get and _put
function calls for each individual field. To simplify the work of coding the
schema compiler generates a C-language structure for all scalar fields and arrays
of fixed length and additional _fixed_get and _fixed_put functions are generated
that can significantly reduce the number of function calls required. But, as the
name indicates, these functions can only be generated for the fixed size fields of a
given class. If a class contains fields of variable length (i.e. string or blob fields)
then these fields must be accessed with their individual _get and _put functions.
Using these functions, objects of the Record class can be written with two
function calls: Record_fixed_put() for the fixed size portion and Record_s_put()
for the variable length field of type string “s”. Similarly, the objects of this class
can be read with two function calls: Record_fixed_get() and Record_s_get().
Sample
To illustrate the generated functions described thus far, let’s consider the
following sub-schema:
#define SYMBOL_LEN 4
#define SYMBOL char<SYMBOL_LEN>
#define uint4 unsigned<4>
#define uint2 unsigned<2>
#define uint1 unsigned<1>
struct Id
{
uint4 num_in;
uint4 time_in;
};
class TestOne
{
vector< SYMBOL > vchars;
vector<Id> vids;
vector< signed<4> > vlong;
uint2 v2;
oid;
};
For the TestOne class, the DDL compiler will yield the following access methods:
/* oid-based methods */
MCO_RET TestOne_oid_find( /*IN*/ mco_trans_h t,
/*IN*/ const market_id *id,
/*OUT*/ TestOne *handle );
/*-------------------------------------------------------*/
/* struct Id methods */
Block Allocation
For persistent classes, eXtremeDB provides a “block allocator” to facilitate
objects’ locality of references, i.e. to assure that a group of objects will be stored
in a contiguous block on disk. This capability is provided in the form of a
generated API function classname_set_allocation_block_size(). The application
calls this function within a READ_WRITE transaction with a sufficient block size to
store a group of objects of the given class.
• It keeps the entire object within the same block, greatly improving the
read performance for complex objects
• It keeps groups of objects that had been added to the database at the same
time stored together within the same block, improving the sequential
access performance for objects of the same class
Note: Database file fragmentation can also be greatly reduced by use of the
file_extension_quantum element of the mco_db_params_t structure used when
opening or extending the database (see function mco_db_open_dev()) . A non-
zero value for file_extension_quantum will cause the database runtime to allocate
space in the database file by the specified number of bytes as opposed to by a the
page size (typically 4K, 8K, or 16K).
Note: When objects are deleted, the space is returned back to the database pool
and can be reused for indexes and other objects. But this space will not be reused
for objects of the same class because these new objects are allocated by blocks.
Thus, deleting objects does not reduce the locality of references.
Note: The block allocator works well when objects are not updated because
when a dynamic object is updated it is possible that a part (the variable length
part) of the object will be allocated outside the block. To lessen this effect, the
runtime always attempts to allocate the entire object on the same page (Disk
Manager page) as the object header.
The eXtremeDB DDL language supports a collation declaration for tree and hash
indexes on string-type fields as follows:
[unique] tree<string_field_name_1 [collate C1]
[, string_field_name_2 [collate C2]], …> index_name;
return 0;
return 0;
}
The application registers user-defined collations via the following API function:
mco_db_register_collations(dbname, mydb_get_collations());
Examples
Example 1:
File “schema.mco”:
class A {
string name;
The key word collate is used in the schema definition file “schema.mco” to
indicate that a tree index tname is to be generated on string field name, using
collation Cname. This DDL instructs the database runtime to use a custom rule
named ‘Cname’ to compare the string field ‘name’. Note that the same collation
(rule) can be used multiple times in the same index, in different indexes within the
same class, or in different classes.
Example 2
File “schema.mco”:
declare database mydb;
class A {
string s;
char<20> c;
class B {
string s;
nchar<20> nc;
Note that in class A the same collation (“C1”) can be used in a tree and hash
index, and in class B a new collation (“C2”) must be defined because its base field
nc is of type nchar. To use the collation C1 in the tree indexes, the application
must implement a compare function with the following signature:
typedef int2 (*mco_compare_collation_f) ( mco_collate_h c1, uint2 len1,
mco_collate_h c2, uint2 len2);
The parameters are collation descriptors (as strings) c1 and c2 and their lengths
(number of symbols) len1 and len2. The compare function must return an integer
value indicating how the strings are compared: negative if c1 < c2, zero if c1 ==
c2 and positive if c1 > c2. This function is called by the runtime to compare field
values in two objects as well as to compare the field value with an external key
value.
The parameters are a descriptor c (as a string) and its length (number of symbols)
len. The function must return an integer hash code for the string. (Note that if the
compare function returns zero for two strings X and Y, i.e. X is equal to Y, the
hash function must generate the same hash code for X and Y.)
For this sample schema, the DDL compiler generates these compare function
stubs in mydb_coll.c:
The DDL compiler also generates the function applications will use to register the
specified collations with the eXtremeDB database runtime in mydb.h and mydb.c:
mco_collation_funcs_h mydb_get_collations(void);
Example 3
class Record
{
string name;
unsigned<4> value;
char * fruits[] = {
"banana", "PEAR", "plum", "Peach", "apricot", "Kiwi", "QUINCE",
"pineapple", "Lemon", "orange", "apple",
"pawpaw", "Fig", "mango", "MANDARIN", "Persimmon", "Grapefruit", 0
};
if ( MCO_S_OK == rc ) {
/* connect to database */
rc = mco_db_connect(db_name, &db);
if ( MCO_S_OK == rc ) {
{
Record_from_cursor(t, &c, &rec);
Record_s_get(&rec, buf, 11, &len);
printf("\n\t%-15s", buf);
}
rc = mco_trans_commit(t);
}
}
…
}
Note that the only additional step the main application needs to perform in order
to implement a specialized string collation is to register the collation prior to
connecting to the database. The sorting logic is handled by the collation compare
function. In this case the compare logic is simply to return the value returned by
the case-insensitive C runtime function stricmp().
Blob Support
eXtremeDB provides support for BLOB fields through BLOB interface functions.
BLOB elements are useful when it is necessary to keep streaming data, with no
known size limits. The semantics for BLOB interfaces is very similar to the
standard _put/_get semantics.
Use the _get method to copy BLOB data to an application’s buffer; it allows
specification of a starting offset within the BLOB.
For the _get function, the ‘bufsz’ parameter is the size of the buffer passed by the
application in the ‘buf’ parameter. The ‘len’ output parameter is the actual
number of bytes copied to the buffer by the _get function (which will be <=
bufsz).
The _size function returns the size of a BLOB data element. This value can be
used to allocate sufficient memory to hold the BLOB, prior to calling the _get
function.
The _put function populates a BLOB field, possibly overwriting prior contents—
it allocates space and copies data from the application’s buffer; the size of the
BLOB must be specified.
The _append function is used to append data to an existing BLOB. This method is
provided so an application does not have to allocate a single buffer large enough
to hold the entire BLOB, but rather can conserve memory by writing the BLOB in
manageable pieces.
Search Methods
Search interfaces locate desired objects or groups of objects by unique identifier
or by index. While exact match lookups by unique identifier (oid and autotid )
using the _find functions are extremely efficient for locating individual objects,
eXtremeDB also supports the following types of index searches that employ a
cursor to navigate a group of objects as an ordered or unordered result set: hash-
based (unique and non-unique), tree-based (including Patricia, rtree and kdtree)
and list-based.
Find Functions
Find methods (as indicated above in the oid-based and autoid-based find
methods) search the database based on an exact match of index values. By
definition, an exact match lookup on a unique index returns exactly zero (not
found) or one result. The _find functions, therefore, do not require and do not use
a cursor.
The _find functions are generated for classes that have one or more unique hash
indexes or one or more unique tree indexes declared. For each unique index, the
following interface is generated:
• Establish a starting position in a sorted list with a known starting value and
optionally retrieve subsequent results in ascending or descending sorted order.
• Establish a starting position in a sorted list when only part of the starting value
is known, find the closest match, and optionally retrieve subsequent results in
ascending or descending sorted order.
• Establish a starting position as above, iterate over the sorted list until an upper
bound is reached, using the _compare() method to determine when the range
limit is reached.
The following two functions are generated to obtain a cursor for an index:
The _search functions generated for all non-unique hash and tree indexes are of
the following form:
After positioning a cursor with a search function or one of the general cursor
positioning functions (_first, _last, _next, _prev), the _from_cursor function is
used to obtain a reference (pointer) to the object:
The _locate function is used to position a tree index cursor based on an object
reference. The cursor must have been previously initiated using the
_index_cursor method. After positioning the cursor, the cursor positioning
functions, _next and _prev, can be used to iterate over the objects. The _locate
function applies only to tree-index-based cursors (i.e. not list or hash cursors):
MCO_RET classname_indexname_locate(
/*IN*/ mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
/*IN*/ classname * handle);
The compare function is used to compare the value of an object referenced by the
cursor with an application supplied value. The method returns a zero if the values
compared are equal, less than zero if the values stored in the object referenced by
the cursor are less than the values passed in, or greater than zero if the values
stored in the object referenced by the cursor are greater than the values passed in.
MCO_RET classname_indexname_compare(
/*IN*/ mco_trans_h trans,
/*IN*/ mco_cursor_h cursor,
/*IN*/ [const] <type> [*]param1,
[[/*IN*/ uint2 len1,]
[/*IN*/ [const] <type> [*]param2,
[/*IN*/ uint2 len2,] …],
/*OUT*/ int *result);
Pattern Search
In addition to the search capabilities described in the previous section,
eXtremeDB supports wildcard pattern matching ability. This is the capability to
search tree index entries matching patterns specified with wildcard characters for
single character and multiple character matches. By default, the question mark “?”
will match any single character in the specified position within the pattern, and
the asterisk “*” will match any combination of characters (including no
characters) in that position. If a match on the characters “?” or “*” is desired, the
wildcard characters themselves can be modified by specifying different characters
in the pattern search policy (see below).
For example, “G*e*” would return “Graves” and “Gorine”, while “Gr?ve*”
would match “Graves”, “Grove”, “Grover” and so on... In this example, ‘*’
matches zero, one, or more characters, while ‘?’ matches exactly one character.
Further, the pattern “G*E*” would match all uppercase entries like “GRAVES”,
“GORINE”. However, because the standard compare functions used to match
index values with search keys use case-sensitive compare functions (for example,
strcmp) the case specified in the search pattern will affect the search results.
To illustrate the use of these functions, suppose we have the following class
definition:
class PatternTest
{
string key1;
char<20> key2;
int4 key3;
tree <key1,key2,key3> i1;
tree <key2,key3> i2;
};
MCO_RET PatternTest_i1_pattern_size(
const char *key1,
uint2 sizeof_key1,
const char *key2,
uint2 sizeof_key2,
int4 key3
/*OUT*/ uint4 *pattern_size);
MCO_RET PatternTest_i1_pattern_search(
mco_trans_h t,
void *allocated_pattern,
uint4 memsize,
PatternTest *obj,
const char *key1,
uint2 sizeof_key1,
const char *key2,
uint2 sizeof_key2,
int4 key3 );
MCO_RET PatternTest_i1_pattern_next(
mco_trans_h t,
void *allocated_pattern,
PatternTest *obj);
To use the ‘i1’ index to perform a pattern search, we take the following steps:
First, we allocate a buffer that the eXtremeDB run-time uses as a state machine
during the pattern search. The size of the buffer required is determined by the
classname_pattern_size() interface, for example:
Now we can code a loop to retrieve the index entries that match the pattern we
have indicated:
free(buf);
The following functions use the pattern policy structure to get and set the pattern
matching policy:
The following code snippet demonstrates the how the pattern matching policy
might be changed:
mco_pattern_policy_t p;
/*
* NOTE: The change policy operation must be execute within a Read_Write
* transaction.
*/
mco_get_pattern_policy(trn, &p);
p.ignore_other_fields = MCO_YES;
mco_set_pattern_policy(trn, &p);
Note: The tree-index algorithm sorts the keys based on their relative weight,
determined from the compare function results, rather than on the key’s value.
Tree and hash indexes can be declared as userdef, in which case the application
must provide custom compare functions that are used by the database runtime
when building the index and during lookup. For example, consider the following
sample DDL:
class Obj {
unsigned<4> first_part1;
usigned<2> first_part2;
unsigned<4> second_part1;
signed<2> second_part2;
string data;
For the tree index “first” it is necessary to implement two functions: the object-to-
object and object-to-key compare functions. These functions return a negative,
zero or a positive value depending on whether the first parameter is less than,
equal to, or greater than the second parameter. For hash indexes, the compare
function returns zero if the first and the second parameters are equal and non-zero
otherwise. The function prototypes are generated by the schema compiler and are
placed into a file named <dbname>_udf.c (where dbname is the database name in
the ‘declare database’ statement).
For example, for the above sample schema the following mydb_udf.c file would
be generated:
#include "mydb.h"
#include "mcowrap.h"
/*
* API for the user-defined index "first"
*/
/* object-to-object user-defined compare function */
int2 Obj_first_compare_obj ( Obj * handle1, Obj * handle2 ){
/* TODO: add your implementation here */
return 0;
}
/*
* API for the user-defined index "second"
*/
/* object-to-object user-defined compare function */
int2 Obj_second_compare_obj ( Obj * handle1, Obj * handle2 ){
/* TODO: add your implementation here */
return 0;
}
If a file with the <dbname>_udf.c already exists (i.e. the schema was previously
compiled, then modified and compiled again), the DDL compiler will generate a
file <dbname>_udf.c.new and display a warning. In this case, it is the
responsibility of the programmer to decide which file should be included in the
project. If the user-defined indexes have been changed, then the .new file should
be renamed to the .c file.
#define Obj_first_extkey_first_part1(k)
#define Obj_first_extkey_first_part2(k)
#define Obj_second_extkey_second_part1(ek)
#define Obj_second_extkey_second_part2(ek)
A sample implementation for the custom compare and hash functions could be as
follows:
if (o1_first_part2 != o2_first_part2)
return 1;
return 0;
}
if ( o_first_part1 != Obj_first_extkey_first_part1(key) )
return 1;
if ( o_first_part2 != Obj_first_extkey_first_part2(key) )
return 1;
return 0;
}
/*
* API for the user-defined index "second"
*/
return 0;
}
return 0;
}
Before the custom index API functions can be used, the application must register
the custom compare and hash functions with the database runtime. This is done
via the mco_db_register_udf() API:
Where the db_name is the database name used for the mco_db_open() and udfs is
a pointer to the custom functions table. This pointer is obtained via the generated
<dbname>_get_udfs() API.
For example:
mco_runtime_start();
mco_db_open("MyDB", mydb_get_dictionary(), start_mem, DBSIZE, PAGESIZE);
mco_db_register_udf("MyDB", mydb_get_udfs());
mco_db_connect("MyDB", &db);
...<continue processing>...
If user-defined indexes are declared for the database, but custom functions are not
registered via the mco_db_register_udf(), the connection API will return the
MCO_E_NOUSERDEF_FUNCS return code.
Search Example
To demonstrate how key searches produce their result sets, assume we have a
simple key containing one uint4 field populated with the dataset [1,2,2,3,4,4,5],
and the following compare function:
int compare_uint4( uint4 a, uint4 b){
if ( a == b ) return 0;
if ( a > b ) return 1;
return -1;
};
Following is a table representing the keys’ values and their respective weights
after the _search function is called:
Now consider a slightly less trivial example where the key is represented as a
structure containing two fields, uint4 and char[16]:
typedef struct
{
uint4 f1;
char f2[16];
} the_key;
The compare function compares integers as in the above sample and the char
buffers as ASCII characters. Assume that the dataset is as follows:
return 0;
};
Further, if the tree-index is in the reverse order (declared as descending) the tree-
algorithm uses the compare function results in the “reverse” order and the weight
tables will be as follows:
Search Algorithm
The index search algorithm operates with key weights, not with key values,
regardless of whether the index direction is ascending or descending. As
explained above, an index lookup for a specified key value is performed by
calling the _search() function which takes a cursor, a search operation and the
key value as parameters. The search operations and their rules require some
explanation.
“cursor’s current element” as CCE, the following table describes how they are
correlated by the eXtremeDB runtime:
MCO_LT
1 = CCE 2 2 3 4 4 5 CCE points to key value 1
mco_cursor_prev will return
MCO_S_CURSOR_END
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 2
MCO_LE
1 2 2 = CCE 3 4 4 5 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 2(the leftmost)
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 3
MCO_EQ
1 2 = CCE 2 3 4 4 5 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 1
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 2(the rightmost)
MCO_GE
1 2 = CCE 2 3 4 4 5 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
MCO_GT
1 2 2 3 = CCE 4 4 5 CCE points to key value 3
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 4 (the leftmost)
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 2(the rightmost)
MCO_LT
5 4 4 3 = CCE 2 2 1 CCE points to key value 3
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 4(the rightmost)
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 2(leftmost)
MCO_LE
5 4 4 3 2 2= CCE 1 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 2(the leftmost)
mco_cursor_next will return MCO_S_OK and
move CCE to the key value 1
MCO_EQ
5 4 4 3 2 = CCE 2 1 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and
move CCE to the key value 3
mco_cursor_next will return MCO_S_OK and
move CCE to the key value 2(the rightmost)
MCO_GE
5 4 4 3 2 = CCE 2 1 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 3
mco_cursor_next will return MCO_S_OK and
move CCE to the key value 2(the rightmost)
MCO_GT
5 4 4 3 2 2 1 = CCE CCE points to key value 1
rc = open_cursor(cursor);
while (rc == MCO_S_OK && (rc = from_cursor(cursor,obj)) ==
MCO_S_OK){
rc = move_cursor(cursor);
delete_obj(obj);
}
eXtremeDB allows a Patricia index to be declared over scalar and boolean data
types as well as arrays and vectors of those types. In fact the boolean data type,
new to version 4.0, was introduced to facilitate Patricia index implementation
where bit arrays are used to store IP addresses.
The boolean data type can be used to define a single bit field, a fixed size array of
bits, or a variable length array of bits.
class xyz{
boolean b1; // a bit field
boolean b2[32]; // fixed-size array of 32 bits
vector<boolean> b3; // variable-length bit array
};
classname_fieldname_get(classname *handle,
/*OUT*/ uint1 * result)
classname_fieldname_put(classname *handle,
uint1 value)
For a fixed-size bit array the following interfaces are also generated:
For variable-size arrays (vectors), the following interfaces are also generated:
The only index type possible for the Boolean data type is the Patricia trie index. A
single-bit field cannot be indexed, nor is it advisable to index a short array of bits.
(It would be faster to perform a table scan than to create an index for a 2 or 3 bit
field, as well as avoid the memory consumption and CPU cycles to maintain the
index. Exactly where is the tipping point, is an exercise left to the reader.)
The Patricia index can be declared unique; in the absence of the unique keyword
it defaults to allowing duplicates. Unlike other eXtremeDB indexes, the Patricia
index cannot be compound; it is always declared for a single field.
class xyz{
Boolean b1[32];
vector<boolean> b2;
uint4 b3;
char<10> b4[10];
vector< string> b5;
In addition to the standard tree index generated functions, the following functions
are generated for each Patricia index:
As explained above, patricia indexes can be created over any scalar or boolean
type field. The generated functions applicable only to patricia indexes are
_longest_match, _exact_match, _prefix_match and _next_match. These will have
slightly different forms depending on the type of field being indexed as explained
below.
A patricia index created over a scalar field will cause the following functions to
be generated:
classname_indexname_next_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type mask,
int number_of_bits);
classname_indexname_prefix_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type mask,
int number_of_bits);
classname_indexname_longest_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type mask,
int number_of_bits);
classname_indexname_exact_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type mask,
int number_of_bits);
Where type is the scalar type (for example, uint4) and mask is the key value to
match. If the indexed field is of type array/vector of scalars these functions will
be of the form:
classname_indexname_next_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type * mask,
int number_of_bits);
classname_indexname_prefix_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type * mask,
int number_of_bits);
classname_indexname_longest_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type * mask,
int number_of_bits);
classname_indexname_exact_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
type * mask,
int number_of_bits);
Here type is the type or each element of the array/vector (for example, uint4) and
mask is the key value to match. If the indexed field is of type boolean array
these functions will be of the form:
classname_indexname_next_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
char* mask,
int number_of_bits);
classname_indexname_prefix_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
char* mask,
int number_of_bits);
classname_indexname_longest_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
char* mask,
int number_of_bits);
classname_indexname_exact_match( mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
char* mask,
int number_of_bits);
Here the mask represents a key value that is a packed bit array (each byte contains
8 bits).
_longest_match
The _longest_match() functions locate the record whose index value has the
longest match, i.e. has the greatest number of characters or bits starting from the
right that match the key value.
For example assuming the following table with Patricia index on field “prefix”:
Table:
prefix operator
01 ATT
020 BCC
025 TNT
03 ANC
0355 NCC
0355 UDC
045 WTC
...
The _longest_match function called with key value of 02456 would position the
cursor at record <025, TNT>; with key value of 035567787 record <0355, UDC>;
and with key value of 03 record <0355, UDC> as well. Notice that the cursor is
positioned at the last record matching the key value. In order to walk through the
result set to visit all records matching the key value, the application would use the
_next_match function.
_exact_match
The _exact_match functions locate the first record whose index value exactly
matches the key value supplied. If no exact matches are found
MCO_S_NOTFOUND is returned. For example, using the above table: the key
value of 02 would find record <020, BCC>, but the key value of 024 would cause
MCO_S_NOTFOUND to be returned.
_prefix_match
The _prefix_match functions are similar to _longest_match except that it finds the
first object whose index matches the key, whereas _longest_match returns the
object with the longest (deepest) match. So using the above table: the key value
of 02456 finds record <025, TNT>; the key value of 035567787 finds record
<0355, UDC>; and the key of value 03 finds record <03, ANC>.
_next_match
The _next_match functions are used, after the cursor is positioned within the
result set, to walk through the result set to visit all records matching the key value.
To traverse the database objects in order, mco_cursor_next() or
mco_cursor_prev() may be used, but they are not constrained by the key value
used to perform the search; i.e. iteration could continue beyond the range
specified in the key.
Unlike the tree index compare functions that return 0<, 0 or >0, the Patricia-
based compare function returns the number of the first different bit between
the key and the object pointed to by the cursor. This allows interrupting the
cursor traversal in a manner similar to the “standard” compare API. In addition,
the application is able to refine the cursor traversal. For example consider a
routing table containing the following values:
128.1.1.0
128.1.1.10
128.1.1.20
128.1.2.0
128.1.2.10
128.1.3.0
boolean<32> ipaddr;
or
uint4 ipaddr;
Suppose our application is looking for the entire subnet 128.1.1. The search value
would be passed in as
The nodes_ipaddr_prefix_match() function called with this key value and length
24 (i.e. 24 bits) positions the cursor at the object with ipaddr equivalent to
128.1.1.0. Now the mco_cursor_next() function is called to advance to record
128.1.1.10 and the Patricia _compare() function returns the value 28 which means
that 28 bits of the records ipaddr match the key value.
At this point the application would conclude that it has left the region of interest
since the key size is 24 bits and the mismatch was detected in the 22nd bit. In
other words the key is no longer a prefix for the value in the object now pointed at
by the cursor.
R-Tree Index
R-Tree indexes (declared as rtree in the schema) are commonly used to speed
spatial searches, for example, find the rectangle that contains this point, or find all
rectangles that overlap this rectangle.
All manner of shapes can be stored and searched with the rtree index. For
example, a point is represented as a rectangle with width and height = 1 and a line
that has starting and ending coordinates of 15, 844 and 0, 3647 is stored as
rectangle with its upper left corner at 15, 844 and its lower right corner at 0, 3647.
75, 15
35, 25
20, 70 20, 30
A search to discover all lines that intersect with (75, 15) (20, 70) would return the
rectangle bounding (35, 25) (20, 30) because the rectangles overlap. The
application would extract additional information for the object, for example, that
it is a line and what its starting and ending coordinates are, and would conclude
that this line does not intersect the key line; and would continue to the next
overlapping rectangle returned by the index search.
Note that, any shape with coordinates {(X1, Y1), (X2, Y2), ... (Xn, Yn)} can be
stored and searched in this manner. For example, consider a polygon:
100
85, 50
70, 33
65, 63
X
55, 30 55, 45
35, 35
0 Y 100
Here we have X coordinates of 35, 55, 65, 70, 85 and Y coordinates of 30, 33, 35,
45, 50, 63. The bounding rectangle is the rectangle with left top vertex
(Xmax,Ymin), and right bottom vertex (Xmin, Ymax) where Xmin = min(Xi),
Ymin=min(Yi), Xmax = max(Xi), Ymax=max(Yi). In this case, Xmax = 85,
Ymin = 30, Xmin = 35, Ymax = 63 and our rectangle top left and bottom right is
(85, 30) and (35, 63).
rtree searches can return rectangles that: 1) exactly match the given coordinates,
2) overlap the given coordinates, 3) wholly contain the given coordinates, or 4)
are within a given distance from a point.
The rectangles are processed as arrays of max and min coordinates. For example,
a two-dimensional rectangle is represented as an array:
xMin,yMin,xMax,yMax
xMin,yMin,zMin,xMax,yMax,zMax
To illustrate the use of the rtree index, suppose we create the following class
definition:
class rtree_class
{
rect<int2> square[2];
#define rtree_class_square_length 4
MCO_RET rtree_class_square_get ( rtree_class *handle,
/*OUT*/ int2 *dest );
MCO_RET rtree_class_square_put ( rtree_class *handle,
const int2 *src );
MCO_RET rtree_class_from_cursor ( mco_trans_h t,
mco_cursor_h c,
/*OUT*/ rtree_class *handle );
MCO_OVERLAP
MCO_CONTAIN
MCO_NEIGHBORHOOD
To utilize the rtree index, we need some supporting data types and constants:
To conduct any search, as with other index types, we need to instantiate a cursor:
rect.l.x = - MCO_BOUND;
rect.l.y = - MCO_BOUND;
rect.r.x = - (MCO_BOUND + 2000) / 2;
rect.r.y = - (MCO_BOUND + 2000) / 2;
if ((rc = rtree_class_ridx_search(t, MCO_OVERLAP, &c, (int2*) &rect)) !=
MCO_S_OK)
{
printf("\n Couldn't find any overlapping rect, code = %d, line = %d\n",
rc, __LINE__ );
}
if ( MCO_S_OK == rc )
{
printf("\n Iterate cursor, condition = MCO_OVERLAP");
for ( i = 0, rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c), i++ )
; // do nothing, just count
printf("\n Found %d overlapping rects", i);
}
Similarly, to search for rectangles that are wholly contained by another rectangle:
if ( MCO_S_OK == rc )
{
for ( i = 0, rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c), i++ )
; // do nothing, just count
Note: An rtree index cursor has different semantics than a conventional tree
index cursor. Whereas, the _search function of a conventional tree index positions
the cursor at the first match, or just before the nearest match in the case of a
partial key search, the rtree index cursor operates on the result set of the search.
In other words, for an rtree cursor, mco_cursor_first(), mco_cursor_next(),
mco_cursor_prev() and mco_cursor_last() operate within the set of objects
that match the given search conditions.
Kd-Tree Index
eXtremeDB 4.0 adds support for the k-dimensional tree index (declared kdtree in
the schema). A kdtree is a data structure for organizing points in a k-dimensional
space. kdtrees are a useful data structure for several applications, such as lookups
that involve a multidimensional search key. The kdtree is a binary tree in which
every node is a k-dimensional point. Every non-leaf node generates a splitting
hyperplane that divides the space into two subspaces. Points left of the hyperplane
represent the left sub-tree of that node and the points right of the hyperplane
represent the right sub-tree. The hyperplane direction is chosen in the following
way: every node split to sub-trees is associated with one of the k-dimensions, such
that the hyperplane is perpendicular to that dimension vector.
The kdtree index is defined in the schema using the kdtree keyword:
class Car
{
string vendor;
string model;
string color;
uint4 year;
uint4 mileage;
boolean automatic;
boolean ac;
uint4 price;
char<3> state;
string description;
blob images;
kdtree <year, mileage, color, model,
vendor, automatic, ac, price> index;
};
Insert and delete operations for indexes are hidden from applications and are
performed automatically by the eXtremeDB runtime. The new API is only
required for search operations. The kdtree uses a Query-By-Example approach to
locate objects that match a given search condition. The application creates pattern
object(s) in the normal way and assigns values to the fields that are included in
the search criteria. The kdtree supports simple exact matches as well as range
lookups. In the latter case, two pattern objects should be specified: one for the
lower and one for the upper boundaries of the search condition. If a field value is
defined only for one boundary, it is considered an open interval that corresponds
to a greater-than-or-equal-to or less-than-or-equal-to search condition.
The following example demonstrates locating all “Ford Mustangs” (using the
sample schema above):
The code snippet creates a pattern object, specifying the value for the “vendor”
and “model” fields and then calls the Car_index_search() method with the same
pattern object passed twice which indicates to the eXtremeDB runtime that the
lower and upper boundaries of the search are the equal. Once the search returns,
the application can traverse the result set using the standard eXtremeDB cursor
mechanism. Note that the order of objects in the selection is unpredictable, but
only objects that match the specified search criteria are returned.
Note: It is necessary to keep the pattern objects as long as the cursor is being
used.
Car_vendor_put(&till, STR(ford));
Car_price_put(&till, 30000);
Car_year_put(&from, 2000); Car_year_put(&till, 2006);
Car_mileage_put(&till, 100000);
printf("Range query results:\n");
Car_index_index_cursor(t, &cursor);
rc = Car_index_search(t, &cursor, &from, &till);
while (rc == MCO_S_OK) {
Car choice;
Car_from_cursor(t, &cursor, &choice);
print_car(&choice);
rc = mco_cursor_next(t, &cursor);
}
Car_delete(&from); /* delete pattern */
Car_delete(&till); /* delete pattern */
mco_trans_commit(t);
}
The code snippet above demonstrates selecting Ford models with the price no
larger than 30000, and model year between 2000 and 2006 and mileage no greater
than 100000. Note that it is possible to pass NULL instead of one or both
boundary values. Specifically, the function call Car_index_search(t, &cursor, 0,
0) would return all objects.
class SpatialObject {
Int4 left;
Int4 top;
int4 right;
int4 bottom;
int type;
…
kdtree <left, top, right, bottom, type> index;
};
SpatialObject low;
SpatialObject high;
SpatialObject_new(&low);
SpatialObject_new(&high);
SpatialObject_left_put(&low, LEFT);
SpatialObject_right_put(&high, RIGHT);
SpatialObject_top_put(&low, TOP);
SpatialObject_bottom_put(&high, BOTTOM);
SpatialObject_type_put(&high, RED);
SpatialObject_index_index(trans, &cursor);
int rc = SpatialObject_index_search(trans, &cursor, &low, &high);
while (rc == MCO_S_OK) {
SpatialObject obj;
SpatialObject_from_cursor(trans, &cursor, &obj);
...
rc = mco_cursor_next(trans, &cursor);
}
SpatialObject_left_put(&high, RIGHT);
SpatialObject_right_put(&low, LEFT);
SpatialObject_top_put(&high, BOTTOM);
SpatialObject_bottom_put(&low, TOP);
SpatialObject_index_index(trans, &cursor);
int rc = SpatialObject_index_search(trans, &cursor, &low, &high);
while (rc == MCO_S_OK) {
SpatialObject obj;
SpatialObject_from_cursor(trans, &cursor, &obj);
...
rc = mcp_cursor_next(trans, &cursor);
}
kdtree indexes are inherently unbalanced. While they are supported for persistent
classes in eXtremeDB, because they’re unbalanced, the performance may be sub-
optimal and therefore kdtree indexes are most useful for transient (in-memory)
classes.
Chapter 6: Programming
Considerations
Database application develop introduces issues that go beyond the usual
embedded systems development. This chapter describes some of these and the
eXtremeDB “best practices” to manage them.
• Status codes that indicate a state of the database runtime (MCO_S_*). If not
handled, these could lead to an error condition.
• Error codes that indicate that the runtime failed to perform the requested
operation (MCO_E_*). The application can handle the error and continue
running. If not handled, these could lead to an exception.
• Fatal error codes (exceptions) that indicate that any further use of the database
runtime is not possible. A fatal exception is usually a sign of a bug in either
the application code or the database runtime code.
The actual values of status codes and error codes are enumerated in mco.h. Status
codes are return codes that are less than or equal to 50 and have #define names
that are prefixed with MCO_S_. Error codes are return codes that are greater than
50 and have #define names that are prefixed with MCO_E_.
Status codes don’t indicate an error in a method, but merely the state of an
operation. For example, every eXtremeDB method, if successful, returns
MCO_S_OK, or if a search function finds no objects corresponding to the specified
key value, the status code MCO_S_NOTFOUND is returned.
Error codes, in contrast, indicate the runtime’s failure to complete a request. For
example, if an invalid handle has been passed to a method,
MCO_E_INVALID_HANDLE is returned. A status code returned by a function does
not affect the state of the transaction context within which the function was
executed, while a function returning an error code causes the enclosing
transaction to enter an error state.
The error state of the transaction is remembered by the eXtremeDB runtime and
any subsequent call to an eXtremeDB function within that transaction will return
with the MCO_E_TRANSACT code. In this case, the eXtremeDB runtime does not
attempt to execute the method, and no changes will be applied to the database.
Being aware of this can greatly simplify your application code, while keeping the
code size to a minimum. For example, it is not uncommon (and many vendors
recommend) to check the return code after every call to a library function. This
leads to source code that looks like one of:
In contrast, when programming with eXtremeDB you may simply check the return
code on each iteration of the loop:
uint2 foo() {
uint2 rc, i;
for( i = rc = 0; i < 10 && MCO_S_OK == rc ; i++ )
{
rc = func1();
rc |= func2();
rc |= func3();
}
return rc;
}
The third category of errors, fatal errors, are unrecoverable and cause the
eXtremeDB runtime to call the function mco_stop(). This function performs the
role of an assertion internal to eXtremeDB. If an error handler has been registered
(see mco_error_set_handler()), mco_stop() will, in turn, call this custom error
handler. Otherwise mco_stop() will enter an infinite loop. It is common practice
in embedded systems to employ a “watchdog” process. If the watchdog does not
receive a periodic message from the application process, it forces a reboot. So
entering an infinite loop will cause the application will trigger the watchdog to
reboot.
The mco_stop() function is only called when the eXtremeDB runtime detects an
unrecoverable error, such as a corrupted stack. In such a case, a reboot is the only
viable course of action. As well, any runtime function call can be asserted and
mco_stop() called if the assertion fails. This usually means that the application
did something illegal from the eXtremeDB runtime’s point of view, such as
passed an invalid transaction or object handle to a runtime function, or corrupted
the runtime internals in some way.
Further, any runtime function might perform a number of validations that can
result in a failed assertion. These validations vary depending on the
CHECK_LEVEL set in the runtime when eXtremeDB is compiled. The object code
distribution includes two runtimes: the debug runtime, which has the highest
CHECK_LEVEL and the release runtime, which has the lowest (minimal validations
are performed). Although the release runtime does some validations, these have
no negative impact on the overall performance. Developers are strongly advised
to use the debug runtime during the development cycle. Then, only when no fatal
errors are reported by eXtremeDB, switch to the release runtime. The only reason
one would use the release version during the development phase is to measure
application performance.
• Set a breakpoint inside the error handler, and run the application in the
debugger to examine the application’s call when the error occurs.
• Note the last runtime function called and any other relevant information in
the stack trace.
• Check the appropriate application entity right before the fatal runtime call
was issued and make sure that the entity - transaction handle, object
handle, heap memory, etc. - is in fact corrupted.
• Go back through the stack and try to find the application code where
the entity was corrupted.
The following example demonstrates this procedure (note that this code is taken
from the 06_errorhandling_fatalerr sample):
mco_error_set_handler( &errhandler );
...
rc = mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);
if ( MCO_S_OK == rc ) {
printf("\n\n\tThe following attempt to create a new record\n"
"\tshould cause the Error handler to be called with Fatal\n"
"\tError 340049 because it requires a READ_WRITE transaction.\n"
"\tThe type of transaction started was MCO_READ_ONLY...\n"
"\tNote: you will get error code instead of fatal error if\n"
"\tthe program was linked not against _check runtime\n");
/* anObject_new() should fail with error code 340049 =
MCO_ERR_TRN+49 */
rc = anObject_new(t, &rec);
if ( MCO_S_OK == rc ) {
rc = anObject_data_put(&rec, data);
When the above code is executed it causes the error handler to be called with the
error code 340049 which generates the following output:
Checking Appendix B, the error code value of 340000 corresponds to the constant
MCO_ERR_TRN. This indicates an error in the transaction being performed. (The
added value of 49 indicates the line within the runtime function where the
assertion failed, causing mco_stop() to be called. This is useful if it is necessary
to contact McObject support – or if the developer has a source code license. For a
more detailed explanation of error codes see Appendix B.)
Following is the call stack (as displayed by the Visual Studio 2008 debugger):
06-errorhandling-fatalerr.exe!mco_w_new_obj_noid(mco_trans_t_ * t=0x004e0378,
unsigned int init_size=4, unsigned short class_code=1, mco_objhandle_t_ *
ret=0x0012facc) Line 494 + 0x14 bytes
06-errorhandling-fatalerr.exe!anObject_new(mco_trans_t_ * t=0x004e0378,
anObject_ * handle=0x0012facc) Line 120 + 0x2f bytes
If the developer has a source code license, the debugging technique is slightly
different. In this case it would be prudent to set a breakpoint in the mco_stop()
function itself. This results in the following call stack:
06-errorhandling-fatalerr.exe!mco_w_new_obj_noid(mco_trans_t_ * t=0x004e0378,
unsigned int init_size=4, unsigned short class_code=1, mco_objhandle_t_ *
ret=0x0012facc) Line 494 + 0x14 bytes
06-errorhandling-fatalerr.exe!anObject_new(mco_trans_t_ * t=0x004e0378,
anObject_ * handle=0x0012facc) Line 120 + 0x2f bytes
Here again the preceding line in the call stack indicates that function
mco_w_new_obj_noid() failed and the same chain of logic makes it clear that the
solution is to correct the transaction type.
int pid ;
#ifdef _WIN32
pid = GetCurrentProcessId();
#else
pid = getpid();
#endif
mco_db_connect_ctx(dbName, &pid, &db);
Note: It is also necessary to specify the size of this connection context in the
database parameters passed to mco_db_open_dev(). For example:
db_params.connection_context_size = sizeof(int);
while (1) {
mco_db_sniffer(db, sniffer_callback,
MCO_SNIFFER_INSPECT_ACTIVE_CONNECTIONS));
sleep(SNIFFER_INTERRVAL);
}
mco_db_disconnect(db);
THREAD_RETURN(0);
}
Recovery actually consists of two stages. In the first stage we “grab” the dead
connection. Each connection has private (process specified) pointers which must
be adjusted to be used in the context of the process performing recovery. In the
second stage, internal functions are called to rollback any transactions that might
have been in progress and to release the dead connections’ data structures.
mco_db_params_t db_params;
...
mco_db_params_init(&db_params);
...
if (...) {
db_params.mode_mask |= MCO_DB_OPEN_EXISTING;
}
...
rc = mco_db_open_dev(db_name... , &db_params);
The database runtime performs the necessary steps to ensure the consistency of
the database metadata and the database content. If mco_db_open_dev() returns
MC_S_OK to the application, the application is able to connect to the database
normally by calling mco_db_connect().
Note that database recovery can fail under certain conditions (such as application
errors that corrupt the database runtime metadata). If recovery fails,
mco_db_open_dev() returns an error code. Please refer to the “Recovery from
failed processes” section above and the mco_db_sniffer() section in the Reference
Guide for further discussion about eXtremeDB recovery procedures. Also refer to
the NVRAM sample located in the /samples/core/02-open/nvram directory.
When the MVCC transaction manager is used, in the case of a crash, a persistent
database can contain undeleted old versions and working copies. Their presence
will not break the consistency of the database and doesn't prevent the normal
working of an application, but does unnecessarily consume space. Detecting these
stale object versions requires a complete scan of the database. For this reason the
recovery process doesn't perform this function automatically. Instead, the
removal of the unused versions can be performed explicitly by calling the
mco_disk_database_vacuum() function:
Alternatively, the application can enable the repair process by setting the
MCO_DB_MODE_MVCC_AUTO_VACUUM mode mask in the mco_db_params_t
when calling mco_db_open_dev ().
Database Security
eXtremeDB (version 4.1 and later) provides two separate security mechanisms for
persistent databases: a page-level CRC32 check and database encryption through a
private key. These mechanisms can be used separately, or in combination. Both
mechanisms are enabled at the time the database is created. Once the CRC or
encryption is enabled, and the persistent database is allocated, it is not possible to
disable security in the current, or future, database sessions. Note that both security
mechanisms are page-level based, hence all data and indexes are protected.
CRC-32
The page-level CRC stores a 32-bit CRC32 for each page in the database. The CRC
is verified every time the page is loaded from persistent storage to memory. If the
database content is changed outside the database runtime, the CRC will not match,
unless it is also modified along with the page content.
mco_db_params_init ( &db_params );
…
db_params.mode_mask |= MCO_DB_USE_CRC_CHECK;
..
rc = mco_db_open_dev(db_name, …, &db_params );
By default the CRC is not calculated. If a page CRC does not match, the database
runtime returns MCO_E_DISK_CRC_MISMATCH (126) error code every time an
attempt is made to read the database (including an index lookup). In the debug
version of the runtime, a mismatched CRC leads to a fatal assertion (mco_stop() ).
Encryption
eXtremeDB encryption allows an application to read and write encrypted
databases using a page-level standard RC4 encryption algorithm (for an
explanation of RC4 see http://en.wikipedia.org/wiki/RC4). Both the content of the
database and the log files are encrypted.
db_params.cipher_key = "welcome";
It is not entirely impossible to break the encryption (there have been a number of
advances in this area). However combined with the CRC, we can make a rather
strong claim for security.
Cache Management
could, for example, be used to fine-tune the application’s caching policies (see
Prioritized Cache, below).
Connection cache
In addition to the disk manager cache (also often referred to as a “page pool”),
eXtremeDB (version 4.1 and later) provides a per-connection cache. The database
runtime “pins” a predefined number of pages from the page pool for each
connection. This is referred to as a “connection cache”. When a transaction loads
pages into the page pool, and the total number of pages loaded from the media is
less than the size of the connection cache, the database runtime makes sure that
these pages stay in the cache until the transaction is committed, or the database
connection is broken. By default the size of the connection cache is set to four
pages. It is not possible to modify the connection cache size (nor does it make
sense to).
The connection cache is enabled by default. The runtime provides two functions
that allow application control over the connection cache:
The first function enables or disables the connection cache. Passing MCO_YES or
MCO_NO as the ‘enable’ parameter value enables and disables the cache. The
function returns the current state of the connection cache. The second function
commits the connection cache (resets) to the database
These two functions address a scenario with many connections and long-lasting
transactions. In this scenario, the connection cache could cause the page pool to
run out of free pages (a new transaction allocates its own connection cache, but
long transactions prevent those pages to be release back to the shared page pool).
To address this the connection cache could be turned off or reset often. Under
normal circumstances, the application does not need to control the connection
cache.
Prioritized cache
eXtremeDB (version 4.1 and later) improves on basic Least Recently Used (LRU)
cache policies by allowing applications to influence how long certain pages
remain in the disk manager cache. The crux of the improvement is in adding a
cache priority property to each page. When the LRU algorithm locates a
"victim", instead of immediately releasing the page (removing the page from the
tail of the L2 link list), the algorithm inspects its caching_priority field. If the
value is not zero, the caching_priority is decremented and the page is re-linked to
the beginning of the L2-list. A caching priority of zero means the default
behavior. A caching priority of 1 indicates that the page will be moved from the
head to the tail of the LRU list twice. A caching priority 2 means three loops
through the LRU list, and so on. The higher the priority, the longer the page
remains linked to the LRU list (ie. stays in cache).
At the time the database is created, the application can assign priorities to indexes,
memory allocator bitmap pages and object pages (excluding BLOBs). The
priorities are assigned through the mco_db_params_t_ structure
index_caching_priority, allocation_bitmap_caching_priority and
objects_caching_priority fields. By default all pages have the same priority (zero).
It is possible to change the caching priority for a class at runtime through the
generated classname_set_caching_priority API. Using the preset object priority as
a baseline, the relative priorities of some classes can be adjusted. For example,
large and rarely accessed objects can be assigned lower priority, while small
frequently accessed classes can be assigned a higher priority. The caching priority
assigned at runtime is stored in the database and is used until it is explicitly
overwritten.
Multi-file databases
eXtremeDB supports three types of multi-file devices for persistent databases:
In all three cases there are two ways of defining the file segments:
Disk IO
For persistent databases, disk I/O (reading from and writing to the disk) are the
most “expensive” operations in performance terms. To minimize the effect of disk
I/O, eXtremeDB implements a Disk Manager Cache that interacts with the
Operating System’s File system cache as shown in the diagram below:
Database transaction
DM Cache (database
runtime)
To make intelligent decisions that will optimize the eXtremeDB Disk Manager
Cache performance for a specific application’s needs, the following section
explains possible impacts of Cache Size, Transaction Logging and Commit
Strategies.
Cache Size
Similar to the memory pool for an in-memory database, the cache for an on-disk
database is created by the application and the address and size of the memory are
max_database_size / page_size / 8.
Database transaction
write_file() API
_sync() API
The impact on performance from how transactions are recorded in the log file is
determined by the selection of the Logging Policy. Non-buffered I/O is slow so
file I/O is usually buffered so that the database write() operation does not write
data directly to the persistent media but rather to the file system buffer. Then the
file system will “flush” buffered data to disk during a file system _commit() or
_sync().
The eXtremeDB Logging policy controls when the changes are committed to the
persistent storage.
the event of a hardware or software failure, the runtime can recover the database
using this log.
Transaction Logging does not alter the all in-memory architecture of eXtremeDB
which retains a performance advantage over disk-based databases. Read
performance is unaffected by transaction logging and write performance will far
exceed write performance of traditional disk-based databases. The reason is
simple: eXtremeDB transaction logging requires exactly one write to the file
system for one database transaction. A disk-based database, however, will
perform many writes per transaction (data pages, index pages, transaction log, etc)
and the larger the transaction and the more indexes that are modified, the more
writes that are necessary.
If this option is selected, transaction processing is turned off, and a log file is not
created. This will significantly increase update performance, but the application
will not be able to recover the database in the event of a crash, and transaction
rollback is also not available. This mode can be useful when the application needs
to quickly populate the database file. We recommend not using this option under
any other circumstances. As an alternative, consider setting the “relaxed
durability” mode via the mco_disk_transaction_policy() API. (See section
“Tuning the Logging Strategy to Your Application Needs” below.)
Note: Once the database file is created, the database can be re-opened in either of
transactional modes described below.
page pool (cache) and guaranteed to never get swapped out during the transaction
(“no steal” policy). Upon transaction commit, all updated pages are first written
into the log and then committed (flushed) to the permanent storage. Only then are
updated pages written to the database (but don’t get flushed). If during the commit
the log size becomes larger than the threshold specified by the
mco_disk_set_log_params(), a checkpoint is created: all updated pages are written
to disk, updates are flushed to the permanent storage and the log is truncated.
The obvious benefit of the Redo policy is a significantly reduced number of disk
writes, since only the log file needs to be flushed to disk at the time of transaction
commit. Furthermore, the log file is written sequentially, and so the cost of
syncing the log is much less than the cost of flushing the data pages. The
disadvantage of using WAL is that the algorithm can run out of memory when
there are many uncommitted updates. The transaction size is limited to the size of
the page pool (cache). Every time a page is made “dirty” (anything is changed on
the page), it must remain in cache. Our implementation does not allow any
swapping.
_put() mco_commit()
database cache
_write();
_sync();
_write();
REDO Logging
WAL’s central concept is that changes to the data must be written only after those
changes have been logged; that is, when log records have been committed to the
permanent storage.
When the Undo logging strategy is used, the log file contains entries that allow
the current transaction’s updates to be un-done. Briefly, the eXtremeDB
implementation of this approach is as follows: During the update, the runtime
marks the containing page as “dirty” and flags it in the bitmap of modified
pages, and the original page is written to the log file. Regardless of the number of
times the individual page is changed during the transaction, the original image of
the page is written to the log file only once. When the transaction is committed,
all modified pages are written and flushed to the database file and then the log file
is truncated. The recovery and the rollback procedures read all saved pages from
the log file, restoring the original images of the pages from the log file and
clearing the “dirty” bit for the page.
The advantages of using Undo Logging are that the algorithm never runs out of
memory and provides easy and efficient recovery. The disadvantages are that all
updates must be flushed to the database file at commit time. Writes to the
database file are usually random and are slower than writes to the log file, which
are sequential.
write to bitmap
database cache
_write();
_sync();
_write();
_sync();
UNDO Logging
Transaction Control
A transaction is a unit of work with the database (a single logical operation on the
data). eXtremeDB supports transactions that enforce the ACID properties. The
ACID (an acronym for Atomicity, Consistency, Isolation, Durability) model is one
of the oldest and most important concepts of database theory. It establishes four
goals that a database management system must strive to achieve: atomicity,
consistency, isolation, and durability. No database that fails to meet any of these
four goals can be considered reliable.
Atomicity states that database modifications must follow an “all or nothing” rule.
Each transaction is said to be “atomic.” If one part of the transaction fails, the
entire transaction fails. It is critical that the database management system
maintain the atomic nature of transactions in spite of any DBMS, operating
system or hardware failure.
Consistency states that only valid data will be written to the database. If, for
some reason, a transaction is executed that violates the database’s consistency
rules, the entire transaction will be rolled back and the database will be restored to
a state consistent with those rules. On the other hand, if a transaction successfully
executes, it will take the database from one state that is consistent with the rules to
another state that is also consistent with the rules.
Isolation requires that multiple transactions occurring at the same time not impact
each other’s execution. For example, if Joe issues a transaction against a database
at the same time that Mary issues a different transaction, both transactions should
operate on the database in an isolated manner. This prevents Joe’s transaction
Durability ensures that any transaction committed to the database will not be lost.
Durability is ensured through the use of database backups and transaction logs
that facilitate the restoration of committed transactions in spite of any subsequent
software or hardware failures.
eXtremeDB enforces the ACID principles by requiring that all database access is
done within the context of a transaction. A transaction is said to be durable if,
upon return of control to the application after a transaction commit, the
transaction data can be recovered in the event of a failure of the application or the
system (assuming that the media on which the database and/or transaction log
itself is not compromised/corrupted). The previous section discussed how to
determine the logging policy for database recovery, below is explained how to
control the manner in which database changes are committed to disk.
In addition to choosing the Transaction Logging Policy, you must also choose the
Transaction Commit Policy. In order to guarantee the durability of transactions,
database systems must force all updates to be written through the database cache,
and the file system cache, onto the physical media (be it solid state or spinning
media). This flushing of the file system buffers is an expensive operation (in
terms of performance), but is the only way to guarantee the transaction Durability.
MCO_COMMIT_SYNC_FLUSH
This policy indicates that a database commit flushes the cache, synchronizes the
file system buffers for both database and log files and truncates the log file
(UNDO_LOG only). This policy provides durable transactions. The database can
be corrupted only if the physical media where the database and log files are
located is damaged.
DM cache
FS Cache
MCO_COMMIT_BUFFERED
This policy indicates that the database cache does not get flushed to disk upon
transaction commit. Pages that were marked dirty by the current transaction are
left in the database cache. That applies to both the database and the log file pages.
This policy significantly reduces the number of I/O operations; the runtime only
writes dirty pages to disk during normal swapping. In case of application failure,
the database cache is destroyed and all changes made by all transactions
committed after the policy was set could be lost.
DM cache
FS Cache
MCO_COMMIT_NO_SYNC
This policy indicates that the database runtime does not explicitly synchronize the
file system buffers with the file system media. Upon transaction commit, all
changes made by the transaction are transferred from the application space to the
operating system space and the log file is truncated (UNDO_LOG only). It is up to
the file system to determine when the data is actually written to the media. This
mode provides some performance advantages over the full synchronization mode,
but also risks losing transaction data in the event of a system crash (while
committed transactions are still in the file system cache).
DM cache
FS Cache
Note: Failure will not cause database corruption, provided that the hardware and the
operating system are working properly. In this mode, the database is restored to a
consistent state from the log file when the application is restarted. It is assured that the
state of the database will be at least the same as it had been prior to setting the
MCO_COMMIT_NO_SYNC mode.
MCO_COMMIT_DELAYED
The size of the log file is checked only in mco_trans_commit(). If the log file size
is less than the specified threshold, mco_trans_commit() does not commit the
transaction to disk (as it is in the case of the MCO_COMMIT_NO_SYNC policy). If
the size exceeds the threshold, then the entire log is committed to disk (as it is in
the case of the MCO_COMMIT_SYNC_FLUSH policy).
This commit mode is only available if the logging policy is set to REDO log.
Since the log file size is checked only in the mco_trans_commit(), it is still
possible to run out of page pool if the total size of all pages that have been
modified by the transaction exceeds half of the page pool. The transaction log is
not truncated after the commit. This is still controlled by the redo_log_limit
aparmeter.
For the REDO_LOG transaction logging policy, the maximum size of the log file
can be established by calling the mco_disk_set_log_params() function. This
function must be called after mco_db_open_dev() (which establishes the
transaction logging policy).
In REDO_LOG mode, the application must establish the maximum size of the log
file. Once this size is reached, the runtime will commit changes to the database
file and truncate the log.
Cache Size
Like the memory pool for an in-memory database, the cache for an on-disk
database is created by the application and the address and size of the memory are
passed as parameters to the mco_db_open_dev() function. The memory can be
either shared memory or conventional memory. It must be shared memory if two
or more processes are to share the database.
When the MCO_UNDO transaction logging policy is used, eXtremeDB uses a “dirty
pages bitmap” to keep track of what pages can be purged from the cache during a
READ_WRITE transaction. The bitmap is allocated from the cache memory and its
size can be roughly calculated as max_database_size / page_size / 8. If the
database size is MCO_INFINITE_DATABASE_SIZE, the size of the bitmap is set to
1/16 of the size of the cache. The bitmap size remains unchanged until the disk
manager is destroyed.
case, the bitmap is allocated on the eXtremeDB heap. To avoid heap overflow the
application should set the disk_max_database_size parameter of the
mco_db_params_t structure appropriately. By default the eXtremeDB runtime
reserves 256K (512K in 64-bit configurations) for the extendable bitmap on the
eXtremeDB external heap (which will map a 128M database with a 128 byte
database page size).
Note: The dirty page bitmap is used only in case of the UNDO logging policy and is not
used for the REDO logging.
Objectives
This section explains some database design considerations with respect to
eXtremeDB. It is not an exhaustive treatment of the topic of database design.
That is a very large subject and well beyond the scope of this document. Rather,
our objective is to shed light on the workings of eXtremeDB in order that
developers can make informed database design decisions choosing from the many
available options.
Logical design considerations involve how you will conceptually organize the
data: what objects you will define, how they are interrelated, what means you will
employ to affect the relationships, what access methods are required, and the
performance requirements of your application. In this process, you will decide
what indexes are needed, which will be hash indexes and which will be tree
indexes, which classes will have the list property for sequential access, whether an
Object Identifier (oid) will be needed and what its structure should be, which
classes will have oids, whether to implement interclass relationships via oid and
ref, autoid and autoid_t, via indexes, or to de-normalize and use a vector instead.
The physical design considerations are page size, initial database size, incremental
extensions, whether certain fields can be optional, and whether classes can be
compact.
Page Size
As a rule of thumb, page size should be between 60 and 512 bytes; a 100 byte
page size works fine in most situations. Page size should be a multiple of 4, and
if it is not the runtime will adjust it internally. Almost all memory used by the
database instance is used to store objects or index (tree or hash) data. The
overhead imposed by the index memory managers is not really affected by the
page size when the page size is larger than 60 bytes. This is because the fixed
part of the index control data is typically between 4 and 12 bytes per page.
Therefore, the page size mostly affects the overhead imposed by object layout
managers.
Objects that have dynamic data fields, such as strings or vectors, always occupy
whole pages. Multiple fixed size objects can share a page. This means, for
example, that if the page size is 100 bytes and some object with dynamic fields
took 440 bytes including all control data, then 60 bytes (= 5*100 – 440) would be
wasted. It is not really possible to determine in advance the exact optimal page
size. It depends on what will be the object size distribution in the real world (at
runtime, what will be the actual sizes of the dynamic data), what will be the
dynamic sequence of operations, and what type of objects will be stored most
frequently, etc. To determine runtime memory requirements the calculator
functionality described in section “Database Status and Statistics Interfaces” of
Chapter 4 can be very helpful. The statistics generated by the calculator make it
easy to adjust the page size parameter in order to reduce the average memory
overhead for specific tests or the actual application itself.
This number is used by the runtime to build the OID hash table. Hash conflict
resolution algorithms are optimized for a certain number of entries, so the best
hash table size really depends on the number of hash entries (the number of
objects assigned an OID). There is no real harm in being somewhat off with the
estimate. But if the estimate is far from the real number of objects, the
performance penalty will be significant (for instance if the estimate is 100 and the
number of objects is tens of thousands). If the estimated number is too small, the
hash table will be smaller and conflicts will happen more frequently. Modest
underestimates will result in some insignificant performance penalty. If the
number is too large, conflicts will happen less frequently but the hash table will
take more memory. This parameter presents a familiar tradeoff between speed
and size. Providing a larger estimate for this number will improve hash index
performance at the cost of extra space allocated for hash entries.
Classes should use an OID if the application data model has one. In other words,
if objects described in the schema have some native identifying information and
that information is common between objects of different types, then an OID is a
natural way to represent this model. If the application’s objects are identified
differently depending on their type, then an OID should not be used. The OID
can have several fields, but they must be of fixed size.
Use of Structures
Note: eXtremeDB allows indexing by structure field(s) even when the structure
is used as a vector field.
The compact class qualifier limits the size of the class’ elements to 64K. This is
because 2-byte offsets are used instead of 4-byte offsets to address within each
object’s layout. Obviously, there is an overhead imposed by eXtremeDB to
support certain data layouts. A large portion of this overhead is due to the fact
that we support dynamic data types such as vectors, strings and optional fields.
For instance, each string field is implemented as an offset to the actual data. For a
compact class this offset is 2 bytes, otherwise it is 4 bytes. Another example is an
optional field. It is common in applications for some data to not be known at the
time of creation for a particular object. Instead of reserving space for such data
within each object, it can be declared as optional. eXtremeDB will place an offset
to the actual data within the data layout. Then if data is not present (or has been
erased) this offset is null. The space for the structure is only allocated when
necessary to store the data. All these offsets are 2-bytes in the compact model.
Note: The total 64K limit of a compact object size does not include BLOBs
defined for the class. It is still possible to have a large BLOB (> 64K in size) for
compact classes. Addressing within BLOBs is not affected by the compact
declaration.
You can use the –c or –compact mcocomp schema compiler options to make all
classes of a database compact.
For example, consider a class that contains two string fields and one optional
structure. For 1000 objects of this class, the compact declaration would save (at
least) 3*2*1000 = 6000 bytes of overhead (3 fields, 2 bytes less overhead each,
times 1000 objects equals 6,000 bytes).
The only limitation with compact classes is the total size of an object, 64K. If it is
known that objects of a class will always require less than 64K it is beneficial to
use the compact qualifier.
The “char<n>” declaration defines a fixed length byte array of ‘n’ bytes (where n
<= 64K). The “string” declaration defines a variable length byte array <= 64K.
In the case of char<n>, ‘n’ bytes will be consumed for this field by every object
(instance of the class). It is best to use char<n> when exactly ‘n’ bytes are used in
every instance, as for example in a social security number field that is a required
entry. In the case of a string element, eXtremeDB imposes 2 or 4-bytes overhead
(depending on the compact qualifier, see above) for each instance.
Blob
An object having K allocated blobs has (4 + 8*K) bytes allocated within the
object layout. A 32-byte header is written for each blob when it is stored, within
the first blob page, plus 8 bytes for the 2nd through N-th pages.
Vector
Like string and blob, a minimum of 2-bytes or 4-bytes of overhead is imposed for
each vector of each object. If the vector is of structures or strings, then the
overhead is 2 * (N+1) (compact) or 4 * (N+1) (normal) where N is the number of
elements in the vector. If the vector is a simple type, the overhead is only 2
(compact) or 4 (normal) bytes.
Vectors, unlike blobs, have structure. The elements of a vector can be efficiently
located by offset within the vector. In contrast, blob access methods are like
sequential file access—the blob is a sequence of bytes, exactly as a file is.
Because of this, it is always better to use a vector when the data is regular in
nature and needs to be accessed by the element number.
This discussion applies to char/string (and Unicode variants) and fixed-size arrays
versus vectors.
eXtremeDB stores all fixed-size elements of a class on a single page and variable
length elements on separate pages (to allow them to grow). The page with the
fixed-size elements contains a 2- or 4-byte offset for each variable length field.
As a consequence, using variable length fields may actually use database space
less efficiently than defining a fixed length element even knowing that a portion
of the fixed length element may go unused.
For example, suppose we have a page size of 100 bytes and a character field that
might hold between 8 and 50 characters, with the average length being 18. A field
definition of char<50> will leave 32 bytes, on average, unused. But a string field
will use 2 (or 4) extra bytes and leave at least 50 bytes unused on the page that
eXtremeDB uses for the string field. In this circumstance, a fixed length character
field would be better. Conversely, a character field that must allow for up to 256
bytes would be better defined as a string field.
The same basic principle applies to the choice of fixed length arrays or variable
length vectors.
Voluntary Indexes
Voluntary indexes can be created, used for a series of transactions and then
destroyed. Because index creation is a relatively “heavy” operation, it does not
make sense to always create an index if all that is needed is to perform a few
searches at some particular time during execution. In this case the indexes can be
declared voluntary and built as needed prior to the search operation.
Voluntary indexes use the same algorithms and consume the same space as
regular indexes; they differ only by their ability to be created and destroyed
dynamically, and by the fact that voluntary indexes are not created automatically
when the database instance is created.
Also, it’s important to note that only tree type indexes can be voluntary.
eXtremeDB provides hash index algorithms and tree index algorithms of a rich
variety of types, modified for efficient operations in memory. The b-tree index
algorithm is the most general; it can be used for all kinds of searches and for
ordered fetches. A b-tree index can be unique or not unique and is searchable by
ranges and partial key values. In addition to the b-tree, eXtremeDB provides
specialized tree indexes including “Patricia Trie”, “R-Tree” and “Kd-Tree” that
are described in detail in section “Search Methodes” in chapter 5. A hash index is
suitable for search by equality only and can also be unique or non-unique. Hash
indexes can exhibit better average performance, for both insert and lookup
operations, compared to a tree index, but this also depends on the initial hash table
size and on key distribution. A hash index does not guarantee search time;
theoretically all different key values can produce the same hash value, and the
search will be very slow. So eXtremeDB implements a “dynamic hash table” to
optimize performance.
Because a sequential search of the linked list of key values for a given hash can
result in inefficient lookup times, if the size of the hash table is too small with
respect to the total number of objects in the class being indexed, the hash table is
rebuilt when necessary by extending the table size. The initial hash table is
allocated using the estimated number of objects specified for this class in the
database schema. The hash_load_factor parameter (a percentage value) passed
to mco_db_open_dev() in the mco_db_params_t structure is used to determine
when to extend (reallocate) the hash table. For example, if the initial hash table
size is 1000 and the hash_load_factor parameter is 50 (ie. 50%), then the hash
table will be extended when the 501st object is inserted; if hash_load_factor is
150, then the hash table will be extended when the 1501st object is inserted.
Memory consumption is comparable for tree and hash indexes. A rough estimate
for a tree index is 10 bytes per entry (exact size depends on the order of
insertions/deletions); and H + 8 bytes per entry for a hash index, where the
constant H is fixed size space taken by the hash table and can be calculated as E /
5 * 4 where E is the estimated number of hash entries provided by you in the
database schema and 5 is a constant hash factor used by eXtremeDB. If
reallocation of the hash table is necessary, then the size will be H * 2.
List Attribute
Each list declaration will create an additional dynamic structure, which will
consume resources similar to those taken by a tree index. The list declaration is
useful when:
An object is characterized, in part, by the fact that when it is deleted all of its
dependent parts are deleted. To accomplish this, these dependent parts of an
object are stored using an object layout manager. In order to express one-to-many
relationships between parts of the object it may be very efficient to use a vector.
For example, a vector of strings will take only 2 or 4 bytes for the vector itself
plus 2 or 4 bytes overhead per string, whereas making a separate object from each
string will require at least one page for each string, so the overhead may be more
significant. Vectors are useful when the object model in an application already
contains dynamically structured items. Say, for example, an application needs to
collect radar measurements for various “Targets”, and that each measurement is a
set of structures. Its database could be defined as follows:
struct Target
{
uint2 x;
uint2 y;
uint2 dx;
uint2 dy;
uint2 type;
};
class Measurement
{
uint4 timestamp;
uint4 radar_number;
vector< Target > targets;
}
class Target2
{
uint4 m_unique_id; // ref to measurement
uint2 x;
uint2 y;
uint2 dx;
uint2 dy;
uint2 type;
}
class Measurement2
{
uint4 m_unique_id;
uint4 timestamp;
uint4 radar_number;
}
faster and will take far less space because fewer objects have to be maintained and
fewer operations have to be performed.
Note: The direct attribute is not allowed for a vector of structures. (See the
explanation in section “Data Definition Language: struct declaration” above.
Write
Using the class handle, assign application values to object fields via different
flavors of “put” methods.
Read
Using the transaction handle, call one of the search methods: list, hash, tree or
oid-based. If the search was successful, obtain a class handle.
The example below illustrates these scenarios in code using the following
extremely simple schema:
Simple Schema
oid;
list;
};
Compiling this schema with the schema compiler mcocomp yields the following
interface header file:
Simple Interface
#ifndef __MCO__simple__H__
#define __MCO__simple__H__
#include "mco.h"
#ifndef MCO_DEBUG_MODE
#error DEBUG mode runtime must be used
#endif
/*-------------------------------------------------------*/
/* Handles and Class Codes */
/*-------------------------------------------------------*/
/* Dictionary */
mco_dictionary_h simple_getDictionary();
/*-------------------------------------------------------*/
/* Object Id definitions */
/*-------------------------------------------------------*/
/* class SimpleClass methods */
MCO_RET SimpleClass_UniqueIndex_cursor(
/*IN*/ mco_trans_h t,
/*OUT*/ mco_cursor_h c);
MCO_RET SimpleClass_UniqueIndex_search(
/*IN*/ mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
/*IN*/ OPCODE oper,
/*IN*/ uint2 v1,
/*IN*/ uint2 v2,
/*IN*/ const char *s3,
/*IN*/ uint2 lg3 );
MCO_RET SimpleClass_UniqueIndex_compare(
/*IN*/ mco_trans_h t,
/*IN*/ mco_cursor_h c,
/*IN*/ uint2 v1,
/*IN*/ uint2 v2,
/*IN*/ const char *s3,
/*IN*/ uint2 lg3,
/*OUT*/ int *result );
MCO_RET SimpleClass_UniqueIndex_locate(
/*IN*/ mco_trans_h t,
/*OUT*/ mco_cursor_h c,
/*IN*/ SimpleClass * handle);
MCO_RET SimpleClass_NonUniqueIndex_cursor(
/*IN*/ mco_trans_h t,
/*OUT*/ mco_cursor_h c );
MCO_RET SimpleClass_NonUniqueIndex_search(
/*IN*/ mco_trans_h t,
/*INOUT*/ mco_cursor_h c,
/*IN*/ OPCODE oper,
/*IN*/ uint2 v1,
/*IN*/ const char *s2,
/*IN*/ uint2 lg2 );
MCO_RET SimpleClass_NonUniqueIndex_compare(
/*IN*/ mco_trans_h t,
/*IN*/ mco_cursor_h c,
/*IN*/ uint2 v1,
MCO_RET SimpleClass_NonUniqueIndex_locate(
/*IN*/ mco_trans_h t,
/*OUT*/ mco_cursor_h c,
/*IN*/ SimpleClass * handle);
/*------------------------------------------------------*/
/* struct Id methods */
#endif
The first code fragment creates a database named “SimpleDb” and allocates space
for the data repository starting at some user-defined memory address. After that,
the application connects to the database and obtains a database handle that is used
later for opening transactions.
#include <simple.h>
#define DATABASE_SEGMENT_SIZE 300 * 1024
#define MEMORY_PAGE_SIZE 128
const char * db_name = "SimpleDb";
void main()
{
RC rc;
mco_device_t dev;
mco_db_params_t db_params;
return 0;
}
The next code fragment demonstrates how an application can extend the amount
of memory used for data storage. On some CPU architectures the entire memory
arena is split into several non-contiguous memory regions. An application may
need to use multiple segments in order to store the necessary data. It is also
possible that the maximum amount of memory needed for the database is not
known in advance. eXtremeDB addresses these scenarios with the
mco_db_extend() function. The example also demonstrates the usage of the
reporting functions: mco_db_free_pages() and mco_db_total_pages().
#include <simple.h>
rc = mco_db_disconnect(db);
rc = mco_db_close(db_name);
while (n_segments) {
free(dev[--n_segments].dev.conv.ptr);
}
} else {
/* Connection failed: free memory allocated for main database device
and close the database without resetting rc */
free(dev[0].dev.conv.ptr);
mco_db_close(db_name);
}
}
/* stop eXtremeDB runtime */
mco_runtime_stop();
return ( MCO_S_OK == rc ? 0 : 1 );
}
Populating a Database
The next code fragment illustrates writing to the database. The schema compiler-
generated “new” and “put” interfaces are used to create references to the
persistent data and write data into their permanent locations. “new” methods must
be called within the context of a write transaction. The code fragment also
demonstrates how to obtain a transaction handle, and use it with the “new”
method later on.
Note: Once an object is allocated, all the base type fields except optional fields
are by default initialized with zeros. Strings are made empty strings. In the
example below, it is important to note that a (unique) hash index is declared for
the field “h”. Therefore h must be assigned a unique value each time the function
is called, otherwise mco_trans_commit() will return an error code indicating an
attempt to create a duplicate.
*/
SimpleClass hClass;
simple_oid id;
int donetrn = 0;
char src[] = “abcdefghigklmnop”;
mco_trans_h t;
/*
* Allocate a new class and return a class handle. Two
* input parameters must be passed to the new interface:
* the transaction handle t and class id. If successful the
* output parameter hClass is a pointer to the newly
* allocated class.
*/
rc = SimpleClass_new (t, &id, & hClass);
if( rc ) goto Err;
/* Important!
* We must assign a unique value for this field since
* hash index is declared for it.
*/
SimpleClass_h_put(&hClass, vh );
/*
* commit the transaction unless there is a problem and
* return 1, otherwise rollback and return 0
*/
rc = mco_trans_commit (t); donetrn = 1;
/*
* Important! After the transaction is committed, the
* class handle is no longer valid. Any attempt to
* reference the created object would result in an error
* condition.
*/
if( rc ) goto Err;
return 1;
Err:
printf("\n %d error inserting object: %d", rc);
if( ! donetrn )
mco_trans_rollback (t);
return 0;
}
Search by OID
Searching for an object based on its oid is demonstrated in the next code
fragment. A search operation is performed within the context of a read
transaction.
#include "simple.h"
if( buff ) {
/* read the string */
rc =SimpleClass_c_get(&hClass,
buff,
(uint2)(sz+1),
&actual_sz);
if ( rc == MCO_S_OK )
printf( "\n\t object with oid=%d\n\t a=%d,b=%d,c=%s(%d)\n\n",
id.seq, a, b, buff, actual_sz );
else
printf( "\n\t error:%d", rc );
free(buff);
}
}
mco_trans_commit (t);
return rc == MCO_S_OK;
}
Cursor Operations
The next example demonstrates how an application can use cursors to navigate
the database. When using the MURSIW transaction manager, a cursor created
within a read-only, or read-write transaction, can still be valid after the transaction
is committed. This behavior is different from the behavior of object handles
which are only valid within a transaction. In other words, an application can
create a cursor in one transaction and use it in another.
Note: When using the MVCC transaction manager, a cursor in one transaction
may contain objects that are modified in another transaction. So, applications
should avoid using the same cursor in different transactions.
}
/* commit the transaction. This is only done to
* illuatrate the fact that cursors could be used across
* the transaction boundaries
*/
rc = mco_trans_commit(trn);
/* re-open a transaction */
mco_trans_start ( db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &trn );
/* commit */
mco_trans_commit(trn);
return;
Overview
In order to share the data between multiple processes, the eXtremeDB runtime
creates the database in shared memory. Multiple threads within a process share
the memory of that process. The shared memory that is used by the eXtremeDB
runtime is architecture and operating system dependent. In some environments,
the eXtremeDB runtime uses a System V shared memory mechanism (for
example, Sun Solaris and Linux) while for others it uses POSIX style shared
memory (for example, QNX Neutrino). On Microsoft Windows platforms there
is yet another shared memory mechanism. When a shared memory database is
created, the eXtremeDB runtime allocates two shared memory segments: one for
the eXtremeDB “registry” that keeps information about all database instances
created on the machine, and another segment for the data itself. The eXtremeDB
runtime shared memory implementation details are hidden from applications and
all the interactions with the database are done via eXtremeDB standard interfaces.
Implementation
Start Up
MCO_RET mco_runtime_start (void);
MCO_RET open_shared_db(
const char * db_name, /* name of the database */
mco_dictionary_h dict, /* pointer to schema */
mco_size_t db_sz, /* size of memory segment for in-mem part
* of the db */
uint2 mem_pg_sz, /* size of memory page */
uint2 max_conn_no /* max. number of connections */
)
{
mco_runtime_info_t info;
mco_db_params_t db_params;
mco_device_t dev;
db_params.mem_page_size = mem_pg_sz;
db_params.disk_page_size = 0;
db_params.db_max_connections = max_conn_no;
Note: When using the eXtremeDB Direct Pointer Aritmetic library (DP) it is
necessary to map the shared memory segment to the same virtual memory
address in every process because in the DP implementation eXtremeDB uses
actual memory addresses (i.e. it performs pointer arithmetic to calculate the
locations of objects in an eXtremeDB database). The pointers must be the same
in every running instance of an eXtremeDB-based application, or pointer
arithmetic just doesn’t work. Setting the dev.named.hint parameter to zero causes
eXtremeDB to determine the actual shared memory segment address. But this
could fail when called from a second process attempting to open the shared
database. In this case it is the applications responsibility to provide a valid “hint”
address.
There are several ways to determine where the runtime should map the shared
memory database. You could use the utility provided by your operating system to
gather memory usage information (the process memory map). These utilities will
usually tell you the code, data and stack memory usage of each process running
on your system and the libraries it is using. Examine the output and pick an
address outside any address space. Or you could simply use the address
MAP_ADDRESS that is currently defined as 0x20000000 in the eXtremeDB
SDK samples.
The above potential issues with respect to the MAP_ADDRESS can be avoided
by using the “Offset” library instead of the Direct Pointer Arithmetic library. The
“Offset” approach calculates an offset from the beginning address of the in-
memory database, to locate objects. Therefore, it does not depend on the in-
memory database starting at a common (and known) location for all processes.
However, the DP pointer arithmetic is about 5%–15% faster than calculating
offsets.
Shut Down
MCO_RET mco_runtime_stop(void);
The function performs clean-up for the process. Every mco_runtime_start() must
be paired with mco_runtime_stop().
Examples
void StartDB()
{
MCO_RET rc;
mco_db_h db;
void* start_mem = 0;
mco_runtime_start();
}
rc = mco_db_connect( dbname, &db );
rc = mco_db_disconnect( db );
rc = mco_db_close( dbname );
mco_runtime_stop();
}
Subsequent processes (once the shared memory for the database has been set up
by open_shared_db() in the initial process) should follow these steps:
void DbAttach()
{
MCO_RET rc;
mco_db_h db;
mco_runtime_start();
rc = mco_db_disconnect( db );
rc = mco_db_close( dbname );
mco_runtime_stop();
}
Overview
With MCORPC, the developer builds a framework that implements an
application-specific remote access API. The framework will implement C-
language remote procedure calls (RPC) callable from any application capable of
calling a C function. The specific details and the actions performed by these
remote procedures are irrelevant for the framework. An RPC function could be as
simple as the following for adding or retrieving an integer value:
add_record(int value);
update_record(int value);
But more often remote APIs require passing/retrieving compound data to/from the
eXtremeDB database. For this purpose a remote procedure call Interface
Definition Language (“IDL”) compiler mcorcomp is provided. The McObject
IDL contains definitions of the data structures to be passed and the function
prototypes of the interface functions expressed in C-Language (as opposed to a
CORBA-like IDL).
The remote access API, defined in the form of a C-language header file, is
processed by mcorcomp to produce an RPC dictionary, remote (client-side)
interfaces and proxy (server-side) stub functions to be completed by the developer
with implementation-specific application code.
The MCORPC library marshals and de-marshals the application’s data (passed as
arguments by the remote to the proxy functions). Because the framework is
communication protocol independent, an example implementation over TCP/IP is
provided to demonstrate how the network layer functions can be implemented, but
this “plumbing” can be any network medium.
RPC Framework
The client-server interface generated by mcorcomp consists of:
Note that it is not, strictly speaking, required that the server process carry out
eXtremeDB-related tasks. The MCORPC mechanism can be used to distribute any
application processing.
The client application invokes the envelope routine for a remote method in the
same manner it would invoke a local function.
mco_rpc_context_t <interface_name>_ctx;
int <interface_name>_write_stream
( void * buf_, unsigned int buf_sz,
void * param, unsigned int network_order );
int <interface_name>_read_stream
( void * buf_, unsigned int buf_sz,
void * param, unsigned int network_order,
unsigned int * read_sz );
where
These read/write functions return zero if the read / write was successful,
otherwise a non-zero value.
int <interface_name>_is_data_available
( mco_rpc_context_t * ctx, unsigned int * result );
This function returns 1 in the result output parameter if the transport has any data
and 0 if there is no data pending in the transport. The function takes the context
as a parameter.
And lastly, the error handler API is provided to trap any fatal runtime condition:
On the client-side, nothing else needs to be done to fully implement the transport.
To build the client, simply link together the transport function implementations,
the dictionary file (<interface_name>_dict.c) and the generated envelopes
(<interface_name>_client.c).
DDL Compiler
(SDK)
eXtreme DB generated
implementation
header file
serialize deserialize
eXtreme DB and
C Source Code C/C++ Framework runtime
skeleton files Compiler libraries
Development Development
In the diagram above, the boxes shaded grey are steps that require some
programming. The first step is to define the eXtremeDB database schema, which
is processed by the eXtremeDB schema compiler, MCOCOMP.EXE, and
produces the database .C and .H files (this is normal eXtremeDB process).
The second step is to define the function prototypes for the functions to be called
from remote processes, and the data structures to be passed to them. For example,
consider an eXtremeDB class called “channel” and an RPC function:
Please note that all business logic functions that return a value (either through a
parameter or a return code) are called only synchronously, while void functions
could be called asynchronously.
The IDL header file is processed by mcorcomp to produce the RPC dictionary,
envelope functions and server-side proxy functions.
The last step is to implement the server-side functions (in this example, the
extremedb_channel_update() function) that will be called by the client
envelope. This is the step indicated by the grey box labeled “Service code with
RPC implementations”.
Client-side Implementation
The following diagram illustrates the client side development and deployment
steps:
C header file
specification of remote
procedures serialize deserialize
MCORPC envelopes
MCORPC Compiler
Framework runtime
C/C++ librates, TCP
C/C++ source file
Compiler communication
Development Development
The mcorcomp compiler generates the proxy and envelope routines. The compiler
recognizes a number of keywords in the form of comments that are used to
declare string, union and array data types used as a part of the interface
declaration (see the example below).
The following C declarations are supported by the compiler and could be a part of
a remote interface:
• Pre-processor directives
• Pre-processor macros
Keywords:
union {
int a
char * b /* string */
char c;
}abc_union;
struct {
int active_union_member;
abc_union u /*
active(active_union_member) */;
};
The actual type of the union field could be an integer, a zero-terminated string or
a single byte. The active keyword indicates to MCORPC how the application
treats the union: if the application treats the union data type u as “int a”, the
active_union_member should be assigned a zero; if the application treats the
union field as a zero-terminated string, the active_union_member should be set to
1, etc.
Example:
} test_struct_variable_t, * test_struct_variable_p;
Example:
#ifndef __TEST_INTERFACE_H
#define __TEST_INTERFACE_H
/* DATA DEFINITIONS */
/* a structure */
/* union definition */
/* union usage */
/* INTERFACE FUNCTIONS */
/* strings */
typedef char * zstring;
typedef int int5[5];
/* fixed-size array */
int test_int_int5( int5 ints );
/* structure */
int test_int_pstruct( test_struct_p pstruct );
/* variable-size parameters */
int test_int_variable_len( test_struct_variable_p p );
/* union as a parameter. */
int test_int_union( test_struct_union_p p );
#endif
Compiling the interface definition above results in the generation of the following
C implementation files:
Test_intf_server.h server
Test_intf_server.c server
Test_intf_dict.c server & client
Test_intf_client.c client
To complete the build, these files are compiled and linked with the network
interface layer implementation and the MCORPC library, mcorpc.lib (or
mcorpc_debug.lib).
How it Works
The database dictionary generated by the mcocomp schema compiler for all
native API calls is coded into the .c output file. In the output the dictionary is
followed by the generated schema specific TypeSafe API functions. These
comprise the application specific API (Native API) used by the application
developer to access specific fields, indexes and array elements.
Note that these individual functions all call low level “mco_wrapper” functions
like mco_w_new_obj_oid(), mco_w_obj_delete(), etc. These wrapper
functions provide a “generic” interface to access individual database objects.
However their implementation requires an intimate knowledge of the database
dictionary in order to correctly specify the integer values for function parameters.
This is the work of the mcocomp compiler.
The UDA API is designed to provide a similar “generic” interface for applications
that does not require intimate knowledge of the database dictionary. To this end,
the UDA “registry” (Dictionary and Meta-Dictionary) functions provide the
means to enumerate fields, indexes and array elements so that the application
developer can access them by name. The integer values returned by the registry
functions are then passed to UDA access funtions like mco_uda_new(),
mco_uda_delete(), etc. Notice that the type-safety provided by the C compiler
when using native API calls is sacrificed for the flexibility of these generic UDA
access functions.
With these “helpers” in place, the following code snippet demonstrates calls to the
UDA access functions to create and update a database object:
mco_uda_object_handle_t rec;
mco_uda_value_t value;
uint4 key = 1999;
unsigned short Record_struct_no,
key_field_no,
tkey_index_no,
hkey_index_no;
Record_struct_no = get_struct_no("Record");
key_field_no = get_field_no(Record_struct_no, "key");
tkey_index_no = get_index_no("Record", "tkey");
hkey_index_no = get_index_no("Record", "hkey");
Registry Functions
Before using any of the registry functions, the application must allocate space for
the meta-dictionary structure and initialize the allocated buffer with the
mco_metadict_init() API. Because an application may use more than one
database, the meta-dictionary is necessary to contain a header and entry for each
database dictionary. The size of the buffer should be obtained via the
mco_metadict_size() function which returns the size of the meta-dictionary in
bytes (including the header).
The application needs to allocate the memory buffer and pass the pointer to the
buffer along with its size to the mco_metadict_init() function. The database
runtime will determine the maximum number of databases that can be registered
within the metadictionary. The application can access this number through the
metadict->n_maxentries field.
Note: The mcouda libarary does not allocate any dynamic memory. Therefore,
any memory buffers used by the UDA API are allocated by the application. The
buffer either can be declared statically or allocated on the heap. Descriptors are
often allocated on the application’s stack.
The flags parameter defines what happens during the initialization of the
dictionary. Currently the only supported value is:
MCO_METADICT_DONT_LOAD_EXISTING_DBS
This flag indicates that the automatic registration of opened databases is not done.
Once the meta-dictionary is registered, the following API functions can be called,
to get the count of databases registered:
And to get a pointer to the dictionary based on its number, name, or connection
handle:
The following API functions will obtain a pointer to the field descriptor based on
its number or name:
#define MCO_DICT_II_UNIQUE 1
#define MCO_DICT_II_VOLUNTARY 2
#define MCO_DICT_II_LIST 4
#define MCO_DICT_II_AUTOID 8
#define MCO_DICT_II_TREE 0x10
#define MCO_DICT_II_HASH 0x20
#define MCO_DICT_II_USERDEF 0x40
The following API functions will obtain a pointer to the index descriptor based on
its number or name:
When an index is composed of multiple fields, each part of the index is defined by
the following descriptor:
Where the values for flags can have the default value of ‘0’ or :
The following API functions will obtain a pointer to the index field descriptor
within an index composed of multiple fields based on its number or name:
UDA Functions
As explained above, the UDA is a generic API. So the objects are defined by
descriptors that can contain any type of object, and the values stored in them are
defined by descriptors that can handle any type of data.
• For simple data types (int, float, double), assign the value to either v.u1 or
v.u2, etc.
• For strings, arrays and blobs set the pointer to the appropriate pointer type
(v.p.p.c, v.p.p.n or v.p.p.v), and specify the size in bytes in v.p.len.
• For structured fields, mco_uda_put() initializes the descriptor v.o, which, in
turn, is used to set field values for the structure.
• For simple types (integer, float, double) the value is returned in the
appropriate field (v.u1, v.u2, etc.,).
• For arrays and blobs it is necessary to assign the appropriate type pointer
(v.p.p.c, v.p.p.n or v.p.p.v) to the buffer that receives the data first, and specify
the size of the data in bytes in v.p.size. mco_uda_get() copies the value into
the buffer (or truncates the output if the buffer is not large enough) and also
returns the actual number of bytes received in the v.p.len.
• For structure fields, mco_uda_get() first initializes the v.o descriptor that will
be used to read the structure field values.
typedef struct
{
uint2 class_code; /* class code */
uint1 persistence; /* persistent or transient*/
} mco_uda_dict_class_storage_t;
Where the persistence argument can have one of the following values:
#define MCO_UDA_CLASS_DEFAULT 0
#define MCO_UDA_CLASS_TRANSIENT 1
#define MCO_UDA_CLASS_PERSISTENT 2
To remove an object from the database call the following API function:
Put/Get Functions
To assign the field value for an object or structure use:
In order to assign the value, the application sets the field type in value->type. The
type must correspond to the DDL type:
Example
mco_uda_object_handle_t obj;
MCO_RET rc;
mco_uda_value_t v;
...
rc = mco_uda_new( t, Rec_class_no, 0 /* no oid */,
0 /* no init.*/, 0, &obj);
v.type = MCO_DD_UINT4;
v.v.u4 = 100;
rc = mco_uda_put( &obj, uint4_field_no, 0, &v );
v.type = MCO_DD_STRING;
v.v.p.len = 5;
v.v.p.p.c = "Hello";
rc = mco_uda_put( &obj, string_field_no, 0, &v );
v.type = MCO_DD_BLOB;
v.v.p.len = blob_size;
v.v.p.p.v = blob_value;
rc = mco_uda_put( &obj, blob_field_no, 0, &v );
Note: For simple types (integers, float/double) the field value is returned in the
corresponding mco_uda_value_t structure field.
For strings, byte arrays, and blobs, the application needs to allocate a buffer and
pass it into the API:
In addition, val->v.p.size has to hold the size of the buffer (in bytes).
The function copies the field value into the buffer and also returns the actual
number of symbols (bytes for blobs) copied in val->v.p.len.
It is possible to use the mco_uda_get() function to receive the size of the buffer
in advance. If the pointer (val->v.p.p.c, val->v.p.p.n or val->v.p.p.v) is set to zero,
the API just fills out the val.v.p.size, and does not copy the actual value into the
buffer.
For structure-based fields (MCO_DD_STRUCT), the API fills out val->v.o, that
can be used to further pass it into the mco_uda_get() and gain access to the
structure fields.
Example 1
mco_uda_value_t val;
val.type = MCO_DD_STRING;
val.v.p.p.c = 0; /* figure out the actual size we need to allocate
*/
mco_uda_get(&obj, my_field_no, 0, &val);
val.v.p.p.c = malloc(val.v.p.size);
mco_uda_get(&obj, field_no, 0, &val); /* get the value */
....<whatever processing is necessary>
free(val.v.p.p.c); /* free up memory */
Example 2 (strings)
mco_uda_value_t val;
val.type = MCO_DD_STRING;
val.v.p.p.c = 0;
mco_uda_get(&obj, my_field_no, 0, &val);
val.v.p.size += sizeof(char);
val.v.p.p.c = malloc(val.v.p.size ); /* field value size */
....<processing results>
free(val.v.p.p.c); /* */
mco_uda_value_t val;
val.type = MCO_DD_NCHAR_STRING;
val.v.p.p.c = 0;
mco_uda_get(&obj, my_field_no, 0, &val);
val.v.p.size += sizeof(nchar);
val.v.p.p.c = malloc(val.v.p.size );
mco_uda_get(&obj, field_no, 0, &val);
....<prcessing results>
free(val.v.p.p.c);
Vector Functions
To get the size (length) of a vector or array:
Cursor Functions
As explained in the section “Search Methods” of 5, a cursor is used to navigate
through a group of records satisfying the search criteria on a specified index. The
function mco_uda_lookup() positions the cursor at the first object that satisfies
the search criteria.
To position the cursor at the first object that satisfies the search criteria:
To obtain information about the index associated with the cursor use:
To compare the value(s) referenced by the current position of the index cursor
with value(s) supplied by the application use:
User-defined Indexes
User-defined indexes for the eXtremeDB native API are explained in section
“User-defined Index Functions” in Chapter 5. As with native User-defined
Functions (udf) the UDA API requires that the application supply two compare
functions for tree indexes and two additional hash functions for hash indexes. For
tree indexes, provide one custom function that compares two objects and one that
compares an object to an external key value. For hash indexes, provide two pairs
of functions: two returning a hash code and two compare functions (if a user-
defined tree index is also defined then these compare functions are used for the
hash index as well).
These functions must then be registered with the runtime before cursor functions
can be called on these indexes by passing a parameter of the folowing type:
The application implements these compare functions with the following function
signatures:
/* Object - Object */
typedef int2(*mco_uda_compare_userdef_f)( mco_uda_object_handle_p obj1,
unsigned short index1,
mco_uda_object_handle_p obj2,
unsigned short index2,
void *user_context);
These compare functions must return <0, =, or >0 depending on whether the
object value is less than, equal to or greater than the external key value.
In addition, for hash indexes, two custom functions need be implemented with the
following function signatures:
/* Hash - Object */
typedef uint4 (*mco_uda_hash_userdef_f)( mco_uda_object_handle_p obj,
unsigned short index,
void *user_context);
Note that hash index compare functions return 0 if and only if two objects (or
object and external key) are equal from the index point of view. This is necessary
for hash index operations because hash codes may be equal yet the objects (keys)
are not. When mco_uda_lookup() is called with a hash index, it will call the
user-defined compare function to assure that any matching hash code actually
exactly matches the indexed database field value.
Notice also that the compare functions receive application specific data passed
from the caller via the user_context parameter.
In addition to the compare functions, the UDA API requires a udf map for the
internal implementation of index navigation. This udf map must be allocated and
passed to the runtime when the udfs are registered.
The following function queries the database dictionary to determine the amount of
memory to be allocated for the udf map:
The following code snippet demonstrates how to register udf compare functions:
/* allocate udfmap */
mco_uda_get_udfmap_size(metadict, 0, &udf_map_size);
udf_map = (mco_userdef_funcs_h) malloc(udf_map_size);
Note: That the user-context parameter param is used by the external key compare
functions to call mco_uda_get() to retrieve the database field value to be
compared to the external key value.
The registration API must be called for all user-defined indexes, before the
application makes a call to mco_db_connect().
To register them with UDA however, it is first necessary to extract the desired
collations from the meta-dictionary. To facilitate this, the UDA Collation API
provides dictionary functions to count and extract collation definitions by name
and number, as well as to determine the collation map size.
Then, as with the core collation API, the helper functions, mco_uda_collate_get()
and mco_uda_collate_get_range(), are provided to facilitate the implementation
of the user-defined collation compare functions called by the UDA cursor
functions.
Example 1
Sample schema:
class Record
{
string name;
uint4 value;
/* compare values */
return STR_CMP(buf1, buf2);
}
uint4 coll_hash(mco_collate_h c, uint2 len)
{
mco_uda_value_t val;
char buf[20];
/* hash value */
return strlen(buf);
}
int main(void)
{
MCO_RET rc;
…
mco_dict_struct_info_t struct_info;
mco_dict_collation_info_t coll_info;
mco_uda_value_t value;
mco_uda_object_handle_t obj;
char buf[16];
…
if ( MCO_S_OK == rc ) {
/* connect to database */
rc = mco_db_connect(db_name, &db);
if ( MCO_S_OK == rc ) {
/* fill database with records setting field s to fruit names */
rc = mco_trans_start(db, MCO_READ_ONLY,
MCO_TRANS_FOREGROUND, &t);
if (rc == MCO_S_OK) {
/* using custom collate tree index iterate through the cursor */
rc = mco_uda_cursor(t, Record_no, tcoll_no, &c);
if (rc == MCO_S_OK) {
for (rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c))
{
UDA Programming
As with all eXtremeDB applications, the runtime must be started and initialized,
memory devices defined and an error handler mapped. Then the Meta-dictionary
is initialized and the database opened with mco_uda_db_open(). The following
example demonstrates a typical sequence for opening a database for UDA access:
int main(void)
{
MCO_RET rc;
mco_runtime_info_t info;
mco_device_t dev[4];
unsigned int n_dev, metadict_size;
mco_db_params_t db_params;
mco_runtime_start();
mco_error_set_handler(&sample_errhandler);
mco_get_runtime_info(&info);
n_dev += 3;
}
/* register dictionary */
rc = mco_metadict_register( metadict, dbName,
udaopen_get_dictionary(), 0);
printf("Register dictionary : %s\n", mco_ret_string(rc, 0));
For a better understanding of the UDA API, build and run in the debugger the
samples in the directory “samples/16-uda”.
The XML interfaces can also be used to facilitate simple schema evolution by
exporting the database to XML, adding/dropping fields, indexes, and classes, and
importing the saved XML into the new database.
Implementation
Standards
eXtremeDB XML is developed in accordance with the W3C SOAP encoding
recommendations. These recommendations can be found on the W3C web site:
http://www.w3.org/TR/soap12-part0/
http://www.w3.org/TR/soap12-part1/
http://www.w3.org/TR/soap12-part2/
http://www.w3.org/TR/xmlschema-0
http://www.w3.org/TR/xmlschema-1
XML Policy
The XML policy structure describes various behavior options of the XML
interface, such as string/blob encoding, XML indentation, etc. It is available for
the user at compile time and also at runtime via the policy APIs. These APIs and
available options are described in the file include/mcoxml.h that can be found in
your eXtremeDB installation.
The encode_spec field from the mco_xml_policy_t structure is ignored. The value
for the encode_spec field is set to MCO_YES regardless of the application’s
settings, meaning that all special characters except the LF are encoded. This
runtime behavior conforms to the XML encoding specifications.
Note: If oid or the autoid fields are specified for a class, the runtime processes
them as follows:
• when the XML is exported from the eXtremeDB database the via the
classname_xml_get() method, the oid and autoid fields are always
written into the XML;
• for classes declared with oid, an XML document used to create a new
object must contain the oid values;
• whether specified or not, the oid value is ignored if the XML document is
used to update an object;
• whether specified or not, the autoid field is ignored in the incoming XML;
Returns the current policy. Note that it requires a transaction context. This
function always returns MCO_S_OK.
Example:
void ChangeXMLOutput(void)
{
mco_xml_policy_t policy;
mco_trans_h t ;
mco_trans_start(db,MCO_READ_WRITE,MCO_TRANS_FOREGROUND,&t);
mco_xml_get_policy(t, &policy);
mco_xml_set_policy(t, &policy);
mco_trans_commit(t);
}
struct Country
{
char<3> c_code;
string name;
};
struct Date
{
uint1 day;
char<3> month;
uint2 year;
};
struct Passport
{
char<8> series;
uint8 number;
};
struct Address
{
Country country;
string city;
string street;
};
struct Phone
{
int2 country;
char<5> area;
char<7> number;
};
struct Residence
{
Address where;
Date since;
optional Phone phone;
};
struct Office
{
Address where;
string organization;
string position;
vector<Phone> phone;
};
class Person
{
string name;
Residence residence[3];
optional Office office;
optional Phone mobile;
blob description;
oid;
autoid[100];
list;
};
For each class declared in the schema, the DDL compiler generates the following
interfaces:
for(;;)
{
// get Person object handle from the cursor
rc = Person_from_cursor ( t, &c, &p_obj);
if(rc) return rc;
// write Person object as XML through the 'doprint' function
rc = Person_xml_get(&p_obj, f, &do_print);
if(rc) return rc;
// advance the cursor, break loop if end of list
if ( mco_cursor_next(t, &c) != MCO_S_OK )
break;
}
rc = mco_trans_commit(t);
return rc;
}
int do_print( /*IN*/ void *stream_handle, /*IN*/ const void * from,
/*IN*/ unsigned nbytes)
{
// this simple example just writes the bytes to a FILE
// another example could write the bytes to a
// pipe to another process, a socket, etc.
FILE * f = (FILE*)stream_handle;
return fwritef( from, sizeof(char), nbytes, f );
}
In this example, the stream is a file handle. It could have been a pipe to another
process or any other type of stream. Person_xml_get() encodes the person
object referenced by the handle, and calls the helper function do_print(),
passing do_print() the stream (file, in this case) handle, a pointer to the XML
string, and the length of the XML string.
Note that the classname_xml_get() does not create any XML header. If an
application is going to create a monolithic document with multiple XML objects,
possibly of different XML tags, the application must create the appropriate XML
header and footer entries to make it a legal XML document.
This interface updates an existing object with the XML description. The entire
object is updated; it is not possible to selectively update fields. The class handle
will have been established from, for example, a cursor, and carries the transaction
content with it (hence, it is not necessary to pass a transaction handle to this
function).
Creates an object from the XML description. The first parameter is the
transaction context, the second parameter is the xml and the output parameter is
the new object handle. For example:
if ( c == EOF )
break; /* end-of-file, thus finished */
ptr = 1;
xml[0] = '<';
if ( c == EOF )
break; /* finished */
xml[ptr] = 0;
if ( strcmp(xml, "<Person>") != 0 )
exit(1);
/* read xml-object */
for (;;)
{
c = getc(file);
if ( c == EOF )
{
xml[ptr] = 0;
printf("\n Error - unexpected end of file: %s\n",
&xml[(ptr>50)?ptr-50:0]);
exit(4);
}
xml[ptr++] = c;
if ( c == '>' )
{
xml[ptr] = 0;
/* closing tag, the object is complete */
if ( strcmp("</Person>", &xml[ptr-9]) == 0 )
break;
}
}
/* write database.. */
rc = Person_xml_create(t, xml, &p_obj);
if ( rc != MCO_S_OK )
exit(0);
rc = mco_trans_commit(t);
return rc;
}
In the above example, an XML string is parsed to find the class tag. This example
only deals with Person objects, so if class tag is for any other type of object, the
procedure terminates. Otherwise, the XML is read up to the closing Person tag
(“</Person>”). Then Person_xml_create() is called, passing a transaction
handle, the XML string, and the handle of a Person object that will reference the
newly created object.
Note that if an XML document was created by eXtremeDB and contains just a
single XML object, there is no need to parse the opening and closing tags; just
read the entire XML into a buffer and pass it to classname_xml_create().
If the Person object represented in the XML string already existed in the database
(i.e. an attempt was made to violate a unique key constraint),
Person_xml_create() would fail with code MCO_S_DUPLICATE. If this is a
possibility, the application code should be written to find the key values within
the XML string and attempt to locate the object and call classname_xml_put()
or classname_xml_create() accordingly. See the following psuedo-code:
/* read xml-object */
/* write database.. */
if((rc = classname_fieldname_search(. . .)) == MCO_S_OK)
rc = classname_xml_put(. . .);
else
rc = classname_xml_create(. . .);
if ( rc != MCO_S_OK )
exit(0);
rc = mco_trans_commit(t);
/* ... and start over */
} /* loop on xml-objects */
• uint1, uint2, uint4, int1, int2, int4 are written as the appropriate integers
(unsigned or signed). The base is defined in the policy field int_base, which
could be 8, 10 (default) and 16. Octal numbers are coded with an initial “0”
(for example, 04567), and hexadecimal numbers with an initial “0x” (for
example, 0x1526A)
• uint8, int8 are written similar to the other integers, but decimal format is not
allowed. The policy field quad_base sets up either the octal or hexadecimal
(default) form
• autoid is formatted as uint8
• date, time are formatted as uint4
• output for float and double depend on the policy’s float_format field value.
MCO_FLOAT_FIXED means floats are formatted as fixed-point numbers
(0.0025); MCO_FLOAT_EXPONENT means floats are represented in
exponent form—integer part, fractional part and exponent (for example, 2.5e-
3)
• oid, ref are coded into hexadecimal format
• Blob depends on the blob_coding value: MCO_TEXT_ASCII means ASCII
as defined by the XML specifications, MCO_TEXT_BINHEX means
BINHEX (2 hexadecimal digits / byte) [according to RFC 1741],
MCO_TEXT_BASE64 means Base64 (default) [according to RFC 1521,
section 5.2—“Base64 Content-Transfer-Encoding”]
• char, string formatted in accordance to the policy’s text_coding field:
MCO_TEXT_ASCII means ASCII as defined by the XML specifications
(default), MCO_TEXT_BINHEX means BINHEX, and
MCO_TEXT_BASE64 means Base64
Exports the XML schema for the class classname. It must be called in the context
of a MCO_ READ_ONLY transaction. The schema format is compliant with the
W3C specifications, which can be found in the following documents:
http://www.w3.org/TR/xmlschema-0
http://www.w3.org/TR/xmlschema-1
The current implementation of this function only supports the default XML
policy.
Example
The XML schema can be used in conjunction with tools, such as XMLSpy, to
validate the content of XML documents, which you can do prior to attempting to
import the XML document into eXtremeDB.
The XML document can also be used with XSLT, which is a language for
transforming XML documents into other XML documents. This might be a
necessary step to exchange data between eXtremeDB and an external system if
they don’t have identical representations for the data being exchanged. For
further information on XSLT, please refer to http://www.w3.org/TR/xslt20/.
blob Binary data object; a byte array of any size, blob jpeg;
can be greater than 64K in size.
time start_tm[3];
defines an array of three time values. Any element except vectors, blobs and
optional structs can be a fixed size array. Fixed size arrays cannot be used in
indexes; for this, use a vector.
Database Types
Device Types
typedef struct mco_device_t_
{
unsigned int type; /* none, conv, named, file, raid, etc */
unsigned int assignment; /* none, db-segment, cache-segment, db-file,
log-file */
mco_size_t size;
union {
struct {
void * ptr;
} conv;
struct {
char name[MCO_MAX_MEMORY_NAME];
unsigned int flags;
void * hint;
} named;
struct {
int flags;
char name[MCO_MAX_FILE_NAME];
} file;
struct {
int flags;
char name[MCO_MAX_MULTIFILE_NAME];
mco_offs_t segment_size;
} multifile;
struct {
int flags;
char name[MCO_MAX_MULTIFILE_NAME];
int level;
} raid;
struct {
unsigned long handle;
} idesc;
} dev;
} mco_device_t, *mco_device_h;
Transaction Priorities
typedef enum MCO_TRANS_PRIORITY_E_
{
MCO_TRANS_IDLE = 1,
MCO_TRANS_BACKGROUND = 2,
MCO_TRANS_FOREGROUND = 3,
MCO_TRANS_HIGH = 4,
MCO_TRANS_ISR = 77
}
MCO_TRANS_PRIORITY;
Transaction Types
typedef enum MCO_TRANS_TYPE_E_
{
MCO_READ_ONLY = 0,
MCO_READ_WRITE = 1
}
MCO_TRANS_TYPE;
Cursor Types
typedef enum MCO_CURSOR_TYPE_E_
{
MCO_LIST_CURSOR = 0,
MCO_TREE_CURSOR = 1,
MCO_HASH_CURSOR = 2
}
MCO_CURSOR_TYPE;
typedef struct mco_cursor_t_
{
char c[mco_cursor_size];
}
mco_cursor_t, /* cursor (structure) */
* mco_cursor_h; /* cursor handle (pointer) */
Class Statistics
typedef struct mco_class_stat_t_
{
uint4 objects_num;
uint4 total_pages; /* index pages are not counted */
uint4 core_space; /* in bytes, not counting blobs */
}
mco_class_stat_t,
* mco_class_stat_h;
Event Types
typedef enum MCO_EVENT_TYPE_E_
{
MCO_EVENT_NEW,
MCO_EVENT_UPDATE,
MCO_EVENT_DELETE,
MCO_EVENT_DELETE_ALL,
MCO_EVENT_CHECKPOINT,
MCO_EVENT_CLASS_UPDATE
}
MCO_EVENT_TYPE;
• Status Codes (S): indicate runtime states that can and will occur during
normal database operations;
• Non-Fatal Error Codes (E): indicate runtime error conditions that the
application can manage by responding appropriately; and
• Fatal Error Codes (ERR): indicate bugs in the application code that render
the eXtremeDB runtime unable to safely continue execution.
The following code snippet from the eXtremeDB runtime file mcocsr.c illustrates
the use of each of the three return code types:
#ifdef MCO_CFG_CHECKLEVEL_1
if (!CHECK_TRANSACTION((mco_db_connection_h)t))
{
mco_stop(MCO_ERR_TRN + 5);
}
#endif
return MCO_S_OK;
}
Note the S and E type return codes are returned directly from the runtime function
to be managed by the application code. But the ERR type return codes are passed
to the runtime function mco_stop() to terminate execution. It calls the internal
function mco_stop__() with the actual file name and line number in source code
where the error condition was detected. The filename and line number can be seen
by examining the call stack to further aid in locating the source of the bug.
In the sections that follow, different categories of return codes are defined. But be
aware that eXtremeDB is a product in continual evolution and, once printed, this
list of error codes might become obsolete. The final source of return code values
for your release of eXtremeDB can always be found in the mco.h header file—
please consult this file if you encounter an error code not defined herein.
Status Codes
The following table lists status codes that might be returned by the eXtremeDB
runtime. These return codes do not indicate error conditions but rather runtime
states that can and will occur during normal database operations.
The following error codes in the range of 50–99 (and 999) indicate non-fatal error
conditions that the eXtremeDB runtime might return, that don’t fall into a specific
category:
The following error codes in the range of 100–199 indicate non-fatal error
conditions that might be returned by the eXtremeDB disk manager:
The following error codes in the range of 200–299 indicate non-fatal error
conditions that might be returned by the eXtremeDB runtime while processing
XML I/O:
The following error codes in the range of 300–399 indicate non-fatal error
conditions that might be returned by the eXtremeDB runtime while processing
Network requests:
HA Error Codes
The following error codes in the range of 400–499 indicate non-fatal error
conditions that might be returned by the eXtremeDB High Availability runtime
while processing API requests:
The following error codes in the range of 500–599 indicate non-fatal error
conditions that might be returned by the eXtremeDB runtime while processing
UDA API requests:
The following snippet of code from the eXtremeDB runtime file mcobtree.c
demonstrates the use of error code base values plus incremental number:
if (height > 0)
{
if ( node->header.kind != MCO_PAGE_TREE_NODE )
{
mco_stop(MCO_ERR_BTREE + 1);
}
}
else
{
if ( node->header.kind != MCO_PAGE_TREE_LEAF )
{
mco_stop(MCO_ERR_BTREE + 2);
}
}
The POSIX API is implemented in the mcofuni.c file and the extended
API is implemented in the mcofu98.c. By default, the build system is
configured to use Unix-98 standard for all flavors of Unix: Linux, Sun OS,
HP-UX, AIX.