eXtremeDB User Guide

User's Guide
Version 4.1
(c) 2001-2010 McObject LLC

22525 SE 64th Place, Suite 302
Issaquah, WA 98027 USA
Phone: 425-888-8505
Fax: 425-888-8508
E-mail: info@mcobject.com
www.mcobject.com
Table of Contents
Introduction 1
Product Overview..........................................................................................................1
Our Objectives...............................................................................................................3
Product evolution ..........................................................................................................5
Chapter 1: eXtremeDB Documentation.............................................................................7

Syntax Conventions.......................................................................................................8
Chapter 2 : Database Concepts ............................................................................................9

Introduction ...................................................................................................................9
Definitions.....................................................................................................................9
Database Design..........................................................................................................15
Database Models .........................................................................................................17
Hierarchical Model .............................................................................................. 17
Network Model.................................................................................................... 17
Relational Model ................................................................................................. 18
Object Model ....................................................................................................... 18
Object-Relational Model ..................................................................................... 19
Summary.............................................................................................................. 19
Example ............................................................................................................... 20
Access Methods...........................................................................................................22
OID ...................................................................................................................... 22
AUTOID.............................................................................................................. 23
INDEX................................................................................................................. 24
LIST..................................................................................................................... 24
Chapter 3 : Operational Overview .....................................................................................24

Integration with Host Language..................................................................................25
Data Definition Language ...........................................................................................26
Base data types .................................................................................................... 27
Preprocessor......................................................................................................... 29
Declare Statement................................................................................................ 30
Struct Declaration ................................................................................................ 31
Enum Declaration ................................................................................................ 32
Class Declaration................................................................................................. 33
Element Statements ............................................................................................. 35
eXtremeDB User’s Guide i

Table of Contents
Access Statements ............................................................................................... 35

Event Statements ................................................................................................. 37
DDL Processor ............................................................................................................38
Debug and Optimized Runtime ........................................................................... 39
Application Interface...................................................................................................41
DDL Example .............................................................................................................42
Chapter 3 : Operational Overview .....................................................................................44

Integration with Host Language..................................................................................44
Data Definition Language ...........................................................................................45
Base data types .................................................................................................... 46
Preprocessor......................................................................................................... 48
Declare Statement................................................................................................ 49
Struct Declaration ................................................................................................ 50
Enum Declaration ................................................................................................ 51
Class Declaration................................................................................................. 52
Element Statements ............................................................................................. 54
Access Statements ............................................................................................... 54
Event Statements ................................................................................................. 56
DDL Processor ............................................................................................................57
Debug and Optimized Runtime ........................................................................... 58
Application Interface...................................................................................................59
DDL Example .............................................................................................................60
Chapter 4: Application-independent Interface...................................................................62

Runtime Environment .................................................................................................62
Device Management....................................................................................................63
Database Control .........................................................................................................64
Creating Databases .............................................................................................. 64
Database Connections.......................................................................................... 69
Closing Databases................................................................................................ 70
Example Application ........................................................................................... 71
Extending Database memory............................................................................... 73
Database Backup API.......................................................................................... 73
Saving a Database Image..................................................................................... 75
Loading a Database Image .................................................................................. 77
Saving / Loading cache........................................................................................ 79
Database Status and Statistics Interfaces............................................................. 80
ii eXtremeDB User’s Guide

Table of Contents
Concurrency Control ...................................................................................................82

Concurrency Management................................................................................... 82
Transaction Isolation Levels................................................................................ 84
Transaction Priority ............................................................................................. 85
Transaction API................................................................................................... 86
Two-phase Commit ............................................................................................. 89
Examining the content of a transaction ............................................................... 90
Pseudo-nested Transactions................................................................................. 90
Cursor Control.............................................................................................................93
Auxiliary Interfaces – Heap Management ..................................................................94
Chapter 5: Generated Type-safe Interfaces........................................................................96

Basic Interfaces ...........................................................................................................96
oid-based Interfaces............................................................................................. 96
autoid Interfaces .................................................................................................. 97
event Interfaces.................................................................................................... 98
“New” and “Delete” Interfaces.......................................................................... 103
Action Interfaces................................................................................................ 104
Vectors and Fixed-length Arrays ..............................................................................106
Block Allocation .......................................................................................................112
Character string collation ..........................................................................................112
Blob Support .............................................................................................................119
Search Methods .........................................................................................................120
Find Functions ................................................................................................... 120
Cursor-based Search Functions ......................................................................... 120
Pattern Search .................................................................................................... 122
Index Search and Navigation............................................................................. 124
Patricia Trie Index ............................................................................................. 133
R-Tree Index...................................................................................................... 140
Kd-Tree Index.................................................................................................... 144
Chapter 6: Programming Considerations.........................................................................148

Return Codes and Error Handling.............................................................................148
Database Recovery from Failed Processes................................................................152
Database Security......................................................................................................155
CRC-32.............................................................................................................. 155
Encryption ......................................................................................................... 156
Cache Management ...................................................................................................156
eXtremeDB User’s Guide iii

Table of Contents
Obtaining runtime cache statistics..................................................................... 156

Connection cache............................................................................................... 157
Prioritized cache ................................................................................................ 157
Multi-file databases ...................................................................................................158
Disk IO ......................................................................................................................159
Cache Size ......................................................................................................... 159
Writing Data to Disk.......................................................................................... 160
Choosing the Transaction Logging Strategy ..................................................... 161
Transaction Control ........................................................................................... 166
Tuning the Logging Strategy to Your Application Needs................................. 171
Chapter 7: Database Design and Implementation............................................................174

Database Design........................................................................................................174
Objectives .......................................................................................................... 174
Types of Database Design Considerations ........................................................ 174
Database Implementation..........................................................................................184
Simple Application – Code Fragments.............................................................. 184
Chapter 8: eXtremeDB Shared Memory Applications....................................................196

Introduction ...............................................................................................................196
Overview ...................................................................................................................196
Implementation..........................................................................................................197
Start Up.............................................................................................................. 197
Shut Down ......................................................................................................... 199
Chapter 9: eXtremeDB Remote Procedure Call (RPC) Applications .............................201

Overview ...................................................................................................................201
RPC Framework........................................................................................................202
Server (eXtremeDB-side) Implementation ...............................................................203
Client-side Implementation .......................................................................................206
Chapter 10 : Uniform Database Access (UDA) API .......................................................210

How it Works ............................................................................................................210
Registry Functions.....................................................................................................211
UDA Fields and Indexes.................................................................................... 213
UDA Functions .........................................................................................................215
Database Open/Close Functions........................................................................ 217
Object New/Delete/Checkpoint Functions ........................................................ 217
Put/Get Functions .............................................................................................. 219
iv eXtremeDB User’s Guide

Table of Contents
Vector Functions................................................................................................ 221

Cursor Functions................................................................................................ 221
User-defined Indexes......................................................................................... 222
Collation implementation in UDA .................................................................... 225
UDA Programming ...................................................................................................228
UDA Error Handling.................................................................................................229
Chapter 11 : eXtremeDB XML Interfaces.......................................................................230

eXtremeDB XML Subsystem ...................................................................................230
Implementation..........................................................................................................230
Standards ........................................................................................................... 230
Functions/Structures Related to XML Subsystem ....................................................230
XML Policy ....................................................................................................... 230
Default XML Policy .......................................................................................... 231
XML Policy Interface Functions ....................................................................... 232
Generated XML Interfaces ................................................................................ 233
Output Data Format ........................................................................................... 237
Input Data Format.............................................................................................. 238
Appendix A: Data Types and Structure Definitions ........................................................240

Base Data Types ................................................................................................ 240
Database Types.................................................................................................. 242
Device Types ..................................................................................................... 242
Transaction Priorities......................................................................................... 242
Transaction Types.............................................................................................. 243
Search Operation Codes .................................................................................... 243
Cursor Types...................................................................................................... 243
Class Statistics ................................................................................................... 243
Event Types ....................................................................................................... 244
Appendix B: eXtremeDB Return Codes ..........................................................................245

Status Codes ...................................................................................................... 246
Non-fatal Error Codes ....................................................................................... 246
Fatal Error Codes............................................................................................... 252
Appendix C: Samples, API Reference.............................................................................256
Appendix D: Frequently Asked Questions ......................................................................257
eXtremeDB User’s Guide v

Introduction
Product Overview
eXtremeDB is first and foremost an all-in-memory embedded database system
(IMDS) ideally suited for embedded systems and other application domains that
require extremely high performance, small footprint, compact storage, zero
memory allocations, or any combination of these attributes. eXtremeDB provides
superior performance and the possibility of very small RAM, CPU and storage
demands. Designed for all-in-memory optimization, it boosts speed by
eliminating mechanical disk I/O, multiple data copies, and redundant logical
processes, such as caching, when not necessary.
However, eXtremeDB has been engineered to also accommodate “persistent”

databases, ie. databases that are stored on disk or non-volatile memory
(NVRAM), with optimal performance. On-disk databases require the eXtremeDB
runtime to cache frequently requested data in memory for faster access, but write
database updates, insertions and deletes through the cache to be stored to disk.
Byte-for-byte, disk storage can be cheaper than memory, and can also take less
physical space. So for small form-factor devices with large storage needs, such
“spinning memory” can be better. (In this document “Disk” is used to refer
generically to any media with a file system, whether it is spinning memory,
FLASH memory, or otherwise.)
eXtremeDB enables developers to combine both database paradigms—in-memory

and on-disk—in a single database instance. Specifying one set of data as transient
(all-in-memory), while choosing persistent storage for other record types, requires
a simple database schema declaration, as shown below:
transient class classname {

[fields]
};
persistent class classname {

[fields]
};
Operations on “transient” database objects eliminate the complex and costly tasks
of cache and file management which has several beneficial side effects:
• The code path between the application and the data is significantly
shortened. Fewer CPU instructions translate directly into increased
performance. With modern hardware, eXtremeDB read transactions
require nanoseconds while write transactions require up to a few
microseconds.
• Eliminating the complex logic associated with cache and file management
reduces the code size (footprint), but also eliminates multiple redundant
eXtremeDB User’s Guide 1

Introduction
copies of any given piece of data (i.e. a copy in the application, a copy in
the database cache, a copy in the file system cache, and a copy in the file
system itself).
• An all-in-memory database is optimized with different strategies than a
disk-based database. The latter is concerned with minimizing file I/O
operations, and will trade off memory and CPU instructions to avoid file
I/O. An all-in-memory database doesn’t worry about disk I/O so is
optimized to reduce CPU instructions and to maximize the amount of data
that can be stored in a given amount of space. Consequently, eXtremeDB
requires a fraction of the space of disk-based databases to store a given
amount of data.
• eXtremeDB actually provides two libraries for all-in-memory databases:
the optimized “Direct Pointer Arithmetic” library which accesses the
memory locations of database records by simple pointer arithmetic
(offering a 5% -15% performance advantage) or, for shared memory
implementations, the “Offset” library which calculates record locations by
first obtaining their offsets from the beginning address of the database
then converting these to proper pointers (incurring a slight additional
performance cost). Both libraries implement these memory calculations
internally; the only interface applications need manage are simple
connection handles.
• Some embedded systems, such as flight safety related systems, are not
permitted to do dynamic memory allocations because they can lead to
memory leaks that could ultimately cause system failure. eXtremeDB
applications can utilize static memory so that no dynamic memory is
allocated.
The “persistent” modifier in the class declaration identifies database objects that
will be stored on disk. The uniquely configurable eXtremeDB on-disk features
include the following:
• Three levels of transaction logging—Undo, Redo and No Logging—to

meet the target system’s footprint, performance, and durability needs.
• Developers can specify the maximum database size, which is especially
important when the ‘disk’ is actually a flash memory file system.
• The Database cache can be saved and restored across sessions—an
application can thus safely suspend and resume a particular activity when
a device is switched off and then back on.
• The database exists in a single file to simplify maintenance, limit I/O, and
reduce size.
eXtremeDB is available for a variety of embedded, real-time, desktop, and server

operating systems. For operating systems with a single-process, multi-task
architecture (for example, VxWorks, INTEGRITY OS), eXtremeDB manages
2 eXtremeDB User’s Guide

Introduction
data in any dedicated region of memory and coordinates access by multiple tasks
(threads). For multi-process architectures (for example, Sun Solaris, Linux, QNX
Neutrino, etc.), eXtremeDB can manage a database in shared memory and
coordinate access by multiple processes, each potentially with multiple threads.
Beyond the core eXtremeDB In-Memory Database System package, there are four
additional editions that extend the basic core runtime to meet specific user
requirements as follows:
• eXtremeDB Transaction Logging: a version of eXtremeDB that supports

database persistence by transaction logging. Transaction logging is a process
that journals changes made to a database (by transactions), as they are made.
With transaction logging enabled, the eXtremeDB runtime captures database
changes and writes them to a file known as a transaction log. In the event of a
hardware or software failure, the eXtremeDB runtime can recover the database
using the log. For performance tuning, the developer can specify the type and
frequency of logging to be performed.
• eXtremeDB High Availability: for applications that require the highest degree
of reliability in case of system failures. The eXtremeDB High Availability
runtime implements a time-cognizant two-phase commit protocol to guarantee
that a master database instance and one or more replica database instances are
perfectly synchronized. It extends the core eXtremeDB programming
interfaces with additional functions to allow replica databases to connect to
and synchronize with the master, or for a replica to become the master in a
“hot-switch” scenario.
• eXtremeSQL: a version that includes a high-performance implementation of
the SQL database programming language. Applications can use standard
SQL through embedded C-language function calls or through a set of
streamlined C++ classes. Built on the unsurpassed performance of
eXtremeDB, and a SQL optimizer tuned to take advantage of the eXtremeDB
runtime, eXtremeSQL delivers a blazingly fast interface for SQL
programmers and a convenient C++ class layer for simpler application coding.
• eXtremeDB JNI: a seamless integration of eXtremeDB with the Java
programming language. Java programmers simply declare the list of desired
“persistent” classes (using the annotation mechanism) and all of the lower
level database functionality is actually implemented through the “reflection”
capability of Java classes. The standard Java object oriented language is thus
enhanced with optimal persistent data storage – whether all-in-memory or on-
disk – so that the Java programmer need not be concerned about the lower-
level database calls.
Our Objectives
The primary objective for eXtremeDB is to provide extremely high performance
for the kinds of applications to which it is targeted. These applications are

Introduction
different from ordinary database business applications such as payroll or

inventory programs. First of all, eXtremeDB-based applications may run on
inexpensive processing equipment with small amounts of memory and often
without any permanent storage device. Secondly, data access must be fast; simple
queries and transactions should take at most a few milliseconds, even when using
slow processors; the amount of data that needs to be stored is relatively small
compared to enterprise databases, and transactions are usually short. On the other
hand, the data that needs to be stored can be complex; for instance, it is almost
always dynamic in nature.
Representative application domains are Internet appliances (IA), process control,

network/telephony, and set-top box applications. So eXtremeDB’s initial design
goals were derived from the requirements of these target applications, which are
typically high transaction rate programs with,
a) soft or hard real-time requirements,

b) dynamic data streams, and
c) cost-effective (limited) hardware resources.
An important constraint is to fully support the conventional DBMS ACID

transaction properties to assure database integrity and durability in the case of
application and system failures.
To these goals was added the need to provide an easy-to-use object-oriented

approach to database application development and to give the developer the
maximum possible configurability and control over the performance and impact
of the database runtime within their applications. So, with these overriding
requirements, the guiding principles in the design of the eXtremeDB include:
• Minimize resources necessary to support persistent data—essentially

memory size. Object sizes are kept small—the overhead that eXtremeDB
introduces is low and, to a degree, controllable from the application; data
layout provides packing of object data
• Keep necessary heap size extremely low—on some configurations
eXtremeDB requires less than 1K of heap
• Keep code size low
• Eliminate extra layers of code by closely integrating persistent storage and
the host application language. Target applications usually have many very
small database transactions rather than large ones. That means that the
operation to obtain data from an object, given a pointer or reference to the
object, has to be very fast, otherwise the overhead cost (such as the cost of
sending a “message”) could become prohibitive. eXtremeDB’s data access
methods make it possible to reference persistent objects at nearly the same
speed as that of transient data

Introduction
• Provide native support for dynamic data structures such as variable-length

strings, lists and trees. eXtremeDB “extends” the “C” language by
supporting dynamic data in a very efficient (fast), safe (transactional), and
compact (memory) way
Product evolution
eXtremeDB has played a significant role in the success of thousands of real-time
applications on a wide range of embedded systems platforms. Driven by requests
from developers and enthusiastic customers, additional features have been added
to extend the basic eXtremeDB core technology to address evolving user
requirements, as can be seen from the list of additional versions above. However,
great care has been taken in each stage of the product evolution to make no
compromises with our basic extreme performance goals. Each additional
interface, be it Transaction Logging, High Availability, SQL or JNI, is provided
in the form of separate libraries that can be linked into an application as desired to
address specific application needs.
While developers who need the absolute best performance for mission critical
applications are reassured to know that the underlying core runtime remains the
fastest, most robust in the industry, they also appreciate knowing that less
demanding applications can interface with the same eXtremeDB databases
through SQL, for example, to generate reports or allow a flexible query interface.
Or a High Availability application may provide the utmost reliability for sensitive
data transactions, while a highly optimized, performance critical application may
perform different transactions simultaneously on the same database.
As more diverse application data persistence features were requested, a number of

enhanced features were added to extend functionality and developer control for
all-in-memory as well as disk-based databases including the following:
• Choice of “optimistic” or “pessimistic” transaction management by simply

linking the appropriate library:
o The Multi-Versioning Concurrency Control (MVCC) Transaction
Manager, an “optimistic” transaction manager which operates on
“snapshots” of data and allows choice of isolation levels when
querying and updating the database
o The Multiple Reads Single Write (MURSIW) library which
implements a rigorous “serialized” transaction management
strategy
• Binary Schema Evolution (the ability to upload a revised database
structure and transform existing data objects automatically
• Improved Transaction Logging:
o logging can be turned on/off at runtime
o transaction log entries can be used to export data to external
databases

Introduction
• Data import/export using standard XML

• Enhanced Database Security:
o CRC checking for reliability when loading data from and
unloading data to persistent memory
o Page-level Data Encryption for robust data security
• NVRAM support for database recovery from battery-backed memory
• Application-managed Caching options, such as:
o Prioritized Cache: employing an enhanced LRU algorithm that
allows applications to influence how long certain pages remain in
the disk manager cache
o Connection Cache: the database runtime makes sure that all pages
loaded for a given transaction stay in the cache until the transaction
is committed

Chapter 1: eXtremeDB
Documentation
Unlike a traditional DBMS that you can use to begin storing data right out of the box,
eXtremeDB is a development tool that data-enables your C/C++, or Java applications.
So, before you begin using eXtremeDB, it is important to understand some basic
eXtremeDB concepts:
• eXtremeDB is implemented as a set of C-language libraries that are linked into

your application, not as a “server” application but rather as a fully “embedded”
database engine;
• the database schema is defined in a high-level C++ -like Data Definition
Language (DDL) and then compiled to produce a generated API in the form of a
.h header file and a .c implementation file;
• eXtremeDB supports a ‘native’ C-language API ,C++, Java and SQL APIs;
• by simple DDL declarations and/or compiler options you can create all-in-
memory databases, “ persistent” databases, or hybrid databases that combine both
“transient” and “persistent” classes
Consequently the eXtremeDB documentation is divided into separate volumes. The

“User’s Guide” walks through the important concepts outlined above and explains how to
incorporate various database functionality into your applications. It is most helpful while
designing the data management part of your applications, while the “Reference Guide” is
intended to explain implementation details of each of the database API functions and is
especially helpful when building your application. Similarly, the “eXtremeDB JNI User’s
Guide” and covers the Java API to eXtremeDB.
Depending on your license, you might have received one or both of the eXtremeDB
addendums:
• The “eXtremeDB High Availability Addendum” explains how to implement High
Availability redundancy in the form of Master and Replica applications;
• The “eXtremeDB Transaction Logging Addendum” explains how to implement
Transaction Logging to allow for efficient recovery of in-memory portions of
databases in the event of system failure.
Finally, if you have licensed SQL extensions, you will have received the “eXtremeSQL
User’s Guide” that explains the use of the SQL Interface to eXtremeDB and how to
optimize its features.
For a good source of information regarding implementation details, you are encouraged
to review the source code provided in the sample programs in the eXtremeDB package.
The readme.txt file in the “samples” directory outlines the specific features implemented

eXtremeDB Documentation
in each sample. Building these applications and stepping through them in the debugger is
an excellent way to gain familiarity with the eXtremeDB API.
There are also a number of technical articles and white papers about eXtremeDB
available on the McObject website, http://www.mcobject.com, that you are welcome to
download.
Syntax Conventions
As you will see in the next sections, the eXtremeDB API is divided into two broad
categories: the “Static API” consisting of application independent interface functions, and
the “Generated API” consisting or application specific type-safe interface functions
generated by the eXtremeDB DDL compiler. When referring to eXtremeDB-generated
functions, we use the following predicates:
classname
structname
fieldname
indexname
eventname
to refer to elements of your database design that are used in the function naming
convention of eXtremeDB, for example:
classname_fieldname_put( classname *handle, <type> value)
describes the functions that are generated for every field of every class to populate the
database objects’ fields.
In our description of the Database Definition Language (DDL) syntax, we use square
brackets (“[” and “]”) to indicate optional elements, ellipses (“…”) to indicate repeating
items, a bar (“|”) to indicate a choice, and italics to indicate an item that requires further
definition by you. For instance:
Field-statement:
type-identifier | struct-name field-name;
A field-statement declares a field of a class or structure. You must declare the type of the
field (either an atomic data type, a typedef, or a previously defined structure). However,
field-name requires no further definition.

Chapter 2 : Database Concepts

Introduction
We are now ready to discuss fundamental database concepts, and relate these to
specific elements or characteristics of eXtremeDB. Even if you are experienced
with database management systems, you are encouraged to read this section for a
full understanding of eXtremeDB and its positioning with respect to other
database products.
Our perspective on database systems is from the standpoint of embedded and real-
time systems. Thus, these sections may reveal a perspective on database theory
that is different from yours. Our objective is to provide a highly effective storage
solution for real-time and embedded applications, and a highly productive
database development tool for embedded application developers, which makes
use of modern programming techniques. Externally, eXtremeDB exposes a rich
object-oriented database interface to applications, making it extremely easy for
developers to describe, store and manipulate application-specific data. Internally,
eXtremeDB uses storage layout and access methods that are specifically
optimized for the supported data representation.
Definitions
A database is a collection of related data organized for efficient storage and
retrieval. Beyond this, any attempt to more specifically describe a database
inevitably involves individual features of one or more specific database
implementations. The following definitions describe the eXtremeDB
implementation.
Each database has groupings of elements. We call the definition of a group of

elements a class; other terms commonly used are “table” and “record definition”.
Instances of the class stored in the database are called objects, analogous to
records and rows. As will be explained in detail in this documentation, an
eXtremeDB class is much more than a relational database table or other database
record definitions. Our purpose here is not to equate eXtremeDB elements to
those of relational or other databases, but rather to outline the hierarchy of
elements and contrast them with elements of familiar database architectures as an
aid to your understanding.
The term class is used in most object-oriented languages, such as C++ or Java.
The class defines the properties of the object and the methods used to control the
object’s behavior. This definition is correct for persistent eXtremeDB classes as
well—a database class defines object fields and access methods.

Elements are called fields in eXtremeDB. Other common terms are “attribute” and
“column”. Fields have a type property. Type determines whether the element
holds character, integer, real, or binary data. See Appendix A for a list of
eXtremeDB data types. eXtremeDB also supports arbitrarily large fields through
blob and vector types, and complex fields through the structure type.
A blob is an arbitrarily large stream of bytes; untyped “opaque” data. From

eXtremeDB’s perspective it has no structure. Classic examples are audio streams
(.wav files), video streams (.mpg files), graphic files (.jpg files), and streams of
text larger than 64 K.
A vector is an arbitrarily large stream of typed data (vector elements), such as a

stream of 2-byte integers, strings or structures. In eXtremeDB, you can define
vectors of any type except blob. Vectors are useful when describing real-world
complex objects such as tree-like data structures.
In addition to grouping fields into a class, eXtremeDB gives you the tools to
create a sub-grouping we call a structure. A structure declaration names a type
and specifies elements of the structure that can have different types. Structures
and simple types are building blocks to construct object definitions. Structures can
be used as elements of other structures. Like other element types, you can have a
vector of structures.
Generally speaking, fields are either simple or complex. Simple fields are atomic
types such as char, integer, string, and so on. Complex fields can be vectors of
simple types, structures (which may in turn contain structures and vectors),
vectors of structures, and blobs.
Note: eXtremeDB structures, in contrast to C or C++ structures, cannot be

instantiated separately; they exist only as a part of an object of some class.
In the relational or hierarchical data models, records are constructed from basic
data type fields. The collection of built-in data types and built-in operations were
motivated by the needs of business data processing applications. However, in
many engineering or scientific applications this collection of types is not
adequate. For example, in a scientific application, a requirement could be to
describe a time series and store and access it with appropriate operations. Another
common example is tree-like structures that are widely used in engineering
applications such as routing tables or “electronic program guide” implementations
for set-top boxes. Historically, complex data types and operations on them have
been simulated using basic data types and operations provided by the DBMS with
substantial inefficiency and added complexity. Complex objects are represented
by multiple basic tables or records and defining relationships between them.
When working with traditional relational or hierarchical data models, application

developers represent their objects as records or rows in tables. In many cases

objects cannot be represented with one record, and developers are forced to store
parts of an object in different tables and define relationships between the object’s
parts. However, objects are entities, all parts of which are working as a whole.
Consequently, developers usually introduce their own APIs to store and retrieve
objects. These APIs shield inner relations within objects from the application, but
at the same time, introduce extra layers of application code that must be written,
debugged and executed.
eXtremeDB employs an object-oriented approach to database development.

Objects can have a more complex nature than merely a one-dimensional
collection of fields. In contrast with a relational representation, eXtremeDB fields
can have complex structure themselves, for example, a field may be a dynamic
array of structures; structures in this vector can also hold other structures or
arrays. eXtremeDB’s class access methods allow for simple but highly efficient
access to objects without extra layers of navigation code.
Embedded development environments usually lack good object-oriented tools that

deal with complex data structures, such as STL with its variety of data types and
data-handling methods like containers. eXtremeDB fills this need, extending C
with C++ data types and ACID-compliant data access methods to provide
developers with a simple, clean, efficient, and easily adjustable approach to
database development. Similarly, Java developers can use the JNI to add
eXtremeDB persistence to Java classes by simple annotation without having to
invest a lot of time learning the details of the database interface.
eXtremeDB, like other databases, provides indexes. Indexes provide access to

objects based on key values. An index definition consists of any combination of
fields, structure elements or vector elements from a given class. Indexes can be of
various types, the most common of which are either hash or tree indexes. Tree
indexes can have a mix of ascending and descending components and, in addition
to exact match searches, can be used for sorting and range-based retrieval, but the
ascending or descending attribute is irrelevant to hash indexes which are used for
fast storage and retrieval but without sorting or range-based access. Both tree and
hash indexes can optionally contain duplicate values.
eXtremeDB also provides a number of specialized indexes that optimize access to

records and groups of records for particular types of applications (see section
“Search Methods” in chapter 5) such as the following:
• Patricia Trie: Particularly useful for network and telecommunications

applications, these are often used to quickly perform IP address prefix
matching for IP subnet, network or routing table lookups.
• R-Tree: These indexes are commonly used to speed spatial searches; for
example to find the rectangle that bounds a given point, or all rectangles
that overlap a specified rectangle.

• Kd-Tree: Using a data structure for organizing points in a k-dimensional

space, kd-trees speed lookups that involve a multidimensional search key.
Indexes can also be used to establish relationships between objects of different

classes (in addition to object identifier and autoid, which are discussed later).
Relational databases support a concept of primary and foreign keys. By definition,
a primary key is a column or combination of columns that uniquely identify a row
in a table. Correspondingly, a foreign key is a column or combination of columns
whose values match the primary key of some other table. eXtremeDB supports
this notion of foreign key and primary key. A primary key is normally
implemented as a unique index on a field or combination of fields within a class.
A foreign key is normally implemented as an index allowing duplicates, which is
a reference to a primary key of some other class in the database.
In addition to indexes, classes in eXtremeDB can be declared to have an object

identifier, or oid. These are different from primary keys in that the composition
of an oid is the same for every class in the database. A class can also have a
reference to object(s) of another class. We call this a reference and define a
corresponding data type ref. A ref is a reference in one object to the oid of another
object.
At first, it may seem strange that oids must have an identical composition for
every class in the database. In actuality, this models the real world in many
embedded application environments. For example, a system that receives data
from some automated source will receive objects that have an identifier already
supplied by the source. An example could be a network of sensors for which the
oid of every sensor class is sensor-type + sensor-id + measurement-timestamp.
Note that not every class is required to have an oid.
oids and refs are a better alternatives to indexes for establishing inter-object
relationships. An object can have a vector of references, as one means to
implement a one-to-many relationship across classes. Indexes can and should be
used to implement fast random access to objects by one or more key fields, for
sorted access, and for range retrieval. When possible, oid and ref types should be
used to implement relationships.
eXtremeDB also offers the autoid type. Autoid is similar to oid, except that the
structure and value of autoid fields are determined by the eXtremeDB system. An
application uses the autoid_t typedef to declare program variables of type autoid,
and the autoid_t DDL data type to create a reference in one object to the autoid
value of another object. Autoid and autoid_t can be used as an alternative to oid
and ref whenever a natural oid does not exist, is deemed too cumbersome, or an
automatically incrementing identifier is desired.
To express the content and organization of a database, the database designer uses
some or all of these components in a database definition language (DDL) to
create a database schema. The schema is a textual description of the data model.

It is processed by the eXtremeDB schema compiler, which ensures that the

schema is syntactically correct and then generates the application programming
interface (API) header (.h) and implementation (.c) files. When the application is
compiled the database dictionary for the database is produced from the
implementation file. The database dictionary is simply a binary form of the
schema that the database runtime can use more efficiently.
An eXtremeDB-based application uses the API generated by the schema compiler

to store, read, and manipulate objects in an eXtremeDB database. This is in
contrast to many database products that offer a static proprietary navigational
API, or a static standard API (like SQL). The eXtremeDB API is always tailored
to your application, so the integration with your application happens naturally. It
is probably very much like the API would be if you wrote a database specifically
for the needs of your application. And that is what we intended. In the world of
embedded systems, the most common alternative for any commercial database
product is the “homegrown” database. eXtremeDB offers all the advantages of
“homegrown”: comparable performance, small footprint without excess baggage,
and an API that fits seamlessly with the rest of your application, while also
delivering the advantages of an off-the-shelf database solution: lower
development cost, lower maintenance cost, and shorter time to market.
The steps involved in creating an application that uses eXtremeDB are illustrated
in the following diagram.

mcocomp
Text editor Database schema (schema
compiler)
Application Database
source code files interface files
(.h and .c)
C/C++ compiler
eXtreme DB Object code

library files
Linker Application

Database Design
There are no object-oriented methodologies that have been specifically devised
for object-oriented database design. However, object models being data oriented
rather than process oriented, suit the needs of database applications quite well.
There are a plethora of books and articles written on the topic of data modeling
and database design. We won’t attempt to give the topic an exhaustive treatment
here, but will attempt to hit the highlights and permit you to seek out additional
resources if a particular area piques your interest.
A good Object-oriented database design methodology would ideally help to select

the right information and craft the correct data model. But a methodology can
only help the designer. The design process is and will always be a creative work,
which is extremely relevant to the success of a whole software project.
Identifying the right objects remains the main problem in the design process.
For eXtremeDB database design, we’ll establish several practical steps that, in our
view, designers will benefit from.
• Find real-life objects;

• Classify objects into classes;
• Define the internal structure of classes: define elements and data types, define
common data structures used across multiple classes;
• Model structured connections within each class;
• Assign object identifiers (oids) to classes if appropriate. eXtremeDB will use
oids to enforce uniqueness constraints and to provide fast data access;
• Define associations between classes. Using oids and refs, autoid and autoid_t,
or indexes, implement bi-directional relationships (one-to-zero-or-one, one-to-
many, many-to-many) if necessary;
• Formalize the schema—create a DDL and compile it. Examine the generated
interfaces;
• If necessary, change the internal behavior of classes—if interfaces are
missing, go back and change the schema;
• Optimize the schema based on your application requirements. It may be
necessary to add a tree index, hash or other indexes indexes to generate
optimal search methods for your classes. Verify your input streams and space
requirements. Some class fields could be declared as optional, some of the
indexes could be declared as voluntary, or some could be removed altogether;
When you are done with the initial schema, develop benchmarks. The best kind of
benchmark is one that models an application well—it is also the most time
consuming and expensive. Unfortunately, the popular benchmarks (for example,
TPC) do not cover many different uses of object-oriented databases, and they are

rather unlikely to match any particular application. Another major drawback of

many benchmarks is that they are based on single-task (or single-threaded)
applications, which usually leads to unrealistic comparisons. Performance and
scalability are typically the top evaluation criteria for data management software.
The benchmark should give developers an idea about expected access times,
response times, throughput and storage. Ideally, it should also show how well the
database scales by number of connections and the nature of the data manipulation.
The steps described above cannot in general be executed completely sequentially.

For instance, the description of a class or change of a class’s behavior changes the
internal structure of the class. Therefore, during the design process, cycles in
design steps are inevitable.
There are no real quality criteria for object database schema, for example, a
measurement of the degree of redundancy as is done by the framework of
normalization for the relational case. There are no “transformation rules” for
better data representation. Designers rely on the benchmarks to establish a
sufficient degree of schema redundancy. During the design process there is no
clear distinction between the design of the application and the database semantics.
Your database will become an integral part of your application, thus making data
access extremely optimized.

Database Models
In the history of database management systems, four database models have
emerged. In chronological order they are: hierarchical, network, relational, and
object. This section will briefly describe the characteristics of each, solely for the
purpose of positioning eXtremeDB in the database landscape.
The various database models are different physical means of implementing the
schema. To a degree, the choice of database models will impact the schema
design. For instance, a relational database will require that all data be organized
into rows and columns with no repeating data elements, whereas eXtremeDB and
some other databases will allow vectors (also called arrays).
Hierarchical Model
The hierarchical database model, of which IBM’s IMS database is the most
recognized example, organizes data into strict parent/child relationships, hence
the term “hierarchical”. The database is always navigated starting from the root
node, and at each node navigated down the left or right branch until the desired
data is located.
Network Model
The network model database is a superset of the hierarchical model. It expands
the strict parent/child metaphor of the hierarchical model such that a “parent”
record can own one, two or more child record types, and child record types can
have more than one (in fact, any number of) parent records. Records can own
themselves in recursive relationships (like manager-employee, where manager is
an employee). Records can have relationships to records more than one level
removed in the model. Finally, network model databases permit navigation to
start at any point in the model, not just from the root. In fact, there may not appear
to be a root when the database is diagrammed. Network model databases take
their name from the fact that a network of relationships is possible.
Hierarchical and network model databases implement the inter-record

relationships through database addresses, or database pointers. A database address
can be translated to the precise location of the record on disk (or in cache, as the
case may be). Consequently, these databases are extremely fast. At the same time,
it is relatively difficult to alter the structure of these databases because the
pointers are stored with the record, so the record structure must change whenever
a relationship is added or dropped. If fields are added or dropped, the file must be
reorganized, which can change records’ locations, and that means the pointers
have to be fixed up. Another disadvantage of network and hierarchical databases
is that it is necessary to keep track of “where you are” when navigating the
database, which adds an element of complexity to the programming.

Relational Model
E. F. Codd first presented the relational data-model. The model offers a
conceptually different approach to data storage. In the relational database, all data
is represented as simple tables where columns represent data attributes (as values
of specific data types) and rows represent instances or records. Relationships are
implemented by data where a given table’s foreign key columns have identical
values to the corresponding primary key columns in a related table. Relational
databases may be accessed using a high-level non-procedural language. This
language is used to gain access to the relations and the desired set of data and the
programmer does not have to write algorithms for navigation. By using this
approach the physical implementation of the database is hidden, thus the
programmer does not have to know the physical implementation to be able to
access the data.
In the mid-seventies Chamberlain and others at IBM proposed such a high-level

non-procedural language—SEQUEL, which later was renamed SQL. The
relational approach separates the program from the physical implementation of
the database, making the program less sensitive to changes of the physical
representation of the data caused by unifying data and metadata in the database.
This makes the development of programs more efficient and less dependent on
changes in the physical representation of data.
SQL and relational DBMSs have become widely used due to the separation of the
physical and logical representation (and marketing of course). It is much easier to
understand rows and columns than records and pointers to records. Relational
databases solve the two problems of network and hierarchical databases. Inter-
record (or table, in relational parlance) relationships are implemented via indexes,
not pointers, so the relationship-maintaining information is separate from the data.
This makes it relatively easy to add/drop relationships (by simply
adding/dropping indexes) and to add/drop/modify columns (no pointers to fix up).
And, with no pointers, there is no need to keep track of where you are when
navigating a relational database. The disadvantage of relational is that indexes
take more time to navigate and consume more space than pointers do, so
relational databases tend to have lower performance and require more disk space.
Object Model
There is no official standard for object databases. Object databases employ a data
model that has object-oriented aspects like classes with attributes and methods
and integrity constraints; provide object identifiers (oids) for any persistent
instance of a class; support encapsulation (data and methods); and support
abstract data types. Object databases combine the elements of object orientation
and object-oriented programming languages with database capabilities—they
extend the functionality of object programming languages (for example, C++,
Java) to provide full-featured database programming capability. The result is a

high level of congruence between the data model for the application and the data
model of the database, more natural data structures, and better maintainability and
reusability of code. Under the covers, object databases are often implementations
of the network model, but because of a higher level of abstraction, they don’t
expose the complexity of programming.
The host application language is the language for both the application and the
database. It provides a very direct relationship between the application object and
the stored object. In general the object DBMS is tightly integrated with the host
language such as C++, C, Smalltalk, or Java. In contrast with relational DBMS,
where the query language is the means to create, access, and update objects, in
object DBMS the primary interface for creating and modifying objects is directly
via the host language using the native language syntax. Moreover, every object in
the system can automatically be given an identifier (oid) that is unique and
immutable during the object’s life. One object can contain an oid that logically
references, or points to, another object. These references prove valuable when
associating objects with real-world entities; they also form the basis of features
such as bi-directional relationships.
Object-Relational Model
Object-relational database management products try to unify aspects of both the
relational and object databases. Note, however, that there is also no official
definition of what an object relational database management system is. Object-
relational DBMS employ a data model that attempts to add “OO-ness” to tables.
All persistent information is still in tables, but some of the entries can have richer
data structure, called “abstract data types” which is a data type that is constructed
by combining basic alphanumeric data types. For the query language, object-
relational DBMS support an extended form of SQL, sometimes referred to as
ObjectSQL. The object-relational RDBMS is still relational because the data is
stored in tables of rows and columns, and SQL, with the extensions mentioned, is
the language for data definition, manipulation, and query.
Summary
Because eXtremeDB provides a variety of convenient and efficient types of
indexes, you are free to design and implement a relational database with
eXtremeDB. Conceptually, however, eXtremeDB employs the object-oriented
paradigm for both internal storage layout and the database application
development process. eXtremeDB internals are not built upon any other data
model—it manipulates objects and their properties, not records of any kind.
Externally, eXtremeDB exposes complex data types and object access methods
that are determined by class definitions, not a standard database API. Applications
written in C can instantiate persistent objects; the data manipulation language is
the application’s host language (C, C++ or Java), so the database access is tightly
integrated with the application; applications can take advantage of object

identifiers, oids and autoids, to gain very high-performance access to stored data;
complex data types like vectors are seamlessly supported (again via class
interfaces) from the host programming language.
Example
Let’s consider a hypothetical application, a satellite radio receiver. The receiver
will process two types of broadcast data objects: “program data” that consist of
individual program information such as a title, narrator, content description and
program description, and “program schedule” that consist of schedule time data
such as the schedule start time and duration, a number of time slots within the
schedule, each with their start time and duration, and a reference to the “program
data” that describes the program for that time slot. Program information
transmission is organized in a data stream that consists of objects of both types.
Program schedule and program data information can arrive in any order. It is
possible that a schedule for a program would be placed ahead of program data in
the stream.
The receiver application should keep all the programming information from the
current time forward up to a number of hours, and be able to link each program to
the time slot the program airs. It should replace schedules and program data in the
database as the old ones become obsolete and the new ones arrive. There are
certain performance considerations—data must be written into the database
quickly enough to keep up with the transmission. Each data object received by our
receiver is assigned a unique identifier that is also broadcast by the satellite.
The pseudo-code below describes the data:
ProgramData {
Object ID;
TitleSize;
Title;
NarratorSize;
Narrator;
For(I = 0;I < MaxDescriptions;I++)
{
ContentDescriptionText | ProgramDescriptionText
}
};
ProgramSchedule {
StartTime;
Duration;
NumberOfTimeSlots;
For(I = 0; I < NumberOfTimeSlots; I++)
{
SlotStartTime;
SlotDuration;
Program_data_ID;
}
};
Following our design methodology, we can clearly introduce two classes that
correspond to real-life objects—“Program Data” class and “Program Schedule”
class. “Program Data” always has “Title” and “Narrator” fields, while program

description text or content description text may or may not be present for a
particular program. In eXtremeDB terms, we would declare them as optional and,
in order to do that, both descriptions must be placed in structures. There are no
data structures shared by both classes. The Program Data class will be declared
with an oid so we can reference a program from the appropriate time slot. Each
Program Schedule has an array of time slots that is nicely implemented as a
vector. We also would like to sort schedules chronologically, so we declare a tree
index based on the schedule’s start time. There is no need to keep the oid of the
Program Schedule, even though it is available to us. The only association between
the two classes is via a reference to a program put into the time slot.
This real-world data stream could be formalized with the following eXtremeDB
schema:
struct ProgID
{
uint4 id; // this will hold the transponder-provided object ID
};
declare database GuideDb;

declare oid ProgId[5000];
struct content_description
{
string text;
};
struct program_description
{
string text;
};
class prog_data
{
string title;
string narrator;
optional content_description ctext;
optional program_description ptext;
list;
oid;
};
struct time_slot
{
uint4 start_time;
uint2 duration;
ref pid; // a reference to the prog_data object

};
class prog_schedule {
uint4 start_time;
uint2 duration;
// within the duration of this schedule, there can be
// an arbitrary number of time_slots, so we use a vector
vector <time_slot> slot;
// to quickly locate the program info for a schedule,
// or to retrieve the schedule by ascending start_time, we use an
// index
tree <start_time> scale;
};

Access Methods
The process of gaining access to objects stored in eXtremeDB is called
navigation. There are several methods of navigation available: by oid or autoid,
by hash or tree index, and sequential. These methods are encapsulated in the
programming interface that is generated when the schema compiler processes the
database schema. Rather than employ a pre-defined navigational API to access the
database, you will employ the methods generated for the classes according to your
database design.
OID
Whether an oid is provided by an external source or retrieved with an object as a
reference to another object in the database, an oid can be used to quickly retrieve
the object it identifies. Oids must be unique within all classes in the database.
Uniqueness is enforced during object creation by the eXtremeDB runtime.
oid, autoid, hash, and tree indexes can be used to establish relationships between
classes in the database. For example, to establish a relationship between a sensor
and measurements using oids, you could design the following:
struct sens
{
uint2 sens_type;
uint2 sens_id;
uint4 timestamp;
};
declare oid sens[1000];
class Sensor
{
. . .
vector <ref> measurements;
. . .
};
class Measurement
{
uint4 meas;
oid;
. . .
};
In this example, the class Sensor contains a variable length array (a vector) of
references to oids of the class Measurement. Each element of the vector is the oid
of an instance of the class Measurement and can be used to quickly reference
(locate) the associated Measurement object.

AUTOID
Autoid is similar to oid, except that it is a value that is determined by the
eXtremeDB runtime, and the number-of-expected-entries qualifier applies to the
class, not the entire database. Autoids are of the type autoid_t, a typedef for
8-byte signed integer in mco.h (you cannot change this). Each autoid value is
unique in the database. (This is an implementation detail provided for your
information only; you should not rely on this detail being immutable in future
versions of eXtremeDB.) Relationships between classes can be created with
autoids by defining an autoid_t field to contain the autoid of another object.
Adapting the previous example to use autoid instead of oid, we would have:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
vector <autoid_t> measurements;
. . .
};
class Measurement
{
uint4 meas;
autoid[10000];
. . .
};
In the new example, the class Sensor contains a variable length array (a vector) of
autoid_t that act as references to the autoids of the class Measurement. Each
element of the vector is the autoid of an instance of the class Measurement and
can be used to quickly reference (locate) the associated Measurement object.
Autoid is useful when an object has no natural unique identifier, or the natural
unique identifiers are cumbersome and would impose an unacceptable
performance and space consumption penalty to index.

Chapter 3 : Operational Overview
INDEX
eXtremeDB supports hash indexes and tree indexes of the following types: tree (b-
tree), trie (Patricia Trie), rtree (R-tree spatial), kdtree (kd-tree multi-dimensional)
and user-defined (“custom”). Hash and tree indexes can also be used to uniquely
identify objects in the database. Unlike oids, however, hash and tree index values
are only required to be unique within a class.
To accomplish the previous example without an oid, you could use indexes, as in
the following example:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
};
class Measurement
{
uint2 sens_type;
uint2 sens_id;
uint4 timestamp;
uint4 meas;
tree <sens_type, sens_id> sens;
. . .
};
In the first and second examples, the application programmer would navigate to a
Sensor object through whatever mechanism is logical in the context, and then
iterate over the vector of references named “measurements” to visit each
Measurement for the Sensor.
In the example using an index, the application programmer would navigate to a

Sensor object and use the Sensor object’s sens_type and sens_id element values to
search the tree index of the Measurement class, named “sens”. (A detailed
explanation of tree-based navigation follows in section “Search Methods”.)
LIST
A third, sequential, navigation method is available for unordered lists of objects.
To iterate over the objects of a class without regard to any particular order, add
the list directive to the class definition, as in:
class Sensor
{
uint2 sens_type;
uint2 sens_id;
. . .
list;
};

Integration with Host Language

eXtremeDB is a database management system that provides a tightly integrated
language interface to the traditional DBMS features of persistent storage,
concurrency control and recovery, and application-defined queries. We selected C
as the first host language through which eXtremeDB is accessed simply because it
is the most popular among the developers of embedded software. Use of
eXtremeDB with other programming languages such as C++ and Java is discussed
later. There are several motivations for making eXtremeDB closely integrated
with the programming language. These include:
• Ease of learning: eXtremeDB is designed so that C, C++ or Java developers

only need to learn a little bit more in order to start effectively using
eXtremeDB. For instance, there is no need to learn a new type system. With
few exceptions, eXtremeDB uses application-defined methods to access the
persistent data—there is almost no new API to absorb. eXtremeDB is easy for
developers to get started with, and is accessible to a wide range of developers
who need to use persistent data in embedded applications.
• No translation code: eXtremeDB does not require any translation between
the representation of persistent data and the representation during execution.
For example, to store a data element into a relational database, the C
representation of it is usually mapped to a relational representation. Some
code has to be written to pick fields out of table rows and copy them into C
data structures. That causes extra CPU cycles, and extra code. With
eXtremeDB no such translation is necessary. To the application, the persistent
data looks like ordinary transient data, accessible through custom interface
functions.
• Debugging support: eXtremeDB provides excellent debugging capabilities
for its users. There are a number of traps throughout the eXtremeDB code that
facilitate detection of application errors. For example, if an incorrect pointer is
passed into eXtremeDB, a fatal exception will be raised before the pointer is
used, which makes it possible to examine the call stack and find the source of
the corruption. (For a detailed explanation of fatal errors and error handling,
see section “Return Codes and Error Handling” in Chapter 6 of this User’s
Guide.). The compile-time type checking of C applies to the methods that
eXtremeDB uses to access stored data. Methods that are generated to provide
access to a certain object type always take a reference to that data type as a
parameter. If a mistake is made, a compiler warning will be generated.

Data Definition Language

An eXtremeDB database design is specified in the Data Definition Language
(DDL). One DDL specification must exist for each eXtremeDB database. The
DDL specification identifies the database, defines each data class, its elements, its
relationship to other data classes, and data access methods. The DDL syntax,
keywords and preprocessor are designed to be similar to elements of the C/C++
language and C++ STL library to simplify learning and understanding.

Base data types
Data type as Meaning Example
Signed<n> Signed n-byte integer, n = 1, 2, 4, or 8. signed<2> some_short;
unsigned<n> unsigned n-byte integer, n = 1, 2, 4, or 8. unsigned<4> hall;
Float 4-byte real number. float rate;
Double 8-byte real number. double rate;
char<n> Fixed length byte array less than 64K in char<8> name;
size. Char fields could store C-type strings
or binary data. Trailing null characters for C
strings are not required to be stored, since
eXtremeDB adds them when the string is
read out of the database, provided that the
size of the supplied buffer is large enough
to hold it.
nchar<n> Fixed length byte array less than 64K in In your schema:
size. nchar fields store 2-byte characters nchar<20> uname;
that are sorted by their numerical value.
In your C/C++ program:
This is suitable for many asian languages.
nchar_t uname[21];
wchar<n> Fixed length byte array of less than 64K in In your schema:
size. wchar fields store Unicode characters wchar<20> uname;
that are sorted according to the machine’s
locale setting.
wchar_t uname[21];
String Variable-length byte array of less than 64K string description;

bytes in size. Bytes are returned to the
application exactly as stored (up to the
length of the supplied buffer). C-strings of
unknown length should be stored in a string
field to save space. Ending null characters
can be omitted, for the same reasons as for
char<n> type.
Nstring Variable length byte array less than 64K in In your schema:
size. See nchar. nstring uname;
nchar_t *uname;
wstring Variable length byte array less than 64K in In your schema:
size. See wchar. wstring uname;
wchar_t *uname;

Enum User-defined type consisting of a set of enum FLOWCONTROL {

named constants called enumerators. The XON, CTS
name of each enumerator is treated as a };
constant and must be unique within the
class using_enum {
scope of the DDL schema where the enum
FLOWCONTROL fc;
is defined. An enumerator can be promoted
};
to a unit1, uint2, or uint4 value.
Blob Binary data object; a byte array of any size, blob jpeg;
can be greater than 64K in size.
Vector Variable length array of any data type, such struct A

as signed, unsigned, or structures (in C {unsigned<2> a1; char a2};
terminology).
vector <A> struct_array;
vector<unsigned<2>> numbers;
Ref Explicitly declared reference to an object by ref of_an_object;

object’s oid. (See Declare statement
below.)
Autoid_t Explicitly declared reference to an object by vector <autoid_t> refd_objs;

object’s autoid.
Date A 4-byte unsigned integer by default, as date start_dt;

defined by mco_date in mco.h.
Time A 4-byte unsigned integer by default, as time start_tm;

defined by mco_time in mco.h.
boolean A fixed size array of bits. Boolean bits[8];
rect|rectangle A rectangle with 2,4,8 byte integers, float or rectangle <uint2>

double coordinates. rectangle<float>
In addition, data types can be fixed size arrays, for example:
time start_tm[3];
defines an array of three time values. Any element except vectors, blobs and
optional structs can be a fixed size array. Fixed size arrays cannot be used in
indexes; for this, use a vector.

Preprocessor
The eXtremeDB™ DDL compiler allows limited use of the C preprocessor.
Preprocessor directives are typically used to make source programs easy to
change. Directives in the source file tell the compiler to perform specific actions,
such as replacing tokens in the text. The eXtremeDB DDL compiler recognizes
the following directives:
#define #ifdef #else #endif #include #undef
The number sign (#) must be the first nonwhite-space character in the line
containing the directive; white-space characters can appear between the number
sign and the first letter of the directive. Some directives include arguments or
values. Preprocessor directives can appear anywhere in a source file, but they
apply only to the remainder of the source file.
Usage examples:
#include "inc1.h"
#define int1 signed<1>

#define uint4 unsigned<4>
#define SYMBOL_LEN 4
#define SYMBOL char<SYMBOL_LEN>
#ifdef X_DEFINED
#include "inc2.h"
#else
#include "inc3.h"
#define SOME_VALUE 34
#endif

Declare Statement
Syntax:
declare database dbname;

or
declare oid structname [expected-number-of-entries];
Description
The Declare statement is currently used for two different purposes. The first is to
specify the name of the database. The DDL processor populates implementation
file names based on the dbname passed to the declare statement.
The Declare statement is also used to identify a unique object identifier and the
expected number of objects that will be stored with an oid. Expected-number-of-
entries are used for optimization of eXtremeDB’s runtime operations. It is not
required to be exact. eXtremeDB allows declaration of classes with a unique
identifier called an oid (see class statement). The runtime maintains an internal
index referencing all objects of such classes. Objects can reference each other by
oid using the ref data type. oid must be a user-defined structure, even if the oid
has a single field. Each oid value must be unique within the database.
Only one database and one oid declaration is allowed within a database schema.
Example
struct Id {
uint4 id;
uint4 time_in;
};
declare database market;
declare oid Id[20000];
Structures that represent oid have limitations: structure elements must be of a

fixed size. For instance, the following is illegal and will generate compiler error:
struct StrId {
string str;
int2 num;
};
declare oid StrId[20000];
But the next definition is perfectly okay.
struct StrId {
char<32> str;
int2 num;
};

Struct Declaration
Syntax:
[packed] [direct] struct struct_identifier { struct-declaration-list };
struct-declaration-list:
type-identifier | struct-name| enum element-name [ = value [,

element-name [= value]] …];
or
vector {type-identifier | struct-name} vector-name;
or
[optional] struct-name element-name;
Description
In short, the DDL meaning of a structure declaration is exactly the same as a C

structure. A structure declaration names a type and specifies a sequence of
elements (called “fields”) of the structure that can have different types. Structures
be used as elements of other structures.

instantiated separately; they exist only as a part of an object of some class, as
described below.
The optional direct keyword can be attributed to a structure declaration, which

tells eXtremeDB to generate methods that allow the application program to
read/write the entire structure with one function call, in addition to the methods to
read/write the structure’s individual fields. It is also possible to declare indexes
over direct structure fields.
The optional packed keyword can be attributed to a structure declaration to cause

structures to be stored in the traditional, unaligned way. By default, storage of
direct structures is now aligned. Aligned direct structures provide faster access,
but require more storage space.
Note: Because the purpose of the direct attribute is to allow the application to
read or write the structure in a single operation, the schema compiler (mcocomp)
needs to know the size of the structure at compile time. For this reason the
structure can contain no dynamic (vector or blob) fields. Likewise it is not
possible to use direct on a vector of structures.

Example
direct struct Fixed

{
uint1 v1;
uint2 v2;
uint4 v4;
uint8 v8;
};
The direct keyword causes the schema compiler to generate the following code in
<dbname>.h:
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d;
#pragma pack(1)
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d_aligned_to_1;
#pragma pack()
MCO_RET Fixed_get_all(Fixed *handle, /*OUT*/ Fixed_d * dest);

MCO_RET Fixed_put_all(Fixed *handle, const Fixed_d * src);
Note: The Fixed_d structure is the one you use in your application.
Fixed_d_aligned_to_1 is only used internally by eXtremeDB.
Enum Declaration
Syntax:
For definition of enumerated type:
enum [declarator] {enum-list} ;
For declaration of variable of type enum within a class:
declarator element-name;
Description
In short, the DDL meaning of an enum declaration is exactly the same as a C

enum: The enum keyword specifies an enumerated type.
Example
enum FLOWCONTROL {
XON, CTS
};
class example_using_enum {

FLOWCONTROL fc;
};
Class Declaration
Syntax:
[compact] [persistent | transient] class class-name {class-elements-list};
class-elements-list:
element-statement |
access-statement |
event-statement
[; element-statement | access-statement|event-statement …];
element-statement:
type-identifier | struct-name| enum element-name [ = value
[, element-name [= value]] …];
or
or
access-statement:
[voluntary] [unique][userdef] tree < class-element |
struct-name.element-name | vector-element [asc|desc]
[,class-element | struct-name.element-name |
vector-element [asc|desc]…]> indexname;
or
[userdef] hash < class-element | struct-name.element-name
| vector-element [,class-element |
struct-name.element-name | vector-element …]>
index-name[expected-number-of-entries];
or
trie < class-element[,class-element] > indexname;
or
[unique] rtree < class-element > indexname;
or
kdtree < class-element[,class-element] > indexname;

or
oid;
or
autoid[number-of-expected-entries];
or
list;
event-statement:
event < class-element update > event_name |
event < new > event_name |
event < delete > event_name [; event-statement ] ;
class-element:
element-name| structname
vector-element:
vector-name.struct-element | vector-name;
Description
A Class declaration defines a group of related class-elements (or fields) named

class-name that are stored together. A Class declaration consists of class element
statements, zero or more access statements, and zero or more event statements.
The compact class qualifier limits the total size of the class’ elements to 64K.
That includes not just application data, but also the overhead required by
eXtremeDB.
Note: The compact declaration significantly reduces the overhead that is

necessary to maintain the class data (simply put, all references within a class are
2-bytes instead of 4-bytes). It is rather easy to estimate the total number of
references within an object—each optional, vector, string, or blob field takes one
reference.
However, the total size excludes the size of blob data (if any blob data elements
are declared for the class), except 2 bytes for the blob reference.
The persistent or transient declaration allows the developer to determine whether

the data for this class will be copied to disk or not. If neither is specified the
default is transient. All classes can be made persistent by using the ‘–persistent’
option on the schema compiler (mcocomp) command line, and likewise all classes
can be made transient using the ‘-transient’ option.

Element Statements
Element statements declare field names with their types. Fields of type integer,
float, date, and enum can have default values expressed in the schema. Default
values will be assigned to such fields when a new record of the type is created in
the database and no explicit value is ‘put’ in the field. Fields that are struct fields
can be declared as optional. An optional declaration means that the field may or
may not be actually stored in the database. If the field is not stored, the runtime
does not reserve (allocate) space for it within the data layout and the associated
“get” methods will return a null pointer.
struct Id
{
uint4 seq;
};
struct Item
{
uint4 id;
string name;
};
enum FC_ {
XON,
CTS
} flow_control;
declare database simple;
declare OID Id[20000];
class Everything
{
date e_date[7];
time e_time[12];
flow_control fc = XON;
uint2 u2 = 99;
uint4 u4, h;
blob blo;
string c;
vector<uint2> vint;
vector<string> vs;
vector<Item> is;
optional Item alternate;
};
Access Statements
Access statements define the access methods that will be generated for the class.
Access methods will be generated for oid, autoid, indexes, and lists.
An index definition consists of any combination of fields, structure elements or

vector elements from a given class. There are several flavors of indexes: tree-
based index and hash-based index. Tree indexes can be used for pattern matching,
range retrieval and ordered (sorted) retrieval. Also Patricia Trie indexes can be
used for optimal access of IP addresses and similar alphanumeric strings, KD-tree
indexes for multi-dimensional key values and R-Tree indexes for spatial searches.
Hash indexes are more efficient for exact match searches than tree indexes are,
and can only be used for exact match searches. When a tree-based (also R-Tree
and Kd_Tree, but not Patricia Trie) index is declared, the additional unique
qualifier can be specified. Hash indexes can be declared nonunique to allow
duplicate values to be stored with the same hash table entry. Unique indexes must

contain a unique combination of field values for that index. The runtime will
recognize an attempt to create a duplicate and refuse to do so. Tree and Kd-Tree
indexes can optionally specify ascending or descending order for each element of
the index. The default is ascending.
Hash indexes require an extra parameter—expected-number-of-entries—

following the index name. It is an integer number that the runtime uses to
optimize data access through the index. It must be specified but is not required to
be exact.
The voluntary qualifier for an index means that the index can be initiated or
dropped at runtime. Voluntary indexes are not built until an explicit call to do so
is issued by the application. In the same fashion, the application may request to
remove a voluntary index.
The userdef qualifier for a tree index means that the application will provide the
compare functions, and thus control the collating sequence for the index.
Note on vector-based indexes: If the vector is based on a structure, the index is

built using a structure element. For example, consider this:
struct Fred {
char bank_account[100];
unit4 bank_account_amount;
}
class A
{
vector <Fred> money;
hash <money.bank_account_amount> exact_amount[1000];
tree < money.bank_account_amount, money.bank_account>
all_accounts;
}
The oid definition specifies that a class is stored with an oid of the type defined in
the declare oid statement. Only one oid statement is allowed per class.
eXtremeDB maintains a special index for oids stored in the database to facilitate
locating an object in the database by its oid value. Oids must be assigned a value
that is unique in the database (not just in the class, as is the case for hash and
unique tree indexes). The assignment is explicit—the DDL processor generates
object creation methods that enforce oid assignment by the application. The
runtime verifies that the oid is unique, and refuses to create an object if a
duplicate is found. The DDL processor also generates access methods based on
oid. The oid type is defined via the declare oid statement.
Classes that are defined without the oid qualifier do not have the requirement of
having one, but then lack oid-based access methods.

The autoid definition specifies that a class is stored with a system-generated

unique identifier. Only one autoid statement is allowed per class. autoid is similar
to oid except that the structure of autoid is pre-determined by eXtremeDB to be of
type autoid_t, and the number-of-expected-entries applied only to the class
(number-of-expected-entries for oid applies to the entire database). eXtremeDB
maintains a special index for autoids stored in the database to facilitate locating
an object in the database by its autoid value.
The list declaration generates access methods to perform a sequential scan of all
objects of a given class. The order in which such scanning is done is determined
by the runtime. Every class must have at least one oid, autoid, hash, or tree index,
or the list definition. The DDL processor will emit a warning otherwise. (Without
one of these, there would be no access method generated for the class.)
Event Statements
Event statements declare the events that the application is interested in. The
eXtremeDB database definition language provides grammar—the database
designer uses to specify that applications should receive notification of certain
events occurring in the database. These events are adding a new object, deleting
an object, and updating an object or specified base type fields of an object (nb.
events are not supported for array and vector elements). Events are specific to
classes. In other words, an add event for class Alpha doesn’t activate the
notification mechanism when an Omega object is added to the database.
The schema grammar documents what events the application will be notified of.
How the application handles the events is determined at run-time by the event
interfaces. Please see the “Event Interfaces” section for the discussion of how
events handlers are registered and invoked.

DDL Processor
The eXtremeDB Data Definition Processor, mcocomp, is executed as follows:
mcocomp [OPTIONS] ddlspec
ddlspec is the name of the text file containing the DDL specification (schema). It
can follow any naming convention of your choice.
OPTIONS DESCRIPTION
-o, -O Instructs the processor to generate the optimized version of the eXtremeDB
implementation files; otherwise the default (development) version is
generated. The optimized version generates inline functions, and replaces
some functions with macros that are put into the implementation header file
instead of the implementation “C” file.
-p, -P <path> Specifies the output directory. If the directory is not specified, the files are
written to the ddlspec file directory.
-i, -I <path> Specifies the include directory. If this path is not specified the compiler will
look only in the ddlspec file directory.
-hpp, -c++ Generates a C++ implementation file (.hpp).
-si Specifies verbose structure initialization. By default, the compiler generates
code of the form:
struct A { int i; int j; };
A a = {3,4};
Some C compilers will not accept this form of structure initialization so the
si switch will generate code of the form:
A a;
a.i = 3; a.j = 4;
-x, -X generate XML methods: classname_xml_get, classname_xml_put,
classname_xml_create, classname_xml_schema.
-s, -S suppress copyright notice and timestamp console output
-sql Generate additional metadata in the dictionary required to use the
eXtremeSQL programming interface.
-c, -compact Specifies the “compact” option for all classes in the database. 2-byte offsets
will be used for structures, variable length and optional fields in each class,
and all objects are limited in size to 64K (excluding BLOBs).
-persistent Makes all unspecified classes ‘Persistent’.
-transient Makes all unspecified classes ‘Transient’ (default).
-ws1 When one of more fields or type wchar or wstring are present in the schema,
generate 1-byte wchar strings (default).
generate 2-byte wchar strings.

OPTIONS DESCRIPTION
generate 4-byte wchar strings
-x32 Generate 32-bit pointers (default).
-x64 Generate 64-bit pointers.
-help Prints out usage information for mcocomp.
When successfully executed, the eXtremeDB Data Definition Processor generates

function prototypes and implementations to locate each database object by all
possible access methods (any hash or tree keys defined for each object, by oid, or
sequentially), and methods to create, modify, and erase objects. C (or C++)
header and implementation files are generated that will then be included in the
end-user application.
Note on wide-character strings: If one or more fields within a schema are of type
wchar or wstring, the character width for all wide-character strings should be
specified on the mcocomp command line. For example the schema
class A
{
wchar<64> name;
wstring description;
tree <name> idx_name;

}
Would be compiled with command “mcocomp –ws2 schema.mco” to generate

wide-character strings of 2-byte characters. If no wchar width is specified, 1-byte
character strings will be generated.
Debug and Optimized Runtime

The eXtremeDB Data Definition Processor has the ability to generate an
optimized version of the implementation files. The eXtremeDB debug runtime
provides extensive support for application developers that quickly catches most
programming errors related to database access. For instance, if an application
mistakenly passes a corrupted transaction or object handle into a runtime method,
eXtremeDB (by default) raises a fatal exception and stops the execution of the
program. In most cases, that makes it very easy for the application developer to
pull out and examine the call stack and find the source of the corruption. In short,
the eXtremeDB runtime internally implements many verification traps and various
consistency checks. Obviously, that does not come free—the runtime needs to
consume extra CPU cycles and some extra space for that. However, when all the

application’s problems have been found and the application can consistently pass
verification tests, it would be a waste of clock cycles to continue checking
function parameters and supporting the debug traps. At this stage, developers can
utilize the optimized version of the eXtremeDB runtime.

Application Interface
eXtremeDB provides support for accessing persistent data inside transactions via
application-specific access methods. Currently, programming interfaces are
generated for the C/C++ language.
The interface consists of two parts. The first part is the group of functions that are
“static” or in other words common for all applications, the application-
independent “static” interface; the second part is the functions that are generated
by the schema compiler to provide type-safe data access methods for a particular
schema, the application-specific “generated” interface. The eXtremeDB runtime
ie. all of the referenced “static” and “generated” functions) is linked together with
the application code.
Also note that an application can simultaneously use multiple databases, each
with a different schema.
The application-independent interface functions are described in detail in the next

section. These functions allow database control, such as opening/closing database
sessions, transaction management and cursor navigation. These parts of the
interface do not change from application to application and require some simple
rules that application developers follow when using persistent storage supported
by eXtremeDB. In short, these rules are:
1) Applications should declare and “connect” to the eXtremeDB database

before using it.
2) Applications should always access a database from within a transaction.
3) Cursor-based navigation is independent of the way the cursor was obtained.
We will refer to some common structure definitions throughout this document

that are used to define eXtremeDB specific data types passed in and out of
interface methods. We will need to abstract the notion of cursor, transaction,
database and database dictionary. These data types are described in the next
chapter and in Appendix A.

DDL Example
The following sample ddl code, “schema.mco”, illustrates the concepts described
in this chapter.

struct SampleStruct {
uint2 s1;
char<20> s2;
};
struct BigStruct {
string str;
uint2 u2;
uint4 u4;
vector <SampleStruct> vss;
};
/* estimated number of class instances is in square brackets */
declare OID SampleStruct[20000];
/*
* “compact” keyword: Total object size, including overhead is less than 64K.
* Size calculation does NOT count size of blob(s) fields
* embedded in the class
*/
compact class SampleClass {
/* basic data types */

uint1 a = 0;
uint2 b;
uint4 c;
/* oid reference */
ref d;
/* vectors - could be made of any type */

vector <uint2> numbers;
vector <SampleStruct> ss;
/* strings are limited to 64K */

string str;
/* blobs are unlimited */

blob blo;
/* optional structure, the value could be missing */

optional BigStruct big_struct;
/* voluntary means could be initiated and dropped at runtime

* unique means unique
* tree means tree-based index (sort order is
* supported)
* hash means hash-based index
* list means the objects could be sequentially
* scanned */
voluntary unique tree< a,b,ss.s2> SAM;

hash <a> AAAA[10000];
hash <ss.s2> SSS2[10000];
hash <numbers> NNNN[10000];
event <new> new;

event <delete> delete;

event <a update> update;
autoid;
oid;
list;
};
Please see the database definition file schema.mco in the samples/core/01-ddl

directory for further examples of ddl usage

Integration with Host Language
eXtremeDB is a database management system that provides a tightly integrated
language interface to the traditional DBMS features of persistent storage,
concurrency control and recovery, and application-defined queries. We selected C
as the first host language through which eXtremeDB is accessed simply because it
is the most popular among the developers of embedded software. Use of
eXtremeDB with other programming languages such as C++ and Java is discussed
later. There are several motivations for making eXtremeDB closely integrated
with the programming language. These include:
• Ease of learning: eXtremeDB is designed so that C, C++ or Java developers

only need to learn a little bit more in order to start effectively using
eXtremeDB. For instance, there is no need to learn a new type system. With
few exceptions, eXtremeDB uses application-defined methods to access the
persistent data—there is almost no new API to absorb. eXtremeDB is easy for
developers to get started with, and is accessible to a wide range of developers
who need to use persistent data in embedded applications.
• No translation code: eXtremeDB does not require any translation between
the representation of persistent data and the representation during execution.
For example, to store a data element into a relational database, the C
representation of it is usually mapped to a relational representation. Some
code has to be written to pick fields out of table rows and copy them into C
data structures. That causes extra CPU cycles, and extra code. With
eXtremeDB no such translation is necessary. To the application, the persistent
data looks like ordinary transient data, accessible through custom interface
functions.
• Debugging support: eXtremeDB provides excellent debugging capabilities
for its users. There are a number of traps throughout the eXtremeDB code that
facilitate detection of application errors. For example, if an incorrect pointer is
passed into eXtremeDB, a fatal exception will be raised before the pointer is
used, which makes it possible to examine the call stack and find the source of
the corruption. (For a detailed explanation of fatal errors and error handling,
see section “Return Codes and Error Handling” in Chapter 6 of this User’s
Guide.). The compile-time type checking of C applies to the methods that
eXtremeDB uses to access stored data. Methods that are generated to provide
access to a certain object type always take a reference to that data type as a
parameter. If a mistake is made, a compiler warning will be generated.

Data Definition Language

An eXtremeDB database design is specified in the Data Definition Language
(DDL). One DDL specification must exist for each eXtremeDB database. The
DDL specification identifies the database, defines each data class, its elements, its
relationship to other data classes, and data access methods. The DDL syntax,
keywords and preprocessor are designed to be similar to elements of the C/C++
language and C++ STL library to simplify learning and understanding.

Base data types
Signed<n> Signed n-byte integer, n = 1, 2, 4, or 8. signed<2> some_short;
unsigned<n> unsigned n-byte integer, n = 1, 2, 4, or 8. unsigned<4> hall;
Float 4-byte real number. float rate;
Double 8-byte real number. double rate;
char<n> Fixed length byte array less than 64K in char<8> name;
to hold it.
nchar<n> Fixed length byte array less than 64K in In your schema:
This is suitable for many asian languages.
nchar_t uname[21];
wchar<n> Fixed length byte array of less than 64K in In your schema:
locale setting.
wchar_t uname[21];
String Variable-length byte array of less than 64K string description;

char<n> type.
Nstring Variable length byte array less than 64K in In your schema:
nchar_t *uname;
wstring Variable length byte array less than 64K in In your schema:
wchar_t *uname;

Enum User-defined type consisting of a set of enum FLOWCONTROL {

named constants called enumerators. The XON, CTS
name of each enumerator is treated as a };
constant and must be unique within the
class using_enum {
scope of the DDL schema where the enum
FLOWCONTROL fc;
};
Blob Binary data object; a byte array of any size, blob jpeg;
Vector Variable length array of any data type, such struct A

terminology).
Ref Explicitly declared reference to an object by ref of_an_object;

below.)
Autoid_t Explicitly declared reference to an object by vector <autoid_t> refd_objs;

object’s autoid.
Date A 4-byte unsigned integer by default, as date start_dt;

defined by mco_date in mco.h.
Time A 4-byte unsigned integer by default, as time start_tm;

boolean A fixed size array of bits. Boolean bits[8];

double coordinates. rectangle<float>
time start_tm[3];

Preprocessor
The eXtremeDB™ DDL compiler allows limited use of the C preprocessor.
Preprocessor directives are typically used to make source programs easy to
change. Directives in the source file tell the compiler to perform specific actions,
such as replacing tokens in the text. The eXtremeDB DDL compiler recognizes
the following directives:
#define #ifdef #else #endif #include #undef
The number sign (#) must be the first nonwhite-space character in the line
containing the directive; white-space characters can appear between the number
sign and the first letter of the directive. Some directives include arguments or
values. Preprocessor directives can appear anywhere in a source file, but they
apply only to the remainder of the source file.
Usage examples:
#include "inc1.h"

#ifdef X_DEFINED
#include "inc2.h"
#else
#include "inc3.h"
#define SOME_VALUE 34
#endif

Declare Statement
Syntax:
declare database dbname;

or
declare oid structname [expected-number-of-entries];
Description
The Declare statement is currently used for two different purposes. The first is to
specify the name of the database. The DDL processor populates implementation
file names based on the dbname passed to the declare statement.
The Declare statement is also used to identify a unique object identifier and the
expected number of objects that will be stored with an oid. Expected-number-of-
entries are used for optimization of eXtremeDB’s runtime operations. It is not
required to be exact. eXtremeDB allows declaration of classes with a unique
identifier called an oid (see class statement). The runtime maintains an internal
index referencing all objects of such classes. Objects can reference each other by
oid using the ref data type. oid must be a user-defined structure, even if the oid
has a single field. Each oid value must be unique within the database.
Only one database and one oid declaration is allowed within a database schema.
Example
struct Id {
uint4 id;
uint4 time_in;
};
Structures that represent oid have limitations: structure elements must be of a

fixed size. For instance, the following is illegal and will generate compiler error:
struct StrId {
string str;
int2 num;
};
But the next definition is perfectly okay.
struct StrId {
char<32> str;
int2 num;
};

Struct Declaration
Syntax:
[packed] [direct] struct struct_identifier { struct-declaration-list };
struct-declaration-list:
type-identifier | struct-name| enum element-name [ = value [,

element-name [= value]] …];
or
or
Description
In short, the DDL meaning of a structure declaration is exactly the same as a C

structure. A structure declaration names a type and specifies a sequence of
elements (called “fields”) of the structure that can have different types. Structures
be used as elements of other structures.

instantiated separately; they exist only as a part of an object of some class, as
described below.
The optional direct keyword can be attributed to a structure declaration, which

tells eXtremeDB to generate methods that allow the application program to
read/write the entire structure with one function call, in addition to the methods to
read/write the structure’s individual fields. It is also possible to declare indexes
over direct structure fields.
The optional packed keyword can be attributed to a structure declaration to cause

structures to be stored in the traditional, unaligned way. By default, storage of
direct structures is now aligned. Aligned direct structures provide faster access,
but require more storage space.
Note: Because the purpose of the direct attribute is to allow the application to
read or write the structure in a single operation, the schema compiler (mcocomp)
needs to know the size of the structure at compile time. For this reason the
structure can contain no dynamic (vector or blob) fields. Likewise it is not
possible to use direct on a vector of structures.

Example
direct struct Fixed

{
uint1 v1;
uint2 v2;
uint4 v4;
uint8 v8;
};
The direct keyword causes the schema compiler to generate the following code in
<dbname>.h:
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d;
#pragma pack(1)
typedef struct
{
uint8 v8;
uint4 v4;
uint2 v2;
uint1 v1;
}Fixed_d_aligned_to_1;
#pragma pack()
MCO_RET Fixed_get_all(Fixed *handle, /*OUT*/ Fixed_d * dest);

MCO_RET Fixed_put_all(Fixed *handle, const Fixed_d * src);
Note: The Fixed_d structure is the one you use in your application.
Fixed_d_aligned_to_1 is only used internally by eXtremeDB.
Enum Declaration
Syntax:
For definition of enumerated type:
enum [declarator] {enum-list} ;
For declaration of variable of type enum within a class:
declarator element-name;
Description
In short, the DDL meaning of an enum declaration is exactly the same as a C

enum: The enum keyword specifies an enumerated type.
Example
enum FLOWCONTROL {
XON, CTS
};
class example_using_enum {

FLOWCONTROL fc;
};
Class Declaration
Syntax:
[compact] [persistent | transient] class class-name {class-elements-list};
class-elements-list:
element-statement |
access-statement |
event-statement
[; element-statement | access-statement|event-statement …];
element-statement:
type-identifier | struct-name| enum element-name [ = value
[, element-name [= value]] …];
or
or
access-statement:
[voluntary] [unique][userdef] tree < class-element |
struct-name.element-name | vector-element [asc|desc]
[,class-element | struct-name.element-name |
vector-element [asc|desc]…]> indexname;
or
[userdef] hash < class-element | struct-name.element-name
| vector-element [,class-element |
struct-name.element-name | vector-element …]>
index-name[expected-number-of-entries];
or
trie < class-element[,class-element] > indexname;
or
[unique] rtree < class-element > indexname;
or
kdtree < class-element[,class-element] > indexname;

or
oid;
or
autoid[number-of-expected-entries];
or
list;
event-statement:
event < class-element update > event_name |
event < new > event_name |
event < delete > event_name [; event-statement ] ;
class-element:
element-name| structname
vector-element:
vector-name.struct-element | vector-name;
Description
A Class declaration defines a group of related class-elements (or fields) named

class-name that are stored together. A Class declaration consists of class element
statements, zero or more access statements, and zero or more event statements.
The compact class qualifier limits the total size of the class’ elements to 64K.
That includes not just application data, but also the overhead required by
eXtremeDB.
Note: The compact declaration significantly reduces the overhead that is

necessary to maintain the class data (simply put, all references within a class are
2-bytes instead of 4-bytes). It is rather easy to estimate the total number of
references within an object—each optional, vector, string, or blob field takes one
reference.
However, the total size excludes the size of blob data (if any blob data elements
are declared for the class), except 2 bytes for the blob reference.
The persistent or transient declaration allows the developer to determine whether

the data for this class will be copied to disk or not. If neither is specified the
default is transient. All classes can be made persistent by using the ‘–persistent’
option on the schema compiler (mcocomp) command line, and likewise all classes
can be made transient using the ‘-transient’ option.

Element Statements
Element statements declare field names with their types. Fields of type integer,
float, date, and enum can have default values expressed in the schema. Default
values will be assigned to such fields when a new record of the type is created in
the database and no explicit value is ‘put’ in the field. Fields that are struct fields
can be declared as optional. An optional declaration means that the field may or
may not be actually stored in the database. If the field is not stored, the runtime
does not reserve (allocate) space for it within the data layout and the associated
“get” methods will return a null pointer.
struct Id
{
uint4 seq;
};
struct Item
{
uint4 id;
string name;
};
enum FC_ {
XON,
CTS
} flow_control;
declare OID Id[20000];
class Everything
{
date e_date[7];
time e_time[12];
flow_control fc = XON;
uint2 u2 = 99;
uint4 u4, h;
blob blo;
string c;
vector<uint2> vint;
vector<string> vs;
vector<Item> is;
optional Item alternate;
};
Access Statements
Access statements define the access methods that will be generated for the class.
Access methods will be generated for oid, autoid, indexes, and lists.
An index definition consists of any combination of fields, structure elements or

vector elements from a given class. There are several flavors of indexes: tree-
based index and hash-based index. Tree indexes can be used for pattern matching,
range retrieval and ordered (sorted) retrieval. Also Patricia Trie indexes can be
used for optimal access of IP addresses and similar alphanumeric strings, KD-tree
indexes for multi-dimensional key values and R-Tree indexes for spatial searches.
Hash indexes are more efficient for exact match searches than tree indexes are,
and can only be used for exact match searches. When a tree-based (also R-Tree
and Kd_Tree, but not Patricia Trie) index is declared, the additional unique
qualifier can be specified. Hash indexes can be declared nonunique to allow
duplicate values to be stored with the same hash table entry. Unique indexes must

contain a unique combination of field values for that index. The runtime will
recognize an attempt to create a duplicate and refuse to do so. Tree and Kd-Tree
indexes can optionally specify ascending or descending order for each element of
the index. The default is ascending.
Hash indexes require an extra parameter—expected-number-of-entries—

following the index name. It is an integer number that the runtime uses to
optimize data access through the index. It must be specified but is not required to
be exact.
The voluntary qualifier for an index means that the index can be initiated or
dropped at runtime. Voluntary indexes are not built until an explicit call to do so
is issued by the application. In the same fashion, the application may request to
remove a voluntary index.
The userdef qualifier for a tree index means that the application will provide the
compare functions, and thus control the collating sequence for the index.
Note on vector-based indexes: If the vector is based on a structure, the index is

built using a structure element. For example, consider this:
struct Fred {
char bank_account[100];
unit4 bank_account_amount;
}
class A
{
vector <Fred> money;
hash <money.bank_account_amount> exact_amount[1000];
tree < money.bank_account_amount, money.bank_account>
all_accounts;
}
The oid definition specifies that a class is stored with an oid of the type defined in
the declare oid statement. Only one oid statement is allowed per class.
eXtremeDB maintains a special index for oids stored in the database to facilitate
locating an object in the database by its oid value. Oids must be assigned a value
that is unique in the database (not just in the class, as is the case for hash and
unique tree indexes). The assignment is explicit—the DDL processor generates
object creation methods that enforce oid assignment by the application. The
runtime verifies that the oid is unique, and refuses to create an object if a
duplicate is found. The DDL processor also generates access methods based on
oid. The oid type is defined via the declare oid statement.
Classes that are defined without the oid qualifier do not have the requirement of
having one, but then lack oid-based access methods.

The autoid definition specifies that a class is stored with a system-generated

unique identifier. Only one autoid statement is allowed per class. autoid is similar
to oid except that the structure of autoid is pre-determined by eXtremeDB to be of
type autoid_t, and the number-of-expected-entries applied only to the class
(number-of-expected-entries for oid applies to the entire database). eXtremeDB
maintains a special index for autoids stored in the database to facilitate locating
an object in the database by its autoid value.
The list declaration generates access methods to perform a sequential scan of all
objects of a given class. The order in which such scanning is done is determined
by the runtime. Every class must have at least one oid, autoid, hash, or tree index,
or the list definition. The DDL processor will emit a warning otherwise. (Without
one of these, there would be no access method generated for the class.)
Event Statements
Event statements declare the events that the application is interested in. The
eXtremeDB database definition language provides grammar—the database
designer uses to specify that applications should receive notification of certain
events occurring in the database. These events are adding a new object, deleting
an object, and updating an object or specified base type fields of an object (nb.
events are not supported for array and vector elements). Events are specific to
classes. In other words, an add event for class Alpha doesn’t activate the
notification mechanism when an Omega object is added to the database.
The schema grammar documents what events the application will be notified of.
interfaces. Please see the “Event Interfaces” section for the discussion of how
events handlers are registered and invoked.

DDL Processor
The eXtremeDB Data Definition Processor, mcocomp, is executed as follows:
mcocomp [OPTIONS] ddlspec
ddlspec is the name of the text file containing the DDL specification (schema). It
can follow any naming convention of your choice.
OPTIONS DESCRIPTION
-o, -O Instructs the processor to generate the optimized version of the eXtremeDB
implementation files; otherwise the default (development) version is
generated. The optimized version generates inline functions, and replaces
some functions with macros that are put into the implementation header file
instead of the implementation “C” file.
-p, -P <path> Specifies the output directory. If the directory is not specified, the files are
written to the ddlspec file directory.
-i, -I <path> Specifies the include directory. If this path is not specified the compiler will
look only in the ddlspec file directory.
-hpp, -c++ Generates a C++ implementation file (.hpp).
-si Specifies verbose structure initialization. By default, the compiler generates
code of the form:
A a = {3,4};
Some C compilers will not accept this form of structure initialization so the
si switch will generate code of the form:
A a;
a.i = 3; a.j = 4;
-x, -X generate XML methods: classname_xml_get, classname_xml_put,
classname_xml_create, classname_xml_schema.
-s, -S suppress copyright notice and timestamp console output
-sql Generate additional metadata in the dictionary required to use the
eXtremeSQL programming interface.
-c, -compact Specifies the “compact” option for all classes in the database. 2-byte offsets
will be used for structures, variable length and optional fields in each class,
and all objects are limited in size to 64K (excluding BLOBs).
-persistent Makes all unspecified classes ‘Persistent’.
-transient Makes all unspecified classes ‘Transient’ (default).
-help Prints out usage information for mcocomp.

When successfully executed, the eXtremeDB Data Definition Processor generates

function prototypes and implementations to locate each database object by all
possible access methods (any hash or tree keys defined for each object, by oid, or
sequentially), and methods to create, modify, and erase objects. The current
version will generate C or C++ implementation files that you will include into
your application.
Debug and Optimized Runtime

The eXtremeDB Data Definition Processor has the ability to generate an
optimized version of the implementation files. The eXtremeDB debug runtime
provides extensive support for application developers that quickly catches most
programming errors related to database access. For instance, if an application
mistakenly passes a corrupted transaction or object handle into a runtime method,
eXtremeDB (by default) raises a fatal exception and stops the execution of the
program. In most cases, that makes it very easy for the application developer to
pull out and examine the call stack and find the source of the corruption. In short,
the eXtremeDB runtime internally implements many verification traps and various
consistency checks. Obviously, that does not come free—the runtime needs to
consume extra CPU cycles and some extra space for that. However, when all the
application’s problems have been found and the application can consistently pass
verification tests, it would be a waste of clock cycles to continue checking
function parameters and supporting the debug traps. At this stage, developers can
utilize the optimized version of the eXtremeDB runtime.

Application Interface
eXtremeDB provides support for accessing persistent data inside transactions via
application-specific access methods. Currently, programming interfaces are
generated for the C/C++ language.
The interface consists of two parts. The first part is the group of functions that are
“static” or in other words common for all applications, the application-
independent “static” interface; the second part is the functions that are generated
by the schema compiler to provide type-safe data access methods for a particular
schema, the application-specific “generated” interface. The eXtremeDB runtime
ie. all of the referenced “static” and “generated” functions) is linked together with
the application code.
Also note that an application can simultaneously use multiple databases, each
with a different schema.
The application-independent interface functions are described in detail in the next

section. These functions allow database control, such as opening/closing database
sessions, transaction management and cursor navigation. These parts of the
interface do not change from application to application and require some simple
rules that application developers follow when using persistent storage supported
by eXtremeDB. In short, these rules are:
4) Applications should declare and “connect” to the eXtremeDB database

before using it.
5) Applications should always access a database from within a transaction.
6) Cursor-based navigation is independent of the way the cursor was obtained.
We will refer to some common structure definitions throughout this document

that are used to define eXtremeDB specific data types passed in and out of
interface methods. We will need to abstract the notion of cursor, transaction,
database and database dictionary. These data types are described in the next
chapter and in Appendix A.

DDL Example
The following sample ddl code, “schema.mco”, illustrates the concepts described
in this chapter.

struct SampleStruct {
uint2 s1;
char<20> s2;
};
struct BigStruct {
string str;
uint2 u2;
uint4 u4;
vector <SampleStruct> vss;
};
declare OID SampleStruct[20000];
/*
* “compact” keyword: Total object size, including overhead is less than 64K.
* Size calculation does NOT count size of blob(s) fields
* embedded in the class
*/
compact class SampleClass {
/* basic data types */

uint1 a = 0;
uint2 b;
uint4 c;
/* oid reference */
ref d;
/* vectors - could be made of any type */

vector <uint2> numbers;
vector <SampleStruct> ss;
/* strings are limited to 64K */

string str;
/* blobs are unlimited */

blob blo;
/* optional structure, the value could be missing */

optional BigStruct big_struct;
/* voluntary means could be initiated and dropped at runtime

* unique means unique
* tree means tree-based index (sort order is
* supported)
* hash means hash-based index
* list means the objects could be sequentially
* scanned */
voluntary unique tree< a,b,ss.s2> SAM;

hash <a> AAAA[10000];
hash <ss.s2> SSS2[10000];
hash <numbers> NNNN[10000];
event <new> new;

event <delete> delete;

event <a update> update;
autoid;
oid;
list;
};
Please see the database definition file schema.mco in the samples/core/01-ddl

directory for further examples of ddl usage

Chapter 4: Application-independent
Interface
This chapter defines the eXtremeDB API functions that are invariant regardless of
the application. This set of functions is described separately from the “Generated
API” functions covered in the next chapter which have unique type-safe interfaces
generated from the specific class definitions of a given application’s database.
When using eXtremeDB’s disk-based persistence, ie. when the database contains
some or all persistent class definitions, the application will link with the “disk”
library and thus the API functions called will internally call the appropriate
caching interface. Since “hybrid” databases or disk-only databases require more
detailed considerations such as disk caching behavior, file system dependencies,
data encryption and data integrity checking, the disk-based API functions are
described at the end of some sections under the tag “
Note: The db_params elements mem_page_size and disk_page_size are set in

the above example to 128 and 0. The value of zero for disk_page_size
indicates that this is an all-in-memory or transient database. For transient
databases the mem_page_size can be any value. However, for disk-based or
hybrid databases the value of mem_page_size and disk_page_size should be a
power of 2 (see example in “Persistent Databases” below).
Persistent Databases”—applications that use all-in-memory databases need not be

concerned with these subsections.
Runtime Environment
The eXtremeDB run-time environment is initialized by calling the function
mco_runtime_start(). This function initializes one or more semaphores that
coordinate access to the database dictionary between multiple processes, or
between multiple threads of a single process. Each process must call
mco_runtime_start() once, and only once.
MCO_RET mco_runtime_start(void);
As the application is terminating, after disconnecting from and closing databases,

it calls mco_runtime_stop() to clean up semaphores and otherwise provide for
an orderly shut down of the eXtremeDB run-time.
MCO_RET mco_runtime_stop(void);

Chapter 4: Application-independent Interface
To obtain information about the eXtremeDB run-time environment, call the

mco_runtime_info() after the function mco_runtime_start() and before
mco_runtime_stop().
MCO_RET mco_runtime_info(&info);
The next function is provided to set various per-process global database runtime
options. This function must be called before the database runtime is initialized
via mco_runtime_start().
MCO_RET mco_db_setoption(int option, int value);
See the Reference Guide for a detailed description of runtime options.
A custom error handler appropriate to the application should be implemented to

trap a possible “fatal” error. In the case of an unrecoverable runtime exception
this custom error handler will be called as the runtime terminates. The API
mco_error_set_handler() is called before any other runtime calls to enable this
redirection. If no error handler is set, the default behavior is to simply enter an
infinite loop. If your application appears to hang, and you did not set a custom
error handler, this is probably why. For a more detailed explanation of fatal errors
and error handling, see section “Return Codes and Error Handling” in Chapter 6
of this User’s Guide.
void mco_error_set_handler(mco_error_handler_f f);
Device Management
eXtremeDB supports the notion of logical database devices. Logical database
devices are abstractions of physical storage locations that can be conventional
memory (static or heap allocated memory in the application address space),
shared memory (“named” memory shared by multiple processes), or persistent
file system memory such as a simple file, a multi-file or RAID file, or even a raw
disk partition.
The application determines the size and location of the memory device, the type
of memory and how the eXtremeDB runtime will use the device. Applications
specify storage devices at runtime via the devs structure argument passed to the
mco_db_open_dev() API. Typically, an array of device structures is stack-
allocated and initialized prior to calling mco_db_open_dev(). Each memory
device is defined by a mco_device_t structure that specifies:
• the type of memory (conventional, shared, file, multi-file or RAID),
• what purpose the memory region will serve (database, cache, disk file or
log file),

• its maximum size, and,
• if conventional or shared memory, a pointer to the pre-allocated memory

block; if persistent memory a filename
Persistent Databases
In addition to standard file devices, eXtremeDB also supports multi-file devices

for persistent databases. These are described in detail in section “Multi-file
databases” of chapter 6.
Database Control
This group of functions deals with the database control. The normal flow of
control is that the database (identified by its name) is opened or “created” (its
runtime “meta-data” is created from its dictionary) and is then connected to by an
application. Databases are “extendable”, ie. it is possible to increase the memory
size that is used for storage at runtime. Databases can also be streamed to storage
devices (“saved”) and initialized (“restored”) from data streams.
Creating Databases
All databases, whether all-in-memory, persistent or a hybrid database with both
transient and persistent classes, occupy main memory in the application’s static or
heap memory space. Additional main memory may be allocated for devices that
contain the database data for all-in-memory databases or for the cache used by
persistent and hybrid databases. Memory device specifications (as described
above in section “Device Management”) determine the type of memory
(conventional, shared or disk-based) used for the database data and for the log file
if Transaction Logging is used. And memory can be extended later if necessary.
A database is created by the mco_db_open_dev() API:
MCO_RET mco_db_open_dev( const char* db_name,

mco_dictionary_h dict,
mco_device_t * devs,
mco_size_t n_devs,
mco_db_params_t * params );
When the database is created memory devices are initialized and meta-data, in the
form of the database dictionary that informs the eXtremeDB runtime about the
database structure, is loaded into memory. The runtime is initialized with several
user-specified parameters that determine runtime behavior while the application is
running or for the life of the database in the case of persistent databases. While
the mco_db_params_t structure and complete implementation details are

explained in the “Reference Guide”, the usage of key elements of this structure
are described below.
The mem_page_size and disk_page_size elements define the size of memory

pages for conventional memory and disk devices. The mem_page_size can be
calculated to optimize performance and will vary depending on the application
dynamics, but as a rule of thumb, page size should be between 60 and 512 bytes; a
100 byte page size would be fairly good in most situations. For all-in-memory
databases disk_page_size will be set to zero, for persistent and hybrid databases
both parameters should be set to a power of 2 with disk_page_size being at least 8
times the mem_page_size value to allow for efficient caching.
The db_max_connections element specifies the maximum number of connections

that will be managed for this database, and disk_max_database_size the amount
of disk space required if this is a persistent database. The file_extension_quantum
element is used to determine the amount of disk space to allocate when extending
a disk file. By making this a multiple of the disk_page_size file fragmentation
can be reduced.
The db_log_type element specifies the logging strategy the runtime will use:
NO_LOG, UNDO_LOG or REDO_LOG. Transaction Logging is an alternative
to persistent disk storage that can provide database recovery (in case of system
failure) and data porting functionality for all-in-memory as well as persistent
databases. The db_log_type value is set to NO_LOG by default. If Transaction
Logging is used, the db_log_type value will be UNDO_LOG or REDO_LOG and
how transactions will be committed to disk is specified in the log_params
parameter. The choice of logging strategy can have a significant impact on
performance (see section “Choosing the Transaction Logging Strategy” in chapter
6 for a more detailed discussion).
The mode_mask element is a bitmask used to specify any combination of the

following runtime options:
MCO_DB_MODE_MVCC_AUTO_VACUUM Causes automatic clean up of stale

versions (MVCC) when the database
is opened. (See description of
mco_disk_database_vacuum() in the
“Reference Guide”)
MCO_DB_MODE_SMART_INDEX_INSERT Causes the runtime to pre-sort
objects by key prior to including
them in indexes.
MCO_DB_OPEN_EXISTING Instructs the runtime not to initialize
an all-in-memory database. (See the
NVRAM database support section in
chapter 6.)

MCO_DB_USE_CRC_CHECK Instructs the runtime to calculate

and store the CRC for each database
page. (See the Database Security
section in chapter 6.)
Note: All databases that define persistent classes must use mco_db_open_dev()
and it is strongly recommended also for all-in-memory databases, even though for
compatibility purposes the old-style mco_db_open() API can still be used to
create all-in-memory databases.
The following code snippet demonstrates the usage of the device management and
the mco_db_open_dev() API for an application managing a single conventional
memory device:
#define DATABASE_SEGMENT_SIZE 16 * 1024 * 1024

#define MEMORY_PAGE_SIZE 128
const char * dbName = "convdb";
int main(int argc, char* argv[])

{
MCO_RET rc;
mco_device_t dev;
mco_db_params_t db_params;
/* start eXtremeDB runtime */

mco_runtime_start();
/* setup memory device as a plain conventional memory region */

dev.type = MCO_MEMORY_CONV; /* conv. mem. device */
dev.assignment = MCO_MEMORY_ASSIGN_DATABASE; /* main database mem.*/
dev.size = DATABASE_SEGMENT_SIZE;
dev.dev.conv.ptr = (void*)malloc( DATABASE_SEGMENT_SIZE );
if( !dev.dev.conv.ptr ) exit;
/* initialize and customize the database parameters */

mco_db_params_init ( &db_params );
db_params.mem_page_size = MEMORY_PAGE_SIZE; /* in mem. part */
/* set to zero page size for disk part to disable disk operations */
db_params.disk_page_size = 0;
db_params.db_max_connections = 1; /* total connections to db */
/* open a database on the device with given params */

rc = mco_db_open_dev(dbName, convdb_get_dictionary(), &dev, 1,
&db_params );
if ( rc == MCO_S_OK ) {
/* the database was opened successfully */
/* close the database */

mco_db_close(dbName);
} else {
/* unable to open the database */
/* check the return code for addtional informaton */
}
/* stop eXtremeDB runtime */

mco_runtime_stop();
free( dev.dev.conv.ptr );
return 0;

Note: The db_params elements mem_page_size and disk_page_size are set in

the above example to 128 and 0. The value of zero for disk_page_size
indicates that this is an all-in-memory or transient database. For transient
databases the mem_page_size can be any value. However, for disk-based or
hybrid databases the value of mem_page_size and disk_page_size should be a
power of 2 (see example in “Persistent Databases” below).
eXtremeDB uses two separate security mechanisms to provide data security for
persistent databases: page-level CRC32 checking and database-level encryption
through a private key. These mechanisms can be used separately or in
combination. Both features are enabled at the time the database is created and
apply only to persistent databases. Once the CRC or encryption is enabled, and
the persistent database is allocated, it is not possible to disable security either in
the current, or future sessions. Both are there for the lifetime of the database.
Also note that both mechanisms are page-level, hence all data and indexes are
protected. (For a more detailed description of CRC32 and Encryption see section
“Database Security” in chapter 6 below.)
The following code snippet demonstrates the usage of the device management and
the mco_db_open_dev() API for an application managing four storage devices: a
conventional memory device that is used for the RAM portion of the database,
another conventional memory device for the disk manager cache, and two
persistent devices, one for the database file and another for the transaction log.
The sample further uses encryption and CRC protection and defines the
transaction logging type as UNDO_LOG. (See section “eXtremeDB supports
three types of multi-file devices for persistent databases:
1. multifile: which can be considered a virtual file consisting of multiple

segments. When the first segment is full, we start filling of the second
one and so on... For all segments except the last one size should be
specified. The last segment can grow infinitely.
2. RAID-0 (striping): blocks are scattered between RAID segments. For
example in case of RAID-0 with two segments, the first block is
written to the first segment, second - to the second, third - to the first
and so on. RAID-0 can help to improve performance: for example
each RAID device could be a separate disk that may be event
controlled by a separate controller. But RAID-0 does not provide
extra redundancy.
3. RAID-1 (mirroring) ): data is copied to all RAID segments. So if the
RAID consists of three segments, then the same data will be stored in
all three segments (redundancy level 3). RAID-1 provides better
reliability (in the case of a disk crash it is possible to use data on

another of the disks). Also it improves read performance since read

requests can be balanced between different disks.
In all three cases there are two ways of defining the file segments:
1. By describing corresponding devices in the call to mco_db_open_dev().

2. By creating a Database Description File (*.mfd) that describes all of the
segments. See the Reference Guide for a more detailed explanation.
For multifile implementations, segments can be added at runtime by means of the

mco_db_extend_dev() function. Multi-file segments cannot be removed.
Disk IO” in Chapter 6 for a detailed discussion of transaction logging and caching
policies, and the samples/core/02-open directory for other code samples.)
const char * db_name = "crypt_crcDb";

#define N_DEVICES 4
#define DATABASE_SIZE (600 * 1024 * 10)
#define CACHE_SIZE (300 * 1024 * 10 )
#define PSTORAGE_PAGE_SIZE 4096

{
MCO_RET rc;
mco_db_h db = 0;
mco_device_t dev[N_DEVICES]; /* Memory devices for: 0)
database, 1) cache, 2) main database storage, 3) transaction log
*/
/* configure first memory device as a plain conventional memory

region */
dev[0].type = MCO_MEMORY_CONV;
dev[0].assignment = MCO_MEMORY_ASSIGN_DATABASE;
dev[0].size = DATABASE_SIZE;
dev[0].dev.conv.ptr = (void*)malloc( DATABASE_SIZE );
/* configure convention memory region for cache */

dev[1].assignment = MCO_MEMORY_ASSIGN_CACHE;
dev[1].size = CACHE_SIZE;
dev[1].dev.conv.ptr = (void*)malloc( CACHE_SIZE );
/* configure FILE memory device for main database storage */

dev[2].type = MCO_MEMORY_FILE;
dev[2].assignment = MCO_MEMORY_ASSIGN_PERSISTENT;
sprintf(dev[2].dev.file.name, "%s.dbs", db_name);
dev[2].dev.file.flags = MCO_FILE_OPEN_DEFAULT;
/* configure FILE memory device for transaction log */

dev[3].assignment = MCO_MEMORY_ASSIGN_LOG;
sprintf(dev[3].dev.file.name, "%s.log", db_name);


db_params.mem_page_size = MEMORY_PAGE_SIZE;
db_params.disk_page_size = PSTORAGE_PAGE_SIZE;
db_params.db_max_connections = 1;
db_params.db_log_type = UNDO_LOG;
db_params.mode_mask |= MCO_DB_USE_CRC_CHECK;
db_params.cipher_key = "my_secure_key";
rc = mco_db_open_dev(db_name, crypt_crcDb_get_dictionary(), dev,

N_DEVICES, &db_params );
if ( MCO_S_OK == rc ) {
mco_db_close(db_name);
}
mco_runtime_stop();
/* free allocated memory */

free( dev[0].dev.conv.ptr );
return ( MCO_S_OK == rc ? 0 : 1 );
}
Database Connections
An application connects to the database (creates a database handle) by passing a
database name to the mco_db_connect() function. The database must have been
previously created with mco_db_open_dev(). By default, up to 64 simultaneous
connections are allowed to each single database (source code licensees can make
this number smaller or larger). If successful, this function returns a handle to the
database, which is used in subsequent database runtime calls such as transaction
control functions.
MCO_RET mco_db_connect(/*IN*/ const char * dbname,

/*OUT*/ mco_db_h *handle);
Note that applications must create a separate connection for each thread/task that
accesses the database and database connection handles can not be shared between
different tasks.
An application disconnects from the database (destroys a database handle) by

calling mco_db_disconnect(). This function takes a database handle as a
parameter and all transaction handles created for this database instance become
invalid.
MCO_RET mco_db_disconnect( /*IN*/ mco_db_h db);
Applications should disconnect from the database once the database connection is
no longer needed. Disconnecting the database allows the database runtime to “de-

allocate” internal memory used to maintain the connection. The database can’t be
destroyed (closed) until all active connections are closed.
Closing Databases
An opened database is closed (destroyed if all-in-memory database) with the
mco_db_close() function. It takes the database name as a parameter. This
function closes (destroys) the database created by the previous mco_db_open()
call. All connections must have been previously closed in order for the function
to succeed. Once closed, all the database transient data is lost.
MCO_RET mco_db_close(/*IN*/ const char * dbname);

Example Application
To put together what we’ve seen so far, the sequence of steps to start eXtremeDB,
open an all-in-memory database, connect to it, disconnect, close and clean up is
illustrated in the following code sample:
#define DATABASE_SEGMENT_SIZE 16 * 1024 * 1024

const char * dbName = "shareddb";
void StartDB(){
MCO_RET rc;
mco_device_t dev;
mco_runtime_info_t info;
/* set fatal error handler */

mco_error_set_handler( &errhandler );

if ( create_new_database ) {
/* get runtime info */
mco_get_runtime_info( &info );
if ( info.mco_shm_supported ) {
/* setup memory device as a shared named memory region */
dev.type = MCO_MEMORY_NAMED;
sprintf( dev.dev.named.name, "%s-db", dbName ); /* set memory
* name */
dev.dev.named.flags = 0; /* zero flags */
dev.dev.named.hint = 0; /* set mapping address or null it */
} else {
/* setup memory device as a conventional memory region */
dev.type = MCO_MEMORY_CONV;
if( !dev.dev.conv.ptr ) exit(1);
}
dev.assignment = MCO_MEMORY_ASSIGN_DATABASE;

db_params.disk_page_size = 0; /* zero to disable disk operations
*/

rc = mco_db_open_dev( dbName, shareddb_get_dictionary(),
&dev, 1, &db_params );
if ( rc != MCO_S_OK ) {
printf("\n Could not open database: %d\n", rc);
exit( 1 );
}
}
/* connect to db by dbname */
rc = mco_db_connect( dbname, &db );
/* normal processing goes here */
rc = mco_db_disconnect( db );
rc = mco_db_close( dbname );
if ( !info.mco_shm_supported )
free(dev.dev.conv.ptr);
mco_runtime_stop();
}

Additional processes in a shared memory application only need to start/stop the

eXtremeDB run-time and connect to the database, as demonstrated in the code
snippet below:
void DbAttach(){
MCO_RET rc;
mco_db_h db;


if ( rc ) {
printf("\n Could not attach to instance: %d\n", rc);
exit( 1 );
}
mco_runtime_stop();
}
Separate threads (tasks) within the same process only need to connect to the
database. The run-time is only started one time and the error handler is only set
one time per process:
void DbAttach(){
MCO_RET rc;
mco_db_h db;
if ( rc ) {
printf("\n Could not attach to instance: %d\n", rc);
exit( 1 );
}
}
See the /samples/core/03-open directory for several examples showing different

application scenarios.

Extending Database memory

An application can extend the amount of memory for use by a database by adding
new memory devices at runtime. To extend the database, use the
mco_db_extend_dev() API:
MCO_RET mco_db_extend_dev( /*IN*/ const char* db_name,

/*IN*/ mco_device_t * dev );
The following code snippet demonstrates database extension:
MCO_RET rc;
char * dbname = “a database”;
int size = 1*1024*1024;
mco_device_t extdev;
/* set extended memory region parameters */

extdev.type = MCO_MEMORY_CONV;
extdev.assignment = MCO_MEMORY_ASSIGN_DATABASE;
extdev.size = size;
extdev.dev.conv.ptr = (void*)malloc( extdev.size );
if ( mco_db_extend_dev( dbname, &extdev ) == MCO_OK )

/* the database RAM pool has been successfully increased */
For compatibility purposes, the old-style mco_db_extend() API can be used to

extend an all-in-memory database, for example:
MCO_RET rc;
char * dbname = “a database”;
int size = 1*1024*1024;
void * memory = malloc( size );
if ( mco_db_extend( dbname, memory, size ) == MCO_OK )

/* the database memory size was increased */
Database Backup API

eXtremeDB offers three sets of functions to provide backup and restore
functionality. These functions allow the application to stream database contents to
and from a disk file, socket, or pipe, with optional CRC checking, “binary schema
evolution” (BSE) support and buffering to allow interruption and restart of the
backup process. The three sets of save/load functions, which are described in
detail below, are:
mco_db_save \ mco_db_load
mco_inmem_save \ mco_inmem_load
mco_disk_save \ mco_disk_load and mco_disk_load_file
They provide compatibility checking between saved and loaded versions and
optional CRC32 checking to assure the integrity of data. The CRC32 checking is

controlled by the runtime option MCO_RT_OPTION_DB_SAVE_CRC which is

set by calling mco_runtime_setoption(). By default this option is disabled.
The stream operations (write_to_stream and read_from_stream) are implemented

through user-defined callback functions. These callback functions are not
required to read or write the entire buffer; they may read/write a specific number
of bytes and then return this value (less than the requested size but greater than
zero). For instance if an operation was interrupted by a signal the user-defined
callback will be called again by the eXtremeDB runtime to complete the buffer.
The eXtremeDB runtime internal buffering uses the buffer size defined by the
constant MCO_STREAM_BUF_SIZE in file mcocfg.h (the default buffer size of
16 Kbytes can be altered to meet the application’s needs). When the buffer size is
reached the runtime calls the user-defined callback function to read or write. The
callback functions simply returns the following value ranges to the eXtremeDB
runtime:
< 0 : application defined error code.

= 0 : no more data to be read or written.
> 0 : number of bytes read or written.
Binary Evolution
“Binary evolution” is a feature that allows an application to alter the database

schema of a stored database image to a new definition contained in the dictionary
of the currently running application. Binary evolution is enabled by setting the
runtime option MCO_RT_OPTION_DB_SAVE_BSE. This allows the
application to automatically convert data fields present in the dictionary of the
saved image to the type defined in the currently opened database dictionary and to
set data fields not present in the saved image to default values. The BSE option is
coded in the on-disk image by calling function mco_db_save(), then function
mco_db_load() detects the BSE code and updates the current database,
performing data conversions from the saved image values and setting default
values.
Note: An auto_oid must be declared in the database to use the BSE option. If
the database has no auto_oid and option MCO_RT_OPTION_DB_SAVE_BSE is set,
then functions mco_db_save() and mco_db_load() will return error code
MCO_E_UNSUPPORTED.

Saving a Database Image

An image of the current database can be saved in its entirety with mco_db_save()
and then loaded in the future with mco_db_load(). The function
mco_db_save() operates within the context of an internally managed
MCO_READ_ONLY transaction, and so is guaranteed to capture a transaction-
consistent snapshot of the database.
The calling application provides a stream pointer (a pointer to a file, socket, or

pipe) and the address of the user-defined function that will perform the actual
writes of the stream of bytes representing the database image. eXtremeDB will
call this user-defined function, passing to it the stream pointer, a buffer and the
size of the buffer to be written to the destination. This approach insulates
eXtremeDB from operating system dependencies while giving the application the
flexibility to write the database image to any valid destination.
MCO_RET mco_db_save( /*IN*/ void * stream_handle,

/*IN*/ mco_stream_write output_stream_writer,
/*IN*/ mco_db_h db);
While mco_db_save() copies all database objects (both persistent and transient),
additional functions mco_inmem_save() and mco_disk_save() are provided to
allow only transient or only persistent classes to be written to a stream. For all-in-
memory databases mco_inmem_save() is equivalent to mco_db_save() (except
that option MCO_RT_OPTION_DB_SAVE_BSE applies only to mco_db_save()),
but for “hybrid” databases this allows the application to stream a snapshot of only
the transient objects in the database. It may be useful to note that
mco_inmem_save() can be used for hybrid databases to make a “light snapshot”
as the application shuts down; this way the persistent part of database is already
stored in main database files and transient part is streamed in a separate file or to
another storage device.
MCO_RET mco_inmem_save( /*IN*/ void * stream_handle,

/*IN*/ mco_db_h db);
Applications can write the content of only the persistent classes to a stream using
the function:
MCO_RET mco_disk_save( /*IN*/ void* stream_handle,

/*IN*/ mco_db_h db );
The calling application provides a stream pointer (a pointer to a file, socket, pipe,
etc.) and the address of the user-defined function that will perform the actual
writes of the stream of bytes representing the database image. eXtremeDB will

call this user-defined function, passing it the stream pointer, a buffer and the size
of the buffer to be written to the destination.
Another operation of interest for some applications is the ability to save a

snapshot of the current database cache. The mco_disk_save_cache() saves the
current content of the database cache to the specified file.
MCO_RET mco_disk_save_cache( /*IN*/ mco_db_h con,

/*IN*/ char const* file_path );
For a detailed discussion of caching behavior see the “Transaction Control”

section and the “eXtremeDB supports three types of multi-file devices for
persistent databases:

extra redundancy.


Disk IO” section of Chapter 6.

Loading a Database Image

When an application wants to start from a previously saved image of a database,
mco_db_load() is called, which in turn calls mco_db_open_dev()internally with
the specified device parameters:
MCO_RET mco_db_load( /*IN*/ void * stream_handle,

/*IN*/ mco_stream_read input_stream_reader,
/*IN*/ const char *dbname,
/*IN*/ mco_dictionary_h dict,
/*IN*/ mco_device_t *devices,
/*IN*/ uint2 n_devices,
/*IN*/ mco_db_params_t * db_params );
While mco_db_load() loads all database objects (both persistent and transient),
additional functions mco_inmem_load() and mco_disk_load() are provided to
allow only transient or only persistent classes to be loaded (read from the stream).
The function mco_db_open_dev() is called internally to open/create an empty
database then database objects are created from the data stream.
For all-in-memory databases mco_inmem_load() is equivalent to mco_db_load()

(except that option MCO_RT_OPTION_DB_SAVE_BSE applies only to
mco_db_load()), but for “hybrid” databases this allows the application to load
only the transient objects in the database from previously saved snapshot (see
mco_inmem_save() above).
MCO_RET mco_inmem_load( /*IN*/ void* stream_handle,

/*IN*/ const char* db_name,
Applications can load only persistent classes from a stream using the function:
MCO_RET mco_disk_load( /*IN*/ void* stream_handle,
/*IN*/ const char *dbname,
It is the application’s responsibility to open the stream, in the proper mode to read
binary data, and to ensure that there is adequate memory to hold the database.
An additional function, mco_disk_load_file(), is provided to allow an

application to specify the file configuration in a “database description file”. This
function must be called before mco_db_open_dev() but once the database is

loaded, it is necessary for the application to then call mco_db_open_dev() to

open it.
MCO_RET mco_disk_load_file( /*IN*/ char const* file_path,
/*IN*/ void* stream_handle,
/*IN*/ mco_dictionary_h dict);
The argument “file_path” of mco_disk_load_file() specifies the file to load

from for “single-file” databases. However if the database is “multi-file” or “raid”
the file path identifies a “database description file” with extension “.mfd”. (A
single-file database may also be described as a single segment multi-file through
the database description file.)
Whereas mco_disk_load() gets the necessary file information from the

mco_device_t structures passed to it, mco_disk_load_file() reads the same
information from the database description file (*.mfd). The format for a database
description file is as follows:
number_of_segments type_of_storage
size_of_segment path_to_database_file
(previous line repeats for each segment)
Where:
number_of_segments: a number greater than zero that specifies the

number of database files. This is the count of mco_device_t elements
passed to mco_db_open_dev() with assignment =
MCO_MEMORY_ASSIGN_PERSISTENT.
type_of_storage: multi-file, raid0 or raid1. This is the value in field
“type” of structure mco_device_t passed to mco_db_open_dev().
size_of_segments: number of Megabytes or zero for raids. This is the
value in field “segment_size” of structure mco_device_t passed to
mco_db_open_dev().
path_to_database_file: the name of file or partition assigned to the
database file segment.
Some examples of database description files follow.
Example 1 (a 3 segment raid1 database):

3 raid1
0 dbtest.dbs0
0 dbtest.dbs1
0 dbtest.dbs2
Example 2 (a 3 segment multifile database):

3 multifile

2048 dbtest.dbs0
1024 dbtest.dbs1
2048 dbtest.dbs2
Example 3 (a single-file database):

1 multifile
2048 dbtest.dbs0
Note: Functions mco_db_load(), mco_disk_load() and mco_inmem_load()

perform a call to mco_db_open_dev() internally, but mco_disk_load_file() just
restores database files from a stream—mco_open_dev() must be called after. .
The database file(s) open flags in structure(s) mco_device_t for calls to

mco_disk_load(), mco_db_load() and mco_open_dev() after calling
mco_disk_load_file() should not be MCO_FILE_OPEN_TRUNCATE.
Note: Function mco_inmem_load() works only with data stored by function

mco_inmem_save() and if used on a stream saved by any other of the “save”
functions will return error code MCO_E_VERS_MISMATCH. Likewise function
mco_disk_load() works only with data stored by function mco_disk_save().
Function mco_db_load() automatically detects the type of “save” function called

to store the data stream and behaves differently depending on which “vt” library
is linked: with library mcovtmem, mco_db_load() accepts data streams saved by
either mco_inmem_save() or mco_db_save(), but returns error
MCO_E_VERS_MISMATCH if the data stream was saved by mco_disk_save() or
by calling mco_db_save() from the mcovtdsk library; with library mcovtdsk,
mco_db_load() accepts data stored with any of the “save” functions.
Saving / Loading cache

The mco_disk_save_cache() saves the current content of the database disk
cache to the specified file.
MCO_RET mco_disk_save_cache( mco_db_h con,

char const* file_path );
The mco_disk_load_cache() function loads the database disk cache from the
specified file.
MCO_RET mco_disk_load_cache( mco_db_h db,

const char* file_path );

For a detailed discussion of caching behavior see the “Transaction Control”

section and the “eXtremeDB supports three types of multi-file devices for
persistent databases:

extra redundancy.


Disk IO” section of Chapter 6.
Database Status and Statistics Interfaces

Often it is useful for applications to know how much database memory is left or
currently in use. eXtremeDB supports this with two simple reporting functions:
mco_db_free_pages() and mco_db_total_pages().
Both functions take a connection handle as a parameter. The first function returns
the total number of free pages while the second returns total number of available
(originally allocated) pages.
MCO_RET mco_db_free_pages( /*IN*/ mco_db_h db,

/*OUT*/ uint4 *freepages);
MCO_RET mco_db_total_pages( /*IN*/ mco_db_h db,

/*OUT*/ uint4 *totalpages);
eXtremeDB also allows collecting of statistics on the database at runtime. This is

done via the mco_class_stat_get() function. This function takes a transaction
handle and a class code, and returns statistics data for the class.
MCO_RET mco_class_stat_get(/*IN*/ mco_trans_h t,

/*IN*/ uint2 class_code,
/*OUT*/ mco_class_stat_h stat);
Runtime statistics are returned in the mco_class_stat structure:
typedef struct mco_class_stat_t_

{
mco_counter_t objects_num;
mco_counter32_t core_pages; /* pages used for all data except
blobs */
mco_counter32_t blob_pages; /* pages used by blobs */
mco_counter_t core_space; /* in bytes, not counting blobs */
} mco_class_stat_t, * mco_class_stat_h;
The next function returns the number of indexes in the database. It must be called
in the context of a read-only transaction and is often used in conjunction with the
mco_index_stat_get() API to obtain index statistics at runtime:
MCO_RET mco_index_stat_num (mco_trans_h t, uint2 * pnidx);
MCO_RET mco_index_stat_get (mco_trans_h t,

uint2 index,
mco_index_stat_t * stat);
Runtime statistics are reported for the given index in the, mco_index_stat_t
structure:
typedef struct mco_index_stat_tree_t_ {

mco_counter_t levels_num; /* tree height */
mco_counter_t duplicates_num; /* number of duplicates */
} mco_index_stat_tree_t;
For a more detailed explanation of the index statistics see the Reference Guide.
Persistent databases
The following function returns information about the current state of the database
and log file: the size of the log file in bytes, the size of the database file in bytes
and the amount of space that is actually used in the database file.
void mco_disk_info (mco_db_h con, mco_disk_info_t * info);

Database Calculator
In addition to the runtime statistics described above, eXtremeDB provides the

possibility of obtaining runtime information about any database class, including
the memory that will be used by that class at runtime, its indices and the size of
dynamic objects. This information allows applications to optimize schema
designs, choose the optimal page sizes and other storage and performance
parameters.
To obtain this information a calculator object is attached to the specified database

and classes are registered with the calculator. The application provides the
calculator with the number of objects expected to be stored for each class. The
calculator API is also provided so that applications can use it at runtime if so
desired. (See the Reference Guide for detailed descriptions of the functions and
data structures used by calculator).
eXtremeDB packages also include the source code of a complete application

called dbcalc that demonstrates the calculator API and can be used by developers
to inspect their databases.
Concurrency Control
Concurrency is defined as the ability for multiple tasks to access shared data
simultaneously. The greater the number of concurrent tasks that can execute
without interfering with each other, the greater the concurrency of the database
system. Database concurrency control mechanisms are implemented through
database transactions. eXtremeDB Transaction Managers ensure that database
transactions are performed concurrently without violating data integrity and that
transactions adhere to ACID principles (see http://en.wikipedia.org/wiki/ACID).
Concurrency Management
There are traditionally two models for database concurrency: optimistic and
pessimistic. Pessimistic concurrency control works on the assumption that data
modification operations are likely to affect any read operation made by a different
task; the database system pessimistically assumes that a conflict will occur.
eXtremeDB behavior when using pessimistic concurrency control is to use locks
and block access to the database when any data is modified.
Optimistic concurrency control works on the assumption that data modifications

are unlikely (although possible) to interfere with any other task reading or
modifying data. The runtime behavior when using optimistic concurrency control
is to use versioning to allow read operations to see the state of the data before the
modification took place.

To allow developers the best choice of concurrency control, eXtremeDB provides

three transaction managers implemented as separate libraries, which makes it easy
for applications to use one or the other by simply linking with the appropriate
library:
• MURSIW (Multiple Reads Single Write): a traditional lock-based

(pessimistic) transaction manager, and
• MVCC (Multi-Versioning Concurrency Control): an optimistic transaction

manager.
• EXCLusive: one task at a time may access the database for reading or writing.
MURSIW
MURSIW, the traditional, lock-based transaction manager, is implemented via a

simple queue. Transactions initiated by applications are added into a queue that is
sorted based on the transaction priorities (see “Transaction Priority” section
below). This approach to serialization allows the MURSIW transaction manager
to simplify locking mechanisms, minimizing the overhead due to resource-
consuming deadlock prevention algorithms. This simplified transaction model is
not, however, as restrictive as it might appear to a traditional DBMS user. As
with the other components of eXtremeDB, the transaction manager is built upon
the principles mentioned above - low overhead and the highest possible
performance for the kind of applications to which eXtremeDB is targeted.
The transaction manager makes the most of the aspects of the environment in
which it executes. Fully supporting the ACID principles, it takes advantage of the
embedded, limited multi-tasking nature of applications running on high tech gear.
Usually, there are few simultaneous tasks executing, and rarely are there
simultaneous tasks that require write-access to an object store; the transaction size
is also small. So it is practical to minimize the footprint and simplify the
implementation by eliminating complex transaction synchronization and
enforcing serialization of write transactions.
Read-only transactions are executed simultaneously. Serialization of write

transactions is transparent to the application programmer. Because the transaction
manager is “light”, properly designed and implemented, application transactions
execute swiftly and serialization is not a performance concern.
MVCC
The Multi-Versioning Concurrency Control (MVCC) transaction manager

enhances applications’ database concurrency management options. With the
MVCC model, while querying a database each transaction sees a snapshot of data,
regardless of the transaction state of the database. This protects the transaction
from viewing inconsistent data that could be caused by other transaction updates

on the same set of objects or indexes, thus providing transaction isolation for each
transaction. The MVCC manager allows applications to choose how transactions
are isolated from each other by setting up the transaction isolation level at
runtime.
Locking optimization
eXtremeDB uses two kinds of synchronization primitives - latches and locks. The
first kind (latch) is a lightweight lock implemented on atomic instructions. It is
used, for example, in btree indexes to lock branches. The second kind (lock) is a
full size synchronization primitive implemented with kernel-locks (and
lightweight atomics for performance, if possible). One lock is used for the
eXtremeDB registry and database header, but all other locks applied during
transaction processing depend on the choice of Transaction Manager:
• MVCC - multiple latches in indexes, multiple latches/locks in transaction

processing.
• MURSIW – no latches in indexes, multiple latches/locks (but fewer than
MVCC) in transaction processing.
• EXCLusive - no latches in indexes, one single lock on the transaction queue.
For single-threaded and single-process applications, it is possible to eliminate

latches and locks completely by employing the EXCLusive Transaction Manager
and “hollow” synchronization implementation. (A sample “hollow”
synchronization implementation that makes no kernel calls nor atomic operations
or spins is provided as a template for custom user-defined synchronization in the
file “mcosempty.c” located in the target directory.)
The developer has complete choice regarding the transaction manager and lock
implementations by selecting one or another of the transaction manager and
synchronization implementations libraries. Most likely the choice will be
between MURSIW and MVCC based on the characteristics of the application. If
it’s mostly read-only with occasional updates, then MURSIW could be the best
choice. If there are a relatively high number of concurrent processes/threads
attempting to modify the database at the same time, then MVCC could be the
better choice. Or if the application is single-threaded, concurrency is not an issue
and clearly it will perform best with the EXCLusive transaction manager. One
can experiment between the transaction managers by just linking the appropriate
library. No application code changes are needed (except to handle conflict errors,
if MVCC is ultimately the choice).
Transaction Isolation Levels

Transaction isolation is a property that defines how and when the changes made
by one transaction become visible to other concurrent transactions. DBMSs

usually offer a number of transaction isolation levels that define the degree to
which one transaction must be isolated from data modifications made by other
transactions. In fact, the ANSI/ISO SQL standard defines four levels of
transaction isolation: Read Uncommitted, Read Committed, Repeatable Read and
Serializable. The eXtremeDB MVCC transaction manager supports three of
these.
Read Committed. When this level is used a transaction always reads committed
data. The transaction will never read data that another transaction has changed
and not yet committed, but it does not ensure that the data will not be changed
before the end of the transaction.
Repeatable Read (snapshot). This is the default isolation level in eXtremeDB.

When a transaction runs at the Repeatable Read level, it reads a database snapshot
(data and cursors) as of the time when the transaction starts. When the transaction
concludes, it will successfully commit only if the values updated by the
transaction have not been changed since the snapshot was taken. Such a write-
write conflict will cause the transaction to roll back. Also note that the snapshot
level does not serialize transactions.
Serializable. This level applies an exclusive lock to all write transactions—no

other write transactions can run at the same time, however “readers” can still run
in parallel with the “writer” and with each other.
To illustrate how choosing a different isolation level could be important, consider

the example of two transactions t1 and t2 that concurrently read and update data
sets as follows:
a=1 b=2
t1: a = a + 1; c = a + b
t2: b = b + 2; d = a + b
If t1 and t2 are serialized, and t1 is executed before t2 the result is c=4 d=6;
If t1 and t2 are serialized, and t1 is executed after t2 the result is c=6, d=5;
In the case of Repeatable Read (snapshot), if t2 starts and ends in between the
start and end of the t1, the result of the concurrent execution is c=4 and d=5 and it
does not correspond to any of the serialized results.
Transaction Priority
eXtremeDB supports transaction priorities; it is possible to assign a priority value
to each transaction at runtime. At the time a transaction is registered with the
runtime, the transaction scheduler checks the priority value and shifts the
transaction forward or backwards in the transaction queue. With the MURSIW
transaction manager, applications normally execute as foreground transactions,

however by using the transaction priority mechanism, an application can also

“boost” the execution of a transaction to high-priority at runtime if necessary. See
Appendix A for a list of the available transaction priorities
Transaction priorities with MVCC have a slightly different meaning.

Transactions with priority above NORMAL lock the entire index - effectively
turning off optimizations for parallel access to tree and hash structures. By doing
so, the transaction avoids locking index pages during cursor movements. The
lock is released when a cursor reaches the end of the result set, or by calling the
mco_cursor_close() API.
Note: When adjusting Transaction Priority be aware of the “Priority Inversion

Problem” described at http://en.wikipedia.org/wiki/Priority_inversion .
Transaction API
Transactions are started by calling one of two eXtremeDB functions:
mco_trans_start() or mco_trans_start_ex(). The second differs only in
that it allows setting the isolation level for the transaction.
MCO_RET mco_trans_start(/*IN*/ mco_db_h db,

/*IN*/ uint1 trans_type,
/*IN*/ uint2 pri,
/*OUT*/ mco_trans_h *trans);
MCO_RET mco_trans_start_ex( mco_db_h db,

MCO_TRANS_TYPE type,
MCO_TRANS_PRIORITY pri,
MCO_TRANS_ISOLATION_LEVEL level,
/*OUT*/ mco_trans_h* p_trans);
After navigating the database to find a desired object, often an application will
need to update the found object. This requires initiating a ReadOnly transaction
to search for the desired object, and then, once found, upgrading the transaction
from ReadOnly to ReadWrite in order to update the record. This is accomplished
by calling the following function:
MCO_RET mco_trans_upgrade(/*INOUT*/ mco_mco_trans_h t);
To commit or rollback a transaction use one of the following functions:
MCO_RET mco_trans_commit(/*IN*/ mco_trans_h t);

MCO_RET mco_trans_rollback(/*IN*/ mco_trans_h t);
Sometimes an application module that performs an update may be called from

different points in the application. If passed a transaction handle, this function
might need to first determine if the transaction is ReadWrite before proceeding
with the update. The following API function serves this purpose:

MCO_RET mco_trans_type( /*IN*/ mco_trans_h t,

/*OUT*/ uint2 *type);
If an error occurs during a transaction, the transaction enters an error state and
subsequent operations within that transaction will return MCO_E_TRANSACT.
In this case, to obtain the error code of the operation that initially caused the error
condition, use the following function:
MCO_RET mco_get_last_error(/*IN*/ mco_trans_h t);
Isolation Levels
The transaction isolation levels are enumerated as follows:
MCO_DEFAULT_ISOLATION_LEVEL = 0x0,
MCO_READ_COMMITTED = 0x1,
MCO_REPEATABLE_READ = 0x2,
MCO_SERIALIZABLE = 0x4
If the mco_trans_start() API is used, the transaction isolation level is set to

MCO_DEFAULT_ISOLATION_LEVEL, which is by default MCO_SERIALIZABLE for
MURSIV and MCO_REPEATABLE_READ for MVCC. (Note that if the MURSIV
transaction manager is used the only possible “level” is MCO_SERIALIZABLE.)
It is possible to redefine the default transaction isolation level for the database
session. This is done via the following function:
MCO_TRANS_ISOLATION_LEVEL mco_trans_set_default_isolation_level(
mco_db_h db,
MCO_TRANS_ISOLATION_LEVEL level );
The application can inspect what transaction isolation levels are supported by the
currently running transaction manager via the following API function:
int mco_trans_get_supported_isolation_levels();
Conflicts
When the MVCC transaction manager is used, transactions are executed

concurrently. Sometimes concurrent transactions modify the same objects, thus
creating transaction conflicts. The transaction manager resolves those conflicts
by aborting one of the conflicting transactions and letting the other one commit its
updates to the database. When a transaction is aborted, the application receives
the MCO_E_CONFLICT error code. It is the application’s responsibility to
manage this possibility with logic similar to the following:
do {
mco_trans_start( db,
MCO_READ_WRITE,
MCO_TRANS_FOREGROUND,
&t);
...<update database>...

rc = mco_trans_commit(t);
} while ( rc == MCO_E_CONFLICT );
Note: When the MVCC transaction manager is used, the application must be
able to tolerate transaction rollbacks due to conflicts as described above.
If the number of conflicts is too high, it could lead to sharp performance

degradation. When this occurs, the transaction manager temporarily turns the
optimistic control off and continues scheduling database transactions using the
pessimistic approach (locks). The application can set the conflicts threshold over
which the optimistic control is disabled. This is done via the following API
function:
void mco_trans_optimistic_threshold( mco_db_h db,

int max_conflicts_percent,
int disable_period);
Specifically, if the percentage of transactions that have been aborted because of

the transaction conflict exceeds the max_conflicts_percent, the optimistic mode is
disabled for disable_period successive transactions. When this happens,
internally transactions are protected by a mutex, thus reducing the level of
concurrency and eliminating any source of conflicts. By default the optimistic
threshold is set to 100 (which means “never disable optimistic mode no matter
how many conflicts occur”).
Persistent databases
When the MVCC transaction manager is used, in the case of a system crash, a
persistent database can contain undeleted old versions and working copies. Their
presence will not break the consistency of the database and doesn't prevent the
normal working of an application, but does unnecessarily consume space.
Detecting these stale object versions requires a complete scan of the database. For
this reason the automatic recovery process doesn't perform this function
automatically.
eXtremeDB provides two methods for the removal of these unused versions. The
application can enable the repair process by setting the mode mask in the
mco_db_params_t when calling mco_db_open_dev () with the value
MCO_DB_MODE_MVCC_AUTO_VACUUM. Or the repair can be performed
explicitly by calling the following API function:
void mco_disk_database_vacuum( mco_db_h db );

The mco_disk_database_vacuum() function requires exclusive access to the

database, so no other operations can be performed on the database until the
vacuum operation is complete and the function has returned control back to the
application.
Two-phase Commit
Some applications require a more elaborate control of the transaction commit
processing; specifically, committing the transaction in two steps (phases). The
first phase writes the data into the database, inserts new data into indexes and
checks index restrictions (uniqueness) (altogether, the “pre-commit”) and returns
control to the application. The second phase finalizes the commit.
One example of such an application is the case where multiple eXtremeDB

databases need to synchronize the updates performed within a single transaction.
Another example could be that the eXtremeDB transaction commit is included in
a global transaction that involves other database systems or external storage. In
this case, the application coordinates the eXtremeDB transaction with the global
transaction between the first phase and the second phase.
Another useful application of the two-phase commit protocol is in the context of

the eXtremeDB High Availability. Please refer to the appropriate section of the
eXtremeDB High Availability Addendum for a detailed explanation.
To facilitate these and similar application scenarios, eXtremeDB provides the

following two API functions:
MCO_RET mco_trans_commit_phase1(/*IN*/ mco_trans_h t);

MCO_RET mco_trans_commit_phase2(/*IN*/ mco_trans_h t);
Please note that in order to use the two-phase commit the eXtremeDB run-time
must be two-phase commit enabled when the run-time is compiled. The two-
phase commit is enabled by the following #define in the mcocfg.h file:
#define MCO_CFG_2PHASE_COMMIT
In order to perform the two-phase commit, the application needs to call the
commit phases sequentially instead of calling on mco_trans_commit(). After the
first commit phase is returned, the application cannot perform any activities
against the database (except initiating the second commit phase or rolling back the
transaction). This process is illustrated in the following code segment:
mco_trans_h t1, t2;
…
if ( (mco_trans_commit_phase1(t1) == MCO_S_OKAY) &&
global_transaction() == SUCCESS ) )
{
mco_trans_commit_phase2(t1);

}
else
{
mco_trans_rollback(t1);
}
Note: At the time of writing, the two-phase commit API is not available for the
MVCC transaction manager.
Examining the content of a transaction

The mco_trans_iterate() API provides the ability to iterate over all modifications
made by a transaction, read the modified objects, and determine what kind of
modifications were applied (new, update, delete).
extern MCO_RET mco_trans_itertate (mco_trans_h trans,

mco_trans_iterator_callback_t callback, void* user_ctx);
The second parameter is an application-defined callback function that inspects

each object, and determines whether it is subject to an external transaction, etc.
The callback receives a handle to the modified object, the class_id of the object,
the opcode of the modification operation and some application-specific context
(anything that the application needs to pass into the callback). The callback will
return MCO_S_OK to indicate that the application can continue iterating through
the transaction, or any other value to indicate a problem, in which case the
application rolls the transaction back. This API can be used together with the
two-phase commit API as demonstrated in the following code snippet:
mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &trans);
...
rc = mco_trans_commit_phase1(&trans);
if (rc == MCO_S_OK) {
/* commit to external database */
rc = mco_trans_iterate(&trans, &my_iterator_callback,
my_iterator_context);
/* external commit succeeded */
mco_trans_commit_phase2(&trans);
} else {
mco_trans_rollback(&trans);
}
}
Pseudo-nested Transactions
Nested transactions might be necessary when two different application functions
may be called separately or call each other. To facilitate transaction nesting
eXtremeDB allows an application to call mco_trans_start() or
mco_trans_start_ex() before the current transaction is committed or aborted.
The runtime maintains an internal counter that is incremented each time

mco_trans_start() or mco_trans_start_ex() is called, and decremented by

mco_trans_commit() and mco_trans_abort(). The transaction commit or rollback
in an inner transaction does not perform any actions except to reduce the nested
transaction counter, and the transaction context remains valid until the outer
transaction performs a commit or rollback. eXtremeDB will not actually commit
or abort the transaction until the counter reaches zero.
If an “inner” transaction calls mco_trans_abort(), the transaction is put into an

error state, and any subsequent calls to modify the database in the scope of the
outer-most transaction will return immediately. Outer and inner transactions will
be assigned the stricter transaction type without requiring the application to
upgrade the transaction type; each transaction code block should simply call
mco_trans_start() with the appropriate transaction type for the operation being
performed within its own body. The following code snippet illustrates a nested
transaction implementation:
/* insert two Transaction records */
int insert_two(mco_db_h db, uint4 from1, uint4 to1, uint4 from2, uint4 to2)
{
MCO_RET rc;
mco_trans_h t;
Transaction trans;
rc = mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);

if ( MCO_S_OK != rc )
return 0;
/* call nested transaction in insert_one() to insert first object */

insert_one(db, from2, to2 );
/* insert second object */

rc = Transaction_new(t, &trans);
{
mco_trans_rollback(t);
return 0;
}
/* put values in first 'new' object */

Transaction_from_put(&trans, from1);
Transaction_to_put(&trans, to1);
/* now commit the transaction to complet the insert of the first object */
return mco_trans_commit(t);
}
/* insert one Transaction record within a read-write transaction */

MCO_RET insert_one(mco_db_h db, uint4 from, uint4 to )
{
MCO_RET rc;
mco_trans_h t;
Transaction trans;

if (rc ) return 0;
rc = Transaction_new(t, &trans);
{
mco_trans_rollback(t);
return 0;
}
Transaction_from_put(&trans, from);
Transaction_to_put(&trans, to);
return mco_trans_commit(t);
}


{
MCO_RET rc;
mco_db_h db = 0;
sample_memory_t dbmem;
…
/* perform a simple nested transaction... */
uint4 from1 = 11, to1 = 16, from2 = 7, to2 = 17;
rc = insert_two(db, from1, to1, from2, to2);
sample_rc_check("\t Insert two objects", rc );
mco_db_disconnect(db);
}
sample_close_database(db_name, &dbmem);
}
mco_runtime_stop();
sample_pause_end("\n\n Press any key to continue . . . ");
}
Note: If the transaction type in module insert_two() had been READ_ONLY,

the nested transaction in insert_one() would automatically promote the
transaction type to READ_WRITE causing the outer transaction to complete
successfully even though it would otherwise fail on the attempt to instantiate a
new object (the line rc = Transaction_new(t, &trans) ) within a
READ_ONLY transaction.

Cursor Control
eXtremeDB supports the traditional definition of cursors. A cursor is an entity that
represents a “result set”, a sequence of objects, associated with an index. The
cursor is created by a _search or _find operation (see chapter 5 “Generated API”)
and its position within the result set is changed using cursor navigation functions.
Applications use cursors and the cursor API to navigate through the database and
read or update database objects. A special data type, mco_cursor_h is used to
reference a cursor.
Note: If the indexed field(s) with which the cursor is associated are updated,
changing the object’s position within the indexed objects, the cursor will also
change position when the transaction is committed or the object(s) are
checkpointed.
A cursor is obtained by one of the two following generated functions (see chapter
5 “Generated API”):
For tree and hash indexes:
MCO_RET classname_indexname_index_cursor( /*IN*/ mco_trans_h t,

/*OUT*/ mco_cursor_h c);
For list indexes:
MCO_RET classname_list_cursor( /*IN*/ mco_trans_h t,

The next two functions are used to position a cursor. These functions must be
called in the context of a transaction (the first parameter) and accept a cursor as
the second parameter. In the event of a cursor being invalid, the error code
MCO_E_CURSOR_INVALID is returned.
MCO_RET mco_cursor_first(/*IN*/ mco_trans_h t,

/*INOUT*/ mco_cursor_h c);
MCO_RET mco_cursor_last( /*IN*/ mco_trans_h t,
The next two functions are used to navigate a cursor. These functions must be
called in the context of a transaction (the first parameter) and accept a cursor as
the second parameter. In the event of a cursor being invalid, the error code
MCO_E_CURSOR_INVALID is returned. If the end of the result set has been
reached, the status code MCO_S_CURSOR_END is returned
MCO_RET mco_cursor_next( /*IN*/ mco_trans_h t,

MCO_RET mco_cursor_prev( /*IN*/ mco_trans_h t,

The following function determines whether a cursor is valid and, if so, whether it
was created from a sequential search (list), hash- or tree-based search.
MCO_RET mco_cursor_type(/*IN*/ mco_cursor_h c,

/*OUT*/ MCO_CURSOR_TYPE *type);
The following function determines the type of object pointed to by the current
cursor, returning the numeric value assigned to a class. (This value is defined in
the interface header file created by the schema compiler.)
MCO_RET mco_cursor_get_class_code(
/*IN*/ mco_cursor_h c,
/*OUT*/ uint2 *classcode);
Auxiliary Interfaces – Heap Management

eXtremeDB defines interfaces for memory allocation and release for use with the
eXtremeDB runtime and target applications. Various operating environments lack
efficient heap management that the eXtremeDB runtime requires. These routines
were implemented for internal use by the eXtremeDB runtime, but are also
available for applications’ use.
Before the application can call the memory management functions, the memory
object (heap) must be created. When the memory object is no longer needed it
should be destroyed to prevent memory leaks. The memory management API
provides a standard C-style heap management.
int mco_heap_head_size(void);
void mco_heap_init( /*IN*/ void *start_address,

/*IN*/ int memory_size,
/*INOUT*/ mco_heap_h memory_handle);
void *mco_malloc( /*IN*/ mco_heap_h memory_handle,

/*IN*/ int nbytes);
void mco_free( /*IN*/ mco_heap_h memory_handle,

/*IN*/ void *p);
void *mco_calloc( /*IN*/ mco_heap_h memory_handle,

/*IN*/ int num,
/*IN*/ int size);
void *mco_realloc( /*IN*/ mco_heap_h memory_handle,

/*IN*/ void * pblock,
/*IN*/ int newsize);
// est. free bytes in pool

uint4 mco_heap_freesize( /*IN*/ mco_heap_h memory_handle);
// est. used bytes in pool

uint4 mco_heap_usedsize( /*IN*/ mco_heap_h memory_handle);

void mco_heap_destroy( /*IN*/ mco_heap_h memory_handle);
The mco_heap_head_size() and mco_heap_init() interfaces are used to

prepare a memory pool for subsequent use by mco_malloc(), mco_calloc(),
mco_realloc() and mco_free(). The mco_heap_destroy() returns a heap back
to the operating system.
The following code fragment illustrates the steps to prepare a heap with
mco_heap_head_size() and mco_heap_init().
void *start_address;
mco_heap_h memory_handle;
int hhize = mco_heap_head_size();
if( !(memory_handle = (mco_heap_h) malloc( hhsize ) ))

{
printf(“invalid address for the heap header.\n”);
exit(0);
}
if( !(start_address = malloc( heap_size )) )
{
printf(“invalid head address.\n”);
exit(0);
}
mco_heap_init( start_address, heap_size, memory_handle );

Chapter 5: Generated Type-safe
Interfaces
The previous chapter describes the eXtremeDB Application Independent API that
is invariant for all applications. In this chapter we will describe the set of
functions that the eXtremeDB schema compiler generates based on declarations in
the schema file. The schema compiler generates six types of interface methods:
oid methods, new methods, delete methods, put methods, get methods and search
methods.
In the discussion that follows, “structname” can be substituted for “classname” if

the generated interface is for a field of a type structure defined in the schema.
Basic Interfaces
oid-based Interfaces
If an oid is declared for the database DDL, the compiler generates a C structure
that represents the oid and five interfaces. For example, if the schema contains the
oid declaration:
struct structname{
uint4 num_in;
};
declare oid structname[10000];
the compiler will generate the following definitions and methods:
typedef struct databasename_oid__ {

uint4 num_in;
}
databasename_oid;
static const uint2 databasename_oid_size = sizeof(databasename_oid);
MCO_RET databasename_delete_object(
/*IN*/ mco_trans_h t,
/*IN*/ const databasename_oid * oid );
MCO_RET databasename_get_class_code(
/*IN*/ const databasename_oid * oid,
/*OUT*/ uint2 *classcode );
The first method allows deletion of an object based on its oid, while the second
returns an integer that identifies the class of the object referenced by oid.
For classes declared with oid, two additional methods are generated: one that
locates an object based on its oid and another that obtains the oid of a known
object;

Chapter 5: Generated Type-safe Interfaces
MCO_RET classname_oid_find(
/*IN*/ const databasename_oid *id,
/*OUT*/ classname *handle );
MCO_RET classname_oid_get( /*IN*/ classname *handle,

/*OUT*/ databasename_oid *id );
autoid Interfaces
If a class is declared to have an autoid, the schema compiler will generate two
methods for the class:
MCO_RET classname_autoid_find( /*IN*/ mco_trans_h t,

/*IN*/ autoid_t id,
/*OUT*/ classname *handle );
MCO_RET classname_autoid_get( /*IN*/ classname *handle,
/*OUT*/ autoid_t *id );
These interfaces facilitate using the autoid attribute to implement object

relationships. When an object of a class with an autoid is created, the system
generates a unique value for the object. This value can be retrieved with the
_autoid_get function. The autoid value can be stored in a field of other objects as
a reference and then used in the _autoid_find function to locate the referenced
object. The following code fragment demonstrates the sequence (error checking is
omitted for brevity):
/* database schema */
class referenced {
…
autoid[4000];
…
};
class referencing {
…
autoid_t refd_object;
…
};
/* application code, object creation */
autoid_t id;
mco_trans_start( db,
MCO_READ_WRITE,
&t );
// create new object, autoid assigned by system

referenced_new( t, &refd_obj );
// get the autoid value
referenced_autoid_get( &refd_obj, &id );
rc = referencing_new( t, &refg_obj ); // create new object
// store ref’d autoid in it
referencing_refd_object_put( &refg_obj, id );
rc = mco_trans_commit( t );
/* application code, object retrieval */

rc = mco_trans_start( db,
MCO_READ_ONLY,
&t );
/* first locate a ‘referencing’ object by some method (not shown) */

// get the autoid of the referenced obj

referencing_refd_object_get( &refg_obj, &id );
// locate the referenced object
rc = referenced_autoid_find( t, id, &refd_obj );
rc = mco_trans_commit( t );
The preceding code fragment shows the application code sequence of creating a
new object in the database that has the autoid attribute, retrieving the system-
assigned autoid value, and storing that value in a field of an object of another
class. Later, the autoid value is extracted from the referencing object and used to
locate the referenced object through its _autoid_find function.
event Interfaces
Embedded applications can be designed to respond to data events like creating,
updating or deleting database objects. The eXtremeDB data definition language
provides grammar the database designer uses to cause applications to receive
notification of the following database events: adding a new object, deleting an
object or all objects of a class, checkpoint events and updating an object or a
specified field of an object. Events are specific to classes. In other words, an add
event for class Alpha doesn’t activate the notification mechanism when an Omega
object is added to the database.
The schema grammar describes what events will trigger application notifications.
interfaces. Events can be handled in two ways: synchronously and/or
asynchronously.
Asynchronous Event Handling
In the case of asynchronous event handling, the application spawns separate

threads to handle each event. The event thread calls an event interface function
and waits for the event specified as an argument to the interface function. When
the event occurs eXtremeDB will release the thread. Upon releasing the thread,
the eXtremeDB runtime continues normal processing, such that the handler thread
runs in parallel with other threads, until it completes its processing and again
waits on the event.
There is a small window of possibility for another instance of the event to occur
before the event handler has completed its task and again wait on the event
(events are not queued). This window can be minimized if the handler delegates
the processing of the event to yet another thread, allowing the handler thread to
immediately wait on the event again. If no risk of an unhandled event can be
tolerated, either a synchronous event handler can be used, or the application can
maintain a separate table of unhandled events. Asynchronous events are activated
after the transaction commits. If, within the scope of a single transaction, several
objects are added, or deleted, or several fields are updated which have event
handlers waiting, all the handlers will be activated simultaneously.

Synchronous Event Handling
Synchronous event handlers are called within the context of the same thread that
caused the event. Care should be taken not to cause extraordinary delays because
the handler has control of a transaction that, by definition, is a write transaction.
Specifically, the handler should not block on an indeterminate external event such
as user input.
A synchronous handler should return MCO_S_OK to indicate successful

completion; any other value (one of MCO_S_ or MCO_E_ constants defined in
mco.h, or a user-defined value) to indicate success or failure with additional
information. eXtremeDB will return this value to the application, which can act on
it accordingly (rolling back the transaction if necessary).
Synchronous handlers are registered by the application. Any number of handlers

can be registered for a single event, but the order in which they are called cannot
be predicted. At registration, the application can pass a user-defined parameter
that will be, in turn, passed to the event handler when it is invoked. This
parameter is a void pointer that can reference a simple scalar value or a complex
data structure depending on the application requirements.
For add events, synchronous handlers are called by the classname_new()method

immediately after the object is instantiated (so the object handle is guaranteed
valid). For checkpoint events, synchronous handlers are called by the
classname_checkpoint() method immediately before or after the object is
inserted into indexes – the application specifies through the handler registration
interface whether the handler will be invoked before or after inserting into
indexes. For delete events, synchronous handlers are called by the
classname_delete() method before the object is deleted (while the object
handle is still valid).
Note: Delete events are not invoked by the classname_delete_all()

method.
Update events can be defined for a class (i.e. all fields of that class) or for a
specific field of a class by specifying the field name in the event declaration. As
with checkpoint events, the application must specify through the handler
registration interface whether the handler will be invoked before or after a field is
updated. Update handlers are activated by any interface method that will cause a
field’s contents to change, for example, classname_fieldname_put(),
classname_fieldname_erase(). If the event handler is called before the update
and the handler invokes classname_fieldname_get() on the field, it will
retrieve the current value in the database. Conversely, if the event is called after
the update, the handler will retrieve the value the application just put in the
database. The user-defined parameter can be used to provide additional

information to the handler such as the incoming value for a before-event handler,
the old value for an after-event handler, or a vector offset for an erase operation.
Note: Both synchronous and asynchronous events can be applied to any given event.
When using eXtremeDB in Shared Memory, Synchronous event handlers must belong
to the same process that caused the event, or the results will be unpredictable. In
particular, do not register a synchronous event handler for class Alpha in process A if
it is possible that process B will insert/update/delete Alpha objects. Use an
asynchronous event handler, instead.
Note: For update events, a class-wide event can not be combined with field update
events for the same class.
The following code fragments illustrate the use of event handling. A sample
schema definition for a class with event notifications follows:
class dropped_call
{
uint4 trunk_id;
…
autoid;
event < trunk_id update > upd_trunk; // any name will do

event < new > add_trunk;
event < checkpoint > checkpoint_trunk;
event < delete > del_trunk;
};
The above class will cause the following definitions to be generated in the
interface header file:
#define upd_trunk 15
// 15 is only illustrative; the actual value is not important
#define add_trunk 16
#define checkpoint_trunk 17
#define del_trunk 18
typedef MCO_RET (*mco_upd_trunk_handler)(

/*IN*/ mco_trans_h *t,
/*IN*/ dropped_call *handle,
/*IN*/ MCO_EVENT_TYPE et,
/*INOUT*/ void *param );
typedef MCO_RET (*mco_add_trunk_handler)(

typedef MCO_RET (*mco_del_trunk_handler)(

MCO_RET mco_register_upd_trunk_handler(
/*IN*/ mco_upd_trunk_handler,
/*IN*/ void *param,
/*IN*/ MCO_HANDLING_ORDER when);
MCO_RET mco_register_add_trunk_handler(

/*IN*/ mco_add_trunk_handler,
/*IN*/ void *param,
/*IN*/ MCO_HANDLING_ORDER when);
MCO_RET mco_register_checkpoint_trunk_handler(
/*IN*/ mco_checkpoint_trunk_handler,
/*IN*/ void *param);
MCO_RET mco_register_del_trunk_handler(
/*IN*/ mco_del_trunk_handler,
/*IN*/ void *param);
MCO_RET mco_unregister_upd_trunk_handler(
/*IN*/ mco_upd_trunk_handler);
MCO_RET mco_unregister_add_trunk_handler(
/*IN*/ mco_add_trunk_handler);
MCO_RET mco_unregister_checkpoint_trunk_handler(
/*IN*/ mco_checkpoint_trunk_handler);
MCO_RET mco_unregister_del_trunk_handler(
/*IN*/ mco_del_trunk_handler);
To employ an asynchronous handler for one of the events above, the application
would create a thread and, within the thread function, call:
mco_async_event_wait( dbh, upd_trunk );
Where ‘dbh’ is the database handle from the mco_db_connect() method and
‘upd_trunk’ is the value defined in the generated interface file to reference the
event of interest.
As previously mentioned, this thread will block (wait) until released by

eXtremeDB. It can be released either by an occurrence of the event, or the
application can release it forcibly by calling one of:
mco_async_event_release( dbh, upd_trunk );

mco_async_event_release_all_( dbh );
An event handler will know if it was released by an occurrence of the event or by

mco_async_event_release() or mco_async_event_release_all() by the
return value of mco_async_event_wait(). MCO_S_OK means the event
happened; MCO_S_EVENT_RELEASED means the handler was released.
For the preceding class definition and its generated interfaces, the following code
fragments illustrate the use of synchronous event handling.
First, the application must register its event handler functions by calling a
function like the following:
int register_events(mco_db_h db)
{
MCO_RET rc;
mco_trans_h t;
mco_trans_start(db,
MCO_READ_WRITE,

&t);
mco_register_add_trunk_handler(t, &new_handler,
(void*) 0);
mco_register_checkpoint_trunk_handler(t, &checkpoint_handler,
(void*) 0,
MCO_BEFORE_UPDATE );
mco_register_del_trunk_handler(t, &delete_handler,
(void *) 0);
mco_register_upd_trunk_handler( t, &update_handler1,
(void *) 0,
MCO_BEFORE_UPDATE );
return rc;
}
The bodies of the handler functions would look like the following:
/* Handler for the "<new>" event. Reads the autoid and prints it out
*/
MCO_RET new_handler( /*IN*/ mco_trans_h t,
/*IN*/ dropped_call * obj,
/*INOUT*/ void *param)
{
int8 u8;
param = (int *)1;
dropped_call_autoid_get( obj, &u8 );

printf( "Event \"Object New\" : object (%ld,%ld) is
created\n", u8.lo, u8.hi );
return MCO_S_OK;
}
/* Handler for the "<checkpoint>" event. Reads the autoid and prints it out
*/
MCO_RET checkpoint_handler( /*IN*/ mco_trans_h t,
{
int8 u8;
param = (int *)1;
dropped_call_autoid_get( obj, &u8 );

printf( "Event \"Object Checkpoint\" : object (%ld,%ld) is
about to be created\n", u8.lo, u8.hi );
return MCO_S_OK;
}
/* Handler for the "<delete>" event. Note that the handler
* is called before the current transaction is committed.
* Therefore, the object is still valid; the object handle
* is passed to the handler and is used to obtain the
* autoid of the object. The event's handler return value
* is passed into the "delete" function and is later
* examined by the mco_trans_commit(). If the value is
* anything but MCO_S_OK, the transaction is rolled back.
* In this sample every other delete transaction is
* committed.
*/
MCO_RET delete_handler( /*IN*/ mco_trans_h t,
/*INOUT*/ void *user_param)
{
int8 u8;
dropped_call_autoid_get( obj, &u8);

printf( "Event \"Object Delete\": object (%ld,%ld) is \

being deleted...", u8.lo, u8.hi );
return (((u8.lo + u8.hi) %2) ? 1: MCO_S_OK);

}
/* Handler for the "update" event. This handler is called
* before the update transaction is committed - hence the
* value of the field being changed is reported unchanged
* yet.
*/
MCO_RET update_handler1( /*IN*/ mco_trans_h t,
{
uint4 u4;
int8 u8;
dropped_call_autoid_get( obj, &u8);

dropped_call_trunk_id_get(obj, &u4);
printf( "Event \"Object Update\" (before commit): object \
(%ld,%ld) value %d\n", u8.lo, u8.hi, u4 );
return MCO_S_OK;
}
When the application is finished handling events, the events are unregistered by
calling a function like the following:
int unregister_events(mco_db_h db)

{
MCO_RET rc;
mco_trans_h t;
mco_trans_start(db,
MCO_READ_WRITE,
&t);
mco_unregister_add_trunk_handler( t, & new_handler);

mco_unregister_del_trunk_handler( t, & delete_handler);
mco_unregister_update_handler( t, & update_handler1);
return rc;
}
Please also refer to the sample programs in the eXtremeDB samples/core/10-

events directory for more sample code.
“New” and “Delete” Interfaces

Creating and deleting objects are accomplished by calling the generated _new and
_delete functions. The _new methods reserve initial space for an object and
returns a reference (handle) to the freshly created data object. These functions are
generated only for classes; no _new functions are generated for structures because
structures by themselves do not get allocated, they always belong to some class.
The _new function must be called in the context of a write transaction, otherwise
eXtremeDB generates an error.
For classes declared without oid:
MCO_RET classname_new ( /*IN*/ mco_trans_h t,

/*OUT*/ classname *handle);

or for classes declared with oid, the oid must be passed:
MCO_RET classname_new ( /*IN*/ mco_trans_h t,

/*IN*/ databasename_oid * oid,
The _delete functions permanently remove the object whose handle is passed
while the _delete_all functions remove all objects of the class from the database.
Memory pages occupied by the object(s) are returned back to the memory
manager, and can be re-used.
MCO_RET classname_delete (/*IN*/ classname *handle);

MCO_RET classname_delete_all ( /*IN*/ mco_trans_h t );
Action Interfaces
Individual database objects are accessed by passing the object handle returned by
a _new, _find or _from_cursor function to specific action functions. For each
field of an object and for each element of a structure declared in the schema file,
both _put and _get methods are generated. The semantic rules that the compiler
follows to generate _put and _get function names are simple: the class or structure
name is followed by the field name and then the action word; all separated by
underscores. Action words can be any of the following: put, get, at, put_range,
get_range, alloc, erase, pack and size.

The _put functions are called to update specific field values. Depending on the
type of field, the generated _put function will be one of two forms: for scalar type
fields it will be of the form:
MCO_RET classname_fieldname_put( /*IN*/ classname *handle,

/*IN*/ <type> value);
while for char and string fields a pointer and length argument are required:

/*IN*/ const char *value,
/*IN*/ uint2 len);
The _get functions are called to bind a field of an object to an application variable
and the form of the function will vary depending on the type of field. For scalar
type fields it will be of the form:
MCO_RET classname_fieldname_get( /*IN*/ classname *handle,

/*OUT*/ <type> *value);
For fixed length char fields the length must be specified:

/*OUT*/ char *dest,
/*IN*/ uint2 dest_size);
And if the field is a string then the function takes two extra parameters: the size of
the buffer to receive the string, and an OUT parameter to receive the actual
number of bytes returned. So the generated function will have the form:

/*OUT*/ char *dest,
/*IN*/ uint2 dest_size
/*OUT*/ uint2 *len);
If a class has one or more indexes, then the field(s) on which the index is defined
will have an index component (hash table entry or tree node) in addition to the
actual field value. The index component is not inserted when the field’s _put
function is called, but rather when the write transaction containing this update is
committed. Or, alternatively, a _checkpoint function can be called to explicitly
create the index components for this object. The _checkpoint function completes
the object’s update before the transaction is committed, however if the application
decides to rollback the current transaction, all the updates for the object including
index components are discarded. (Committing a transaction implicitly
checkpoints all the objects modified (created/updated/deleted) in the transaction.)
MCO_RET classname_checkpoint ( /*IN*/ classname *handle);

The sequence, then, to create objects, is to _new space for the object, _put one or
more field values into the object, and optionally _checkpoint the object to create
index components. If a unique index constraint is violated, the checkpoint
method will return an appropriate code.
For fields of type string, an additional _size function is generated to return the
actual size of the string value:
MCO_RET classname_fieldname_size( /*IN*/ classname *handle,

/*OUT*/ uint2 *size);
Vectors and Fixed-length Arrays

Vectors and fixed-length arrays require special consideration. The first thing to
note is that for fixed-length arrays, a constant is generated in dbname.h:
#define classname_fieldname_size NN
The functions that operate on vector and array fields require an index argument
but are otherwise functionally equivalent to their scalar counterparts. The _put
functions for fields declared as vector or fixed-size array have the form:

/*IN*/ uint2 index,
/*IN*/ <type> value);
Fields declared as vectors of strings have the form:

/*IN*/ uint2 index,
/*IN*/ const char * value,
/*IN*/ uint2 len);
To determine the number of vector elements the _size function is generated:

/*OUT*/ uint2 *size);
For fixed length arrays and vectors additional _put_range methods are generated
to assign an array of values to a vector or array. The size of the IN array should be
less than or equal to the size of the vector as specified in the prior _alloc function
call, or the size of the array as defined in the database definition.
MCO_RET classname_fieldname_put_range(
/*IN*/ classname *handle,
/*IN*/ uint2 start_index,
/*IN*/ uint2 num,
/*IN*/ const <type> *src );

Please note that _put_range methods are only generated for vectors that consist of
simple scalar elements. For vectors of structures this method is not generated. The
reason is that for simple type vector elements the schema compiler can generate
optimized methods to assign values to them. This optimization is only possible if
the size of the vector element is known at compile time. Also note that it is never
necessary to use a _put_range method to set the vector; the _put function can
always be iterated to assign individual vector element values for the desired
range.
To access a specific element of a vector, the _at functions are provided. The form
of the _at function will vary depending on the type of elements stored in the
vector. For vectors of fixed-length fields it will have the form:
MCO_RET classname_fieldname_at( /*IN*/ classname *handle,

/*IN*/ uint2 index,
/*OUT*/ <type> *result );
If the vector consists of strings or fixed length byte-arrays (char<n>), the at

method takes two extra parameters: the maximum size of the buffer to receive the
string and the actual length of the string returned:
MCO_RET classname_fieldname_at( /*IN*/ classname *handle,

/*IN*/ uint2 index,
/*OUT*/ char *result,
/*IN*/ uint2 bufsize
When allocating memory for vectors (see _alloc function below) of variable
length elements, it may be necessary to first determine the actual size of the
vector elements. The _at_len functions are generated for vectors of strings for
this purpose:
MCO_RET classname_fieldname_at_len(
/*IN*/ classname,
/*IN*/ uint2 pos,
/*OUT*/ uint2 *retlen);
The _get_range functions return a reference to a range of vector elements:
MCO_RET classname_fieldname_get_range(
/*IN*/ classname,
/*IN*/ uint2 startIndex,
/*IN*/ uint2 num,
/*OUT*/ const <type> *dest);
The _alloc functions reserve space for a vector field. The application must call the
_alloc function and supply the size of the vector in order to allocate a vector field
within a data layout. Invoking the _alloc function for a vector field of an existing

object will resize the vector. If the new size is less than the current size the vector
is truncated to the new size.
MCO_RET classname_fieldname_alloc( /*IN*/ classname *handle,

/*IN*/ uint2 size);
The _erase functions remove an element of a vector from the layout (and all
indexes the element is included in). Note that the vector size remains unchanged.
If an attempt is made to get the erased element, the runtime returns a null pointer
as a result. Also note that the erase method is only generated for vectors of
structures, not for vectors of basic types or strings. For vectors of basic types and
strings, the application should _put a recognizable value in the vector element that
it can interpret as null. (Note that the _erase functions are also generated for
optional struct fields.)
MCO_RET classname_fieldname_erase( /*IN*/ classname *handle,

/*IN*/ uint2 index);
As may have been noticed, use of the _erase functions can leave unused elements
(“holes”) in vector fields. For this reason, the _pack functions are generated for
vector fields to remove “holes” so that the space occupied by the deleted element
is returned to the free database memory pool.
MCO_RET classname_pack ( /*IN*/ classname *handle,

/*OUT*/ uint4 pages_released );
Fixed _put and _get
Often database classes will contain many fields with the consequence that
fetching and storing these objects require a long sequence of _get and _put
function calls for each individual field. To simplify the work of coding the
schema compiler generates a C-language structure for all scalar fields and arrays
of fixed length and additional _fixed_get and _fixed_put functions are generated
that can significantly reduce the number of function calls required. But, as the
name indicates, these functions can only be generated for the fixed size fields of a
given class. If a class contains fields of variable length (i.e. string or blob fields)
then these fields must be accessed with their individual _get and _put functions.
For example, the following schema:

class Record
{
uint4 key;
int4 uint4_field;
uint1 array[10];
string s;
tree <key> tkey;

};

would cause the following “C” structure to be generated:

/* Structures for fixed part of the class */
typedef struct Record_fixed_ {
uint4 key;
int4 uint4_field;
uint1 array[10];
} Record_fixed;
with the following two access functions:

MCO_RET Record_fixed_get ( Record *handle, Record_fixed* dst );
MCO_RET Record_fixed_put ( Record *handle, Record_fixed const* src );
Using these functions, objects of the Record class can be written with two
function calls: Record_fixed_put() for the fixed size portion and Record_s_put()
for the variable length field of type string “s”. Similarly, the objects of this class
can be read with two function calls: Record_fixed_get() and Record_s_get().
Sample
To illustrate the generated functions described thus far, let’s consider the
following sub-schema:
struct Id
{
uint4 num_in;
uint4 time_in;
};

class TestOne
{
vector< SYMBOL > vchars;
vector<Id> vids;
vector< signed<4> > vlong;
uint2 v2;
oid;
};
For the TestOne class, the DDL compiler will yield the following access methods:
typedef struct market_id__

{
uint4 seq;
uint4 time_in;
} market_id;
/* oid-based methods */
MCO_RET TestOne_oid_find( /*IN*/ mco_trans_h t,
/*IN*/ const market_id *id,
/*OUT*/ TestOne *handle );

MCO_RET TestOne_oid_get( /*IN*/ TestOne *handle,

/*OUT*/ market_id *id );
/* methods generated for vector vchars; vchars consist of 4-byte
elements */
static const uint2 TestOne_vchars_elsize = 4;
MCO_RET TestOne_vchars_size( /*IN*/ TestOne *handle,

/*OUT*/ uint2 *result );
MCO_RET TestOne_vchars_at( /*IN*/ TestOne *handle,

/*IN*/ uint2 position,
/*OUT*/ char * buf,
/*IN*/ uint2 bufsz);
MCO_RET TestOne_vchars_alloc( /*IN*/ TestOne *handle,

/*IN*/ uint2 size);
MCO_RET TestOne_vchars_put( /*IN*/ TestOne *handle,

/*IN*/ uint2 index,
/*IN*/ const char * value );
MCO_RET TestOne_vchars_erase( /*IN*/ TestOne *handle,

/*IN*/ uint2 index );
/* methods generated for vector vids;

vids elements are structures of type Id */
MCO_RET TestOne_vids_size( /*IN*/ TestOne *handle,
/*OUT*/ uint2 * result );
MCO_RET TestOne_vids_at( /*IN*/ TestOne *handle,

/*IN*/ uint2 index,
/*OUT*/ Id * h );
MCO_RET TestOne_vids_alloc( /*IN*/ TestOne *handle,

/*IN*/ uint2 size);
MCO_RET TestOne_vids_put( /*IN*/ TestOne *handle,

/*IN*/ uint2 index,
/*IN*/ Id * h );
MCO_RET TestOne_vids_erase( /*IN*/ TestOne *handle,

/*IN*/ uint2 index );
/* Methods generated for vector vlong; vlong is a vector of long

integers. Note get_range and put_range are available*/
MCO_RET TestOne_vlong_size( /*IN*/ TestOne *handle,
MCO_RET TestOne_vlong_at( /*IN*/ TestOne *handle,

/*IN*/ uint2 index,
/*OUT*/ int4 *result);
MCO_RET TestOne_vlong_get_range( /*IN*/ TestOne *handle,

/*IN*/ uint2 num,
/*OUT*/ int4 *dest);
MCO_RET TestOne_vlong_alloc( /*IN*/ TestOne *handle,

/*IN*/ uint2 size);
MCO_RET TestOne_vlong_put( /*IN*/ TestOne *handle,

/*IN*/ uint2 index,
/*IN*/ int4 value );
MCO_RET TestOne_vlong_put_range( /*IN*/ TestOne *handle,

/*IN*/ uint2 num,
/*IN*/ const int4 *src );
/* methods generated for v2; */

MCO_RET TestOne_v2_get( /*IN*/ TestOne *handle,

MCO_RET TestOne_v2_put( /*IN*/ TestOne *handle,
/*IN*/ uint2 value );
/*-------------------------------------------------------*/
/* struct Id methods */
MCO_RET Id_num_in_get ( /*IN*/ Id *handle,

MCO_RET Id_num_in_put ( /*IN*/ Id *handle,

MCO_RET Id_time_in_get ( /*IN*/ Id *handle,

MCO_RET Id_time_in_put ( /*IN*/ Id *handle,


Block Allocation
For persistent classes, eXtremeDB provides a “block allocator” to facilitate
objects’ locality of references, i.e. to assure that a group of objects will be stored
in a contiguous block on disk. This capability is provided in the form of a
generated API function classname_set_allocation_block_size(). The application
calls this function within a READ_WRITE transaction with a sufficient block size to
store a group of objects of the given class.
This function provides major performance improvements for persistent objects:
• It keeps the entire object within the same block, greatly improving the
read performance for complex objects
• It keeps groups of objects that had been added to the database at the same
time stored together within the same block, improving the sequential
access performance for objects of the same class
Note: Database file fragmentation can also be greatly reduced by use of the
file_extension_quantum element of the mco_db_params_t structure used when
opening or extending the database (see function mco_db_open_dev()) . A non-
zero value for file_extension_quantum will cause the database runtime to allocate
space in the database file by the specified number of bytes as opposed to by a the
page size (typically 4K, 8K, or 16K).
Note: When objects are deleted, the space is returned back to the database pool
and can be reused for indexes and other objects. But this space will not be reused
for objects of the same class because these new objects are allocated by blocks.
Thus, deleting objects does not reduce the locality of references.
Note: The block allocator works well when objects are not updated because
when a dynamic object is updated it is possible that a part (the variable length
part) of the object will be allocated outside the block. To lessen this effect, the
runtime always attempts to allocate the entire object on the same page (Disk
Manager page) as the object header.
Character string collation

The eXtremeDB core and eXtremeDB UDA programming interfaces include
support for collations. A collation, as defined in Wikipedia, “is the assembly of
written information into a standard order. One common type of collation is called
alphabetization, though collation is not limited to ordering letters of the

alphabet.” Collation is implemented as a set of rules for comparing characters in

a character set. A character set is a set of symbols with assigned ordinals that
determine precise ordering. For example, in the segment of the Italian alphabet
consisting of the letters “a”, “a`”, “b”, “c”, “d”, “e”, “e`”, “f” the letters could be
assigned the following ordinals: a=0, a`=1, b=2, c=3, d=4, e=5, e`=6, f=7. This
mapping will assure that the letter “a`” (“a accentato” or “accented a”) will be
sorted after “a” but before “b”, and “e`” will follow “e” but precede “f”.
In some character sets, multiple-character combinations like “AE” (“labor lapsus”

in the Danish and Norwegian alphabets or “ash” in Old-English) and “OE” (an
alternate form of “Ö” or “O-umlaut” in the German alphabet) are treated as single
letters. This poses a collation problem when strings containing these character
combinations need to be ordered. Clearly, a collation algorithm to sort strings of
these character sets must compare more than a single character at a time.
“Capitalization” is also a collation issue. In some cases strings will be compared

in a “case sensitive” manner where for example the letters “a-z” will follow the
(uppercase) letter “Z”, while more often strings will be compared in a “case
insensitive” manner where “a” follows “A”, “b” follows “B”, etc. This can be
easily accomplished by treating uppercase and lowercase versions of each letter as
equivalent, by converting upper to lower or vice versa before comparing stings, or
by assigning them the same ordinal in a case-insensitive character set.
eXtremeDB enables applications to compare strings using a variety of collations,

and to mix strings and character arrays with different character sets or collations
in the same database; character sets and collations are specified at the application
level.
Collation Data Definition Language and API function definitions
The eXtremeDB DDL language supports a collation declaration for tree and hash
indexes on string-type fields as follows:
[unique] tree<string_field_name_1 [collate C1]
[, string_field_name_2 [collate C2]], …> index_name;
hash<string_field_name_1 [collate C1]

[, string_field_name_2 [collate C2]], …> index_name;
If a collation is not explicitly specified for an index component, the default

collation is used. Based on the DDL declaration, for each collation the DDL
compiler will generate the following compare function placeholders for tree
indexes and/or hash indexes using this collation:
int2 collation_name_collation_compare ( mco_collate_h c1, uint2 len1,
mco_collate_h c2, uint2 len2 );
{
/* TODO: add your implementation here */
return 0;

uint4 collation_name_collation_hash (mco_collate_h c, uint2 len)

{
return 0;
}
For each defined collation, a separate API is generated. The actual

implementation of the compare functions, including the definition of character
sets, is the application’s responsibility. To facilitate compare function
implementation, eXtremeDB includes the following set of functions:
mco_collate_get_char(mco_collate_h s, char *buf, uint2 len);
mco_collate_get_nchar(mco_collate_h s, nchar_t *buf, uint2 len);
mco_collate_get_wchar(mco_collate_h s, wchar_t *buf, uint2 len);
mco_collate_get_char_range(mco_collate_h s, char *buf,

uint2 from, uint2 len);
mco_collate_get_nchar_range(mco_collate_h s, nchar_t *buf,
mco_collate_get_wchar_range(mco_collate_h s, wchar_t *buf,
Note that three different versions of the mco_collate_get_*char() and

mco_collate_get_*char_range() functions are required because, in order to use the
same collation, the arguments must be of the corresponding type for the field
being accessed. In other words: for fields of type string and char<n>, the *char
version (mco_collate_get_char()) will be called; for fields of type nstring and
nchar<n>, the *nchar version; and for fields of type wstring and wchar<n>, the
*wchar() version.
The application registers user-defined collations via the following API function:
mco_db_register_collations(dbname, mydb_get_collations());
This function must be called prior to mco_db_connect() or mco_db_connect_ctx()

and must be called once for each process that accesses a shared memory database.
The second argument mydb_get_collations() to mco_db_register_collations() is a
database specific function similar to mydb_get_dictionary() that is generated by
the DDL compiler in the files mydb.h and mydb.c. In addition, the DDL compiler
generates the collation compare function stubs in mydb_coll.c. (Note that if the
file mydb_coll.c already exists, the DDL compiler will display a warning and
generates mydb_coll.c.new instead.)
Note: What is the difference between user-defined indexes and collations?

User–defined indexes are implemented through user-defined compare and hash
functions that are passed objects and can compare key fields any way they like.
However, collations can only be defined for character fields (string, char<>,
nstring, nchar<>, wstring and wchar<>) and the key segments are compared in
the sequence defined in the schema.
But collations are much simpler to implement. Whereas user-defined indexes

require object-to-object and object-to-key function implementations for tree

indexes, and hash_object and hash_external_key function implementations for
hash indexes, collations require a single compare function for each collation.
Furthermore the same collation can be used in different classes and indexes. For
example, for the case-insensitive collation it is necessary to implement a single
function.
To summarize: Collations are a better choice when the application needs to
simply change how strings are sorted (compared) and user-defined indexes are
appropriate for more complex user-defined data sort algorithms.
Examples
Example 1:
File “schema.mco”:
class A {
string name;
tree <name collate Cname> tname;

};
The key word collate is used in the schema definition file “schema.mco” to
indicate that a tree index tname is to be generated on string field name, using
collation Cname. This DDL instructs the database runtime to use a custom rule
named ‘Cname’ to compare the string field ‘name’. Note that the same collation
(rule) can be used multiple times in the same index, in different indexes within the
same class, or in different classes.
Example 2
File “schema.mco”:
declare database mydb;
class A {
string s;
char<20> c;
tree <s collate C1> sidx;

hash <c collate C1> cidx[1000]; /* CORRECT: string and char<20> can be
used
with the same collation C1 */
};
class B {
string s;
nchar<20> nc;
tree<s collate C1> sidx;

// tree<nc collate C1> ncidx; /* INCORRECT: string and nchar<N> can’t
be used with the same collation */
tree<nc collate C2> ncidx2; /* CORRECT – different collation, C2 */
};

Note that in class A the same collation (“C1”) can be used in a tree and hash
index, and in class B a new collation (“C2”) must be defined because its base field
nc is of type nchar. To use the collation C1 in the tree indexes, the application
must implement a compare function with the following signature:
typedef int2 (*mco_compare_collation_f) ( mco_collate_h c1, uint2 len1,
mco_collate_h c2, uint2 len2);
The parameters are collation descriptors (as strings) c1 and c2 and their lengths
(number of symbols) len1 and len2. The compare function must return an integer
value indicating how the strings are compared: negative if c1 < c2, zero if c1 ==
c2 and positive if c1 > c2. This function is called by the runtime to compare field
values in two objects as well as to compare the field value with an external key
value.
If a collation is used in a hash index, as is C1 in class A, the application must

implement a hash function with the following signature:
typedef uint4 (*mco_hash_collation_f) ( mco_collate_h c, uint2 len);
The parameters are a descriptor c (as a string) and its length (number of symbols)
len. The function must return an integer hash code for the string. (Note that if the
compare function returns zero for two strings X and Y, i.e. X is equal to Y, the
hash function must generate the same hash code for X and Y.)
For this sample schema, the DDL compiler generates these compare function
stubs in mydb_coll.c:
/* collation compare function */

int2 C1_collation_compare ( mco_collate_h c1, uint2 len1,
mco_collate_h c2, uint2 len2)
{
return 0;
}
uint4 C1_collation_hash (mco_collate_h c, uint2 len)

{
return 0;
}

{
return 0;
}
The DDL compiler also generates the function applications will use to register the
specified collations with the eXtremeDB database runtime in mydb.h and mydb.c:

mco_collation_funcs_h mydb_get_collations(void);
Example 3
Sample using a “case-insensitive” collation tree index:

/* schema definition file… */
declare database colldb;
class Record
{
string name;
unsigned<4> value;
unique tree <name> tstd;

unique tree <name collate C1> tcoll;
};
/* application code snippets… */
char * fruits[] = {
"banana", "PEAR", "plum", "Peach", "apricot", "Kiwi", "QUINCE",
"pineapple", "Lemon", "orange", "apple",
"pawpaw", "Fig", "mango", "MANDARIN", "Persimmon", "Grapefruit", 0
};

{
char buf1[16], buf2[16];
mco_collate_get_char(c1, buf1, sizeof(buf1));

mco_collate_get_char(c2, buf2, sizeof(buf2));
return stricmp(buf1, buf2);

}
int main(void)
{
MCO_RET rc;
mco_db_h db = 0;
mco_trans_h t;
mco_cursor_t c;
uint2 len;
char buf[16];
…
/* open the database */
/* register the custom compare & hash functions */

mco_db_register_collations(db_name, colldb_get_collations());
/* connect to database */
rc = mco_db_connect(db_name, &db);
/* fill database with records setting field s to fruit names */
rc = mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND,

&t);
/* using custom collate tree index iterate through the cursor */
rc = Record_tcoll_index_cursor(t, &c);
for (rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c))

{
Record_from_cursor(t, &c, &rec);
Record_s_get(&rec, buf, 11, &len);
printf("\n\t%-15s", buf);
}
}
}
…
}
Note that the only additional step the main application needs to perform in order
to implement a specialized string collation is to register the collation prior to
connecting to the database. The sorting logic is handled by the collation compare
function. In this case the compare logic is simply to return the value returned by
the case-insensitive C runtime function stricmp().

Blob Support
eXtremeDB provides support for BLOB fields through BLOB interface functions.
BLOB elements are useful when it is necessary to keep streaming data, with no
known size limits. The semantics for BLOB interfaces is very similar to the
standard _put/_get semantics.
Use the _get method to copy BLOB data to an application’s buffer; it allows
specification of a starting offset within the BLOB.

/*IN*/ uint4 startOffset,
/*OUT*/ char *buf,
/*IN*/ uint4 bufsz,
For the _get function, the ‘bufsz’ parameter is the size of the buffer passed by the
application in the ‘buf’ parameter. The ‘len’ output parameter is the actual
number of bytes copied to the buffer by the _get function (which will be <=
bufsz).
The _size function returns the size of a BLOB data element. This value can be
used to allocate sufficient memory to hold the BLOB, prior to calling the _get
function.

/*OUT*/ uint4 * result);
The _put function populates a BLOB field, possibly overwriting prior contents—
it allocates space and copies data from the application’s buffer; the size of the
BLOB must be specified.

/*IN*/ const void *from,
/*IN*/ uint4 nbytes);
The _append function is used to append data to an existing BLOB. This method is
provided so an application does not have to allocate a single buffer large enough
to hold the entire BLOB, but rather can conserve memory by writing the BLOB in
manageable pieces.
MCO_RET classname_fieldname_append(/*IN*/ classname *handle,

/*IN*/ const void * from,
/*IN*/ uint4 nbytes );
To erase (truncate) a BLOB, pass a size of 0 to the _put method.

Search Methods
Search interfaces locate desired objects or groups of objects by unique identifier
or by index. While exact match lookups by unique identifier (oid and autotid )
using the _find functions are extremely efficient for locating individual objects,
eXtremeDB also supports the following types of index searches that employ a
cursor to navigate a group of objects as an ordered or unordered result set: hash-
based (unique and non-unique), tree-based (including Patricia, rtree and kdtree)
and list-based.
Find Functions
Find methods (as indicated above in the oid-based and autoid-based find
methods) search the database based on an exact match of index values. By
definition, an exact match lookup on a unique index returns exactly zero (not
found) or one result. The _find functions, therefore, do not require and do not use
a cursor.
The _find functions are generated for classes that have one or more unique hash
indexes or one or more unique tree indexes declared. For each unique index, the
following interface is generated:
MCO_RET classname_index_name_find( /*IN*/ mco_trans_h trans,

/*IN*/ <type> [*]param1,
[/*IN*/ uint2 len1,]
[/*IN*/ <type> [*]param2,
[/*IN*/ uint2 len2,] …]
Cursor-based Search Functions

A cursor is essentially an iterator over the collection of objects in a result set. For
each index declared for a class, the compiler generates functions to initiate a
cursor, position it based on some value(s), and obtain a reference to the database
object based on the cursor. List- and hash-based cursors allow navigation in
sequential order, though the order of the sequence is not known to the application;
it is simply a mechanism to iterate over the unordered list of objects of a class. A
list or unique-hash declaration in the schema file will cause the compiler to
generate cursor functions but with a _find function (for exact match) instead of a
_search function. (The cursor navigation functions themselves are application
independent and described above.)
The _search functions are used whenever it is necessary to:
• Establish a starting position in a sorted list with a known starting value and
optionally retrieve subsequent results in ascending or descending sorted order.

• Establish a starting position in a sorted list when only part of the starting value
is known, find the closest match, and optionally retrieve subsequent results in
ascending or descending sorted order.
• Establish a starting position as above, iterate over the sorted list until an upper
bound is reached, using the _compare() method to determine when the range
limit is reached.
The following two functions are generated to obtain a cursor for an index:
MCO_RET classname_list_cursor( /*IN*/ mco_trans_h t,

/*OUT*/ mco_cursor_h c );
MCO_RET classname_indexname_index_cursor(
The _search functions generated for all non-unique hash and tree indexes are of
the following form:
MCO_RET classname_indexname_search( /*IN*/ mco_trans_h trans,

/*INOUT*/ mco_cursor_h cursor,
/*IN*/ MCO_OPCODE op,
/*IN*/ [const] <type> [*]param1,
[[/*IN*/ uint2 len1,]
[/*IN*/ [const] <type> [*]param2,
[/*IN*/ uint2 len2,] …]);
where MCO_OPCODE represents a compare operation as defined in Appendix A.
After positioning a cursor with a search function or one of the general cursor
positioning functions (_first, _last, _next, _prev), the _from_cursor function is
used to obtain a reference (pointer) to the object:
MCO_RET classname_from_cursor( /*IN*/ mco_trans_h t,

/*OUT*/ classname *object);
The _locate function is used to position a tree index cursor based on an object
reference. The cursor must have been previously initiated using the
_index_cursor method. After positioning the cursor, the cursor positioning
functions, _next and _prev, can be used to iterate over the objects. The _locate
function applies only to tree-index-based cursors (i.e. not list or hash cursors):
MCO_RET classname_indexname_locate(
/*INOUT*/ mco_cursor_h c,
/*IN*/ classname * handle);
The compare function is used to compare the value of an object referenced by the
cursor with an application supplied value. The method returns a zero if the values
compared are equal, less than zero if the values stored in the object referenced by

the cursor are less than the values passed in, or greater than zero if the values
stored in the object referenced by the cursor are greater than the values passed in.
MCO_RET classname_indexname_compare(
/*IN*/ mco_trans_h trans,
/*IN*/ mco_cursor_h cursor,
/*IN*/ [const] <type> [*]param1,
[[/*IN*/ uint2 len1,]
[/*IN*/ [const] <type> [*]param2,
[/*IN*/ uint2 len2,] …],
/*OUT*/ int *result);
Pattern Search
In addition to the search capabilities described in the previous section,
eXtremeDB supports wildcard pattern matching ability. This is the capability to
search tree index entries matching patterns specified with wildcard characters for
single character and multiple character matches. By default, the question mark “?”
will match any single character in the specified position within the pattern, and
the asterisk “*” will match any combination of characters (including no
characters) in that position. If a match on the characters “?” or “*” is desired, the
wildcard characters themselves can be modified by specifying different characters
in the pattern search policy (see below).
For example, “G*e*” would return “Graves” and “Gorine”, while “Gr?ve*”
would match “Graves”, “Grove”, “Grover” and so on... In this example, ‘*’
matches zero, one, or more characters, while ‘?’ matches exactly one character.
Further, the pattern “G*E*” would match all uppercase entries like “GRAVES”,
“GORINE”. However, because the standard compare functions used to match
index values with search keys use case-sensitive compare functions (for example,
strcmp) the case specified in the search pattern will affect the search results.
To illustrate the use of these functions, suppose we have the following class
definition:
class PatternTest
{
string key1;
char<20> key2;
int4 key3;
tree <key1,key2,key3> i1;
tree <key2,key3> i2;
};
The schema compiler will generate the following interfaces:
MCO_RET PatternTest_i1_pattern_size(
const char *key1,
uint2 sizeof_key1,
const char *key2,
uint2 sizeof_key2,
int4 key3
/*OUT*/ uint4 *pattern_size);

MCO_RET PatternTest_i1_pattern_search(
mco_trans_h t,
void *allocated_pattern,
uint4 memsize,
PatternTest *obj,
const char *key1,
uint2 sizeof_key1,
const char *key2,
uint2 sizeof_key2,
int4 key3 );
MCO_RET PatternTest_i1_pattern_next(
mco_trans_h t,
void *allocated_pattern,
PatternTest *obj);
To use the ‘i1’ index to perform a pattern search, we take the following steps:
First, we allocate a buffer that the eXtremeDB run-time uses as a state machine
during the pattern search. The size of the buffer required is determined by the
classname_pattern_size() interface, for example:
char *key1 = "Gr?ve*";

char *key2 = S"*";
uint4 key3 = 46;
void *buf;
uint4 bsize;
PatternTest pat;
PatternTest_i1_pattern_size(key1, strlen(key1),
key2, strlen(key2), key3, &bsize);
buf = malloc(bsize);
Now we can code a loop to retrieve the index entries that match the pattern we
have indicated:
for( rc = PatternTest_i1_pattern_search( transaction, buf, bsize,

&pat, key1, strlen(key1), key2, strlen(key2), key3 );
rc == MCO_S_OK ;
rc = PatternTest_i1_pattern_next(transaction, buf, &pat) {
... }
free(buf);
Pattern Search Policy Interfaces
The following functions use the pattern policy structure to get and set the pattern
matching policy:
typedef struct mco_pattern_policy_t_ {

char ignore_other_fields; // default = MCO_NO
char any_char; // default = ‘*’
char one_char; // default = ‘?’
nchar_t any_nchar; // default = 0x002a
nchar_t one_nchar; // default = 0x003f
#ifdef MCO_CFG_WCHAR_SUPPORT
wchar_t any_wchar; // default = (wchar_t)0x002a
wchar_t one_wchar; // default = (wchar_t)0x003f
#endif
void mco_get_default_pattern_policy(mco_pattern_policy_t * p);
MCO_RET mco_get_pattern_policy(mco_trans_h t, mco_pattern_policy_t *p);
MCO_RET mco_set_pattern_policy(mco_trans_h t,
const mco_pattern_policy_t *p);

The ‘ignore_other_fields’ element controls whether non-character fields that

participate in an index are considered during the pattern matching or not. In the
example at the beginning of this section, an index consisting of two character
fields and a 4-byte integer field was defined. By setting ‘ignore_other_fields =
MCO_YES’ we can use this index to match patterns on the two character fields
without regard to the value of the integer field.
The following code snippet demonstrates the how the pattern matching policy
might be changed:
mco_pattern_policy_t p;
/*
* NOTE: The change policy operation must be execute within a Read_Write
* transaction.
*/
mco_get_pattern_policy(trn, &p);
p.ignore_other_fields = MCO_YES;
mco_set_pattern_policy(trn, &p);
Index Search and Navigation

An eXtremeDB tree index is a tree-like structure in the application’s memory that
contains a collection of keys. A key can be simple (based on a single field) or
compound, i.e. containing more than one field. The tree-index stores the keys in
sorted order. In order to support sorting, the tree-index must have a compare
function. Regardless of the key type (simple or compound), the compare function
must be able to compare two keys and determine whether the first key is less than
(compare returns -1), equal to (0) or greater than (1) the second key. The tree-
index algorithm sorts the keys based on this comparison.
Note: The tree-index algorithm sorts the keys based on their relative weight,
determined from the compare function results, rather than on the key’s value.
User-defined Index Functions
Tree and hash indexes can be declared as userdef, in which case the application
must provide custom compare functions that are used by the database runtime
when building the index and during lookup. For example, consider the following
sample DDL:
declare database mydb;
class Obj {
unsigned<4> first_part1;
usigned<2> first_part2;
unsigned<4> second_part1;
signed<2> second_part2;
string data;
userdef hash <first_part1, first_part2> first[1000];

userdef tree <second_part1, second_part2> second;

};
For the tree index “first” it is necessary to implement two functions: the object-to-
object and object-to-key compare functions. These functions return a negative,
zero or a positive value depending on whether the first parameter is less than,
equal to, or greater than the second parameter. For hash indexes, the compare
function returns zero if the first and the second parameters are equal and non-zero
otherwise. The function prototypes are generated by the schema compiler and are
placed into a file named <dbname>_udf.c (where dbname is the database name in
the ‘declare database’ statement).
For example, for the above sample schema the following mydb_udf.c file would
be generated:
#include "mydb.h"
#include "mcowrap.h"
/*
* API for the user-defined index "first"
*/
/* object-to-object user-defined compare function */
int2 Obj_first_compare_obj ( Obj * handle1, Obj * handle2 ){
return 0;
}
/* object-to-key user-defined compare function */

int2 Obj_first_compare_ext ( Obj * handle, void ** key ){
return 0;
}
/* user-defined object hash function */

uint4 Obj_first_hash_obj ( Obj * handle ){
return 0;
}
/* user-defined key hash function */

uint4 Obj_first_hash_ext ( void ** key){
return 0;
}
/*
* API for the user-defined index "second"
*/
int2 Obj_second_compare_obj ( Obj * handle1, Obj * handle2 ){
return 0;
}

int2 Obj_second_compare_ext ( Obj * handle, void ** key ){
return 0;
}
If a file with the <dbname>_udf.c already exists (i.e. the schema was previously
compiled, then modified and compiled again), the DDL compiler will generate a
file <dbname>_udf.c.new and display a warning. In this case, it is the
responsibility of the programmer to decide which file should be included in the

project. If the user-defined indexes have been changed, then the .new file should
be renamed to the .c file.
The custom user-defined compare functions are called in the context of a

transaction. So it is possible to access object fields using the standard _get()
functions (Obj_first_part1_get()). To access key fields use the
supplementary macros defined in the <dbname>.h file (for example, for the
example above):
#define Obj_first_extkey_first_part1(k)
#define Obj_first_extkey_first_part2(k)
#define Obj_second_extkey_second_part1(ek)
#define Obj_second_extkey_second_part2(ek)
A sample implementation for the custom compare and hash functions could be as
follows:

int2 Obj_first_compare_obj ( Obj * handle1, Obj * handle2 ){
uint4 o1_first_part1, o2_first _part1;
uint2 o1_first _part2, o2_first _part2;
Obj_first_part1_get( handle1, &o1_first_part1 );
if (o1_first_part1 != o2_first_part1)
return 1;

if (o1_first_part2 != o2_first_part2)
return 1;
return 0;
}

int2 Obj_first_compare_ext ( Obj * handle, void ** key ){
uint4 o_first_part1;
Obj_first_part1_get( handle, &o_first_part1 );
if ( o_first_part1 != Obj_first_extkey_first_part1(key) )
return 1;
if ( o_first_part2 != Obj_first_extkey_first_part2(key) )
return 1;
return 0;
}
/* user-defined object hash function */

uint4 Obj_first_hash_obj ( Obj * handle ){
uint4 hash;

hash = (o_first_part1*1000+o_first_part2) / 1000;

return hash;
}
/* user-defined key hash function */

uint4 Obj_first_hash_ext ( void ** key){

uint4 hash;
hash = ( Obj_first_extkey_first_part1(key)*1000 +
Obj_first_extkey_first_part2(key)) / 1000;
return hash;
}
/*
* API for the user-defined index "second"
*/

int2 Obj_second_compare_obj ( Obj * handle1, Obj * handle2 ){
uint4 o1_second_part1, o2_second _part1;
uint2 o1_second _part2, o2_second _part2;
Obj_second_part1_get( handle1, &o1_second_part1 );


if ( o1_second_part1 < o2_second_part1 )
return -1;
if ( o1_second_part1 > o2_second_part1 )
return 1;
if ( o1_second_part2 < o2_second_part2 )
return -1;
if ( o1_second_part2 > o2_second_part2 )
return 1;
return 0;
}

int2 Obj_second_compare_ext ( Obj * handle, void ** key ){
uint4 o_second_part1;
uint2 o_second_part2;
Obj_second_part1_get( handle, &o_second_part1 );

Obj_second_part2_get( handle, &o_second_part2 );
if ( o_second_part1 < Obj_second_extkey_second_part1(key) )

return -1;
if ( o_second_part1 > Obj_second_extkey_second_part1(key) )
return 1;
if ( o_second_part2 < Obj_second_extkey_second_part2(key) )
return -1;
if ( o_second_part2 > Obj_second_extkey_second_part2(key) )
return 1;
return 0;
}
Before the custom index API functions can be used, the application must register
the custom compare and hash functions with the database runtime. This is done
via the mco_db_register_udf() API:
MCO_RET mco_db_register_udf( const char * db_name,

mco_userdef_funcs_h udfs );
Where the db_name is the database name used for the mco_db_open() and udfs is
a pointer to the custom functions table. This pointer is obtained via the generated
<dbname>_get_udfs() API.
Note: The mco_db_register_udf() must be called after the

mco_runtime_start() and before the API that creates a connection to the database:

mco_db_connect(), mco_db_load(), mco_db_connect_and_recover(), etc.
For example:
mco_db_open("MyDB", mydb_get_dictionary(), start_mem, DBSIZE, PAGESIZE);
mco_db_register_udf("MyDB", mydb_get_udfs());
mco_db_connect("MyDB", &db);
...<continue processing>...
If user-defined indexes are declared for the database, but custom functions are not
registered via the mco_db_register_udf(), the connection API will return the
MCO_E_NOUSERDEF_FUNCS return code.
Search Example
To demonstrate how key searches produce their result sets, assume we have a
simple key containing one uint4 field populated with the dataset [1,2,2,3,4,4,5],
and the following compare function:
int compare_uint4( uint4 a, uint4 b){
if ( a == b ) return 0;
if ( a > b ) return 1;
return -1;
};
Following is a table representing the keys’ values and their respective weights
after the _search function is called:
Key Value Key Weight

1 0
2 1
2 1
3 2
4 3
4 3
5 4
Now consider a slightly less trivial example where the key is represented as a
structure containing two fields, uint4 and char[16]:
typedef struct
{
uint4 f1;
char f2[16];
} the_key;
The compare function compares integers as in the above sample and the char
buffers as ASCII characters. Assume that the dataset is as follows:

[ (1,"ZZZ"), (1,"BBB"), (2,"AAA"), (2,"AAA"), (2,"BBB"), (3,"XXX") ]
And the compare function:
int compare_compound_key ( the_key a, the_key b )

{
int i;
if ( a.f1 < b.f1 ) return -1;

if ( a.f2 > b.f2 ) return 1;
for ( i=0; i<sizeof(the_key.f2); i++ )

{
if ( a.f2[i] < b.f[2] ) return -1;
if ( a.f2[i] > b.f[2] ) return 1;
}
return 0;
};

The resulting key weight-table is:

1, “BBB” 0
1, “ZZZ” 1
2, “AAA” 2
2, “AAA” 2
2, “BBB” 3
3, “XXX” 4
Further, if the tree-index is in the reverse order (declared as descending) the tree-
algorithm uses the compare function results in the “reverse” order and the weight
tables will be as follows:

5 0
4 1
4 1
3 2
2 3
2 3
1 4

3, “XXX” 0
2, “BBB” 1
2, “AAA” 2
2, “AAA” 2
1, “ZZZ” 3
1, “BBB” 4
Search Algorithm
The index search algorithm operates with key weights, not with key values,
regardless of whether the index direction is ascending or descending. As
explained above, an index lookup for a specified key value is performed by
calling the _search() function which takes a cursor, a search operation and the
key value as parameters. The search operations and their rules require some
explanation.
There are five search operations: MCO_LT, MCO_LE, MCO_EQ, MCO_GE,

and MCO_GT. Abbreviating “weight of the specified keys value” as WSKV and

“cursor’s current element” as CCE, the following table describes how they are
correlated by the eXtremeDB runtime:
Search operation Search condition Description of rules

MCO_LT less then (<) points the CCE to the first key with weight
WSKV-1
MCO_LE less or equal (<=) points the CCE to the last (rightmost)
occurrence from the keys with weight equal to
WSKV or (if not found) to the first key with
weight WSKV-1
MCO_EQ equal (==) points the CCE to the first key with weight
equal to WSKV
MCO_GE great or equal (>=) points the CCE to the first (leftmost)
occurrence from the keys with weight equal to
WSKV or (if not found) to the first key with
weight WSKV+1
MCO_GT great then (>) points the CCE to the first key with weight
WSKV+1
To illustrate these rules, consider the first example dataset above.
MCO_LT
1 = CCE 2 2 3 4 4 5 CCE points to key value 1
mco_cursor_prev will return
MCO_S_CURSOR_END
mco_cursor_next will return MCO_S_OK and will
move CCE to key value 2
MCO_LE
1 2 2 = CCE 3 4 4 5 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and will
move CCE to key value 2(the leftmost)
MCO_EQ
1 2 = CCE 2 3 4 4 5 CCE points to key value 2
move CCE to key value 2(the rightmost)
MCO_GE
1 2 = CCE 2 3 4 4 5 CCE points to key value 2


MCO_GT
1 2 2 3 = CCE 4 4 5 CCE points to key value 3
move CCE to key value 4 (the leftmost)
In the case of the reverse index:
MCO_LT
5 4 4 3 = CCE 2 2 1 CCE points to key value 3
move CCE to key value 2(leftmost)
MCO_LE
5 4 4 3 2 2= CCE 1 CCE points to key value 2
move CCE to key value 2(the leftmost)
mco_cursor_next will return MCO_S_OK and
move CCE to the key value 1
MCO_EQ
5 4 4 3 2 = CCE 2 1 CCE points to key value 2
mco_cursor_prev will return MCO_S_OK and
move CCE to the key value 3
move CCE to the key value 2(the rightmost)
MCO_GE
5 4 4 3 2 = CCE 2 1 CCE points to key value 2
move CCE to the key value 2(the rightmost)
MCO_GT
5 4 4 3 2 2 1 = CCE CCE points to key value 1


move CCE to key value 2 (rightmost)
mco_cursor_next will return
MCO_S_CURSOR_END
A common search and delete mistake
Sometimes, applications need to obtain a cursor to traverse through a class and

delete some of the objects in the process. The cursor contains a reference to the
current object. Therefore, if the current object is deleted, the cursor is unable to
move and the database runtime returns a MCO_E_CURSOR_INVALID error
code. To remove an object within a cursor, the application needs to first move the
cursor and, after the cursor no longer points to the target object, delete the object.
Note that this is true regardless of how the cursor has been obtained (via a B-tree,
hash, Patricia tree, or any other index type) The following pseudo-code illustrates
this technique:
rc = open_cursor(cursor);
while (rc == MCO_S_OK && (rc = from_cursor(cursor,obj)) ==
MCO_S_OK){
rc = move_cursor(cursor);
delete_obj(obj);
}
Patricia Trie Index

The eXtremeDB Patricia trie index uses a Patricia trie structure that is based on a
radix tree using a radix of two. The term "trie" is derived from the word
"retrieval". Patricia stands for "Practical Algorithm to Retrieve Information Coded
as Alphanumeric”. This algorithm is optimized to perform quick IP address prefix
matching for IP subnet, network or routing table lookups. So Patricia Trie
indexes are particularly useful for network and telecommunications applications.
eXtremeDB allows a Patricia index to be declared over scalar and boolean data
types as well as arrays and vectors of those types. In fact the boolean data type,
new to version 4.0, was introduced to facilitate Patricia index implementation
where bit arrays are used to store IP addresses.
Boolean data type:
The boolean data type can be used to define a single bit field, a fixed size array of
bits, or a variable length array of bits.
Example DDL declaration:

class xyz{
boolean b1; // a bit field
boolean b2[32]; // fixed-size array of 32 bits
vector<boolean> b3; // variable-length bit array
};
The following functions are generated for fields of type Boolean:
classname_fieldname_get(classname *handle,
/*OUT*/ uint1 * result)
classname_fieldname_put(classname *handle,
uint1 value)
For a fixed-size bit array the following interfaces are also generated:
// reads the specified bit out of the array.

// Note: It’s OK to read one bit, but in order
// to read several bits it’s better to read the
// entire array and mask it.
classname_fieldname_at(classname* handle,
uint2 index,
/*OUT*/ uint1 *result);
// read range of bits.

// Note the start_index bit number must be a power of 8
classname_fieldname_get_range(classname* handle,
uint2 start_index,
uint2 num,
/*OUT*/ uint1 *dest );
// again, the bit-by-bit _put is not the most effective

// way of writing the array. Better to mask the entire
// array and use _put_range
classname_fieldname_put(classname* handle,
uint2 index,
uint1 value );
// write range of bits.

classname_fieldname_put_range(classname* handle,
uint2 start_index,
uint2 num,
const uint1 *src );
// read the entire field

classname_fieldname_get(classname *handle,
/*OUT*/ uint1 * result)
For variable-size arrays (vectors), the following interfaces are also generated:
// vector size (in bits)

classname_fieldname_size(classname* handle,
/*OUT*/ uint2 *result );
// allocate a vector (size in bits)

classname_fieldname_alloc(classname* handle,
uint2 size);
// read an element at a specified position

classname_fieldname_at(classname* handle,
uint2 index,

// read range of bits.

classname_fieldname_get_range(classname* handle,
uint2 start_index,
uint2 num,
/*OUT*/ uint1 *dest );
// write range of bits.

classname_fieldname_put_range(classname* handle,
uint2 start_index,
uint2 num,
const uint1 *src );
// write an element (not very efficient)

classname_fieldname_put(classname* handle,
uint2 index,
uint1 value );
The only index type possible for the Boolean data type is the Patricia trie index. A
single-bit field cannot be indexed, nor is it advisable to index a short array of bits.
(It would be faster to perform a table scan than to create an index for a 2 or 3 bit
field, as well as avoid the memory consumption and CPU cycles to maintain the
index. Exactly where is the tipping point, is an exercise left to the reader.)
Using Patricia trie indexes:
The Patricia index can be declared unique; in the absence of the unique keyword
it defaults to allowing duplicates. Unlike other eXtremeDB indexes, the Patricia
index cannot be compound; it is always declared for a single field.
The following schema illustrates some possible declarations:
class xyz{
Boolean b1[32];
vector<boolean> b2;
uint4 b3;
char<10> b4[10];
vector< string> b5;
patricia <b1> Ib1;

patricia <b2> Ib2;
patricia <b3> Ib3;
patricia <b4> Ib4;
patricia <b5> Ib5;
unique patricia <b1> Ib1U;

}
In addition to the standard tree index generated functions, the following functions
are generated for each Patricia index:

Patricia index functions
As explained above, patricia indexes can be created over any scalar or boolean
type field. The generated functions applicable only to patricia indexes are
_longest_match, _exact_match, _prefix_match and _next_match. These will have
slightly different forms depending on the type of field being indexed as explained
below.
A patricia index created over a scalar field will cause the following functions to
be generated:
classname_indexname_next_match( mco_trans_h t,
type mask,
int number_of_bits);
classname_indexname_prefix_match( mco_trans_h t,
type mask,
classname_indexname_longest_match( mco_trans_h t,
type mask,
classname_indexname_exact_match( mco_trans_h t,
type mask,
Where type is the scalar type (for example, uint4) and mask is the key value to
match. If the indexed field is of type array/vector of scalars these functions will
be of the form:
type * mask,
type * mask,
type * mask,
type * mask,
Here type is the type or each element of the array/vector (for example, uint4) and
mask is the key value to match. If the indexed field is of type boolean array
these functions will be of the form:
char* mask,

char* mask,
char* mask,
char* mask,
Here the mask represents a key value that is a packed bit array (each byte contains
8 bits).
_longest_match
The _longest_match() functions locate the record whose index value has the
longest match, i.e. has the greatest number of characters or bits starting from the
right that match the key value.
For example assuming the following table with Patricia index on field “prefix”:
Table:
prefix operator
01 ATT
020 BCC
025 TNT
03 ANC
0355 NCC
0355 UDC
045 WTC
...
The _longest_match function called with key value of 02456 would position the
cursor at record <025, TNT>; with key value of 035567787 record <0355, UDC>;
and with key value of 03 record <0355, UDC> as well. Notice that the cursor is
positioned at the last record matching the key value. In order to walk through the
result set to visit all records matching the key value, the application would use the
_next_match function.
_exact_match
The _exact_match functions locate the first record whose index value exactly
matches the key value supplied. If no exact matches are found
MCO_S_NOTFOUND is returned. For example, using the above table: the key

value of 02 would find record <020, BCC>, but the key value of 024 would cause
MCO_S_NOTFOUND to be returned.
_prefix_match
The _prefix_match functions are similar to _longest_match except that it finds the
first object whose index matches the key, whereas _longest_match returns the
object with the longest (deepest) match. So using the above table: the key value
of 02456 finds record <025, TNT>; the key value of 035567787 finds record
<0355, UDC>; and the key of value 03 finds record <03, ANC>.
_next_match
The _next_match functions are used, after the cursor is positioned within the
result set, to walk through the result set to visit all records matching the key value.
To traverse the database objects in order, mco_cursor_next() or
mco_cursor_prev() may be used, but they are not constrained by the key value
used to perform the search; i.e. iteration could continue beyond the range
specified in the key.
Note: The standard cursor functions mco_cursor_next() or mco_cursor_prev()

could be used to traverse the result but this would require calling the cursor
compare function to determine if the index still matches the key. The
_next_match functions do this internally returning MCO_S_NOTFOUND when
the subset of matching results is finished.

Sample Patricia index search
Unlike the tree index compare functions that return 0<, 0 or >0, the Patricia-
based compare function returns the number of the first different bit between
the key and the object pointed to by the cursor. This allows interrupting the
cursor traversal in a manner similar to the “standard” compare API. In addition,
the application is able to refine the cursor traversal. For example consider a
routing table containing the following values:
128.1.1.0
128.1.1.10
128.1.1.20
128.1.2.0
128.1.2.10
128.1.3.0
stored in a database with class “nodes” and a field declared as:
boolean<32> ipaddr;
or
uint4 ipaddr;
Suppose our application is looking for the entire subnet 128.1.1. The search value
would be passed in as
10000000|00000001|00000001 binary or 8388865
The nodes_ipaddr_prefix_match() function called with this key value and length
24 (i.e. 24 bits) positions the cursor at the object with ipaddr equivalent to
128.1.1.0. Now the mco_cursor_next() function is called to advance to record
128.1.1.10 and the Patricia _compare() function returns the value 28 which means
that 28 bits of the records ipaddr match the key value.
Continuing: the mco_cursor_next() advances to record 128.1.1.20 and the

_compare() function returns 27; then mco_cursor_next() advances to record
128.1.2.0 and the _compare() function returns 22.
At this point the application would conclude that it has left the region of interest
since the key size is 24 bits and the mismatch was detected in the 22nd bit. In
other words the key is no longer a prefix for the value in the object now pointed at
by the cursor.

R-Tree Index
R-Tree indexes (declared as rtree in the schema) are commonly used to speed
spatial searches, for example, find the rectangle that contains this point, or find all
rectangles that overlap this rectangle.
All manner of shapes can be stored and searched with the rtree index. For
example, a point is represented as a rectangle with width and height = 1 and a line
that has starting and ending coordinates of 15, 844 and 0, 3647 is stored as
rectangle with its upper left corner at 15, 844 and its lower right corner at 0, 3647.
To determine if two lines intersect, or if a point is within a given area (described

by a circle, rectangle, etc), an rtree search is performed to find all overlapping
rectangles. For each match, a further test is conducted in the application code to
determine whether the condition is actually met. For example, consider that we
have lines as follows:
75, 15
35, 25
20, 70 20, 30
A search to discover all lines that intersect with (75, 15) (20, 70) would return the
rectangle bounding (35, 25) (20, 30) because the rectangles overlap. The
application would extract additional information for the object, for example, that
it is a line and what its starting and ending coordinates are, and would conclude
that this line does not intersect the key line; and would continue to the next
overlapping rectangle returned by the index search.
Note that, any shape with coordinates {(X1, Y1), (X2, Y2), ... (Xn, Yn)} can be
stored and searched in this manner. For example, consider a polygon:

100
85, 50
70, 33
65, 63
X
55, 30 55, 45
35, 35
0 Y 100
Here we have X coordinates of 35, 55, 65, 70, 85 and Y coordinates of 30, 33, 35,
45, 50, 63. The bounding rectangle is the rectangle with left top vertex
(Xmax,Ymin), and right bottom vertex (Xmin, Ymax) where Xmin = min(Xi),
Ymin=min(Yi), Xmax = max(Xi), Ymax=max(Yi). In this case, Xmax = 85,
Ymin = 30, Xmin = 35, Ymax = 63 and our rectangle top left and bottom right is
(85, 30) and (35, 63).
rtree searches can return rectangles that: 1) exactly match the given coordinates,
2) overlap the given coordinates, 3) wholly contain the given coordinates, or 4)
are within a given distance from a point.
The rectangles are processed as arrays of max and min coordinates. For example,
a two-dimensional rectangle is represented as an array:
xMin,yMin,xMax,yMax
A three-dimensional rectangle is represented as follows:
xMin,yMin,zMin,xMax,yMax,zMax
To illustrate the use of the rtree index, suppose we create the following class
definition:
class rtree_class
{
rect<int2> square[2];
rtree <square> ridx;

};
Note: The number of array elements is expressed in terms of the number of

points in the shape, and each point has X and Y coordinates. For a square, there
are two points; for a cube there are three points; for a polygon there are N points.

The programming API generated for this class will be as follows:
MCO_RET rtree_class_new ( mco_trans_h t,

/*OUT*/ rtree_class *handle );
MCO_RET rtree_class_delete ( rtree_class *handle );
MCO_RET rtree_class_delete_all ( mco_trans_h t );
MCO_RET rtree_class_checkpoint ( rtree_class *handle );
#define rtree_class_square_length 4
MCO_RET rtree_class_square_get ( rtree_class *handle,
/*OUT*/ int2 *dest );
MCO_RET rtree_class_square_put ( rtree_class *handle,
const int2 *src );
MCO_RET rtree_class_from_cursor ( mco_trans_h t,
mco_cursor_h c,
/*OUT*/ rtree_class *handle );
MCO_RET rtree_class_ridx_index_cursor ( mco_trans_h t,

MCO_RET rtree_class_ridx_search ( mco_trans_h t_,
/*IN*/ MCO_OPCODE op_,
/*INOUT*/ mco_cursor_h c_,
const int2* square );
The _search function argument MCO_OPCODE introduces new search operation

codes to support R-Tree indexes:
MCO_OVERLAP
MCO_CONTAIN
MCO_NEIGHBORHOOD
To utilize the rtree index, we need some supporting data types and constants:
#define MCO_COORTYPE int2

#define MCO_BOUND 32760
typedef struct mco_point_t_

{
MCO_COORTYPE x, y;
} mco_point_t, *mco_point_h;
typedef struct mco_rect_t_
{
mco_point_t l, r;
} mco_rect_t, *mco_rect_h;
mco_rect_t rect, samp;

rtree_class obj, obj1[2];
Populating the database follows the usual eXtremeDB steps:
/* put our rect */

samp.l.x = - MCO_BOUND / 3;
samp.l.y = - MCO_BOUND / 3;
samp.r.x = MCO_BOUND / 2;
samp.r.y = MCO_BOUND / 2;
if ((rc = rtree_class_new(t, &obj)) != MCO_S_OK)
{
exit(1);
}
if ((rc = rtree_class_square_put(&obj, (int2*) &samp)) != MCO_S_OK)

{
exit(1);

To conduct any search, as with other index types, we need to instantiate a cursor:
if (rtree_class_ridx_index_cursor(t, &c) != MCO_S_OK)

{
exit(1);
}
To search for the specific shape we just created:
if ( (rc = rtree_class_ridx_search(t, MCO_EQ, &c, (int2*) &samp))

!= MCO_S_OK )
{
printf("\n Couldn't find any suitable rect, code = %d, line = %d\n",
rc, __LINE__ );
}
To search for rectangles that overlap a given rectangle:
rect.l.x = - MCO_BOUND;
rect.l.y = - MCO_BOUND;
rect.r.x = - (MCO_BOUND + 2000) / 2;
rect.r.y = - (MCO_BOUND + 2000) / 2;
if ((rc = rtree_class_ridx_search(t, MCO_OVERLAP, &c, (int2*) &rect)) !=
MCO_S_OK)
{
printf("\n Couldn't find any overlapping rect, code = %d, line = %d\n",
rc, __LINE__ );
}
if ( MCO_S_OK == rc )
{
printf("\n Iterate cursor, condition = MCO_OVERLAP");
for ( i = 0, rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c), i++ )
; // do nothing, just count
printf("\n Found %d overlapping rects", i);
}
Similarly, to search for rectangles that are wholly contained by another rectangle:
rect.l.x = - (MCO_BOUND - 2000);

rect.l.y = - (MCO_BOUND - 2000);
rect.r.x = - MCO_BOUND / 2;
rect.r.y = - MCO_BOUND / 2;
if ( (rc = rtree_class_ridx_search(t, MCO_CONTAIN, &c,

(int2*) &rect)) != MCO_S_OK )
{
printf("\n Couldn't find any suitable rect, code = %d, line = %d\n",
rc, __LINE__ );
if ( rc != MCO_S_NOTFOUND )
{
exit(1);
}
}
if ( MCO_S_OK == rc )
{
for ( i = 0, rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c), i++ )
; // do nothing, just count

printf("\n Found %d random rects", i);

}
The examples above are taken from the ./samples/05-indexes/rtree sample

program in SDK.
Note: An rtree index cursor has different semantics than a conventional tree
index cursor. Whereas, the _search function of a conventional tree index positions
the cursor at the first match, or just before the nearest match in the case of a
partial key search, the rtree index cursor operates on the result set of the search.
In other words, for an rtree cursor, mco_cursor_first(), mco_cursor_next(),
mco_cursor_prev() and mco_cursor_last() operate within the set of objects
that match the given search conditions.
Kd-Tree Index
eXtremeDB 4.0 adds support for the k-dimensional tree index (declared kdtree in
the schema). A kdtree is a data structure for organizing points in a k-dimensional
space. kdtrees are a useful data structure for several applications, such as lookups
that involve a multidimensional search key. The kdtree is a binary tree in which
every node is a k-dimensional point. Every non-leaf node generates a splitting
hyperplane that divides the space into two subspaces. Points left of the hyperplane
represent the left sub-tree of that node and the points right of the hyperplane
represent the right sub-tree. The hyperplane direction is chosen in the following
way: every node split to sub-trees is associated with one of the k-dimensions, such
that the hyperplane is perpendicular to that dimension vector.
The kdtree index is defined in the schema using the kdtree keyword:
class Car
{
string vendor;
string model;
string color;
uint4 year;
uint4 mileage;
boolean automatic;
boolean ac;
uint4 price;
char<3> state;
string description;
blob images;
kdtree <year, mileage, color, model,
vendor, automatic, ac, price> index;
};
Insert and delete operations for indexes are hidden from applications and are
performed automatically by the eXtremeDB runtime. The new API is only
required for search operations. The kdtree uses a Query-By-Example approach to
locate objects that match a given search condition. The application creates pattern

object(s) in the normal way and assigns values to the fields that are included in
the search criteria. The kdtree supports simple exact matches as well as range
lookups. In the latter case, two pattern objects should be specified: one for the
lower and one for the upper boundaries of the search condition. If a field value is
defined only for one boundary, it is considered an open interval that corresponds
to a greater-than-or-equal-to or less-than-or-equal-to search condition.
The following example demonstrates locating all “Ford Mustangs” (using the
sample schema above):
void simple_query_by_example(mco_db_h db)

{
mco_trans_h t;
mco_cursor_t cursor;
Car car;
MCO_RET rc;
mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);
Car_new(t, &car); /* create a pattern object */
Car_vendor_put(&car, STR(ford));
Car_model_put(&car, STR(mustang));
Car_index_index_cursor(t, &cursor);
printf("Exact match query results:\n");
rc = Car_index_search(t, &cursor, &car, &car);
while (rc == MCO_S_OK)
{
Car choice;
Car_from_cursor(t, &cursor, &choice);
print_car(&choice);
rc = mco_cursor_next(t, &cursor);
}
Car_delete(&car); /* delete the pattern */
mco_trans_commit(t);
}
The code snippet creates a pattern object, specifying the value for the “vendor”
and “model” fields and then calls the Car_index_search() method with the same
pattern object passed twice which indicates to the eXtremeDB runtime that the
lower and upper boundaries of the search are the equal. Once the search returns,
the application can traverse the result set using the standard eXtremeDB cursor
mechanism. Note that the order of objects in the selection is unpredictable, but
only objects that match the specified search criteria are returned.
Note: It is necessary to keep the pattern objects as long as the cursor is being
used.
Range lookups are similar to the exact match:
void range_query_by_example(mco_db_h db)

{
mco_trans_h t;
mco_cursor_t cursor;
MCO_RET rc;
Car from, till;

Car_new(t, &from); /* create the low boundary pattern object */
Car_new(t, &till); /* create the high boundary pattern obj. */
Car_vendor_put(&from, STR(ford));

Car_vendor_put(&till, STR(ford));
Car_price_put(&till, 30000);
Car_year_put(&from, 2000); Car_year_put(&till, 2006);
Car_mileage_put(&till, 100000);
printf("Range query results:\n");
Car_index_index_cursor(t, &cursor);
rc = Car_index_search(t, &cursor, &from, &till);
while (rc == MCO_S_OK) {
Car choice;
Car_from_cursor(t, &cursor, &choice);
print_car(&choice);
rc = mco_cursor_next(t, &cursor);
}
Car_delete(&from); /* delete pattern */
Car_delete(&till); /* delete pattern */
}
The code snippet above demonstrates selecting Ford models with the price no
larger than 30000, and model year between 2000 and 2006 and mileage no greater
than 100000. Note that it is possible to pass NULL instead of one or both
boundary values. Specifically, the function call Car_index_search(t, &cursor, 0,
0) would return all objects.
Consider the following schema:
class SpatialObject {
Int4 left;
Int4 top;
int4 right;
int4 bottom;
int type;
…
kdtree <left, top, right, bottom, type> index;
};
Again, using the Query-By-Example (QBE) approach, a QBE query for a

CONTAINS operation (select objects contained within the specified rectangle) is
as follows:
SpatialObject low;
SpatialObject high;
SpatialObject_new(&low);
SpatialObject_new(&high);
SpatialObject_left_put(&low, LEFT);
SpatialObject_right_put(&high, RIGHT);
SpatialObject_top_put(&low, TOP);
SpatialObject_bottom_put(&high, BOTTOM);
SpatialObject_type_put(&high, RED);
SpatialObject_index_index(trans, &cursor);
int rc = SpatialObject_index_search(trans, &cursor, &low, &high);
SpatialObject obj;
SpatialObject_from_cursor(trans, &cursor, &obj);
...
rc = mco_cursor_next(trans, &cursor);
}
A QBE for an OVERLAPS operation (select objects intersecting with the

specified rectangle) is as follows:
SpatialObject_left_put(&high, RIGHT);
SpatialObject_right_put(&low, LEFT);
SpatialObject_top_put(&high, BOTTOM);

SpatialObject_bottom_put(&low, TOP);
SpatialObject_index_index(trans, &cursor);
int rc = SpatialObject_index_search(trans, &cursor, &low, &high);
SpatialObject obj;
SpatialObject_from_cursor(trans, &cursor, &obj);
...
rc = mcp_cursor_next(trans, &cursor);
}
kdtree indexes are inherently unbalanced. While they are supported for persistent
classes in eXtremeDB, because they’re unbalanced, the performance may be sub-
optimal and therefore kdtree indexes are most useful for transient (in-memory)
classes.

Chapter 6: Programming Considerations
Chapter 6: Programming
Considerations
Database application develop introduces issues that go beyond the usual
embedded systems development. This chapter describes some of these and the
eXtremeDB “best practices” to manage them.
Return Codes and Error Handling

There are three categories of error codes returned by the eXtremeDB runtime:
• Status codes that indicate a state of the database runtime (MCO_S_*). If not
handled, these could lead to an error condition.
• Error codes that indicate that the runtime failed to perform the requested
operation (MCO_E_*). The application can handle the error and continue
running. If not handled, these could lead to an exception.
• Fatal error codes (exceptions) that indicate that any further use of the database
runtime is not possible. A fatal exception is usually a sign of a bug in either
the application code or the database runtime code.
The actual values of status codes and error codes are enumerated in mco.h. Status
codes are return codes that are less than or equal to 50 and have #define names
that are prefixed with MCO_S_. Error codes are return codes that are greater than
50 and have #define names that are prefixed with MCO_E_.
Status codes don’t indicate an error in a method, but merely the state of an
operation. For example, every eXtremeDB method, if successful, returns
MCO_S_OK, or if a search function finds no objects corresponding to the specified
key value, the status code MCO_S_NOTFOUND is returned.
Error codes, in contrast, indicate the runtime’s failure to complete a request. For
example, if an invalid handle has been passed to a method,
MCO_E_INVALID_HANDLE is returned. A status code returned by a function does
not affect the state of the transaction context within which the function was
executed, while a function returning an error code causes the enclosing
transaction to enter an error state.
The error state of the transaction is remembered by the eXtremeDB runtime and
any subsequent call to an eXtremeDB function within that transaction will return
with the MCO_E_TRANSACT code. In this case, the eXtremeDB runtime does not
attempt to execute the method, and no changes will be applied to the database.
Being aware of this can greatly simplify your application code, while keeping the
code size to a minimum. For example, it is not uncommon (and many vendors

recommend) to check the return code after every call to a library function. This
leads to source code that looks like one of:
uint2 foo() { uint2 foo() {

uint2 rc, i; uint2 rc, i;
for( i=0; i < 10; i++ ) for( i = 0; i < 10; i++ ) {
{ if( (rc = func1()) == 0 )
if((rc=func1()) != 0) if( (rc = func2()) == 0 )
break; if( (rc = func3() == 0 )
if((rc=func2()) != 0) . . .
break; else
if((rc=func3()) != 0) break;
break; else
} break;
return rc; else
} break;
}
return rc;
}
In contrast, when programming with eXtremeDB you may simply check the return
code on each iteration of the loop:
uint2 foo() {
uint2 rc, i;
for( i = rc = 0; i < 10 && MCO_S_OK == rc ; i++ )
{
rc = func1();
rc |= func2();
rc |= func3();
}
return rc;
}
The resulting implementation is clearly tighter code and easier to read.
The third category of errors, fatal errors, are unrecoverable and cause the
eXtremeDB runtime to call the function mco_stop(). This function performs the
role of an assertion internal to eXtremeDB. If an error handler has been registered
(see mco_error_set_handler()), mco_stop() will, in turn, call this custom error
handler. Otherwise mco_stop() will enter an infinite loop. It is common practice
in embedded systems to employ a “watchdog” process. If the watchdog does not
receive a periodic message from the application process, it forces a reboot. So
entering an infinite loop will cause the application will trigger the watchdog to
reboot.
The mco_stop() function is only called when the eXtremeDB runtime detects an
unrecoverable error, such as a corrupted stack. In such a case, a reboot is the only
viable course of action. As well, any runtime function call can be asserted and
mco_stop() called if the assertion fails. This usually means that the application

did something illegal from the eXtremeDB runtime’s point of view, such as
passed an invalid transaction or object handle to a runtime function, or corrupted
the runtime internals in some way.
Further, any runtime function might perform a number of validations that can
result in a failed assertion. These validations vary depending on the
CHECK_LEVEL set in the runtime when eXtremeDB is compiled. The object code
distribution includes two runtimes: the debug runtime, which has the highest
CHECK_LEVEL and the release runtime, which has the lowest (minimal validations
are performed). Although the release runtime does some validations, these have
no negative impact on the overall performance. Developers are strongly advised
to use the debug runtime during the development cycle. Then, only when no fatal
errors are reported by eXtremeDB, switch to the release runtime. The only reason
one would use the release version during the development phase is to measure
application performance.
The following is the recommended method of debugging for developers without a

source code license:
• First, be sure to register a custom error handler (see function

mco_set_error_handler()).
• Set a breakpoint inside the error handler, and run the application in the
debugger to examine the application’s call when the error occurs.
• Note the last runtime function called and any other relevant information in
the stack trace.
• Consult this User’s Guide error code and descriptions in Appendix B to

see why the runtime assertion failed, for example, an error in the
transaction manager, heap corruption, a cursor is corrupted, etc.
• Check the appropriate application entity right before the fatal runtime call
was issued and make sure that the entity - transaction handle, object
handle, heap memory, etc. - is in fact corrupted.
• Go back through the stack and try to find the application code where
the entity was corrupted.
The following example demonstrates this procedure (note that this code is taken
from the 06_errorhandling_fatalerr sample):
static void errhandler( int n )

{
printf( "\n eXtremeDB runtime fatal error: %d", n );
getchar();
exit( -1 );
}
void main()
{
...

...
rc = mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);
printf("\n\n\tThe following attempt to create a new record\n"
"\tshould cause the Error handler to be called with Fatal\n"
"\tError 340049 because it requires a READ_WRITE transaction.\n"
"\tThe type of transaction started was MCO_READ_ONLY...\n"
"\tNote: you will get error code instead of fatal error if\n"
"\tthe program was linked not against _check runtime\n");
/* anObject_new() should fail with error code 340049 =
MCO_ERR_TRN+49 */
rc = anObject_new(t, &rec);
rc = anObject_data_put(&rec, data);
/* the following code will not be reached unless the

transaction is
changed to MCO_READ_WRITE */
}
} else if (rc == MCO_E_ACCESS) {
printf("\nThe sample was linked with no-check runtime\n");
rc = MCO_S_OK;
}
...
}
When the above code is executed it causes the error handler to be called with the
error code 340049 which generates the following output:
eXtremeDB runtime fatal error: 340049
Checking Appendix B, the error code value of 340000 corresponds to the constant
MCO_ERR_TRN. This indicates an error in the transaction being performed. (The
added value of 49 indicates the line within the runtime function where the
assertion failed, causing mco_stop() to be called. This is useful if it is necessary
to contact McObject support – or if the developer has a source code license. For a
more detailed explanation of error codes see Appendix B.)
Following is the call stack (as displayed by the Visual Studio 2008 debugger):
06-errorhandling-fatalerr.exe!mco_w_new_obj_noid(mco_trans_t_ * t=0x004e0378,
unsigned int init_size=4, unsigned short class_code=1, mco_objhandle_t_ *
ret=0x0012facc) Line 494 + 0x14 bytes
06-errorhandling-fatalerr.exe!anObject_new(mco_trans_t_ * t=0x004e0378,
anObject_ * handle=0x0012facc) Line 120 + 0x2f bytes
06-errorhandling-fatalerr.exe!main(int argc=1, char * * argv=0x00343250)

Line 60 + 0x13 bytes
06-errorhandling-fatalerr.exe!__tmainCRTStartup() Line 582 + 0x19 bytes

06-errorhandling-fatalerr.exe!mainCRTStartup() Line 399
It’s apparent that the function mco_w_new_obj_noid() called by anObject_new()

caused the runtime assertion to fail. Knowing that error code MCO_ERR_TRN
indicates a problem with the transaction and that the _new functions require a
READ_WRITE transaction, the solution is obvious.

Note: eXtremeDB generated functions like anObject_new() interface with the

runtime through “wrapper” functions like mco_w_new_obj_noid (). Though it is
not necessary for developers to delve into the eXtremeDB internals, it can be
instructive to examine the “.c” interface file generated by the mcocomp schema
compiler and notice how the compiler generates the calling parameters to these
“wrapper” functions from the corresponding “dictionary” values.
If the developer has a source code license, the debugging technique is slightly
different. In this case it would be prudent to set a breakpoint in the mco_stop()
function itself. This results in the following call stack:
06-errorhandling-fatalerr.exe!mco_stop__(int n=340049, const char *

file=0x004d437c, int line=494) Line 61
06-errorhandling-fatalerr.exe!mco_w_new_obj_noid(mco_trans_t_ * t=0x004e0378,
unsigned int init_size=4, unsigned short class_code=1, mco_objhandle_t_ *
ret=0x0012facc) Line 494 + 0x14 bytes
06-errorhandling-fatalerr.exe!anObject_new(mco_trans_t_ * t=0x004e0378,
anObject_ * handle=0x0012facc) Line 120 + 0x2f bytes
06-errorhandling-fatalerr.exe!main(int argc=1, char * * argv=0x00343250)

Line 60 + 0x13 bytes
06-errorhandling-fatalerr.exe!__tmainCRTStartup() Line 582 + 0x19 bytes
06-errorhandling-fatalerr.exe!mainCRTStartup() Line 399
Here again the preceding line in the call stack indicates that function
mco_w_new_obj_noid() failed and the same chain of logic makes it clear that the
solution is to correct the transaction type.
Database Recovery from Failed Processes

Recovery can be necessary in a number of situations. For example, consider a
multi-process implementation: a bug in the application could cause the
eXtremeDB runtime to be corrupted in process A, which has the database
“locked” (ie. is performing a READ_WRITE transaction) while one or more other
processes have active connections to the database. These other processes will
effectively be blocked by the “hung” process A. In this case, “recovery” would be
affected by “killing” process A which removes all locks held by its database
connections and closes the connections, thus freeing the other processes to
proceed with their active database connections.
Because there is no system-independent way to detect when a process has failed,

eXtremeDB provides the “sniffer” API function mco_db_sniffer() to allow
applications to detect and remove “dead” connections in their specific operating
environment. Usually mco_db_sniffer() will be called periodically in a separate
thread or from specific places in the application to check for “dead” connections.

A user-supplied callback function is then called by mco_db_sniffer() to actually

detect if a given connection is “alive”, and if not to terminate it.
To perform this check, some identifying information (typically the process

identifier) is added to each connection context with code like the following:
int pid ;
#ifdef _WIN32
pid = GetCurrentProcessId();
#else
pid = getpid();
#endif
mco_db_connect_ctx(dbName, &pid, &db);
Note: It is also necessary to specify the size of this connection context in the
database parameters passed to mco_db_open_dev(). For example:
db_params.connection_context_size = sizeof(int);
The “sniffer callback” function could then be implemented as follows:
MCO_RET sniffer_callback(mco_db_h db, void* context, mco_trans_counter_t

trans_no)
{
int pid = *(int*)context;
#ifdef _WIN32
HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, pid);
if (h != NULL) {
CloseHandle(h);
return MCO_S_OK;
}
#else
if (kill(pid, 0) == 0) {
return MCO_S_OK;
}
#endif
printf("Process %d is crashed\n", pid);
return MCO_S_DEAD_CONNECTION;
}
If the user callback function returns MCO_S_DEAD_CONNECTION recovery will be

performed for this connection. Now mco_db_sniffer()iterates through database
connections and will call the user callback function depending on the policy
specified (third parameter). The possible values for this policy parameter are as
follows:
MCO_SNIFFER_INSPECT_ACTIVE_CONNECTIONS: for all active connections,
MCO_SNIFFER_INSPECT_ACTIVE_TRANSACTIONS: for all connections with

active transactions, or
MCO_SNIFFER_INSPECT_HANGED_TRANSACTIONS: for connections with

active transactions whose transaction number has not changed since the
previous invocation of mco_db_sniffer() (such a connection is assumed
to be hung; it is up to the user to correctly specify and enforce the interval

between successive calls to mco_db_sniffer() to avoid false detection

of hung transactions).
A “watchdog” thread could then be implemented in the application as follows:

THREAD_PROC_DEFINE(sniffer_loop, arg)
{
mco_db_h db;
int pid = get_pid();
mco_db_connect_ctx(dbName, &pid, &db));
while (1) {
mco_db_sniffer(db, sniffer_callback,
MCO_SNIFFER_INSPECT_ACTIVE_CONNECTIONS));
sleep(SNIFFER_INTERRVAL);
}
mco_db_disconnect(db);
THREAD_RETURN(0);
}
Recovery actually consists of two stages. In the first stage we “grab” the dead
connection. Each connection has private (process specified) pointers which must
be adjusted to be used in the context of the process performing recovery. In the
second stage, internal functions are called to rollback any transactions that might
have been in progress and to release the dead connections’ data structures.
NVRAM database support and recovery
eXtremeDB (version 4.1 and later) allows applications to re-connect to databases

created in non-volatile memory (NVRAM) after system restart, or similar activities.
The database can be created either in conventional or shared memory. If the
database is corrupted, the eXtremeDB runtime makes an attempt to recover the
database based on the content of the memory buffer specified in the call to
mco_db_open_dev().
In order to reconnect to an NVRAM-based database, the application specifies a

memory device in the mco_db_open_dev() API and passes the
MCO_DB_OPEN_EXISTING as a parameter (mco_db_params_t):
...
mco_db_params_init(&db_params);
...
if (...) {
db_params.mode_mask |= MCO_DB_OPEN_EXISTING;
}
...
rc = mco_db_open_dev(db_name... , &db_params);
The database runtime performs the necessary steps to ensure the consistency of
the database metadata and the database content. If mco_db_open_dev() returns
MC_S_OK to the application, the application is able to connect to the database
normally by calling mco_db_connect().

Note that database recovery can fail under certain conditions (such as application
errors that corrupt the database runtime metadata). If recovery fails,
mco_db_open_dev() returns an error code. Please refer to the “Recovery from
failed processes” section above and the mco_db_sniffer() section in the Reference
Guide for further discussion about eXtremeDB recovery procedures. Also refer to
the NVRAM sample located in the /samples/core/02-open/nvram directory.
Recovering unused disk space
When the MVCC transaction manager is used, in the case of a crash, a persistent
database can contain undeleted old versions and working copies. Their presence
will not break the consistency of the database and doesn't prevent the normal
working of an application, but does unnecessarily consume space. Detecting these
stale object versions requires a complete scan of the database. For this reason the
recovery process doesn't perform this function automatically. Instead, the
removal of the unused versions can be performed explicitly by calling the
mco_disk_database_vacuum() function:
MCO_RET mco_disk_database_vacuum(mco_db_h con);
Note that mco_disk_database_vacuum() requires exclusive access to the database,

so no operations can be performed on the database until the vacuum operation is
complete and the function has returned control back to the application.
Alternatively, the application can enable the repair process by setting the
MCO_DB_MODE_MVCC_AUTO_VACUUM mode mask in the mco_db_params_t
when calling mco_db_open_dev ().
Database Security
eXtremeDB (version 4.1 and later) provides two separate security mechanisms for
persistent databases: a page-level CRC32 check and database encryption through a
private key. These mechanisms can be used separately, or in combination. Both
mechanisms are enabled at the time the database is created. Once the CRC or
encryption is enabled, and the persistent database is allocated, it is not possible to
disable security in the current, or future, database sessions. Note that both security
mechanisms are page-level based, hence all data and indexes are protected.
CRC-32
The page-level CRC stores a 32-bit CRC32 for each page in the database. The CRC
is verified every time the page is loaded from persistent storage to memory. If the
database content is changed outside the database runtime, the CRC will not match,
unless it is also modified along with the page content.

The database runtime calculates the CRC if the mco_db_open_dev parameter

db_params.mode_mask is set to MCO_DB_USE_CRC_CHECK:
…
db_params.mode_mask |= MCO_DB_USE_CRC_CHECK;
..
rc = mco_db_open_dev(db_name, …, &db_params );
By default the CRC is not calculated. If a page CRC does not match, the database
runtime returns MCO_E_DISK_CRC_MISMATCH (126) error code every time an
attempt is made to read the database (including an index lookup). In the debug
version of the runtime, a mismatched CRC leads to a fatal assertion (mco_stop() ).
Note: The MCO_DB_USE_CRC_CHECK and MCO_FILE_OPEN_NO_BUFFERING

modes are incompatible.
Encryption
eXtremeDB encryption allows an application to read and write encrypted
databases using a page-level standard RC4 encryption algorithm (for an
explanation of RC4 see http://en.wikipedia.org/wiki/RC4). Both the content of the
database and the log files are encrypted.
To enable database encryption the application specifies a cipher key in the

database parameters (by default the encryption is disabled):
db_params.cipher_key = "welcome";
It is not entirely impossible to break the encryption (there have been a number of
advances in this area). However combined with the CRC, we can make a rather
strong claim for security.
Two samples located in the /samples/core/02-open/security directory illustrate the

use of these security features.
Cache Management
Obtaining runtime cache statistics

The mco_disk_get_cache_info() function allows applications to obtain runtime
disk manager cache statistics, including cache hits and cache misses. A cache hit
occurs when the address or data required by the database runtime is found in the
cache and does not require retrieval from the storage media. This information

could, for example, be used to fine-tune the application’s caching policies (see
Prioritized Cache, below).
Connection cache
In addition to the disk manager cache (also often referred to as a “page pool”),
eXtremeDB (version 4.1 and later) provides a per-connection cache. The database
runtime “pins” a predefined number of pages from the page pool for each
connection. This is referred to as a “connection cache”. When a transaction loads
pages into the page pool, and the total number of pages loaded from the media is
less than the size of the connection cache, the database runtime makes sure that
these pages stay in the cache until the transaction is committed, or the database
connection is broken. By default the size of the connection cache is set to four
pages. It is not possible to modify the connection cache size (nor does it make
sense to).
The connection cache is enabled by default. The runtime provides two functions
that allow application control over the connection cache:
mco_bool mco_disk_enable_connection_cache(mco_db_h con, mco_bool

enabled);
MCO_RET mco_disk_reset_connection_cache (mco_db_h con);
The first function enables or disables the connection cache. Passing MCO_YES or
MCO_NO as the ‘enable’ parameter value enables and disables the cache. The
function returns the current state of the connection cache. The second function
commits the connection cache (resets) to the database
These two functions address a scenario with many connections and long-lasting
transactions. In this scenario, the connection cache could cause the page pool to
run out of free pages (a new transaction allocates its own connection cache, but
long transactions prevent those pages to be release back to the shared page pool).
To address this the connection cache could be turned off or reset often. Under
normal circumstances, the application does not need to control the connection
cache.
Prioritized cache
eXtremeDB (version 4.1 and later) improves on basic Least Recently Used (LRU)
cache policies by allowing applications to influence how long certain pages
remain in the disk manager cache. The crux of the improvement is in adding a
cache priority property to each page. When the LRU algorithm locates a
"victim", instead of immediately releasing the page (removing the page from the
tail of the L2 link list), the algorithm inspects its caching_priority field. If the
value is not zero, the caching_priority is decremented and the page is re-linked to
the beginning of the L2-list. A caching priority of zero means the default

behavior. A caching priority of 1 indicates that the page will be moved from the
head to the tail of the LRU list twice. A caching priority 2 means three loops
through the LRU list, and so on. The higher the priority, the longer the page
remains linked to the LRU list (ie. stays in cache).
At the time the database is created, the application can assign priorities to indexes,
memory allocator bitmap pages and object pages (excluding BLOBs). The
priorities are assigned through the mco_db_params_t_ structure
index_caching_priority, allocation_bitmap_caching_priority and
objects_caching_priority fields. By default all pages have the same priority (zero).
It is possible to change the caching priority for a class at runtime through the
generated classname_set_caching_priority API. Using the preset object priority as
a baseline, the relative priorities of some classes can be adjusted. For example,
large and rarely accessed objects can be assigned lower priority, while small
frequently accessed classes can be assigned a higher priority. The caching priority
assigned at runtime is stored in the database and is used until it is explicitly
overwritten.
Multi-file databases
eXtremeDB supports three types of multi-file devices for persistent databases:

extra redundancy.



Disk IO
For persistent databases, disk I/O (reading from and writing to the disk) are the
most “expensive” operations in performance terms. To minimize the effect of disk
I/O, eXtremeDB implements a Disk Manager Cache that interacts with the
Operating System’s File system cache as shown in the diagram below:
Database transaction
eXtreme DB API “write”
DM Cache (database
runtime)
FS flush() API write file() API
File system cache
File System I/O
Database file LOG file
To make intelligent decisions that will optimize the eXtremeDB Disk Manager
Cache performance for a specific application’s needs, the following section
explains possible impacts of Cache Size, Transaction Logging and Commit
Strategies.
Cache Size
Similar to the memory pool for an in-memory database, the cache for an on-disk
database is created by the application and the address and size of the memory are

passed as parameters to the mco_db_open_dev() function. The memory can be

either shared memory or local memory. It must be shared memory if two or more
processes are to share the database.
Maximum Database Size
The eXtremeDB runtime uses the value of the parameter max_database_size to

allocate the “dirty pages bitmap”. The bitmap is allocated in cache at the time the
cache is created. The bitmap size can be roughly calculated as:
max_database_size / page_size / 8.
The application can set the max_database_size to MCO_INFINITE_DATABASE_SIZE

to indicate that the maximum size of the database is unknown. The size of the
bitmap is set to 1/16 of the size of the cache. The runtime can also be configured
with the ‘extendable bitmap’ option allowing for unlimited database size. If the
runtime is configured with the extendable bitmap, then the
MCO_INFINITE_DATABASE_SIZE is specified as the mco_db_open_dev() parameter
max_database_size. In this scenario the bitmap is allocated in eXtremeDB heap
space.
Writing Data to Disk

There are two aspects of writing data to consider: performance and recovery in
the case of system failure. To allow for recovery of the database in the case of a
system failure, transaction logging is implemented by writing to a log file in
addition to the database file.

Database transaction
eXtreme DB API “write”
DM cache (database runtime)
write_file() API
File system cache
_sync() API
Database LOG file

file
The impact on performance from how transactions are recorded in the log file is
determined by the selection of the Logging Policy. Non-buffered I/O is slow so
file I/O is usually buffered so that the database write() operation does not write
data directly to the persistent media but rather to the file system buffer. Then the
file system will “flush” buffered data to disk during a file system _commit() or
_sync().
The eXtremeDB Logging policy controls when the changes are committed to the
persistent storage.
Choosing the Transaction Logging Strategy

Transaction Logging is offered as an extended edition, eXtremeDB-TL, and is
described in detail in the separate document “eXtremeDB Transaction Logging
Addendum”. This version of eXtremeDB allows the application to enable or
disable logging. With transaction logging enabled, the eXtremeDB runtime
captures database changes and writes them to a file known as a transaction log. In

the event of a hardware or software failure, the runtime can recover the database
using this log.
Transaction Logging does not alter the all in-memory architecture of eXtremeDB
which retains a performance advantage over disk-based databases. Read
performance is unaffected by transaction logging and write performance will far
exceed write performance of traditional disk-based databases. The reason is
simple: eXtremeDB transaction logging requires exactly one write to the file
system for one database transaction. A disk-based database, however, will
perform many writes per transaction (data pages, index pages, transaction log, etc)
and the larger the transaction and the more indexes that are modified, the more
writes that are necessary.
eXtremeDB-TL provides three choices: no logging (NO_LOG), standard Write

Ahead Logging (WAL), also often referred to as Deferred Modifications Logging
with limited transaction size (REDO_LOG), and the Immediate Modifications
Logging (called UNDO_LOG). The application must choose the logging policy
when the database and the log files are created and continue using the policy for
the life of the database (except NO_LOG, see below) because different log files
formats are not interchangeable.
The log type is set to REDO_LOG, UNDO_LOG or NO_LOG in the call to

mco_db_open_dev().
The Non-transactional Mode (NO_LOG Option).
If this option is selected, transaction processing is turned off, and a log file is not
created. This will significantly increase update performance, but the application
will not be able to recover the database in the event of a crash, and transaction
rollback is also not available. This mode can be useful when the application needs
to quickly populate the database file. We recommend not using this option under
any other circumstances. As an alternative, consider setting the “relaxed
durability” mode via the mco_disk_transaction_policy() API. (See section
“Tuning the Logging Strategy to Your Application Needs” below.)
Note: Once the database file is created, the database can be re-opened in either of
transactional modes described below.
Write Ahead Logging or Deferred Modifications Algorithm

(REDO_LOG Option).
Write Ahead Logging (WAL) is a standard approach to transaction logging.

Briefly, WAL’s central concept is that changes to the data must be written only
after those changes have been logged; that is, when log records have been flushed
to permanent storage. When a page (data or index) is updated, it is “pinned” in the

page pool (cache) and guaranteed to never get swapped out during the transaction
(“no steal” policy). Upon transaction commit, all updated pages are first written
into the log and then committed (flushed) to the permanent storage. Only then are
updated pages written to the database (but don’t get flushed). If during the commit
the log size becomes larger than the threshold specified by the
mco_disk_set_log_params(), a checkpoint is created: all updated pages are written
to disk, updates are flushed to the permanent storage and the log is truncated.
The obvious benefit of the Redo policy is a significantly reduced number of disk
writes, since only the log file needs to be flushed to disk at the time of transaction
commit. Furthermore, the log file is written sequentially, and so the cost of
syncing the log is much less than the cost of flushing the data pages. The
disadvantage of using WAL is that the algorithm can run out of memory when
there are many uncommitted updates. The transaction size is limited to the size of
the page pool (cache). Every time a page is made “dirty” (anything is changed on
the page), it must remain in cache. Our implementation does not allow any
swapping.
_put() mco_commit()
database cache
_write();
_sync();
_write();
Log file database file
REDO Logging

WAL’s central concept is that changes to the data must be written only after those
changes have been logged; that is, when log records have been committed to the
permanent storage.
Immediate Modifications (UNDO_LOG Option)
When the Undo logging strategy is used, the log file contains entries that allow
the current transaction’s updates to be un-done. Briefly, the eXtremeDB
implementation of this approach is as follows: During the update, the runtime
marks the containing page as “dirty” and flags it in the bitmap of modified
pages, and the original page is written to the log file. Regardless of the number of
times the individual page is changed during the transaction, the original image of
the page is written to the log file only once. When the transaction is committed,
all modified pages are written and flushed to the database file and then the log file
is truncated. The recovery and the rollback procedures read all saved pages from
the log file, restoring the original images of the pages from the log file and
clearing the “dirty” bit for the page.
The advantages of using Undo Logging are that the algorithm never runs out of
memory and provides easy and efficient recovery. The disadvantages are that all
updates must be flushed to the database file at commit time. Writes to the
database file are usually random and are slower than writes to the log file, which
are sequential.

put() mco commit()
write to bitmap
database cache
_write();
_sync();
_write();
_sync();
Log file database file
UNDO Logging

To determine the best logging strategy for your application it is helpful to

consider the two following general categories:
Type 1: Long Transactions

Long-running transactions
Performance insensitive at commit
Space-constrained transactional environments
Type 2: Short Transactions

Short transactions (OLTP systems)
Fast commit—throughput and latency sensitive applications
Generally Type 1 applications will implement UNDO logging, whereas Type 2

applications will use REDO logging. Remember that the logging policy
determines how database recovery will be affected in the case of a system failure.
Application performance is more directly affected by the choice of when to flush
data to disk. This has to do with how database transactions are managed which is
discussed in the next section.
Transaction Control
A transaction is a unit of work with the database (a single logical operation on the
data). eXtremeDB supports transactions that enforce the ACID properties. The
ACID (an acronym for Atomicity, Consistency, Isolation, Durability) model is one
of the oldest and most important concepts of database theory. It establishes four
goals that a database management system must strive to achieve: atomicity,
consistency, isolation, and durability. No database that fails to meet any of these
four goals can be considered reliable.
Atomicity states that database modifications must follow an “all or nothing” rule.
Each transaction is said to be “atomic.” If one part of the transaction fails, the
entire transaction fails. It is critical that the database management system
maintain the atomic nature of transactions in spite of any DBMS, operating
system or hardware failure.
Consistency states that only valid data will be written to the database. If, for
some reason, a transaction is executed that violates the database’s consistency
rules, the entire transaction will be rolled back and the database will be restored to
a state consistent with those rules. On the other hand, if a transaction successfully
executes, it will take the database from one state that is consistent with the rules to
another state that is also consistent with the rules.
Isolation requires that multiple transactions occurring at the same time not impact
each other’s execution. For example, if Joe issues a transaction against a database
at the same time that Mary issues a different transaction, both transactions should
operate on the database in an isolated manner. This prevents Joe’s transaction

from reading intermediate data produced as a side effect of part of Mary’s

transaction that will not eventually be committed to the database. ANSI/ISO
defines four possible isolation levels, three of which are supported by
eXtremeDB: SERIALIZABLE, REAPEATABLE READ, READ COMMITTED and READ
UNCOMITTED (not supported). (For a more detailed discussion of the eXtremeDB
implementation of these isolation levels, please refer to section “Transaction
Isolation Levels” in chapter 4.)
Durability ensures that any transaction committed to the database will not be lost.
Durability is ensured through the use of database backups and transaction logs
that facilitate the restoration of committed transactions in spite of any subsequent
software or hardware failures.
eXtremeDB enforces the ACID principles by requiring that all database access is
done within the context of a transaction. A transaction is said to be durable if,
upon return of control to the application after a transaction commit, the
transaction data can be recovered in the event of a failure of the application or the
system (assuming that the media on which the database and/or transaction log
itself is not compromised/corrupted). The previous section discussed how to
determine the logging policy for database recovery, below is explained how to
control the manner in which database changes are committed to disk.
Transaction Commit Policy
In addition to choosing the Transaction Logging Policy, you must also choose the
Transaction Commit Policy. In order to guarantee the durability of transactions,
database systems must force all updates to be written through the database cache,
and the file system cache, onto the physical media (be it solid state or spinning
media). This flushing of the file system buffers is an expensive operation (in
terms of performance), but is the only way to guarantee the transaction Durability.
MCO_COMMIT_SYNC_FLUSH
This policy indicates that a database commit flushes the cache, synchronizes the
file system buffers for both database and log files and truncates the log file
(UNDO_LOG only). This policy provides durable transactions. The database can
be corrupted only if the physical media where the database and log files are
located is damaged.

DM cache
FS Cache
Note: This policy is the default.
However, some applications do not require durable transactions, or there are

phases of the application’s lifecycle during which it is not required. For example,
commodity/equity trading systems may put a higher premium on performance
than transaction durability, knowing that in a worst-case scenario the transaction
data can be reconstructed from other sources (for example, from the exchange’s
systems). As another example, bulk load operations often don’t require ACID
transactions because the process can be re-started in the unlikely event that
something goes awry.
In order to accommodate applications’ differing emphasis on performance and

transaction durability, eXtremeDB offers the following additional transaction
commit policies which can be set for the database connection by calling the
mco_disk_transaction_policy() API.

MCO_COMMIT_BUFFERED
This policy indicates that the database cache does not get flushed to disk upon
transaction commit. Pages that were marked dirty by the current transaction are
left in the database cache. That applies to both the database and the log file pages.
This policy significantly reduces the number of I/O operations; the runtime only
writes dirty pages to disk during normal swapping. In case of application failure,
the database cache is destroyed and all changes made by all transactions
committed after the policy was set could be lost.
DM cache
FS Cache
MCO_COMMIT_NO_SYNC
This policy indicates that the database runtime does not explicitly synchronize the
file system buffers with the file system media. Upon transaction commit, all
changes made by the transaction are transferred from the application space to the
operating system space and the log file is truncated (UNDO_LOG only). It is up to
the file system to determine when the data is actually written to the media. This
mode provides some performance advantages over the full synchronization mode,
but also risks losing transaction data in the event of a system crash (while
committed transactions are still in the file system cache).

DM cache
FS Cache
Note: Failure will not cause database corruption, provided that the hardware and the
operating system are working properly. In this mode, the database is restored to a
consistent state from the log file when the application is restarted. It is assured that the
state of the database will be at least the same as it had been prior to setting the
MCO_COMMIT_NO_SYNC mode.
With MCO_COMMIT_NO_SYNC mode, the application can explicitly force the

cache to be flushed to the media with the mco_disk_flush() function.
Note: mco_disk_flush() must be called outside the scope of a transaction. If

mco_disk_flush() is called from inside a transaction, it will return an error code.

MCO_COMMIT_DELAYED
Similar to the MCO_COMMIT_NO_SYNC mode, in this mode the transaction is not

committed to disk immediately upon the database commit. Instead, the database
disk manager keeps the transaction in memory until a specified percentage of the
log file size is reached. This threshold is specified by calling the
mco_disk_set_log_params(). By default, the threshold is set to be a half of the
available page pool size.
The size of the log file is checked only in mco_trans_commit(). If the log file size
is less than the specified threshold, mco_trans_commit() does not commit the
transaction to disk (as it is in the case of the MCO_COMMIT_NO_SYNC policy). If
the size exceeds the threshold, then the entire log is committed to disk (as it is in
the case of the MCO_COMMIT_SYNC_FLUSH policy).
This commit mode is only available if the logging policy is set to REDO log.
Since the log file size is checked only in the mco_trans_commit(), it is still
possible to run out of page pool if the total size of all pages that have been
modified by the transaction exceeds half of the page pool. The transaction log is
not truncated after the commit. This is still controlled by the redo_log_limit
aparmeter.
Tuning the Logging Strategy to Your Application Needs

Remember that the Logging Policy has to do with how Recovery is performed in
the case of system failure, Commit Policy has to do with Transaction behavior—
both have an impact on application performance. If the NO_LOG transaction
logging policy is selected, none of the transaction commit policy options are
applicable.

Maximum Database File Size
eXtremeDB can support a pre-determined maximum database file size, or can

allow a database to grow to an indeterminate size (the maximum size would then
depend on available file space and/or the maximum file size permitted by the file
system).
The maximum database size is passed as a parameter to the mco_db_open_dev()

function. Depending on your target platform, the parameter might be an unsigned
long integer value, or an unsigned long long integer. In either case, it is of type
mco_offset_t which can be found in mco_spec.h. The sentinel value
MCO_INFINITE_DATABASE_SIZE can be passed if you don’t wish to establish a
maximum database size.
Maximum Transaction Log File Size
For the REDO_LOG transaction logging policy, the maximum size of the log file
can be established by calling the mco_disk_set_log_params() function. This
function must be called after mco_db_open_dev() (which establishes the
transaction logging policy).
In REDO_LOG mode, the application must establish the maximum size of the log
file. Once this size is reached, the runtime will commit changes to the database
file and truncate the log.
Cache Size
Like the memory pool for an in-memory database, the cache for an on-disk
database is created by the application and the address and size of the memory are
passed as parameters to the mco_db_open_dev() function. The memory can be
either shared memory or conventional memory. It must be shared memory if two
or more processes are to share the database.
When the MCO_UNDO transaction logging policy is used, eXtremeDB uses a “dirty
pages bitmap” to keep track of what pages can be purged from the cache during a
READ_WRITE transaction. The bitmap is allocated from the cache memory and its
size can be roughly calculated as max_database_size / page_size / 8. If the
database size is MCO_INFINITE_DATABASE_SIZE, the size of the bitmap is set to
1/16 of the size of the cache. The bitmap size remains unchanged until the disk
manager is destroyed.
When the database size is MCO_INFINITE_DATABASE_SIZE the bitmap size

effectively determines the maximum size of a database. However, the run-time
can also be built with an extendable bitmap, thus allowing arbitrarily large
databases. This requires the eXtremeDB source code (or that McObject Technical
Support build this custom version of the run-time by request), and is done by
defining MCO_CFG_EXTENDABLE_DIRTY_PAGE_BITMAP in mcocfg.h. In this

case, the bitmap is allocated on the eXtremeDB heap. To avoid heap overflow the
application should set the disk_max_database_size parameter of the
mco_db_params_t structure appropriately. By default the eXtremeDB runtime
reserves 256K (512K in 64-bit configurations) for the extendable bitmap on the
eXtremeDB external heap (which will map a 128M database with a 128 byte
database page size).
Note: The dirty page bitmap is used only in case of the UNDO logging policy and is not
used for the REDO logging.
On-Disk Database Information
The function mco_disk_info() can be used to collect information about the

current state of the database and log file: the size of the log file in bytes, the size
of the database file in bytes and the amount of space that is actually used in the
database file.

Chapter 7: Database Design and
Implementation
Database Design
Objectives
This section explains some database design considerations with respect to
eXtremeDB. It is not an exhaustive treatment of the topic of database design.
That is a very large subject and well beyond the scope of this document. Rather,
our objective is to shed light on the workings of eXtremeDB in order that
developers can make informed database design decisions choosing from the many
available options.
Types of Database Design Considerations
Logical vs. Physical
Logical design considerations involve how you will conceptually organize the
data: what objects you will define, how they are interrelated, what means you will
employ to affect the relationships, what access methods are required, and the
performance requirements of your application. In this process, you will decide
what indexes are needed, which will be hash indexes and which will be tree
indexes, which classes will have the list property for sequential access, whether an
Object Identifier (oid) will be needed and what its structure should be, which
classes will have oids, whether to implement interclass relationships via oid and
ref, autoid and autoid_t, via indexes, or to de-normalize and use a vector instead.
The physical design considerations are page size, initial database size, incremental
extensions, whether certain fields can be optional, and whether classes can be
compact.

Chapter 7: Database Design and Implementation
eXtremeDB imposes very few limitations. These are:
• There is a limit of 32,767 classes per database.

• Each class can have up to 32,767 fields, subject to a limit of 2^32 bytes per
object.
• An index can include 32,767 fields.
• A database can have at most 32,767 indexes.
• A database can store 2,147,483,647 objects.
• Vectors are limited to 64K elements (vector elements are indexed by an
unsigned 2-byte integer).
• Page size can be as low as 40 bytes and as much as 64K, but as you will
discover below, this is not a limitation in any practical sense.
• Strings are limited to 64K in size. If you need larger strings, use a blob.
Page Size
The eXtremeDB runtime provides its own optimized memory management. It

includes a low-level page manager and a heap manager. When a database
instance is created, memory space is allocated on the heap or claimed (declared)
in static memory. This memory is formatted by the runtime and used the entire
time the instance exists. A small portion of this memory is dedicated for database
runtime control structures. The page manager controls the remainder along with
any extensions. The memory is partitioned into pages, each page having the same
size (page size). The page manager is a very fast and efficient mechanism. It
allows any page released by any consumer (for example, by a tree index) to be
used immediately by any other consumer (for example, by an object layout
manager). Page size is a parameter passed to the runtime at the time the database
instance is created and affects all the derived memory managers.
As a rule of thumb, page size should be between 60 and 512 bytes; a 100 byte
page size works fine in most situations. Page size should be a multiple of 4, and
if it is not the runtime will adjust it internally. Almost all memory used by the
database instance is used to store objects or index (tree or hash) data. The
overhead imposed by the index memory managers is not really affected by the
page size when the page size is larger than 60 bytes. This is because the fixed
part of the index control data is typically between 4 and 12 bytes per page.
Therefore, the page size mostly affects the overhead imposed by object layout
managers.
Objects that have dynamic data fields, such as strings or vectors, always occupy
whole pages. Multiple fixed size objects can share a page. This means, for
example, that if the page size is 100 bytes and some object with dynamic fields
took 440 bytes including all control data, then 60 bytes (= 5*100 – 440) would be

wasted. It is not really possible to determine in advance the exact optimal page
size. It depends on what will be the object size distribution in the real world (at
runtime, what will be the actual sizes of the dynamic data), what will be the
dynamic sequence of operations, and what type of objects will be stored most
frequently, etc. To determine runtime memory requirements the calculator
functionality described in section “Database Status and Statistics Interfaces” of
Chapter 4 can be very helpful. The statistics generated by the calculator make it
easy to adjust the page size parameter in order to reduce the average memory
overhead for specific tests or the actual application itself.
Initial Database Size
eXtremeDB allows the application to define multiple memory regions at runtime.

If an application’s requirements include a known database size (the maximum
amount of memory that can be used for storage), it is easier to claim that memory
in advance. The runtime creates a page manager that uses contiguous address
space for object storage. Once claimed, the physical memory cannot be returned
back to the operating environment without destroying the database. So, if the
database size is not known in advance (there is no maximum for example), it
could make sense for the application to claim less physical memory first and then
extend that amount at runtime as necessary.
Extending Database Size
Practically, there are no performance penalties associated with extending database

memory and the application doesn’t need to be aware of how the page manager
works. The only issue to be concerned about with memory extension is that none
of the “database” memory can be returned back to the operating system without
destroying the database content. (For a detailed discussion of memory
management see the “Device Management” and “Database Control” sections of
Chapter 4).
Estimating the Number of OID Entries
This number is used by the runtime to build the OID hash table. Hash conflict
resolution algorithms are optimized for a certain number of entries, so the best
hash table size really depends on the number of hash entries (the number of
objects assigned an OID). There is no real harm in being somewhat off with the
estimate. But if the estimate is far from the real number of objects, the
performance penalty will be significant (for instance if the estimate is 100 and the
number of objects is tens of thousands). If the estimated number is too small, the
hash table will be smaller and conflicts will happen more frequently. Modest
underestimates will result in some insignificant performance penalty. If the
number is too large, conflicts will happen less frequently but the hash table will
take more memory. This parameter presents a familiar tradeoff between speed

and size. Providing a larger estimate for this number will improve hash index
performance at the cost of extra space allocated for hash entries.
OID Design Considerations
Classes should use an OID if the application data model has one. In other words,
if objects described in the schema have some native identifying information and
that information is common between objects of different types, then an OID is a
natural way to represent this model. If the application’s objects are identified
differently depending on their type, then an OID should not be used. The OID
can have several fields, but they must be of fixed size.
Use of Structures
The eXtremeDB structure meaning and usage is almost the same as it is in C. A

structure declaration names a type and specifies a sequence of elements of the
structure that can have different types. Structures and simple types are building
blocks to construct object definitions. Structures can be used as fields of other
structures. In contrast to C or C++ though, eXtremeDB structures cannot be
instantiated without a class, nor do they take any memory by themselves; they can
only exist as a field of an object. eXtremeDB structures merely present a compact
way to program with data that is used across multiple classes. Also, it is very
common for applications to use structures as vector fields. There are a couple of
unconventional (not C-like) ways that eXtremeDB uses structures: the declaration
of an oid requires a structure (even if the oid is a single field), and use of
‘optional’ fields; only structure fields can be declared as optional.
Note: eXtremeDB allows indexing by structure field(s) even when the structure
is used as a vector field.

Compact Modifier for Class Declarations
The compact class qualifier limits the size of the class’ elements to 64K. This is
because 2-byte offsets are used instead of 4-byte offsets to address within each
object’s layout. Obviously, there is an overhead imposed by eXtremeDB to
support certain data layouts. A large portion of this overhead is due to the fact
that we support dynamic data types such as vectors, strings and optional fields.
For instance, each string field is implemented as an offset to the actual data. For a
compact class this offset is 2 bytes, otherwise it is 4 bytes. Another example is an
optional field. It is common in applications for some data to not be known at the
time of creation for a particular object. Instead of reserving space for such data
within each object, it can be declared as optional. eXtremeDB will place an offset
to the actual data within the data layout. Then if data is not present (or has been
erased) this offset is null. The space for the structure is only allocated when
necessary to store the data. All these offsets are 2-bytes in the compact model.
Note: The total 64K limit of a compact object size does not include BLOBs
defined for the class. It is still possible to have a large BLOB (> 64K in size) for
compact classes. Addressing within BLOBs is not affected by the compact
declaration.
You can use the –c or –compact mcocomp schema compiler options to make all
classes of a database compact.
For example, consider a class that contains two string fields and one optional
structure. For 1000 objects of this class, the compact declaration would save (at
least) 3*2*1000 = 6000 bytes of overhead (3 fields, 2 bytes less overhead each,
times 1000 objects equals 6,000 bytes).
The only limitation with compact classes is the total size of an object, 64K. If it is
known that objects of a class will always require less than 64K it is beneficial to
use the compact qualifier.
Char<n> vs. String
The “char<n>” declaration defines a fixed length byte array of ‘n’ bytes (where n
<= 64K). The “string” declaration defines a variable length byte array <= 64K.
In the case of char<n>, ‘n’ bytes will be consumed for this field by every object
(instance of the class). It is best to use char<n> when exactly ‘n’ bytes are used in
every instance, as for example in a social security number field that is a required
entry. In the case of a string element, eXtremeDB imposes 2 or 4-bytes overhead
(depending on the compact qualifier, see above) for each instance.

Blob
An object having K allocated blobs has (4 + 8*K) bytes allocated within the
object layout. A 32-byte header is written for each blob when it is stored, within
the first blob page, plus 8 bytes for the 2nd through N-th pages.
Vector
Like string and blob, a minimum of 2-bytes or 4-bytes of overhead is imposed for
each vector of each object. If the vector is of structures or strings, then the
overhead is 2 * (N+1) (compact) or 4 * (N+1) (normal) where N is the number of
elements in the vector. If the vector is a simple type, the overhead is only 2
(compact) or 4 (normal) bytes.
Vectors, unlike blobs, have structure. The elements of a vector can be efficiently
located by offset within the vector. In contrast, blob access methods are like
sequential file access—the blob is a sequence of bytes, exactly as a file is.
Because of this, it is always better to use a vector when the data is regular in
nature and needs to be accessed by the element number.
Variable vs. Fixed Length Elements
This discussion applies to char/string (and Unicode variants) and fixed-size arrays
versus vectors.
eXtremeDB stores all fixed-size elements of a class on a single page and variable
length elements on separate pages (to allow them to grow). The page with the
fixed-size elements contains a 2- or 4-byte offset for each variable length field.
As a consequence, using variable length fields may actually use database space
less efficiently than defining a fixed length element even knowing that a portion
of the fixed length element may go unused.
For example, suppose we have a page size of 100 bytes and a character field that
might hold between 8 and 50 characters, with the average length being 18. A field
definition of char<50> will leave 32 bytes, on average, unused. But a string field
will use 2 (or 4) extra bytes and leave at least 50 bytes unused on the page that
eXtremeDB uses for the string field. In this circumstance, a fixed length character
field would be better. Conversely, a character field that must allow for up to 256
bytes would be better defined as a string field.
The same basic principle applies to the choice of fixed length arrays or variable
length vectors.

Optional Structure Fields
Only structures can be declared optional, so if you want a single field to be

optional, include it in a structure of just the one field. If a structure is declared
optional, 4 bytes (normal) or 2 bytes (compact) are used by eXtremeDB for the
offset. This is the overhead of an optional structure compared to an ordinary
structure. Obviously, if a structure is always present, there is no advantage to
declaring it optional.
Note: Optional structure elements cannot be used in an index.
Voluntary Indexes
The advantages of indexes on a class are fast searches and runtime-maintained

order (for the tree index). Certainly, it requires additional time to create or
modify the index on inserts, updates and deletes and extra memory space to keep
index control structures. Normally, indexes are created at the time of database
instance initialization and are maintained during the life of the instance.
However, some indexes are useful for a particular purpose or task only, and after
that the index is no longer needed. Another common scenario is the need for a
fast processing at a certain point of execution, i.e. an application could be required
to acquire boot stage data faster or have some other period at a high transaction
rate, but without performing any complex searches at that time, while normally
the transaction rate is lower but search operations must be complex and fast. Here
a voluntary index should be used, and index creation deferred until after the
critical performance period, so as not to consume additional CPU cycles.
Voluntary indexes can be created, used for a series of transactions and then
destroyed. Because index creation is a relatively “heavy” operation, it does not
make sense to always create an index if all that is needed is to perform a few
searches at some particular time during execution. In this case the indexes can be
declared voluntary and built as needed prior to the search operation.
Voluntary indexes use the same algorithms and consume the same space as
regular indexes; they differ only by their ability to be created and destroyed
dynamically, and by the fact that voluntary indexes are not created automatically
when the database instance is created.
Also, it’s important to note that only tree type indexes can be voluntary.
Hash and Tree Indexes
eXtremeDB provides hash index algorithms and tree index algorithms of a rich
variety of types, modified for efficient operations in memory. The b-tree index
algorithm is the most general; it can be used for all kinds of searches and for

ordered fetches. A b-tree index can be unique or not unique and is searchable by
ranges and partial key values. In addition to the b-tree, eXtremeDB provides
specialized tree indexes including “Patricia Trie”, “R-Tree” and “Kd-Tree” that
are described in detail in section “Search Methodes” in chapter 5. A hash index is
suitable for search by equality only and can also be unique or non-unique. Hash
indexes can exhibit better average performance, for both insert and lookup
operations, compared to a tree index, but this also depends on the initial hash table
size and on key distribution. A hash index does not guarantee search time;
theoretically all different key values can produce the same hash value, and the
search will be very slow. So eXtremeDB implements a “dynamic hash table” to
optimize performance.
Because a sequential search of the linked list of key values for a given hash can
result in inefficient lookup times, if the size of the hash table is too small with
respect to the total number of objects in the class being indexed, the hash table is
rebuilt when necessary by extending the table size. The initial hash table is
allocated using the estimated number of objects specified for this class in the
database schema. The hash_load_factor parameter (a percentage value) passed
to mco_db_open_dev() in the mco_db_params_t structure is used to determine
when to extend (reallocate) the hash table. For example, if the initial hash table
size is 1000 and the hash_load_factor parameter is 50 (ie. 50%), then the hash
table will be extended when the 501st object is inserted; if hash_load_factor is
150, then the hash table will be extended when the 1501st object is inserted.
Memory consumption is comparable for tree and hash indexes. A rough estimate
for a tree index is 10 bytes per entry (exact size depends on the order of
insertions/deletions); and H + 8 bytes per entry for a hash index, where the
constant H is fixed size space taken by the hash table and can be calculated as E /
5 * 4 where E is the estimated number of hash entries provided by you in the
database schema and 5 is a constant hash factor used by eXtremeDB. If
reallocation of the hash table is necessary, then the size will be H * 2.
List Attribute
Each list declaration will create an additional dynamic structure, which will
consume resources similar to those taken by a tree index. The list declaration is
useful when:
• objects of a class have to be accessed sequentially

• the application does not require any particular order in which objects are
accessed
• there is no suitable index for the class
Note: The “list” declaration is a deprecated feature, supported for backward

compatibility. eXtremeDB version 4.0 introduced the possibility to instantiate a

cursor over the entries of a hash index, obviating the need for list cursors.
Vector of Structs vs. Objects of a Class
An object is characterized, in part, by the fact that when it is deleted all of its
dependent parts are deleted. To accomplish this, these dependent parts of an
object are stored using an object layout manager. In order to express one-to-many
relationships between parts of the object it may be very efficient to use a vector.
For example, a vector of strings will take only 2 or 4 bytes for the vector itself
plus 2 or 4 bytes overhead per string, whereas making a separate object from each
string will require at least one page for each string, so the overhead may be more
significant. Vectors are useful when the object model in an application already
contains dynamically structured items. Say, for example, an application needs to
collect radar measurements for various “Targets”, and that each measurement is a
set of structures. Its database could be defined as follows:
struct Target
{
uint2 x;
uint2 y;
uint2 dx;
uint2 dy;
uint2 type;
};
class Measurement
{
uint4 timestamp;
uint4 radar_number;
vector< Target > targets;
}
Alternatively, in “normalized” form, it could be defined:
class Target2
{
uint4 m_unique_id; // ref to measurement
uint2 x;
uint2 y;
uint2 dx;
uint2 dy;
uint2 type;
}
class Measurement2
{
uint4 m_unique_id;
uint4 timestamp;
uint4 radar_number;
}
As an exercise, build an application that stores an intensive stream of

measurements and makes some simple lookups of the data. For instance, one that
deletes all measurements that are older then 30 minutes and periodically requests
radar beacons (indicated by the field radar_number) that have detected targets of
a given type. It will demonstrate the first (eXtremeDB) approach will perform far

faster and will take far less space because fewer objects have to be maintained and
fewer operations have to be performed.
Note: The direct attribute is not allowed for a vector of structures. (See the
explanation in section “Data Definition Language: struct declaration” above.

Database Implementation
Simple Application – Code Fragments

The sample application in this section demonstrates basic eXtremeDB operations:
creating and connecting to a database, dictionary methods, and simple database
writes and reads. Typically, eXtremeDB based applications use the following
scenarios to implement database access:
Write
Create a database and connect to it. The result is a database handle.
Open a “write” transaction—obtain a transaction handle.
Using the transaction handle, “new” a new object—get a class handle.
Using the class handle, assign application values to object fields via different
flavors of “put” methods.
Finish (commit or rollback) the transaction.
Read
Connect to a database. The result is a database handle.
Open a “read” transaction—obtain a transaction handle.
Using the transaction handle, call one of the search methods: list, hash, tree or
oid-based. If the search was successful, obtain a class handle.
Using “get” methods, assign stored values to application variables.
Commit or rollback the transaction.
The example below illustrates these scenarios in code using the following
extremely simple schema:
Simple Schema
/* Common data type definitions used throughout our code

*/

/* oid is simply unsigned long.

*/
struct Id
{
uint4 seq;
};

/* The class has oid and four elements:

* 2-byte integers "a" and "b", a string "c" and 4-byte integer “h”
* Two indexes are defined
* - one is unique, another is non-unique; implicitly, a
* hash index by oid is supported as well;
* <list> declaration allows sequential cursors for the class
*/
class SimpleClass
{
uint2 a;
uint2 b;
string c;
uint4 h;
unique tree <a, b, c> UniqueIndex;

hash <h> SimpleKey[20000];
tree <b,c> NonUniqueIndex;
oid;
list;
};

Compiling this schema with the schema compiler mcocomp yields the following
interface header file:
Simple Interface
#ifndef __MCO__simple__H__
#define __MCO__simple__H__
#include "mco.h"
#ifndef MCO_DEBUG_MODE
#error DEBUG mode runtime must be used
#endif
/*-------------------------------------------------------*/
/* Handles and Class Codes */
typedef char MCO_Hf [ mco_handle_size ];
typedef struct SimpleClass_{ MCO_Hf h; } SimpleClass;

static const short SimpleClass_code = 1;
typedef struct ABC_ { MCO_Hf h; } ABC;
typedef struct Id_ { MCO_Hf h; } Id;
typedef struct ref_ { MCO_Hf h; } ref;
/*-------------------------------------------------------*/
/* Dictionary */
mco_dictionary_h simple_getDictionary();
/*-------------------------------------------------------*/
/* Object Id definitions */
typedef struct simple_oid__

{
uint4 seq;
}
simple_oid;
static const uint2 simple_oid_size = sizeof(simple_oid);
MCO_RET simple_delete_object( /*IN*/ mco_trans_h t,

/*IN*/ const simple_oid * oid);
uint2 simple_get_class_code( /*IN*/ mco_trans_h t,

/*IN*/ const simple_oid * oid );
/*-------------------------------------------------------*/
/* class SimpleClass methods */
MCO_RET SimpleClass_new( /*IN*/ mco_trans_h t,

/*IN*/ const simple_id *id,
/*OUT*/ SimpleClass *handle );
MCO_RET SimpleClass_delete( /*IN*/ SimpleClass *handle );

MCO_RET SimpleClass_checkpoint( /*IN*/ SimpleClass *handle);
MCO_RET SimpleClass_oid_find( /*IN*/ mco_trans_h t,

/*IN*/ const simple_id *id,
/*OUT*/ SimpleClass *handle);
MCO_RET SimpleClass_oid_get( /*IN*/ SimpleClass *handle,
/*OUT*/ simple_id *id );
MCO_RET SimpleClass_a_get( /*IN*/ SimpleClass *handle,
MCO_RET SimpleClass_a_put( /*IN*/ SimpleClass *handle,


MCO_RET SimpleClass_b_get( /*IN*/ SimpleClass *handle,

MCO_RET SimpleClass_b_put( /*IN*/ SimpleClass *handle,

MCO_RET SimpleClass_c_get( /*IN*/ SimpleClass *handle,

/*OUT*/ char * buf,
/*IN*/ uint2 bufsz,
/*OUT*/ uint2 * len);
MCO_RET SimpleClass_c_size( /*IN*/ SimpleClass *handle,

MCO_RET SimpleClass_c_put( /*IN*/ SimpleClass *handle,
/*IN*/ const char * src,
/*IN*/ uint2 len);
MCO_RET SimpleClass_h_get( /*IN*/ SimpleClass *handle,

MCO_RET SimpleClass_h_put( /*IN*/ SimpleClass *handle,

MCO_RET SimpleClass_from_cursor( /*IN*/ mco_trans_h t,

/*OUT*/ SimpleClass * obj);
MCO_RET SimpleClass_list( /*IN*/ mco_trans_h t,

MCO_RET SimpleClass_SimpleKey_find( /*IN*/ mco_trans_h t,

/*IN*/ uint4 v1,
/*OUT*/ SimpleClass *obj);
MCO_RET SimpleClass_UniqueIndex_cursor(
MCO_RET SimpleClass_UniqueIndex_search(
/*IN*/ OPCODE oper,
/*IN*/ uint2 v1,
/*IN*/ uint2 v2,
/*IN*/ const char *s3,
/*IN*/ uint2 lg3 );
MCO_RET SimpleClass_UniqueIndex_compare(
/*IN*/ uint2 v1,
/*IN*/ uint2 v2,
/*IN*/ uint2 lg3,
/*OUT*/ int *result );
MCO_RET SimpleClass_UniqueIndex_locate(
/*OUT*/ mco_cursor_h c,
/*IN*/ SimpleClass * handle);
MCO_RET SimpleClass_NonUniqueIndex_cursor(
MCO_RET SimpleClass_NonUniqueIndex_search(
/*IN*/ OPCODE oper,
/*IN*/ uint2 v1,
/*IN*/ uint2 lg2 );
MCO_RET SimpleClass_NonUniqueIndex_compare(
/*IN*/ uint2 v1,


/*IN*/ uint2 lg2,
/*OUT*/ int *result );
MCO_RET SimpleClass_NonUniqueIndex_locate(
/*OUT*/ mco_cursor_h c,
/*IN*/ SimpleClass * handle);
/*------------------------------------------------------*/
/* struct Id methods */
MCO_RET Id_seq_get ( /*IN*/ Id *handle,

MCO_RET Id_seq_put ( /*IN*/ Id *handle,

#endif

Open and Connect to a Database
The first code fragment creates a database named “SimpleDb” and allocates space
for the data repository starting at some user-defined memory address. After that,
the application connects to the database and obtains a database handle that is used
later for opening transactions.
The database is created by calling the mco_db_open() function. This function

takes five parameters:
• The database name

• A dictionary handle, normally returned by databasename_getDictionary()
• The device structure address (or array) specifying the memory device(s)
characteristics
• The maximum number of database connections
• The database parameters structure address specifying database open
parameters
#include <simple.h>
#define DATABASE_SEGMENT_SIZE 300 * 1024
const char * db_name = "SimpleDb";
void main()
{
RC rc;
mco_device_t dev;


dev.assignment = MCO_MEMORY_ASSIGN_DATABASE;


rc = mco_db_open_dev(db_name,SimpleDb_get_dictionary(),&dev,1,&db_params );
mco_db_close(db_name); /* close the database */
} else {
printf("\tError %d opening database", rc );
}
mco_runtime_stop();
free( dev.dev.conv.ptr );
return 0;
}

Extending Database Memory
The next code fragment demonstrates how an application can extend the amount
of memory used for data storage. On some CPU architectures the entire memory
arena is split into several non-contiguous memory regions. An application may
need to use multiple segments in order to store the necessary data. It is also
possible that the maximum amount of memory needed for the database is not
known in advance. eXtremeDB addresses these scenarios with the
mco_db_extend() function. The example also demonstrates the usage of the
reporting functions: mco_db_free_pages() and mco_db_total_pages().
#include <simple.h>
#define SEGMENT_SIZE 200 * 1024 /* size of each memory segment */

#define MAX_SEGMENTS 10 /* max number of segments */
const char * db_name = "SimpleDb";

{
MCO_RET rc;
mco_device_t dev[MAX_SEGMENTS];
int n_segments = 0;
mco_db_h db;


dev[0].size = SEGMENT_SIZE;
dev[0].dev.conv.ptr = (void*)malloc( SEGMENT_SIZE );
n_segments++;

/* set total number of connections to the database - mco_db_extend_dev()
requires an additional connection */
/* open a database on the main device with given params */

rc = mco_db_open_dev(db_name,SimpleDb_get_dictionary(),
&dev[0],1,&db_params );
show_mem();
/* extend the database until out of memory segments */
while ( MAX_SEGMENTS > n_segments ) {
dev[n_segments].type = MCO_MEMORY_CONV;
dev[n_segments].assignment = MCO_MEMORY_ASSIGN_DATABASE;
dev[n_segments].size = SEGMENT_SIZE;
dev[n_segments].dev.conv.ptr = (void*)malloc( SEGMENT_SIZE );
rc = mco_db_extend_dev(db_name, &dev[n_segments]);
if ( MCO_S_OK != rc ) break; /* if extend failed exit loop */
n_segments++;
}
show_mem();
/* disconnect from the database */

rc = mco_db_disconnect(db);
rc = mco_db_close(db_name);
while (n_segments) {
free(dev[--n_segments].dev.conv.ptr);
}
} else {
/* Connection failed: free memory allocated for main database device
and close the database without resetting rc */
free(dev[0].dev.conv.ptr);
mco_db_close(db_name);
}
}
/* stop eXtremeDB runtime */
mco_runtime_stop();
}
/* Produces database memory report */

void showMem(mco_db_h db)
{
mco_db_free_pages (db, &freepg);
mco_db_total_pages(db, &totalpg);
printf( "\n Memory: total pages=%d, free pages=%d, used %d Kb",

totalpg, freepg, (totalpg-freepg)*PAGESIZE /1024 );
}
Populating a Database
The next code fragment illustrates writing to the database. The schema compiler-
generated “new” and “put” interfaces are used to create references to the
persistent data and write data into their permanent locations. “new” methods must
be called within the context of a write transaction. The code fragment also
demonstrates how to obtain a transaction handle, and use it with the “new”
method later on.
Note: Once an object is allocated, all the base type fields except optional fields
are by default initialized with zeros. Strings are made empty strings. In the
example below, it is important to note that a (unique) hash index is declared for
the field “h”. Therefore h must be assigned a unique value each time the function
is called, otherwise mco_trans_commit() will return an error code indicating an
attempt to create a duplicate.
#include <simple.h> /* interface definition file */
/* This function creates and writes a new SimpleClass

* object.
* It returns 1 if successful and 0 otherwise.
* It takes two input parameters: a database handle,
* and a value for the new object’s oid. The database
* handle must have been previously created via a
* mco_db_connect() call.
*/
int newObject( /*IN*/ mco_db_h db, /*IN*/ int id0)
{
MCO_RET rc;
/* SimpleClass and simple_id typedefs are generated by
* compiler and found in simple.h

*/
SimpleClass hClass;
simple_oid id;
int donetrn = 0;
char src[] = “abcdefghigklmnop”;
mco_trans_h t;
/* open a transaction, obtain a transaction handle.

* This transaction is a read_write transaction, ran at the
* FOREGROUND priority level.
*/
uint2 va = (uint2)(rand() & 0x3FF);

uint2 vb = (uint2)(rand() & 0x3FFF);
uint4 vh = (uint4)(rand() & 0x3FFC);
if( trace) printf("\n %d %d", va, vb);

id.seq = id0;
/*
* Allocate a new class and return a class handle. Two
* input parameters must be passed to the new interface:
* the transaction handle t and class id. If successful the
* output parameter hClass is a pointer to the newly
* allocated class.
*/
rc = SimpleClass_new (t, &id, & hClass);
if( rc ) goto Err;
/* assign values, two integers and a string. By this

* time, we must know the string size already
*/
SimpleClass_a_put(&hClass, va );
SimpleClass_b_put(&hClass, vb );
rc = SimpleClass_c_put ( &hClass, src, strlen(src));

if( rc ) goto Err;
/* Important!
* We must assign a unique value for this field since
* hash index is declared for it.
*/
SimpleClass_h_put(&hClass, vh );
/*
* commit the transaction unless there is a problem and
* return 1, otherwise rollback and return 0
*/
rc = mco_trans_commit (t); donetrn = 1;
/*
* Important! After the transaction is committed, the
* class handle is no longer valid. Any attempt to
* reference the created object would result in an error
* condition.
*/
if( rc ) goto Err;
return 1;
Err:
printf("\n %d error inserting object: %d", rc);
if( ! donetrn )
mco_trans_rollback (t);
return 0;
}

Search by OID
Searching for an object based on its oid is demonstrated in the next code
fragment. A search operation is performed within the context of a read
transaction.
#include "simple.h"
/* This function looks for the object by its oid, and

* prints out field values */
int oid_search_ex(mco_db_h db, int id0) {
MCO_RET rc;
uint2 a,b, sz=0, actual_sz=0;
char *buff;
SimpleClass hClass;
simple_oid id;
mco_trans_h t;
/* open a read-only transaction */

mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);
id.seq = id0;
/* call the find, pass in a transaction handle and the id

* value we are looking for.
* Possible return values are 0, if the object was found,
* or MCO_S_NOTFOUND if no object with the oid exists.
*/
rc = SimpleClass_oid_find (t, &id, &hClass);
if ( rc == MCO_S_OK /* 0 */ ) {
/* read values */
SimpleClass_a_get ( &hClass, &a );
SimpleClass_b_get ( &hClass, &b );
/* get the size of the string. Note that string size is

* kept internally with the string itself. */
rc = SimpleClass_c_size(&hClass, &sz);
if ( rc == MCO_S_OK )
buff = malloc(sz+1);
if( buff ) {
/* read the string */
rc =SimpleClass_c_get(&hClass,
buff,
(uint2)(sz+1),
&actual_sz);
printf( "\n\t object with oid=%d\n\t a=%d,b=%d,c=%s(%d)\n\n",
id.seq, a, b, buff, actual_sz );
else
printf( "\n\t error:%d", rc );
free(buff);
}
}
mco_trans_commit (t);
return rc == MCO_S_OK;
}

Cursor Operations
The next example demonstrates how an application can use cursors to navigate
the database. When using the MURSIW transaction manager, a cursor created
within a read-only, or read-write transaction, can still be valid after the transaction
is committed. This behavior is different from the behavior of object handles
which are only valid within a transaction. In other words, an application can
create a cursor in one transaction and use it in another.
Note: When using the MVCC transaction manager, a cursor in one transaction
may contain objects that are modified in another transaction. So, applications
should avoid using the same cursor in different transactions.
void cursor(mco_db_h db)

{
MCO_RET rc = 0;
mco_cursor_t csr;
mco_trans_h trn;
SimpleClass simple;
uint2 a, b, size;
uint4 h;
char src[64];
simple_oid oid;
MCO_CURSOR_TYPE ctype;
printf("\n\n Reading using \"UniqueIndex\"\n");
/* start read-only transaction and initialize cursor in

* it.
*/
mco_trans_start ( db,
MCO_READ_ONLY,
&trn);
/* initialize cursor */
rc = SimpleClass_UniqueIndex_index_cursor( trn, &csr );
if (rc != MCO_S_OK)
{
/* if not found for whatever reason - close the
* transaction and bail out
*/
mco_trans_commit(trn);
return;
}
/* commit the transaction. This is only done to
* illuatrate the fact that cursors could be used across
* the transaction boundaries
*/
rc = mco_trans_commit(trn);
/* re-open a transaction */
mco_trans_start ( db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &trn );
/* check whether the cursor is still valid */

if ( MCO_S_OK != mco_cursor_type(&csr, &ctype) )
{
/* cursor invalid */
printf("\n\t cursor invalid");
/* reinitialize */
rc = SimpleClass_UniqueIndex_index_cursor(trn, &csr );
}
/* position the cursor at the first object */

for( rc = mco_cursor_first(trn, &csr);
rc == MCO_S_OK;

rc = mco_cursor_next(trn, &csr) /* advance the cursor (fetch)*/

)
{
/* obtain object handle through the current cursor
*/
rc = SimpleClass_from_cursor ( trn, &csr, &simple );
if ( rc != MCO_S_OK )
{
printf("\n\t error:%d",rc);
break;
}
/* read data out from handle

*/
SimpleClass_a_get ( &simple, &a );
SimpleClass_b_get ( &simple, &b );
SimpleClass_h_get ( &simple, &h );
SimpleClass_c_get ( &simple, src, (uint2)sizeof(src), &size );
/* obtain oid for the object */

rc = SimpleClass_oid_get( &simple, &oid );
printf( "\n\t object with oid=%d\n\t a=%d,b=%d,c=%s(%d)\n\n",
oid.seq,a, b,src, size);
else
printf( "\n\t error:%d", rc );
}
/* commit */
mco_trans_commit(trn);
return;

Chapter 8: eXtremeDB Shared
Memory Applications
Introduction
eXtremeDB allows two or more processes in a multi-processing operating
environment (for example, UNIX, QNX, and Win32 platforms) to share a
common database.
Overview
In order to share the data between multiple processes, the eXtremeDB runtime
creates the database in shared memory. Multiple threads within a process share
the memory of that process. The shared memory that is used by the eXtremeDB
runtime is architecture and operating system dependent. In some environments,
the eXtremeDB runtime uses a System V shared memory mechanism (for
example, Sun Solaris and Linux) while for others it uses POSIX style shared
memory (for example, QNX Neutrino). On Microsoft Windows platforms there
is yet another shared memory mechanism. When a shared memory database is
created, the eXtremeDB runtime allocates two shared memory segments: one for
the eXtremeDB “registry” that keeps information about all database instances
created on the machine, and another segment for the data itself. The eXtremeDB
runtime shared memory implementation details are hidden from applications and
all the interactions with the database are done via eXtremeDB standard interfaces.

Chapter 8: eXtremeDB Shared Memory Applications
Implementation
Start Up
MCO_RET mco_runtime_start (void);
The mco_runtime_start() function makes a database dictionary accessible by the

eXtremeDB runtime. It also creates a semaphore that coordinates dictionary
access. This function must be called once, and only once, from each process
that is accessing the shared memory database before the process attempts to open
or connect to the database. When using the conventional memory version,
mco_runtime_start() must be called once, and only once, before the database is
created by mco_db_open_dev(). If an application calls mco_runtime_start()
multiple times within the same process, either from the same or different threads,
the behavior of the eXtremeDB runtime is unpredictable.
Possible return codes of mco_runtime_start() are:
MCO_E_SHM_ERROR Failed to allocate shared memory.

MCO_S_OKAY Success.
An application can determine at run-time whether or not the version of

eXtremeDB supports shared memory by examining the output of the
mco_get_runtime_info() function, then initialize memory devices accordingly and
open the database as illustrated below:
MCO_RET open_shared_db(
const char * db_name, /* name of the database */
mco_dictionary_h dict, /* pointer to schema */
mco_size_t db_sz, /* size of memory segment for in-mem part
* of the db */
uint2 mem_pg_sz, /* size of memory page */
uint2 max_conn_no /* max. number of connections */
)
{
mco_device_t dev;
/* get runtime info */

mco_get_runtime_info(&info);

dev.assignment = MCO_MEMORY_ASSIGN_DATABASE; /* main database memory */
dev.size = db_sz; /* set the device size */
if (info.mco_shm_supported) {
/* set the device as a shared named memory device */
dev.type = MCO_MEMORY_NAMED;
sprintf( dev.named.name, "%s-db", db_name ); /* set memory name */
dev.named.flags = 0; /* zero flags */
dev.named.hint = 0; /* set mapping address or null it */
} else {
/* set the device as a conventional memory device */

dev.conv.ptr = (void*)malloc( db_sz ); /* allocate memory */

}

mco_db_params_init ( &db_params ); /* initialize the params with
default values */
db_params.mem_page_size = mem_pg_sz;
db_params.db_max_connections = max_conn_no;

return mco_db_open_dev(db_name, dict, dev, 1, &db_params );
}
Note: When using the eXtremeDB Direct Pointer Aritmetic library (DP) it is
necessary to map the shared memory segment to the same virtual memory
address in every process because in the DP implementation eXtremeDB uses
actual memory addresses (i.e. it performs pointer arithmetic to calculate the
locations of objects in an eXtremeDB database). The pointers must be the same
in every running instance of an eXtremeDB-based application, or pointer
arithmetic just doesn’t work. Setting the dev.named.hint parameter to zero causes
eXtremeDB to determine the actual shared memory segment address. But this
could fail when called from a second process attempting to open the shared
database. In this case it is the applications responsibility to provide a valid “hint”
address.
There are several ways to determine where the runtime should map the shared
memory database. You could use the utility provided by your operating system to
gather memory usage information (the process memory map). These utilities will
usually tell you the code, data and stack memory usage of each process running
on your system and the libraries it is using. Examine the output and pick an
address outside any address space. Or you could simply use the address
MAP_ADDRESS that is currently defined as 0x20000000 in the eXtremeDB
SDK samples.
In any event, the address is an address in each process’s virtual memory. An

address should be chosen that is sufficiently far away from the data, stack and
heap segment of every process that will share the database. (Hence the default of
0x20000000 for MAP_ADDRESS; we assume it’s unlikely that any single
process will have data, stack and heap that stretch out to 0x20000000 bytes.)
The above potential issues with respect to the MAP_ADDRESS can be avoided
by using the “Offset” library instead of the Direct Pointer Arithmetic library. The
“Offset” approach calculates an offset from the beginning address of the in-
memory database, to locate objects. Therefore, it does not depend on the in-
memory database starting at a common (and known) location for all processes.
However, the DP pointer arithmetic is about 5%–15% faster than calculating
offsets.

Shut Down
MCO_RET mco_runtime_stop(void);
The function performs clean-up for the process. Every mco_runtime_start() must
be paired with mco_runtime_stop().
Examples
The first process should follow these steps:
void StartDB()
{
MCO_RET rc;
mco_db_h db;
void* start_mem = 0;
/* set fatal error handler *

/* make sure we have cleaned up

before going any further */
mco_close_all_instances();
if(create_new_database)
{
rc = open_shared_db( dbname, DemoShm1_get_dictionary(),
SEGSZ, PAGESIZE, MAX_CONNECTIONS );
}
mco_runtime_stop();
}
Subsequent processes (once the shared memory for the database has been set up
by open_shared_db() in the initial process) should follow these steps:
void DbAttach()
{
MCO_RET rc;
mco_db_h db;

/* connect to a database by name "dbname" */

if ( rc )
{
printf("\n Could not attach to instance: %d\n",
rc);
exit( 1 );
}

/* normal database processing goes here */
mco_runtime_stop();
}
For an example of a shared memory application please see sample “03-

connect/multiprocess”.

Chapter 9: eXtremeDB Remote
Procedure Call (RPC) Applications
eXtremeDB has no inherent remote interface capability, i.e. there is no server
process for clients to connect to and communicate with. Nevertheless, it is
occasionally desirable to permit remote processes to update and/or read an
eXtremeDB all-in-memory or disk-based database. To meet this objective,
McObject provides the means to implement a remote interface via remote
procedure calls (RPC). This chapter summarizes the eXtremeDB RPC
implementation (MCORPC).
Overview
With MCORPC, the developer builds a framework that implements an
application-specific remote access API. The framework will implement C-
language remote procedure calls (RPC) callable from any application capable of
calling a C function. The specific details and the actions performed by these
remote procedures are irrelevant for the framework. An RPC function could be as
simple as the following for adding or retrieving an integer value:
add_record(int value);
update_record(int value);
But more often remote APIs require passing/retrieving compound data to/from the
eXtremeDB database. For this purpose a remote procedure call Interface
Definition Language (“IDL”) compiler mcorcomp is provided. The McObject
IDL contains definitions of the data structures to be passed and the function
prototypes of the interface functions expressed in C-Language (as opposed to a
CORBA-like IDL).
The remote access API, defined in the form of a C-language header file, is
processed by mcorcomp to produce an RPC dictionary, remote (client-side)
interfaces and proxy (server-side) stub functions to be completed by the developer
with implementation-specific application code.
The MCORPC library marshals and de-marshals the application’s data (passed as
arguments by the remote to the proxy functions). Because the framework is
communication protocol independent, an example implementation over TCP/IP is
provided to demonstrate how the network layer functions can be implemented, but
this “plumbing” can be any network medium.

Chapter 9: eXtremeDB Remote Procedure Call (RPC) Applications
RPC Framework
The client-server interface generated by mcorcomp consists of:
• a remote (client) process invoking a function that marshals the function

arguments as specified in the IDL.
• a proxy (server) process that receives the function call, de-marshals the
arguments, and carries out the eXtremeDB-related processing on behalf of the
client process and returns the results to the client process.
Note that it is not, strictly speaking, required that the server process carry out
eXtremeDB-related tasks. The MCORPC mechanism can be used to distribute any
application processing.
In general, the implementation of a client application requires an envelope routine

for each remote method that the client application needs to invoke. The
envelope’s primary responsibility is to
• marshal the input data

• send the request to the designated proxy
• wait for the response
• de-marshal the output data when it arrives
The client application invokes the envelope routine for a remote method in the
same manner it would invoke a local function.
The communication protocol must be implemented by the application and can be

based on virtually any transport. The RPC framework requires implementation of
stream-based read and write functions, a function that reports the availability of
data, and a simple error-handling routing. The transport layer is glued together by
the “context” represented via a generated structure:
mco_rpc_context_t <interface_name>_ctx;
Where <interface_name> is the name of the interface definition file. Among

other things, the context facilitates passing user-defined parameters to the
transport (such as IP address and the port numbers) via the param element.
Function prototypes necessary for the transport implementation are as follows:
int <interface_name>_write_stream
( void * buf_, unsigned int buf_sz,
void * param, unsigned int network_order );
int <interface_name>_read_stream
( void * buf_, unsigned int buf_sz,
void * param, unsigned int network_order,
unsigned int * read_sz );

where
buf is a pointer to the data buffer

buf_sz is the size of the data buffer
void * param mco_rpc_context_t.param value
network_order indicates whether the RPC should translate
the data into the network order
read_sz the number of bytes read
These read/write functions return zero if the read / write was successful,
otherwise a non-zero value.
int <interface_name>_is_data_available
( mco_rpc_context_t * ctx, unsigned int * result );
This function returns 1 in the result output parameter if the transport has any data
and 0 if there is no data pending in the transport. The function takes the context
as a parameter.
And lastly, the error handler API is provided to trap any fatal runtime condition:
void mco_rpc_fatal_error ( int e );
On the client-side, nothing else needs to be done to fully implement the transport.
To build the client, simply link together the transport function implementations,
the dictionary file (<interface_name>_dict.c) and the generated envelopes
(<interface_name>_client.c).
On the server-side, one extra step is necessary. In addition to implementing the

transport layer and the actual “business-logic” functionality, the server application
should call the generated <interface_name>_dispatch API periodically to process
incoming RPC requests. This function returns zero if successful.
Server (eXtremeDB-side) Implementation

The following diagram illustrates the development and deployment steps for
building an RPC server application that accesses an eXtremeDB database:

DDL Schema file

(user input)
DDL Compiler
(SDK)
eXtreme DB generated
implementation
header file
C/C++ header file

specification of Service Code with RPC
Remote Procedures implementations (user
(user input) input)
serialize deserialize
MCORPC Compiler MCORPC envelopes

(mcorcomp)
eXtreme DB and
C Source Code C/C++ Framework runtime
skeleton files Compiler libraries
Development Development
In the diagram above, the boxes shaded grey are steps that require some
programming. The first step is to define the eXtremeDB database schema, which
is processed by the eXtremeDB schema compiler, MCOCOMP.EXE, and
produces the database .C and .H files (this is normal eXtremeDB process).
The second step is to define the function prototypes for the functions to be called
from remote processes, and the data structures to be passed to them. For example,
consider an eXtremeDB class called “channel” and an RPC function:

void * extremedb_channel_update( /* list of arguments */ );
Note that the function prototype is an application-level logical operation. The

remote interface should consist of high level logic that will dispatch meaningfully
large divisions of work in order to avoid too many RPC calls (which would
impede performance).
Please note that all business logic functions that return a value (either through a
parameter or a return code) are called only synchronously, while void functions
could be called asynchronously.
The IDL header file is processed by mcorcomp to produce the RPC dictionary,
envelope functions and server-side proxy functions.
The last step is to implement the server-side functions (in this example, the
extremedb_channel_update() function) that will be called by the client
envelope. This is the step indicated by the grey box labeled “Service code with
RPC implementations”.

Client-side Implementation
The following diagram illustrates the client side development and deployment
steps:
Client Code calls

RPC proxies
C header file
specification of remote
procedures serialize deserialize
MCORPC envelopes
MCORPC Compiler
Framework runtime
C/C++ librates, TCP
C/C++ source file
Compiler communication
Development Development
The mcorcomp compiler generates the proxy and envelope routines. The compiler
recognizes a number of keywords in the form of comments that are used to
declare string, union and array data types used as a part of the interface
declaration (see the example below).
The following C declarations are supported by the compiler and could be a part of
a remote interface:
• Pre-processor directives
• Pre-processor macros

• Atomic data types

• Structures (including nested structures)
• Unions
• Enums
• Strings
• Typed pointers
• Fixed-sized arrays
• Variable-sized arrays
• Input, output and input/output function parameters
Keywords:
• string indicates a string field

• active ( <field_name> ) indicates the type for a union field. The RPC
library needs to “know” how it should treat
the union field. For example, consider the
following declaration:
union {
int a
char * b /* string */
char c;
}abc_union;
struct {
int active_union_member;
abc_union u /*
active(active_union_member) */;
};
The actual type of the union field could be an integer, a zero-terminated string or
a single byte. The active keyword indicates to MCORPC how the application
treats the union: if the application treats the union data type u as “int a”, the
active_union_member should be assigned a zero; if the application treats the
union field as a zero-terminated string, the active_union_member should be set to
1, etc.
• size ( <field_name> ) indicates the size of the array in bytes

• length ( <field_name> ) indicates the sizeof the array in elements
Example:
typedef struct tag_test_struct_variable {
unsigned int a_sz;

unsigned int b_len;

unsigned short * a /* size(a_sz) */;

unsigned short * b /* length(b_len) */;
} test_struct_variable_t, * test_struct_variable_p;
The MCORPC library implements the serialization and de-serialization

algorithms. Linked with the application, the library supports remote calls: no
“special” API is required, just the generated envelopes, stubs and dictionary files,
and the network interface layer implementation.
Example:
Below is an example of an RPC-enabled application. This is a very simple case

of using MCORPC and TCP/IP as a network media layer. The client and the
server parts are implemented separately. Please note the usage of keywords; they
appear inside comments.
Interface Declaration File
#ifndef __TEST_INTERFACE_H
#define __TEST_INTERFACE_H
/* DATA DEFINITIONS */
/* a structure */
typedef struct tag_test_inner_struct {

unsigned char a;
unsigned short b;
unsigned int c;
unsigned long d;
}test_inner_struct_t, *test_inner_struct_p;
/* a struct with a struct inside */
typedef struct tag_test_struct {

unsigned int a;
char b;
char * c /* string */;
unsigned short d;
short e[3];
test_inner_struct_t f[2];
} test_struct_t, *test_struct_p;
/* union definition */
typedef union tag_test_union {

unsigned int a;
char * b /* string */;
} test_union_t ;
/* union usage */
typedef struct tag_test_struct_union {

unsigned char tp;
test_union_t dt /* active (tp) */;
} test_struct_union_t, * test_struct_union_p;
/* a struct with variable-sized fields */

typedef struct tag_test_struct_variable {
unsigned int a_sz;
unsigned int b_len;

unsigned short * a /* size(a_sz) */;

unsigned short * b /* length(b_len) */;
} test_struct_variable_t, * test_struct_variable_p;
/* INTERFACE FUNCTIONS */
/* “atomic” parameter and “atomic” return */

int test_int( int i );
/* atomic parameter and no return. */

void test_void_int( int i );
/* strings */
typedef char * zstring;
typedef int int5[5];
/* zero-terminated array as a parameter */

int test_int_string( zstring str );
/* fixed-size array */
int test_int_int5( int5 ints );
/* structure */
int test_int_pstruct( test_struct_p pstruct );
/* variable-size parameters */
int test_int_variable_len( test_struct_variable_p p );
/* union as a parameter. */
int test_int_union( test_struct_union_p p );
#endif
Compiling the interface definition above results in the generation of the following
C implementation files:
Test_intf_server.h server
Test_intf_server.c server
Test_intf_dict.c server & client
Test_intf_client.c client
To complete the build, these files are compiled and linked with the network
interface layer implementation and the MCORPC library, mcorpc.lib (or
mcorpc_debug.lib).

Chapter 10 : Uniform Database Access (UDA) API
Chapter 10 : Uniform Database

Access (UDA) API
This chapter describes the Uniform Database Access (UDA) API that allows
applications to bypass the native, generated, API and use a pre-defined generic
navigational API to access the database. The new API is exported from the
mcouda library. In order to use the API, applications must include the mcouda.h
header file and link with the mcouda library.
How it Works
The database dictionary generated by the mcocomp schema compiler for all
native API calls is coded into the .c output file. In the output the dictionary is
followed by the generated schema specific TypeSafe API functions. These
comprise the application specific API (Native API) used by the application
developer to access specific fields, indexes and array elements.
Note that these individual functions all call low level “mco_wrapper” functions
like mco_w_new_obj_oid(), mco_w_obj_delete(), etc. These wrapper
functions provide a “generic” interface to access individual database objects.
However their implementation requires an intimate knowledge of the database
dictionary in order to correctly specify the integer values for function parameters.
This is the work of the mcocomp compiler.
The UDA API is designed to provide a similar “generic” interface for applications
that does not require intimate knowledge of the database dictionary. To this end,
the UDA “registry” (Dictionary and Meta-Dictionary) functions provide the
means to enumerate fields, indexes and array elements so that the application
developer can access them by name. The integer values returned by the registry
functions are then passed to UDA access funtions like mco_uda_new(),
mco_uda_delete(), etc. Notice that the type-safety provided by the C compiler
when using native API calls is sacrificed for the flexibility of these generic UDA
access functions.
Typically, an application developer will build “helper” functions like the

following to make calls to the UDA registry functions:
unsigned short get_field_no(unsigned short struct_no, const char *field_name)

{
mco_dict_field_info_t field_info;
if (mco_dict_field_name(metadict, 0, struct_no, field_name, &field_info)
!= MCO_S_OK) return (unsigned short) -1;
return field_info.field_no;
}
unsigned short get_struct_no(const char *struct_name)

{
mco_dict_struct_info_t struct_info;

if (mco_dict_struct_name(metadict, 0, struct_name, &struct_info) !=

MCO_S_OK) return (unsigned short) -1;
return struct_info.struct_no;
}
unsigned short get_index_no(const char *struct_name, const char *index_name)

{
mco_dict_index_info_t index_info;
if (mco_dict_index_name(metadict, 0, get_struct_no(struct_name),
index_name, &index_info) != MCO_S_OK) return (unsigned short) -1;
return index_info.index_no;
}
With these “helpers” in place, the following code snippet demonstrates calls to the
UDA access functions to create and update a database object:
mco_uda_object_handle_t rec;
mco_uda_value_t value;
uint4 key = 1999;
unsigned short Record_struct_no,
key_field_no,
tkey_index_no,
hkey_index_no;
Record_struct_no = get_struct_no("Record");
key_field_no = get_field_no(Record_struct_no, "key");
tkey_index_no = get_index_no("Record", "tkey");
hkey_index_no = get_index_no("Record", "hkey");
rc = mco_uda_new(t, Record_struct_no, 0, 0, 0, &rec);

if (MCO_S_OK == rc)
{
value.type = MCO_DD_UINT4;
value.v.u4 = key;
rc = mco_uda_put(&rec, key_field_no, 0, &value);
}
Registry Functions
Before using any of the registry functions, the application must allocate space for
the meta-dictionary structure and initialize the allocated buffer with the
mco_metadict_init() API. Because an application may use more than one
database, the meta-dictionary is necessary to contain a header and entry for each
database dictionary. The size of the buffer should be obtained via the
mco_metadict_size() function which returns the size of the meta-dictionary in
bytes (including the header).
void mco_metadict_size( unsigned short n_entries, unsigned int * size );
typedef struct tag_mco_metadict_header_t_ {

unsigned short n_maxentries; /* The maximum number of database
* dictonaries*/
unsigned short n_allocated; /* The current number of registered
* dictionaries */
} mco_metadict_header_t;
typedef struct tag_mco_metadict_entry_t_ {

unsigned short dict_no; /* dictionary number (starting with 0) */
unsigned short flags; /* internal flag */
char name[16]; /* dictionary name */
void * dict; /* internal pointer */
void * user_data; /* application data */
} mco_metadict_entry_t;

Now, the mco_metadict_init() function can be called to initialize the meta-

dictionary. It must be called before any of the UDA API functions are called.
MCO_RET mco_metadict_init ( mco_metadict_header_t * metadict,

unsigned int size,
unsigned int flags );
The application needs to allocate the memory buffer and pass the pointer to the
buffer along with its size to the mco_metadict_init() function. The database
runtime will determine the maximum number of databases that can be registered
within the metadictionary. The application can access this number through the
metadict->n_maxentries field.
Note: The mcouda libarary does not allocate any dynamic memory. Therefore,
any memory buffers used by the UDA API are allocated by the application. The
buffer either can be declared statically or allocated on the heap. Descriptors are
often allocated on the application’s stack.
When databases are created in conventional memory (as opposed to shared

memory), the runtime automatically registers (records) all opened databases in the
meta-dictionary. The dictionary name in the meta-dictionary is assigned the
database name.
The flags parameter defines what happens during the initialization of the
dictionary. Currently the only supported value is:
MCO_METADICT_DONT_LOAD_EXISTING_DBS
This flag indicates that the automatic registration of opened databases is not done.
The following API function registers a database dictionary in the meta-dictionary:
MCO_RET mco_metadict_register ( mco_metadict_header_t * metadict,

const char * name,
const void * dict,
const void * user_data );
Note: mco_metadict_init() automatically registers open conventional

memory databases. If the database has been opened after the metadictionary had
been initialized, it can be registered with the mco_metadict_register() API. It
is possible to register database dictionaries before opening the database, or
without opening the database (for example, the application might need to just
receive the specific schema (dictionary) information).
Once the meta-dictionary is registered, the following API functions can be called,
to get the count of databases registered:

MCO_RET mco_metadict_count ( const mco_metadict_header_t * metadict,

/* out */ unsigned short * count );
And to get a pointer to the dictionary based on its number, name, or connection
handle:
MCO_RET mco_metadict_entry( const mco_metadict_header_t * metadict,

unsigned short dict_no,
/* out */ mco_metadict_entry_t ** entry );
MCO_RET mco_metadict_entry_name( const mco_metadict_header_t * metadict,

const char * name,
/* out */ mco_metadict_entry_t ** entry );
MCO_RET mco_metadict_entry_conn( const mco_metadict_header_t * metadict,

const mco_db_h connection,
/* OUT */ mco_metadict_entry_t ** entry );
The count of structures/classes for a database dictionary can be obtained by:
MCO_RET mco_dict_struct_count( const mco_metadict_header_t * metadict,

/* out */ unsigned short * count );
And a pointer to the structure/class based on its number or name by:
MCO_RET mco_dict_struct( const mco_metadict_header_t * metadict,

unsigned short struct_no,
/* out */ mco_dict_struct_info_p struct_info );
MCO_RET mco_dict_struct_name( const mco_metadict_header_t * metadict,

const char * name,
/* out */ mco_dict_struct_info_p struct_info );
UDA Fields and Indexes

Fields in a database class are defined by the following descriptor:
typedef struct tag_mco_dict_field_info_t_ {

unsigned short field_no; /* field sequential number (from 0) */
char * name; /* DDL field name */
mco_dict_type_t type; /* field type: MCO_DD_.*/
unsigned int flags; /* flags: MCO_DICT_FI_... */
unsigned short dimension; /* for arrays (MCO_DICT_FI_ARRAY)
* the dimension for the array for
* scalar-type fields is set to 0
*/
unsigned short struct_no; /* if type == MCO_DD_STRUCT),
* the structure number,
* otherwise, 0
*/
} mco_dict_field_info_t, * mco_dict_field_info_p;
Where the values for flags can be a combination of the following:
#define MCO_DICT_FI_OPTIONAL 1 /* optional field */

#define MCO_DICT_FI_INDEXED 2 /* the field is included in an index */

#define MCO_DICT_FI_VECTOR 4 /* vector-based field */

#define MCO_DICT_FI_ARRAY 8 /* array-based field */
#define MCO_DICT_FI_EVENT_UPD 0x10 /* update event set for field */
The following API functions will obtain a pointer to the field descriptor based on
its number or name:
MCO_RET mco_dict_field( const mco_metadict_header_t * metadict,

unsigned short field_no,
/* out */ mco_dict_field_info_p field_info );
MCO_RET mco_dict_field_name ( const mco_metadict_header_t * metadict,

unsigned short struct_no, const char * name,
/* out */ mco_dict_field_info_p field_info );
Indexes in a database class are defined by the following descriptor:
typedef struct tag_mco_dict_index_info_t_

{
unsigned short index_no;
char * name;
unsigned int flags;
unsigned short n_fields;
} mco_dict_index_info_t, * mco_dict_index_info_p;
Where the values for flags can be a combination of the following:
#define MCO_DICT_II_UNIQUE 1
#define MCO_DICT_II_VOLUNTARY 2
#define MCO_DICT_II_LIST 4
#define MCO_DICT_II_AUTOID 8
#define MCO_DICT_II_TREE 0x10
#define MCO_DICT_II_HASH 0x20
#define MCO_DICT_II_USERDEF 0x40
The following API functions will obtain a pointer to the index descriptor based on
its number or name:
MCO_RET mco_dict_index( const mco_metadict_header_t * metadict,

unsigned short index_no,
/* out */ mco_dict_index_info_p index_info );
MCO_RET mco_dict_index_name( const mco_metadict_header_t * metadict,

const char * name,
/* out */ mco_dict_index_info_p index_info );
When an index is composed of multiple fields, each part of the index is defined by
the following descriptor:
typedef struct tag_mco_dict_ifield_info_t_ {

unsigned short ifield_no; /* segment number (zero-based) */
unsigned short field_no; /* field number that corresponds to
* the segment (within the same
* class) */

unsigned int flags; / *flags: MCO_DICT_IFI_... */

} mco_dict_ifield_info_t, * mco_dict_ifield_info_p;
Where the values for flags can have the default value of ‘0’ or :
#define MCO_DICT_IFI_DESCENDING 1 /* sort descending; 0 = ascending */
The following API functions will obtain a pointer to the index field descriptor
within an index composed of multiple fields based on its number or name:
MCO_RET mco_dict_ifield ( const mco_metadict_header_t * metadict,

unsigned short ifield_no,
/* out */ mco_dict_ifield_info_p ifield_info );
MCO_RET mco_dict_ifield_name( const mco_metadict_header_t * metadict,

const char * name,
/* out */ mco_dict_ifield_info_p ifield_info );
UDA Functions
As explained above, the UDA is a generic API. So the objects are defined by
descriptors that can contain any type of object, and the values stored in them are
defined by descriptors that can handle any type of data.
Instances of database classes are defined by the following generic descriptor:
typedef struct tag_mco_uda_object_handle_t_ {

MCO_Hf obj; /* internal handle */
unsigned short struct_no; /* structure number. It is set by */
/* functions that return an object*/
/* descriptor (mco_uda_new, mco_uda_get
/* etc). */
} mco_uda_object_handle_t, * mco_uda_object_handle_p;
An object descriptor can be received:
1. When a new object is created (see mco_uda_new())

2. From a cursor, after positioning the cursor via the mco_uda_from_cursor()
API
3. For structure-based fields the mco_uda_get()/mco_uda_put() functions
return a handle to the structure. Note that this handle is not a real descriptor
and can’t be used in certain object related functions (for instance,
mco_uda_delete() or mco_uda_checkpoint())

A structure of the following typedef mco_uda_value_t is used to contain values

(of any data types) for an object’s fields (see mco_uda_get() / mco_uda_put()),
and external keys (see mco_uda_lookup() / mco_uda_compare() ).
typedef struct tag_mco_uda_value_t_ {

mco_dict_type_t type; /* type: MCO_DD_... */
union {
unsigned char u1; /* MCO_DD_UINT1 */
unsigned short u2; /* MCO_DD_UINT2 */
unsigned int u4; /* MCO_DD_UINT4, MCO_DD_DATE,
* MCO_DD_TIME */
uint8 u8; /* MCO_DD_UINT8, MCO_DD_AUTOID,
* MCO_DD_AUTOOID */
char i1; /* MCO_DD_INT1 */
short i2; /* MCO_DD_INT2 */
int i4; /* MCO_DD_INT4 */
int8 i8; /* MCO_DD_INT8 */
float f; /* MCO_DD_FLOAT */
double d; /* MCO_DD_DOUBLE */
mco_uda_object_handle_t o; /* MCO_DD_STRUCT */
struct {
unsigned short size;/* buffer size (in bytes) */
unsigned short len; /* string length (in characters)
* or blob size (in bytes)*/
union {
char * c; /* a pointer to the buffer with
* MCO_DD_CHAR or MCO_DD_STRING values */
nchar_t * n; /* a pointer to the buffer with */
* MCO_DD_NCHAR_CHAR or
* MCO_DD_NCHAR_STRING value */
void * v; /* pointer to the buffer to receive
* MCO_DD_BLOB, MCO_DD_REF or
* MCO_DD_OID values */
} p; /* pointer to the buffer */
} p;
} v;
} mco_uda_value_t, * mco_uda_value_p;
In order to use the union to assign values via the mco_uda_put(),

mco_uda_lookup() and mco_uda_compare() functions, the value of the “type”
field has to correspond to the DDL field type. Furthermore:
• For simple data types (int, float, double), assign the value to either v.u1 or
v.u2, etc.
• For strings, arrays and blobs set the pointer to the appropriate pointer type
(v.p.p.c, v.p.p.n or v.p.p.v), and specify the size in bytes in v.p.len.
• For structured fields, mco_uda_put() initializes the descriptor v.o, which, in
turn, is used to set field values for the structure.
In order to read data out (mco_uda_get()):
• For simple types (integer, float, double) the value is returned in the
appropriate field (v.u1, v.u2, etc.,).
• For arrays and blobs it is necessary to assign the appropriate type pointer
(v.p.p.c, v.p.p.n or v.p.p.v) to the buffer that receives the data first, and specify
the size of the data in bytes in v.p.size. mco_uda_get() copies the value into

the buffer (or truncates the output if the buffer is not large enough) and also
returns the actual number of bytes received in the v.p.len.
• For structure fields, mco_uda_get() first initializes the v.o descriptor that will
be used to read the structure field values.
Database Open/Close Functions

After the meta-dictionary has been registered, a single database is opened with the
UDA open API function:
MCO_RET mco_uda_db_open( const mco_metadict_header_t * metadict,

mco_uda_dict_class_storage_t *dict_classes_storage,
uint2 n_dict_classes_storage);
Notice the use of the structure argument mco_uda_dict_class_storage_t, it

allows redefining the storage type (transient/persistent) for classes at runtime. The
structue is defined as:
typedef struct
{
uint2 class_code; /* class code */
uint1 persistence; /* persistent or transient*/
} mco_uda_dict_class_storage_t;
Where the persistence argument can have one of the following values:
#define MCO_UDA_CLASS_DEFAULT 0
#define MCO_UDA_CLASS_TRANSIENT 1
#define MCO_UDA_CLASS_PERSISTENT 2
Each open database is then closed with the folowing:
MCO_RET mco_uda_db_close( const mco_metadict_header_t * metadict,

unsigned short dict_no );
Note: The native API mco_db_open_dev()/mco_db_close() pair may also be

used with the UDA API.
Object New/Delete/Checkpoint Functions

To allocate a new database object call the following API function:
MCO_RET mco_uda_new( mco_trans_h t,

const void * oid,
const mco_uda_value_t * initializers,
unsigned short initializers_no,
/* out */ mco_uda_object_handle_t * obj );

To remove an object from the database call the following API function:
MCO_RET mco_uda_delete ( mco_uda_object_handle_t * obj );
To cause an object’s indexes to be created before the transaction in which it is

instantiated or updated is commited call, the following API function:
MCO_RET mco_uda_checkpoint ( mco_uda_object_handle_t * obj );

Put/Get Functions
To assign the field value for an object or structure use:
MCO_RET mco_uda_put( mco_uda_object_handle_p obj,

unsigned short index,
const mco_uda_value_p value );
In order to assign the value, the application sets the field type in value->type. The
type must correspond to the DDL type:
• for simple types, assign the value (value->v.u1, value->v.u2, etc)

• for strings, arrays and blobs, set the appropriate pointer to reference the value
(value->v.p.p.c, value->v.p.p.n or value->v.p.p.v) and put the size of the value
into v.p.len
• for structure-based fields (MCO_DD_STRUCT), use mco_uda_put() to
initialize the descriptor v.o; Use the descriptor as the first parameter passed to
mco_uda_put()
Example
mco_uda_object_handle_t obj;
MCO_RET rc;
mco_uda_value_t v;
...
rc = mco_uda_new( t, Rec_class_no, 0 /* no oid */,
0 /* no init.*/, 0, &obj);
v.type = MCO_DD_UINT4;
v.v.u4 = 100;
rc = mco_uda_put( &obj, uint4_field_no, 0, &v );
v.type = MCO_DD_STRING;
v.v.p.len = 5;
v.v.p.p.c = "Hello";
rc = mco_uda_put( &obj, string_field_no, 0, &v );
v.type = MCO_DD_BLOB;
v.v.p.len = blob_size;
v.v.p.p.v = blob_value;
rc = mco_uda_put( &obj, blob_field_no, 0, &v );
To get the value of an object’s field:
MCO_RET mco_uda_get ( const mco_uda_object_handle_p obj,

/* out */ mco_uda_value_t * val );
Note: For simple types (integers, float/double) the field value is returned in the
corresponding mco_uda_value_t structure field.

For strings, byte arrays, and blobs, the application needs to allocate a buffer and
pass it into the API:
val->v.p.p.c – MCO_DD_CHAR or MCO_DD_STRING

val->v.p.p.n – MCO_DD_NCHAR_CHAR or
MCO_DD_NCHAR_STRING
val->v.p.p.v – MCO_DD_BLOB or MCO_DD_REF
In addition, val->v.p.size has to hold the size of the buffer (in bytes).
The function copies the field value into the buffer and also returns the actual
number of symbols (bytes for blobs) copied in val->v.p.len.
It is possible to use the mco_uda_get() function to receive the size of the buffer
in advance. If the pointer (val->v.p.p.c, val->v.p.p.n or val->v.p.p.v) is set to zero,
the API just fills out the val.v.p.size, and does not copy the actual value into the
buffer.
For structure-based fields (MCO_DD_STRUCT), the API fills out val->v.o, that
can be used to further pass it into the mco_uda_get() and gain access to the
structure fields.
Example 1
mco_uda_value_t val;
val.type = MCO_DD_STRING;
val.v.p.p.c = 0; /* figure out the actual size we need to allocate
*/
mco_uda_get(&obj, my_field_no, 0, &val);
val.v.p.p.c = malloc(val.v.p.size);
mco_uda_get(&obj, field_no, 0, &val); /* get the value */
....<whatever processing is necessary>
free(val.v.p.p.c); /* free up memory */
Example 2 (strings)
val.v.p.p.c = 0;
val.v.p.size += sizeof(char);
val.v.p.p.c = malloc(val.v.p.size ); /* field value size */
....<processing results>
free(val.v.p.p.c); /* */
Example 3 (Unicode strings)
val.type = MCO_DD_NCHAR_STRING;
val.v.p.p.c = 0;
val.v.p.size += sizeof(nchar);
val.v.p.p.c = malloc(val.v.p.size );
mco_uda_get(&obj, field_no, 0, &val);
....<prcessing results>
free(val.v.p.p.c);

Vector Functions
To get the size (length) of a vector or array:
MCO_RET mco_uda_length( const mco_uda_object_handle_p obj,

/* out */ unsigned short * dimension );
To allocate or free vector elements:
MCO_RET mco_uda_field_alloc( mco_uda_object_handle_p obj,

unsigned short length );
MCO_RET mco_uda_field_free( mco_uda_object_handle_p obj,

unsigned short index );
Cursor Functions
As explained in the section “Search Methods” of 5, a cursor is used to navigate
through a group of records satisfying the search criteria on a specified index. The
function mco_uda_lookup() positions the cursor at the first object that satisfies
the search criteria.
To initialize a cursor based on the index_no:
MCO_RET mco_uda_cursor( mco_trans_h t,

/* out */ mco_cursor_t * cursor );
To position the cursor at the first object that satisfies the search criteria:
MCO_RET mco_uda_lookup( mco_trans_h t,

MCO_OPCODE op,
const mco_uda_value_t * keys,
unsigned short keys_count,
The object handle can then be received using:
MCO_RET mco_uda_from_cursor( mco_trans_h t,

const mco_cursor_t * cursor,
/* out */ mco_uda_object_handle_t * obj );

To obtain information about the index associated with the cursor use:
MCO_RET mco_uda_cursor_info( mco_trans_h t,

const mco_metadict_header_t * metadict,
/* out */ unsigned short * dict_no,
/* out */ unsigned short * struct_no,
/* out */ unsigned short * index_no );
To position the cursor in the index referenced by index_no at a specific object

use:
MCO_RET mco_uda_locate( mco_trans_h t,

const mco_uda_object_handle_p obj,
To compare the value(s) referenced by the current position of the index cursor
with value(s) supplied by the application use:
MCO_RET mco_uda_compare( mco_trans_h t,

const mco_uda_value_t * keys,
unsigned short keys_count,
/* out */ int * cmp_result );
User-defined Indexes
User-defined indexes for the eXtremeDB native API are explained in section
“User-defined Index Functions” in Chapter 5. As with native User-defined
Functions (udf) the UDA API requires that the application supply two compare
functions for tree indexes and two additional hash functions for hash indexes. For
tree indexes, provide one custom function that compares two objects and one that
compares an object to an external key value. For hash indexes, provide two pairs
of functions: two returning a hash code and two compare functions (if a user-
defined tree index is also defined then these compare functions are used for the
hash index as well).
These functions must then be registered with the runtime before cursor functions
can be called on these indexes by passing a parameter of the folowing type:
typedef struct mco_uda_userdef_funcs_t_ {

mco_uda_compare_userdef_f fcomp;
mco_uda_compare_extkey_userdef_f fcomp_ext;
mco_uda_hash_userdef_f fhash;
mco_uda_hash_extkey_userdef_f fhash_ext;
} mco_uda_userdef_funcs_t, *mco_uda_userdef_funcs_h;

The application implements these compare functions with the following function
signatures:
/* Object - Object */
typedef int2(*mco_uda_compare_userdef_f)( mco_uda_object_handle_p obj1,
unsigned short index1,
mco_uda_object_handle_p obj2,
unsigned short index2,
void *user_context);
/* Object - external key(s) */

typedef int2(*mco_uda_compare_extkey_userdef_f)( mco_uda_object_handle_p obj,
mco_uda_value_t *keys,
uint2 keys_count,
These compare functions must return <0, =, or >0 depending on whether the
object value is less than, equal to or greater than the external key value.
In addition, for hash indexes, two custom functions need be implemented with the
following function signatures:
/* Hash - Object */
typedef uint4 (*mco_uda_hash_userdef_f)( mco_uda_object_handle_p obj,
/* Hash – external key(s) */

typedef uint4 (*mco_uda_hash_extkey_userdef_f)( mco_uda_value_t *keys,
uint2 keys_count,
Note that hash index compare functions return 0 if and only if two objects (or
object and external key) are equal from the index point of view. This is necessary
for hash index operations because hash codes may be equal yet the objects (keys)
are not. When mco_uda_lookup() is called with a hash index, it will call the
user-defined compare function to assure that any matching hash code actually
exactly matches the indexed database field value.
Notice also that the compare functions receive application specific data passed
from the caller via the user_context parameter.
In addition to the compare functions, the UDA API requires a udf map for the
internal implementation of index navigation. This udf map must be allocated and
passed to the runtime when the udfs are registered.
The following function queries the database dictionary to determine the amount of
memory to be allocated for the udf map:
MCO_RET mco_uda_get_udfmap_size( const mco_metadict_header_t * metadict,

/* out */ unsigned int * size);

Once the compare functions are defined, the mco_uda_userdef_funcs_t

structure initialized, and the udf map allocated and its size determined, use the
following API to register the udfs with the runtime:
MCO_RET mco_uda_register_udf( const mco_metadict_header_t * metadict,

mco_userdef_funcs_h udf_map,
mco_uda_userdef_funcs_h udf_entry,
The following code snippet demonstrates how to register udf compare functions:
/* allocate udfmap */
mco_uda_get_udfmap_size(metadict, 0, &udf_map_size);
udf_map = (mco_userdef_funcs_h) malloc(udf_map_size);
/* register user-defined compare & hash functions */

udf_entry.fcomp = cmp_obj_obj; /* Object – object compare */
udf_entry.fcomp_ext = cmp_obj_ext; /* Object – key compare */
udf_entry.fhash = hash_obj; /* Hash – object compare */
udf_entry.fhash_ext = hash_ext; /* Hash – key compare */
param.fld_no = name_no; /* will pass name_no to compare/hash functions */
rc = mco_uda_register_udf( metadict, 0, Record_no, tudf_no, udf_map,

&udf_entry, (void*) &param);
if (rc != MCO_S_OK) {
printf("Error in mco_uda_register_udf() : %s\n", mco_ret_string(rc,
0));
exit(0);
}
Note: That the user-context parameter param is used by the external key compare
functions to call mco_uda_get() to retrieve the database field value to be
compared to the external key value.
The registration API must be called for all user-defined indexes, before the
application makes a call to mco_db_connect().
Note: For shared-memory databases, the udf functions must be registered in

EACH process separately.

Collation implementation in UDA

Collation support in the eXtremeDB core API is explained in section “Character
String Collation” in Chapter 5. As with the core collation API, the user-defined
collations are registered with the runtime before connecting to the database, then
essential sorting logic is implemented in the application-supplied compare
functions for tree indexes and/or hash indexes.
To register them with UDA however, it is first necessary to extract the desired
collations from the meta-dictionary. To facilitate this, the UDA Collation API
provides dictionary functions to count and extract collation definitions by name
and number, as well as to determine the collation map size.
Then, as with the core collation API, the helper functions, mco_uda_collate_get()
and mco_uda_collate_get_range(), are provided to facilitate the implementation
of the user-defined collation compare functions called by the UDA cursor
functions.
The following sample code snippets demonstrate a normal sequence of UDA

operations performed by the main application and a UDA collation compare
function.
Example 1
Sample schema:
class Record
{
string name;
uint4 value;
unique tree <name> tstd;
unique tree <name collate C1> tcoll;

hash <name collate C1> hcoll[100];
};
Sample UDA collation compare functions and main application:

int2 coll_cmp(mco_collate_h c1, uint2 len1, mco_collate_h c2, uint2 len2)
{
mco_uda_value_t val1, val2;
char buf1[20], buf2[20];
/* get first object's value */

val1.type = MCO_DD_STRING;
val1.v.p.size = sizeof(buf1);
val1.v.p.p.c = buf1;
mco_uda_collate_get(c1, &val1);
/* get second object's value */

val2.type = MCO_DD_STRING;
val2.v.p.size = sizeof(buf2);
val2.v.p.p.c = buf2;
mco_uda_collate_get(c2, &val2);

/* compare values */
return STR_CMP(buf1, buf2);
}
uint4 coll_hash(mco_collate_h c, uint2 len)
{
char buf[20];
/* get object's value */

val.v.p.size = sizeof(buf);
val.v.p.p.c = buf;
mco_uda_collate_get(c, &val);
/* hash value */
return strlen(buf);
}
int main(void)
{
MCO_RET rc;
…
mco_dict_struct_info_t struct_info;
mco_dict_collation_info_t coll_info;
mco_uda_value_t value;
mco_uda_object_handle_t obj;
char buf[16];
…
/* initialize metadict and register dictionary */

mco_metadict_size(1, &metadict_size);
metadict = (mco_metadict_header_t *) malloc(metadict_size);
mco_metadict_init (metadict, metadict_size, 0);
mco_metadict_register(metadict, db_name, udacoll_get_dictionary(), 0);
/* get id numbers of ‘Record’, field ‘name’, index ‘tcoll’ */

Record_no = get_struct_no("Record");
name_no = get_field_no(Record_no, "name");
tcoll_no = get_index_no("Record", "tcoll");
/* get id number of collation ‘C1’ */
rc = mco_dict_collation_name(metadict, 0, "C1", &coll_info);
if ( MCO_S_OK == rc ) coll_no = coll_info.collation_no;
/* allocate collation map */

mco_uda_get_collmap_size(metadict, 0, &coll_map_size);
coll_map = (mco_collation_funcs_h) malloc(coll_map_size);
/* register collation compare & hash functions */

rc = mco_uda_register_collation(metadict, 0, coll_no, coll_map,
&coll_cmp, &coll_hash);
/* open the database */
/* connect to database */
/* fill database with records setting field s to fruit names */
rc = mco_trans_start(db, MCO_READ_ONLY,
MCO_TRANS_FOREGROUND, &t);
/* using custom collate tree index iterate through the cursor */
rc = mco_uda_cursor(t, Record_no, tcoll_no, &c);
for (rc = mco_cursor_first(t, &c);
MCO_S_OK == rc;
rc = mco_cursor_next(t, &c))
{

mco_uda_from_cursor(t, &c, &obj);

value.type = MCO_DD_STRING;
value.v.p.p.c = buf;
value.v.p.size = 15;
mco_uda_get(&obj, name_no, 0, &value);
printf("\n\t%-15s", buf);
}
}
}
…
}

UDA Programming
As with all eXtremeDB applications, the runtime must be started and initialized,
memory devices defined and an error handler mapped. Then the Meta-dictionary
is initialized and the database opened with mco_uda_db_open(). The following
example demonstrates a typical sequence for opening a database for UDA access:
int main(void)
{
MCO_RET rc;
mco_device_t dev[4];
unsigned int n_dev, metadict_size;
mco_error_set_handler(&sample_errhandler);
mco_get_runtime_info(&info);
/* initialize memory device(s) */

dev[0].size = DATABASE_SIZE;
dev[0].type = MCO_MEMORY_NAMED;
sprintf( dev[0].dev.named.name, "%s-db", dbName );
dev[0].dev.named.flags = 0;
dev[0].dev.named.hint = 0;
} else {
dev[0].dev.conv.ptr = (void*)malloc( DATABASE_SIZE );
}
n_dev = 1;
if (info.mco_disk_supported) {
dev[1].assignment = MCO_MEMORY_ASSIGN_CACHE
dev[1].size = CACHE_SIZE;
dev[1].type = MCO_MEMORY_NAMED;
sprintf( dev[0].dev.named.name, "%s-cache", dbName );
dev[1].dev.named.flags = 0;
dev[1].dev.named.hint = 0;
} else {
dev[1].dev.conv.ptr = (void*)malloc( CACHE_SIZE );
}
/* setup memory device for main database storage */

dev[2].assignment = MCO_MEMORY_ASSIGN_PERSISTENT;
sprintf( dev[2].dev.file.name, FILE_PREFIX "%s.dbs", dbName );
/* setup memory device for database log */

dev[3].assignment = MCO_MEMORY_ASSIGN_LOG;
sprintf( dev[3].dev.file.name, FILE_PREFIX "%s.log", dbName );
n_dev += 3;
}

db_params.disk_page_size = PSTORAGE_PAGE_SIZE;
/* initialize meta-dictionary for 1 entry */

mco_metadict_size(1, &metadict_size);
metadict = (mco_metadict_header_t *) malloc(metadict_size);

mco_metadict_init (metadict, metadict_size, 0);
/* register dictionary */
rc = mco_metadict_register( metadict, dbName,
udaopen_get_dictionary(), 0);
printf("Register dictionary : %s\n", mco_ret_string(rc, 0));
printf("Runtime disk support : %s\n",

(info.mco_disk_supported) ? "yes" : "no");
/* change class persistence */

patches[0].class_code = get_class_code("InMem");
patches[0].persistence = MCO_UDA_CLASS_TRANSIENT;
patches[1].class_code = get_class_code("OnDisk");
patches[1].persistence = (info.mco_disk_supported) ?
MCO_UDA_CLASS_PERSISTENT :
MCO_UDA_CLASS_TRANSIENT;
/* open database */
rc = mco_uda_db_open(metadict, /* meta-dictionary header */
0, /* dictionary number */
dev, /* memory devices */
n_dev, /* num of memory devices */
&db_params, /* db parameters */
patches, /* class persistence patches */
2); /* num of persistence patches */
printf("mco_uda_db_open() : %s\n", mco_ret_string(rc, 0));
/* destroy the database */
mco_uda_db_close(metadict, 0);
}
mco_runtime_stop();
if (!info.mco_shm_supported) {
if (info.mco_disk_supported) {
}
}
return 0;
}
For a better understanding of the UDA API, build and run in the debugger the
samples in the directory “samples/16-uda”.
UDA Error Handling

All API functions that return MCO_RET, always return MCO_S_OK when
successful, or one of the error codes described in Appendix B. User-level UDA
functions perform input parameter verifications. If the verifications fail, UDA
functions return the MCO_E_ILLEGAL_PARAM to the application. For more
detailed explanation of error codes see Appendix B.

Chapter 11 : eXtremeDB XML Interfaces
Chapter 11 : eXtremeDB XML

Interfaces
eXtremeDB XML Subsystem
The eXtremeDB schema compiler option “–x” causes mcocomp to generate
interface functions to retrieve, create and replace (update) the contents of an
object with the content of an XML string. These XML interface functions can be
used, for instance, in concert with the eXtremeDB event notifications to cause
data to be shared between eXtremeDB and other systems when something of
interest changes in the database.
The XML interfaces can also be used to facilitate simple schema evolution by
exporting the database to XML, adding/dropping fields, indexes, and classes, and
importing the saved XML into the new database.
Implementation
Standards
eXtremeDB XML is developed in accordance with the W3C SOAP encoding
recommendations. These recommendations can be found on the W3C web site:
http://www.w3.org/TR/soap12-part0/
eXtremeDB XML schema encoding is developed in accordance with the W3C

standards described in the following documents:
http://www.w3.org/TR/xmlschema-0
Functions/Structures Related to XML Subsystem
XML Policy
The XML policy structure describes various behavior options of the XML
interface, such as string/blob encoding, XML indentation, etc. It is available for
the user at compile time and also at runtime via the policy APIs. These APIs and

available options are described in the file include/mcoxml.h that can be found in
your eXtremeDB installation.
This structure defines the current XML behavior:
typedef struct mco_xml_policy_t_

{
MCO_NUM_BASE int_base;
MCO_NUM_BASE quad_base;
MCO_TEXT_CODING text_coding;
MCO_TEXT_CODING blob_coding;
MCO_FLOAT_FORMAT float_format;
mco_bool indent;
mco_bool ignore_field;/* ignore field in xml, that is not in
* class */
mco_bool encode_spec; /* ignored */
mco_bool encode_lf; /* encode line feeds */
mco_bool encode_nat; /* encode national chars (code > 127) */
mco_bool truncate_sp; /* truncate trailing spaces in chars */
mco_bool use_xml_attrs; /* alternative XML representation, using */
/* attributes */
}
mco_xml_policy_t;
The legal values for MCO_NUM_BASE, MCO_TEXT_CODING, and

MCO_FLOAT_FORMAT are also found in mcoxml.h, in the following typedefs:
typedef enum MCO_NUM_BASE_E

{
MCO_NUM_OCT = 8,
MCO_NUM_DEC = 10,
MCO_NUM_HEX = 16
}
MCO_NUM_BASE;
typedef enum MCO_TEXT_CODING_E

{
MCO_TEXT_ASCII = 1,
MCO_TEXT_BINHEX = 2,
MCO_TEXT_BASE64 = 3
}
MCO_TEXT_CODING;
typedef enum MCO_FLOAT_FORMAT_E

{
MCO_FLOAT_FIXED = 1,
MCO_FLOAT_EXPONENT = 2
}
MCO_FLOAT_FORMAT;
The encode_spec field from the mco_xml_policy_t structure is ignored. The value
for the encode_spec field is set to MCO_YES regardless of the application’s
settings, meaning that all special characters except the LF are encoded. This
runtime behavior conforms to the XML encoding specifications.
Default XML Policy

The default policy is defined as follows:
static mco_xml_policy_t default_xml_policy = {

MCO_NUM_DEC, /* int_base is decimal */
MCO_NUM_HEX, /* quad_base is hexadecimal */

MCO_TEXT_ASCII, /* text_coding (strings) are ASCII */

MCO_TEXT_BASE64, /* blob_coding is Base64 */
MCO_FLOAT_EXPONENT, /* float_format is exponential */
MCO_YES, /* text is indented */
MCO_YES, /* ignore incorrect autoid & oid */
MCO_NO /* all fields must be present in */
/* the incoming XML */
MCO_YES, /* encode special chars (< 32) */
MCO_YES, /* encode line feeds */
MCO_NO, /* encode national chars (> 127) */
MCO_YES /* truncate trailing spaces */
MCO_NO, /* don't use attributes */
};
Note: If oid or the autoid fields are specified for a class, the runtime processes
them as follows:
• when the XML is exported from the eXtremeDB database the via the
classname_xml_get() method, the oid and autoid fields are always
written into the XML;
• for classes declared with oid, an XML document used to create a new
object must contain the oid values;
• whether specified or not, the oid value is ignored if the XML document is
used to update an object;
• whether specified or not, the autoid field is ignored in the incoming XML;
XML Policy Interface Functions

void mco_xml_get_default_policy( /*OUT*/ mco_xml_policy_t * p);
Returns the default policy.
MCO_RET mco_xml_get_policy( /*IN*/ mco_trans_h t,

/*OUT*/ mco_xml_policy_t * p);
Returns the current policy. Note that it requires a transaction context. This
function always returns MCO_S_OK.
MCO_RET mco_xml_set_policy( /*IN*/ mco_trans_h t,

/*IN*/ const mco_xml_policy_t * p);
Changes the policy. This function requires a transaction started with

MCO_READ_WRITE. The new policy becomes effective immediately when
changed, not when the transaction is committed.
Example:
void ChangeXMLOutput(void)
{
mco_xml_policy_t policy;
mco_trans_h t ;

mco_trans_start(db,MCO_READ_WRITE,MCO_TRANS_FOREGROUND,&t);
mco_xml_get_policy(t, &policy);
policy.text_coding = MCO_TEXT_BASE64; //BASE64;

policy.blob_coding = MCO_TEXT_BASE64; //BASE64;
policy.ignore_field = MCO_YES;
mco_xml_set_policy(t, &policy);
}
Note on using alternative XML representation with attributes: Attributes are

used to provide additional information about elements. In essence, XML
elements can have attributes in the start tags, just like HTML. Please see the
W3C specifications for details. Data can be stored in child elements or in
attributes. There are no rules about when to use attributes, and when to use child
elements. However, according to the SOAP recommendations, XML should try
to avoid using attributes. To enable the XML representation using attributes, set
the use_xml_attrs field of the mco_xml_policy_t structure to MCO_YES:

policy.use_xml_attrs = MCO_YES;
mco_xml_set_policy(t, &policy);
Generated XML Interfaces

The following schema [fragment] is used throughout the sample code:
struct Country
{
char<3> c_code;
string name;
};
struct Date
{
uint1 day;
char<3> month;
uint2 year;
};
struct Passport
{
char<8> series;
uint8 number;
};
struct Address
{
Country country;
string city;
string street;
};
struct Phone
{
int2 country;
char<5> area;
char<7> number;
};

struct Residence
{
Address where;
Date since;
optional Phone phone;
};
struct Office
{
Address where;
string organization;
string position;
vector<Phone> phone;
};
class Person
{
string name;
Residence residence[3];
optional Office office;
optional Phone mobile;
blob description;
oid;
autoid[100];
list;
};
For each class declared in the schema, the DDL compiler generates the following
interfaces:
MCO_RET classname_xml_get( /*IN*/ classname *handle,

/*IN*/ void * stream_handle,
/*INOUT*/ mco_stream_write o_stream);
This function takes an object handle, a handle to a stream that is available to

receive XML bytes, and a pointer to a function that writes data into the stream.
See the following example:
MCO_RET output_data (mco_db_h db)

{
mco_trans_h t;
Person p_obj;
MCO_RET rc;
mco_cursor_t c;
FILE * f = fopen(xml_name, "w");
rc = mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);

if(rc) return rc;
// instantiate a 'list' cursor for Person class

rc = Person_list_cursor(t, &c);
if(rc) return rc;
// iterate over the Person objects
rc = mco_cursor_first(t, &c);
if(rc) return rc;
for(;;)
{
// get Person object handle from the cursor
rc = Person_from_cursor ( t, &c, &p_obj);
if(rc) return rc;
// write Person object as XML through the 'doprint' function
rc = Person_xml_get(&p_obj, f, &do_print);
if(rc) return rc;
// advance the cursor, break loop if end of list
if ( mco_cursor_next(t, &c) != MCO_S_OK )
break;

}
return rc;
}
int do_print( /*IN*/ void *stream_handle, /*IN*/ const void * from,
/*IN*/ unsigned nbytes)
{
// this simple example just writes the bytes to a FILE
// another example could write the bytes to a
// pipe to another process, a socket, etc.
FILE * f = (FILE*)stream_handle;
return fwritef( from, sizeof(char), nbytes, f );
}
In this example, the stream is a file handle. It could have been a pipe to another
process or any other type of stream. Person_xml_get() encodes the person
object referenced by the handle, and calls the helper function do_print(),
passing do_print() the stream (file, in this case) handle, a pointer to the XML
string, and the length of the XML string.
Note that the classname_xml_get() does not create any XML header. If an
application is going to create a monolithic document with multiple XML objects,
possibly of different XML tags, the application must create the appropriate XML
header and footer entries to make it a legal XML document.
MCO_RET classname_xml_put( /*IN*/ classname *handle,

/*IN*/ const char * xml );
This interface updates an existing object with the XML description. The entire
object is updated; it is not possible to selectively update fields. The class handle
will have been established from, for example, a cursor, and carries the transaction
content with it (hence, it is not necessary to pass a transaction handle to this
function).
MCO_RET classname_xml_create( /*IN*/ mco_trans_h t,

/*IN*/ const char * xml,
/*OUT*/ classname * handle);

Creates an object from the XML description. The first parameter is the
transaction context, the second parameter is the xml and the output parameter is
the new object handle. For example:
static int insert(mco_db_h db, FILE *file)

{
MCO_RET rc;
mco_trans_h t;
Person p_obj;
int c;
for (;;) /* loop on xml-objects */

{
/* start the transaction */
rc = mco_trans_start(db, MCO_READ_WRITE,
MCO_TRANS_FOREGROUND, &t);
if(rc) return rc;
/* skip everything before a class tag */

do c = getc(file);
while ( c != '<' && c != EOF );
if ( c == EOF )
break; /* end-of-file, thus finished */
ptr = 1;
xml[0] = '<';
/* read the class name */

do
{
c = getc(file);
xml[ptr++] = c;
} while ( c != '>' && c != EOF );
if ( c == EOF )
break; /* finished */
xml[ptr] = 0;
if ( strcmp(xml, "<Person>") != 0 )
exit(1);
/* read xml-object */
for (;;)
{
c = getc(file);
if ( c == EOF )
{
xml[ptr] = 0;
printf("\n Error - unexpected end of file: %s\n",
&xml[(ptr>50)?ptr-50:0]);
exit(4);
}
xml[ptr++] = c;
if ( c == '>' )
{
xml[ptr] = 0;
/* closing tag, the object is complete */
if ( strcmp("</Person>", &xml[ptr-9]) == 0 )
break;
}
}
/* write database.. */
rc = Person_xml_create(t, xml, &p_obj);
exit(0);
/* ... and start over */

ptr = 0;
} /* loop on xml-objects */
return rc;
}

In the above example, an XML string is parsed to find the class tag. This example
only deals with Person objects, so if class tag is for any other type of object, the
procedure terminates. Otherwise, the XML is read up to the closing Person tag
(“</Person>”). Then Person_xml_create() is called, passing a transaction
handle, the XML string, and the handle of a Person object that will reference the
newly created object.
Note that if an XML document was created by eXtremeDB and contains just a
single XML object, there is no need to parse the opening and closing tags; just
read the entire XML into a buffer and pass it to classname_xml_create().
If the Person object represented in the XML string already existed in the database
(i.e. an attempt was made to violate a unique key constraint),
Person_xml_create() would fail with code MCO_S_DUPLICATE. If this is a
possibility, the application code should be written to find the key values within
the XML string and attempt to locate the object and call classname_xml_put()
or classname_xml_create() accordingly. See the following psuedo-code:
for (;;) /* loop on xml-objects */

{
/* start the transaction */
/* skip everything before a class tag */
/* read the class name */

if ( strcmp(xml, "<classname>") != 0 )
exit(1);
/* read xml-object */
/* extract key fields */
/* write database.. */
if((rc = classname_fieldname_search(. . .)) == MCO_S_OK)
rc = classname_xml_put(. . .);
else
rc = classname_xml_create(. . .);
exit(0);
/* ... and start over */
} /* loop on xml-objects */
Output Data Format

The XML output always starts with the <classname>. If the value of the indent
field in the policy is 1, a new line is started and indent_element spaces are added.
Indent_element is defined as two spaces. Every time it is necessary to step down
one level in the hierarchy, this process is repeated. A new level is started when a
structure, a vector or an array is processed.
• uint1, uint2, uint4, int1, int2, int4 are written as the appropriate integers
(unsigned or signed). The base is defined in the policy field int_base, which
could be 8, 10 (default) and 16. Octal numbers are coded with an initial “0”

(for example, 04567), and hexadecimal numbers with an initial “0x” (for
example, 0x1526A)
• uint8, int8 are written similar to the other integers, but decimal format is not
allowed. The policy field quad_base sets up either the octal or hexadecimal
(default) form
• autoid is formatted as uint8
• date, time are formatted as uint4
• output for float and double depend on the policy’s float_format field value.
MCO_FLOAT_FIXED means floats are formatted as fixed-point numbers
(0.0025); MCO_FLOAT_EXPONENT means floats are represented in
exponent form—integer part, fractional part and exponent (for example, 2.5e-
3)
• oid, ref are coded into hexadecimal format
• Blob depends on the blob_coding value: MCO_TEXT_ASCII means ASCII
as defined by the XML specifications, MCO_TEXT_BINHEX means
BINHEX (2 hexadecimal digits / byte) [according to RFC 1741],
MCO_TEXT_BASE64 means Base64 (default) [according to RFC 1521,
section 5.2—“Base64 Content-Transfer-Encoding”]
• char, string formatted in accordance to the policy’s text_coding field:
MCO_TEXT_ASCII means ASCII as defined by the XML specifications
(default), MCO_TEXT_BINHEX means BINHEX, and
MCO_TEXT_BASE64 means Base64
Input Data Format

• uint1, uint2, uint4, int1, int2, int4 are decoded from the incoming XML
natively. The numeric base is determined based on the first characters; 0 for
octal, 0x – for hex, otherwise the number is considered decimal. For example,
0643 – octal, 419 – decimal, 0x1A3 – hex; hexadecimal notation allows using
capital or low-case letters (0x1a3 is okay)
• The number taken from the XML string is converted to the database type
according to the dictionary. If the conversion is not possible without data loss,
the appropriate error is emitted (for example, MCO_E_XML_INVINT,
MCO_E_XML_INTOVF)
• uint8, int8 are decoded similar to other integers, but the base must be 8 or 16.
If the base is 10 the error code MCO_E_XML_INVBASE is returned
• Date, time are considered uint4
• Float, double could be in exponential form or with the decimal point (i.e.
1.23e4 or 11.222), otherwise the error MCO_E_XML_INVFLT is returned
• ref should be written as hex value with 2 hexadecimal digits per byte

• blob is decoded in accordance with the blob_coding value:

MCO_TEXT_ASCII – ASCII as defined by the XML specifications,
MCO_TEXT_BINHEX – BINHEX (2 hexadecimal digits / byte),
MCO_TEXT_BASE64 – Base64 (default)
• char, string are decoded according to the text_coding value:
MCO_TEXT_ASCII – ASCII (default), MCO_TEXT_BINHEX – BINHEX
with two hex digits/per bye, MCO_TEXT_BASE64 – Base64. If the resulting
string length is longer than the size indicated by the dictionary, the string is
truncated. If it is shorter, it is padded with zeros (‘\0’).
MCO_RET classname_xml_schema( /*IN*/ mco_trans_h t,

/*IN*/ void * stream_handle,
/*INOUT*/ mco_stream_write o_stream );
Exports the XML schema for the class classname. It must be called in the context
of a MCO_ READ_ONLY transaction. The schema format is compliant with the
W3C specifications, which can be found in the following documents:
The current implementation of this function only supports the default XML
policy.
Example
MCO_RET output_schema (mco_db_h db )

{
mco_trans_h t;
FILE * f = 0;
MCO_RET rc;
mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);
f = fopen("person.xsd", "w");
Person_xml_schema( t, f, &do_print );
fclose(f);
return rc;
}
The XML schema can be used in conjunction with tools, such as XMLSpy, to
validate the content of XML documents, which you can do prior to attempting to
import the XML document into eXtremeDB.
The XML document can also be used with XSLT, which is a language for
transforming XML documents into other XML documents. This might be a
necessary step to exchange data between eXtremeDB and an external system if
they don’t have identical representations for the data being exchanged. For
further information on XSLT, please refer to http://www.w3.org/TR/xslt20/.

Appendix A: Data Types and
Structure Definitions
Base Data Types
signed<n> signed n-byte integer, n = 1, 2, 4, or 8. signed<2> some_short;
unsigned<n> unsigned n-byte integer, n = 1, 2, 4, or 8 unsigned<4> hall;
Float 4-byte real number float rate;
Double 8-byte real number double rate;
char<n> Fixed-length byte array less than 64K in char<8> name;

to hold it.
nchar<n> Fixed-length byte array less than 64K in In your schema:

This is suitable for many asian languages In your C/C++ program:
nchar_t uname[21];
wchar<n> Fixed-length byte array of less than 64K in In your schema:

locale setting In your C/C++ program:
wchar_t uname[21];
string Variable-length byte array of less than 64K string description;

char<n> type.

Appendix A: Data Types and Structure Definitions
nstring Variable-length byte array less than 64K in In your schema:


nchar_t *uname;
wstring Variable-length byte array less than 64K in In your schema:


wchar_t *uname;
enum User-defined type consisting of a set of enum FLOWCONTROL {

XON, CTS
named constants called enumerators. The };
name of each enumerator is treated as a
constant and must be unique within the class using_enum {
scope of the DDL schema where the enum FLOWCONTROL fc;
};
blob Binary data object; a byte array of any size, blob jpeg;
vector Variable-length array of any data type, such struct A

terminology).
ref Explicitly declared reference to an object by ref of_an_object;

below.)
autoid_t Explicitly declared reference to an object by vector <autoid_t> refd_objs;

object’s autoid.
date A 4-byte unsigned integer by default, as date start_dt;

defined by mco_date in mco.h
time A 4-byte unsigned integer by default, as time start_tm;

boolean A fixed size array of bits. boolean bits[8];

double coordinates rectangle<float>

time start_tm[3];
Database Types
/* database handle (pointer) */

struct mco_db_t_;
typedef struct mco_db_t_ *mco_db_h;
/* dictionary handle (pointer) */

struct mco_dictionary_t_;
typedef struct mco_dictionary_t_ *mco_dictionary_h;
Device Types
typedef struct mco_device_t_
{
unsigned int type; /* none, conv, named, file, raid, etc */
unsigned int assignment; /* none, db-segment, cache-segment, db-file,
log-file */
mco_size_t size;
union {
struct {
void * ptr;
} conv;
struct {
char name[MCO_MAX_MEMORY_NAME];
unsigned int flags;
void * hint;
} named;
struct {
int flags;
char name[MCO_MAX_FILE_NAME];
} file;
struct {
int flags;
char name[MCO_MAX_MULTIFILE_NAME];
mco_offs_t segment_size;
} multifile;
struct {
int flags;
char name[MCO_MAX_MULTIFILE_NAME];
int level;
} raid;
struct {
unsigned long handle;
} idesc;
} dev;
} mco_device_t, *mco_device_h;
Transaction Priorities
typedef enum MCO_TRANS_PRIORITY_E_
{

MCO_TRANS_IDLE = 1,
MCO_TRANS_BACKGROUND = 2,
MCO_TRANS_FOREGROUND = 3,
MCO_TRANS_HIGH = 4,
MCO_TRANS_ISR = 77
}
MCO_TRANS_PRIORITY;
Transaction Types
typedef enum MCO_TRANS_TYPE_E_
{
MCO_READ_ONLY = 0,
MCO_READ_WRITE = 1
}
MCO_TRANS_TYPE;
/* transaction handle (pointer) */

struct mco_trans_t_;
typedef struct mco_trans_t_ *mco_trans_h;
Search Operation Codes

typedef enum MCO_OPCODE_E_
{
MCO_LT = 1,
MCO_LE = 2,
MCO_EQ = 3,
MCO_GE = 4,
MCO_GT = 5
}
MCO_OPCODE;
Cursor Types
typedef enum MCO_CURSOR_TYPE_E_
{
MCO_LIST_CURSOR = 0,
MCO_TREE_CURSOR = 1,
MCO_HASH_CURSOR = 2
}
MCO_CURSOR_TYPE;
typedef struct mco_cursor_t_
{
char c[mco_cursor_size];
}
mco_cursor_t, /* cursor (structure) */
* mco_cursor_h; /* cursor handle (pointer) */
Class Statistics
typedef struct mco_class_stat_t_
{
uint4 objects_num;
uint4 total_pages; /* index pages are not counted */
uint4 core_space; /* in bytes, not counting blobs */
}
mco_class_stat_t,
* mco_class_stat_h;

Event Types
typedef enum MCO_EVENT_TYPE_E_
{
MCO_EVENT_NEW,
MCO_EVENT_UPDATE,
MCO_EVENT_DELETE,
MCO_EVENT_DELETE_ALL,
MCO_EVENT_CHECKPOINT,
MCO_EVENT_CLASS_UPDATE
}
MCO_EVENT_TYPE;
typedef enum MCO_HANDLING_ORDER_E_

{
MCO_BEFORE_UPDATE,
MCO_AFTER_UPDATE
}
MCO_HANDLING_ORDER

Appendix B: eXtremeDB Return
Codes
The eXtremeDB runtime manages three different types of return code:
• Status Codes (S): indicate runtime states that can and will occur during
normal database operations;
• Non-Fatal Error Codes (E): indicate runtime error conditions that the
application can manage by responding appropriately; and
• Fatal Error Codes (ERR): indicate bugs in the application code that render
the eXtremeDB runtime unable to safely continue execution.
The following code snippet from the eXtremeDB runtime file mcocsr.c illustrates
the use of each of the three return code types:
MCO_RET mco_cursor_check(mco_trans_h t, mco_cursor_h c0)

{
mco_scursor_t* c = (mco_scursor_t*)(c0);
mco_objhandle_t obj;
MCO_RET rc;
#ifdef MCO_CFG_CHECKLEVEL_1
if (!CHECK_TRANSACTION((mco_db_connection_h)t))
{
mco_stop(MCO_ERR_TRN + 5);
}
#endif
if (c->mark != MCO_DB_CURSOR_MARK || c->endmark != MCO_DB_CURSOR_MARK)

{
return MCO_E_INVALID_HANDLE;
}
rc = mco_w_obj_from_cursor(t, c0, c->csr_class_code, &obj);

if (rc != MCO_S_OK)
{
return MCO_E_CURSOR_INVALID;
}
return MCO_S_OK;
}
Note the S and E type return codes are returned directly from the runtime function
to be managed by the application code. But the ERR type return codes are passed
to the runtime function mco_stop() to terminate execution. It calls the internal
function mco_stop__() with the actual file name and line number in source code
where the error condition was detected. The filename and line number can be seen
by examining the call stack to further aid in locating the source of the bug.
In the sections that follow, different categories of return codes are defined. But be
aware that eXtremeDB is a product in continual evolution and, once printed, this
list of error codes might become obsolete. The final source of return code values

Appendix B: eXtremeDB Return Codes
for your release of eXtremeDB can always be found in the mco.h header file—
please consult this file if you encounter an error code not defined herein.
Status Codes
The following table lists status codes that might be returned by the eXtremeDB
runtime. These return codes do not indicate error conditions but rather runtime
states that can and will occur during normal database operations.
Mnemonic constant Value Description

MCO_S_OK 0 Success
MCO_S_BUSY 1 The instance is busy
MCO_S_NOTFOUND 10 Search operation failed
MCO_S_CURSOR_END 11 Cursor cannot be moved
MCO_S_CURSOR_EMPTY 12 No objects in index
MCO_S_DUPLICATE 13 Index restriction violated (duplicate)
MCO_S_EVENT_RELEASED 14 Waiting thread was released
MCO_S_DEAD_CONNECTION 15 Database connection is invalid
Non-fatal Error Codes

The following tables list non-fatal error codes that can be managed by the calling
application. They are grouped by the type of error condition they indicate.
General Error Conditions
The following error codes in the range of 50–99 (and 999) indicate non-fatal error
conditions that the eXtremeDB runtime might return, that don’t fall into a specific
category:

MCO_E_CORE 50 Runtime Core error code base
MCO_E_INVALID_HANDLE 51 Normally invalid handle causes a
fatal error
MCO_E_NOMEM 52 No memory
MCO_E_ACCESS 53 Trying to use read only transaction
for write operation
MCO_E_TRANSACT 54 Transaction is in the error state
MCO_E_INDEXLIMIT 55 Vector index out of bounds
MCO_E_EMPTYVECTOREL 56 Vector element was not set (at given
index)
MCO_E_UNSUPPORTED 57 Unsupported call
MCO_E_EMPTYOPTIONAL 58 Optional structure has not been set
MCO_E_EMPTYBLOB 59 Attempting to read a null blob


MCO_E_CURSOR_INVALID 60 Cursor is not valid
MCO_E_ILLEGAL_TYPE 61 Search by oid: type is not expected
MCO_E_ILLEGAL_PARAM 62 For example, illegal search operation
type
MCO_E_CURSOR_MISMATCH 63 Cursor type and object type are
incompatible
MCO_E_DELETED 64 Trying to update object, deleted in
current transaction
MCO_E_LONG_TRANSACTON 65 Transaction length is more than
MCO_TRN_MAXLENGTH
MCO_E_INSTANCE_DUPLICATE 66 Duplicate database instance
MCO_E_UPGRADE_FAILED 67 Transaction upgrade failed; write
transactions is progress
MCO_E_NOINSTANCE 68 Database instance is not found
MCO_E_OPENED_SESSIONS 69 Failed to close database as it has
opened connections
MCO_E_PAGESIZE 70 Page size is not acceptable
MCO_E_WRITE_STREAM 71 Write stream failure
MCO_E_READ_STREAM 72 Read stream failure
MCO_E_LOAD_DICT 73 Db_load: incompatible dictionary
MCO_E_LOAD_DATA 74 Db_load: corrupted image
MCO_E_VERS_MISMATCH 75 Version mismatch
MCO_E_VOLUNTARY_NOT_EXIST 76 Voluntary index is not created
MCO_E_EXCLUSIVE_MODE 77 Database is in the exclusive mode;
try later
MCO_E_MAXEXTENDS 78 Maximum number of extends
reached
MCO_E_HIST_OBJECT 79 Operation is illegal for old version of
the object
MCO_E_SHM_ERROR 80 Failed to create/attach to shared
memory
MCO_E_NOTINIT 81 Runtime was not initialized
MCO_E_SESLIMIT 82 Sessions number limit reached
MCO_E_INSTANCES_LIMIT 83 Too many instances
MCO_E_MAXTRANSSIZE_LOCKED 84 The maximum size of transaction
cannot be changed
MCO_E_DEPRECATED 85 Obsolete feature
MCO_E_NOUSERDEF_FUNCS 86 Database has userdef indexes, but
custom functions not registered
MCO_E_CONFLICT 87 MVCC conflict
MCO_E_INMEM_ONLY_RUNTIME 88 There are persistent classes in the
schema but runtime is in-memory
only
MCO_E_ISOLATION_LEVEL_NOT_SUPPORTED 89 Requested isolation level is not
supported


MCO_E_REGISTRY_UNABLE_CREATE_CONNECT 90 Unable to make a new registry
MCO_E_REGISTRY_UNABLE_CONNECT 91 Unable to connect to the existing
registry
MCO_E_REGISTRY_INVALID_SYNC 92 Unable to lock eXtremeDB registry
MCO_E_EVAL 999 Evaluation version limitation
Disk Manager Error Codes
The following error codes in the range of 100–199 indicate non-fatal error
conditions that might be returned by the eXtremeDB disk manager:

MCO_E_DISK 100 General disk manager error
MCO_E_DISK_OPEN 101 Error opening database or log file
MCO_E_DISK_ALREADY_OPENED 102 mco_disk_open() is called more than
once
MCO_E_DISK_NOT_OPENED 103 Attempt to access disk objects without
prior call to mco_disk_open(). Note
that this could be returned as an error or
as a fatal assertion via mco_stop()
MCO_E_DISK_INVALID_PARAM 104 Invalid parameter is passed into the disk
manager API
MCO_E_DISK_PAGE_ACCESS 105 Unbalanced PIN/UNPIN or similar error
caused by incorrect access to the page
pool
MCO_E_DISK_OPERATION_NOT_ALLOWED 106 Unsupported operation. This error is
returned from the mcovtmem library
when there is an attempt to use the disk
manager.
MCO_E_DISK_ALREADY_CONNECTED 107 Pesistent storage already connected
MCO_E_DISK_KEY_TOO_LONG 108 Index key too long
MCO_E_DISK_TOO_MANY_INDICES 109 Too many indeces in persistent classes
MCO_E_DISK_TOO_MANY_CLASSES 110 Too many persistent classes
MCO_E_DISK_SPACE_EXHAUSTED 111 Persistent storage is out of space
MCO_E_DISK_INCOMPATIBLE_LOG_TYPE 112 Incompatible database log type
MCO_E_DISK_BAD_PAGE_SIZE 113 Page size is not acceptable
MCO_E_DISK_SYNC 114 Failed operation on sync. primitive
MCO_E_DISK_PAGE_POOL_EXHAUSTED 115 Too many pinned disk pages
MCO_E_DISK_CLOSE 116 Error closing the database or log file
MCO_E_DISK_TRUNCATE 117 Error truncating the database or log file
MCO_E_DISK_SEEK 118 Error performing seek on database or log
file
MCO_E_DISK_WRITE 119 Error writing to the database or log file


MCO_E_DISK_READ 120 Error reading from the database or log
file
MCO_E_DISK_FLUSH 121 Error performing synch operation with
database or log file
MCO_E_DISK_TOO_HIGH_TREE 122 Index too big
XML Error Codes
conditions that might be returned by the eXtremeDB runtime while processing
XML I/O:

MCO_E_XML 200 XML Error Code base
MCO_E_XML_INVINT 201 Invalid integer
MCO_E_XML_INVFLT 202 Invalid float
MCO_E_XML_INTOVF 203 Integer overflow
MCO_E_XML_INVBASE 204 Invalid base for quad (10)
MCO_E_XML_BUFSMALL 205 Buffer too small for double in fixed point format
MCO_E_XML_VECTUNSUP 206 Unsupported base type for vector
MCO_E_XML_INVPOLICY 207 Invalid xml policy value
MCO_E_XML_INVCLASS 208 Object class and XML class not the same
MCO_E_XML_NO_OID 209 First field in xml object MUST be oid
MCO_E_XML_INVOID 210 Invalid data in oid field (hex code)
MCO_E_XML_INVFLDNAME 211 Invalid field name
MCO_E_XML_FLDNOTFOUND 212 Specified field was not found
MCO_E_XML_INVENDTAG 213 Invalid closing tag name
MCO_E_XML_UPDID 214 Cannot update oid or AUTOID
MCO_E_XML_INVASCII 215 Invalid xml coding in ascii string
MCO_E_XML_INCOMPL 216 XML data incomplete (closing tag not found)
MCO_E_XML_ARRSMALL 217 Array is not large enough to hold all elements
MCO_E_XML_INVARREL 218 Invalid name of array element
MCO_E_XML_EXTRAXML 219 Extra XML found after parsing
MCO_E_XML_NOTWF 220 Not well-formed XML
MCO_E_XML_UNICODE 221 Bad unicode conversion
MCO_E_XML_NOINDEX 222 Some class has no index, so database cannot be
exported

Network Error Codes
Network requests:

MCO_E_NW 300 Network Error Code base
MCO_E_NW_FATAL 301 Fatal error
MCO_E_NW_NOTSUPP 302 Network is not supported
MCO_E_NW_CLOSE_CHANNEL 303 Error closing network channel
MCO_E_NW_BUSY 304 Network busy (another listener?)
MCO_E_NW_ACCEPT 305 Accept failed
MCO_E_NW_TIMEOUT 306 Timeout exceeded
MCO_E_NW_INVADDR 307 Invalid address specified
MCO_E_NW_NOMEM 308 Host name is too long
MCO_E_NW_CONNECT 309 Connect failed
MCO_E_NW_SENDERR 310 Send failed
MCO_E_NW_RECVERR 311 Receive failed
MCO_E_NW_CLOSED 312 Connection was closed (by the remote
host)
MCO_E_NW_HANDSHAKE 313 Handshake failed
MCO_E_NW_CLOSE_SOCKET 314 Error closing socket
MCO_E_NW_CREATEPIPE 315 Error creating pipe
MCO_E_NW_SOCKET 316 Socket error
MCO_E_NW_SOCKOPT 317 Setsockopt() error
MCO_E_NW_BIND 318 Bind error
MCO_E_NW_SOCKIOCTL 319 Ioctlsocket() error
MCO_E_NW_MAGIC 320 Bad magic (aka bad juju)
MCO_E_NW_INVMSGPARAM 321 Invalid parameter of the message
MCO_E_NW_WRONGSEQ 322 Wrong sequence number of the message
MCO_E_NWMCAST_CLOSE_SOCKET 323 Error closing socket
MCO_E_NWMCAST_SOCKET 324 Socket error
MCO_E_NWMCAST_SOCKOPT 325 Setsockopt() error
MCO_E_NWMCAST_RECV 326 Unable to receive data from multicast
socket
MCO_E_NWMCAST_BIND 327 Unable to bind multicast socket
MCO_E_NWMCAST_NBIO 328 Unable to ioctl multicast socket

HA Error Codes
conditions that might be returned by the eXtremeDB High Availability runtime
while processing API requests:

MCO_E_HA 400 HA Error Code base
MCO_E_HA_PROTOCOLERR 401 Error in protocol
MCO_E_HA_TIMEOUT 402 Timeout
MCO_E_HA_IOERROR 403 Input/output error
MCO_E_HA_MAXREPLICAS 404 Too many replicas requested
MCO_E_HA_INIT 405 Error initializing HA
MCO_E_HA_RECEIVE 406 Error receiving HA message
MCO_E_HA_NO_AUTO_OID 407 Auto_oid not declared in schema
MCO_E_HA_NOT_INITIALIZED 408 HA not initialized
MCO_E_HA_INVALID_MESSAGE 409 Invalid HA message ID
MCO_E_HA_INVALID_PARAMETER 410 Invalid parameter
MCO_E_HA_INVCHANNEL 411 Invalid channel handler specified
MCO_E_HA_INCOMPATIBLE_MODES 412 Incompatible HA mode
MCO_E_HA_CLOSE_TEMP 413 Close temporary multicast channel
MCO_E_HA_MULTICAST_NOT_SUPP 414 Multicast is not configured
MCO_E_HA_HOTSYNCH_NOT_SUPP 415 Hot synchronization is not configured
MCO_E_HA_ASYNCH_NOT_SUPP 416 Asynchronous replication is not
configured
MCO_E_HA_NO_MEM 417 Not enough memory to create
communication layer descriptors
MCO_E_HA_BAD_DESCRIPTOR 418 ha_t structure is not cleared before
creation of base channel
MCO_E_HA_CANCEL 419 Connection was canceled
MCO_E_HA_WRONG_DB_MAGIC 420 Wrong DB magic
MCO_E_HA_COMMIT 421 Master commit error, break commit loop
MCO_E_HA_MANYREPLICAS 422 Trying to attach too many replicas
MCO_E_KILLED_BY_REPLICA 423 The master connection is killed by
replica
MCO_E_NOT_MASTER 424 Master mode not set
MCO_E_HA_STOPPED 425 Replication was stopped
MCO_E_HA_NOWRITETXN 426 Read-write transactions are prohibited
on replica
MCO_E_HA_PM_BUFFER 427 Page memory buffer error
MCO_E_HA_NOT_REPLICA 428 Current mode is not “replica”
MCO_E_HA_BAD_DICT 429 Master’s dictionary incompatible with
replica’s one (in binary schema
evolution mode)


MCO_E_HA_CONVERSION 430 Conversion error in binary schema
evolution
UDA Error Codes
UDA API requests:

MCO_E_UDA 500 UDA Error Code base
MCO_E_UDA_TOOMANY_ENTRIES 501 The metadictionary is full
MCO_E_UDA_NAME_TOO_LONG 502 Dictionary name is too long
MCO_E_UDA_DUPLICATE 503 Attempt to register a duplicate name
MCO_E_UDA_DICT_NOTFOUND 504 Dictionary is not found
MCO_E_UDA_STRUCT_NOTFOUND 505 Structure/class is not found
MCO_E_UDA_FIELD_NOTFOUND 506 Field not found
MCO_E_UDA_INDEX_NOTFOUND 507 Index not found
MCO_E_UDA_IFIELD_NOTFOUND 508 Index segment is invalid
MCO_E_UDA_STRUCT_NOT_CLASS 509 structure is not a class
Incorrect number of keys
MCO_E_UDA_WRONG_KEY_NUM 510 mco_uda_lookup() and
mco_uda_compare()
Incorrect types mco_uda_lookup()
MCO_E_UDA_WRONG_KEY_TYPE 511
and mco_uda_compare()
MCO_E_UDA_WRONG_OPCODE 512 Incorrect OPCODE
Called mco_uda_length() for a
MCO_E_UDA_SCALAR 513
scalar type
Incorrect call to
MCO_E_UDA_NOT_DYNAMIC 514
mco_uda_field_alloc/free()
MCO_E_UDA_WRONG_VALUE_TYPE 515 Incorrect type
MCO_E_UDA_READONLY 516 Access violation
Fatal Error Codes

The following table lists fatal error code bases that indicate bugs in the
application code that render the eXtremeDB runtime unable to safely continue
execution. These error code bases identify the type of error. When a fatal error
occurs, an incremental number is added to the base which further indicates the
source of the problem to McObject technical support.


MCO_ERR_DB 100000 Database head
MCO_ERR_DICT 110000 Dictionary
MCO_ERR_CURSOR 120000 Cursors
MCO_ERR_PMBUF 130000 Page manager (internal allocator) buffer
MCO_ERR_COMMON 140000 Common routines
MCO_ERR_HEAP 150000 Heap manager
MCO_ERR_OBJ 160000 Object allocator
MCO_ERR_BLOB 170000 Blob operation
MCO_ERR_FREC 180000 Record allocator
MCO_ERR_VOLUNTARY 190000 Voluntary index
MCO_ERR_LOADSAVE 200000 Db save and load
MCO_ERR_PGMEM 210000 Page memory
MCO_ERR_EV_SYN 220000 Syncronous events
MCO_ERR_EV_ASYN 230000 Async events
MCO_ERR_EV_W 240000 Event wrappers
MCO_ERR_XML_W 250000 XML serialization
MCO_ERR_XML_SC 260000 XML schema
MCO_ERR_BTREE 270000 Btree
MCO_ERR_HASH 280000 Hash
MCO_ERR_RECOV 290000 Recovery
MCO_ERR_INST 330000 Db instance
MCO_ERR_TRN 340000 Transaction
MCO_ERR_TMGR 370000 Trans. manager
MCO_ERR_SYNC 400000 General sync
MCO_ERR_ORDER 450000 Ordering and hash
MCO_ERR_SEM 460000 Semaphores
MCO_ERR_SHM 470000 Shared memory
MCO_ERR_SER 500000 Serialization
MCO_ERR_HA 510000 High Availability
MCO_ERR_DB_NOMEM 520000 Insufficient memory
MCO_ERR_OBJECT_HANDLE 530000 Invalid object handle
MCO_ERR_UNSUPPORTED_FLOAT 540000 Support of float and double types is disabled
MCO_ERR_DB_NOMEM_HASH 560000 Insufficient memory in hash index
MCO_ERR_DB_NOMEM_HEAP 570000 Insufficient memory in heap manager
MCO_ERR_DB_NOMEM_TRANS 580000 Insufficient memory in transaction manager
MCO_ERR_DB_NAMELONG 590000 Database name is too long
MCO_ERR_DB_VERS_MISMATCH 600000 Version of eXtremeDB runtime mismatch
MCO_ERR_RUNTIME 610000 Invalid type of sync. library
MCO_ERR_INMEM_ONLY_RUNTIME 620000 There are persisten classes in the schema but
runtime in-memory only
MCO_ERR_DISK 700000 General disk error
MCO_ERR_DISK_WRITE 710000 Unable to write to persistent storage


MCO_ERR_DISK_READ 720000 Unable to read from persistent storage
MCO_ERR_DISK_FLUSH 730000 Unable to commit file system buffers to disk_
MCO_ERR_DISK_CLOSE 740000 Error closing persistent storage
MCO_ERR_DISK_TRUNCATE 750000 Unable to truncate persistent storage
MCO_ERR_DISK_SEEK 760000 Unable to seek in persistent storage
MCO_ERR_DISK_OPEN 770000 Unable to open pesistent storage
MCO_ERR_DISK_ALREADY_OPENED 780000 Pesistent storage already opened
MCO_ERR_DISK_NOT_OPENED 790000 Pesistent storage has not been opened
MCO_ERR_DISK_INVALID_PARAM 800000 Invalid parameters value
MCO_ERR_DISK_PAGE_ACCESS 810000 Error reading a page from persistem media
MCO_ERR_DISK_INTERNAL_ERROR 820000
MCO_ERR_DISK_OPERATION_NOT_ALLOWED 830000 Unsupported operation
MCO_ERR_DISK_ALREADY_CONNECTED 840000 Pesistent storage already connected
MCO_ERR_DISK_TOO_MANY_INDICES 850000 Too many indexes in persistent classes
MCO_ERR_DISK_TOO_MANY_CLASSES 860000 Too many persistent classes
MCO_ERR_DISK_SPACE_EXHAUSTED 870000 Persistent storage out of space
MCO_ERR_DISK_PAGE_POOL_EXHAUSTED 880000 Too many pinned disk pages
MCO_ERR_DISK_INCOMPATIBLE_LOG_TYPE 890000 Incompatible database log type
MCO_ERR_DISK_BAD_PAGE_SIZE 900000 Page size is not acceptable
MCO_ERR_DISK_SYNC 910000 Faild operation on sync. primitive
MCO_ERR_CHECKPIN 920000 Unbalance pin/unpin
MCO_ERR_CONN 930000
MCO_ERR_REGISTRY 940000
MCO_ERR_INDEX 950000
MCO_ERR_VTMEM 960000 In-memory only runtime
MCO_ERR_VTDSK 970000 Mixed runtime
MCO_ERR_RTREE 980000 Rtree
MCO_ERR_LAST 999999 The last one
The following snippet of code from the eXtremeDB runtime file mcobtree.c
demonstrates the use of error code base values plus incremental number:
MCO_CFG_INLINE void mco_check_tree_head(CON, mco_btree_t * h )

{
mco_btree_node_h r = READ_NODE(con, pm, h->root);
uint2 rkind = r->header.kind;
if ( !h || h->mark != MCO_BTREE_MARK || h->endmark !=

MCO_BTREE_MARK || (rkind != MCO_PAGE_TREE_LEAF && rkind !=
MCO_PAGE_TREE_NODE ) )
{
mco_stop(MCO_ERR_BTREE + 0);
}
}

if (height > 0)
{
if ( node->header.kind != MCO_PAGE_TREE_NODE )
{
}
}
else
{
if ( node->header.kind != MCO_PAGE_TREE_LEAF )
{
}
}
Deprecated Error Code
The MCO_ERR_UNSUPPORTED_DOUBLE error code is no longer generated

by the database runtime. Note that the error code is still present in the mco.h for
backward compatibility purposes only.

Appendix C: Samples, API
Reference
Please see the ./samples directory of the installation for examples of using the
eXtremeDB API.
Function references can be found in this document’s accompanying eXtremeDB

Reference Manual.

Appendix D: Frequently Asked
Questions
Q Are indexes specified for persistent classes saved on disk?
A Yes.
Q Are indexes specified for persistent classes always kept in cache?

A No, the same LRU mechanism as for object pages is used for caching
indexes.
Q Please explain voluntary indexes that are specified persistent.

A There is no difference from the user-level API perspective. There are, of
course, different performance characteristics of persistent and in-memory
indexes. Indexes defined as hash indexes for persistent classes are
implemented as b-tree indexes by the eXtremeDB run-time. kd-tree
indexes, which are inherently unbalanced, may not exhibit good
performance with persistent classes with a large quantity of objects.
Q Is a separate file created for each persistent class?

A No, one or more files are created and maintained for the entire database.
Q How about when using two different schema declarations?

A Each of the two databases are created in their own file(s).
Q When a database is specified persistent, what is the relationship with the

mco_db_save() interface ?
A mco_db_save() doesn’t save persistent classes. The mco_disk_save()
API must be used to create a backup of persistent data.
Q Is mco_db_dump() useless when using persistent database ?

A Since pointers are eliminated, mco_db_dump() can be omitted. The
mco_db_load()/mco_db_save() can perform a bulk read/write of the
entire in-memory portion of the database.
Q How many objects of a class can be created?

A The number of objects is not limited. The database file size is currently
limited by the offset type (In the multi-file configuration, each file size is

Appendix D: Frequently Asked Questions
limited by the offset type). If it is a 4-byte integer (which is the current

default), then the maximum size of a database is 4GB. It is possible to
redefine the offset type, recompile the eXtremeDB runtime, and provide
support for larger databases, if the file system also supports it.
Q What file systems are supported?

A The abstract I/O layer currently supports Win32, UNIX-98 and UNIX
(POSIX) file systems. It is not difficult to provide support for any given
file system. A note on file system support for UNIX: The runtime can be
configured for POSIX API support that only provides read, write, lseek
API. Alternatively, the Unix-98 standard could be used. In addition to the
POSIX API, the Unix-98 API provides support for pread() and
pwrite() APIs that perform a thread-safe positioning and read and write
to a specified file location. The API uses special macros that force the
usage of 64-bit API even on 32-bit systems. If the POSIX API is linked in,
and the underlying operating system is a 32-bit OS, the database file size
is limited to 2G. The Unix-98 API does not limit the database to 2G.
The POSIX API is implemented in the mcofuni.c file and the extended
API is implemented in the mcofu98.c. By default, the build system is
configured to use Unix-98 standard for all flavors of Unix: Linux, Sun OS,
HP-UX, AIX.
Q How can we optimize the access speed at runtime?

A The key requirement for good performance is “locality of reference”. To
achieve good locality of reference, it is beneficial to insert data into the
persistent classes in ascending order of the key. It is difficult, however, to
provide general recommendations at the application level of how to
improve the locality of reference.

eXtremeDB User Guide

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

eXtremeDB User Guide

Încărcat de

Drepturi de autor:

Formate disponibile

User's Guide

(c) 2001-2010 McObject LLC

Chapter 1: eXtremeDB Documentation.............................................................................7

Chapter 2 : Database Concepts ............................................................................................9

Chapter 3 : Operational Overview .....................................................................................24

eXtremeDB User’s Guide i

Access Statements ............................................................................................... 35

Chapter 3 : Operational Overview .....................................................................................44

Chapter 4: Application-independent Interface...................................................................62

ii eXtremeDB User’s Guide

Concurrency Control ...................................................................................................82

Chapter 5: Generated Type-safe Interfaces........................................................................96

Chapter 6: Programming Considerations.........................................................................148

eXtremeDB User’s Guide iii

Obtaining runtime cache statistics..................................................................... 156

Chapter 7: Database Design and Implementation............................................................174

Chapter 8: eXtremeDB Shared Memory Applications....................................................196

Chapter 9: eXtremeDB Remote Procedure Call (RPC) Applications .............................201

Chapter 10 : Uniform Database Access (UDA) API .......................................................210

iv eXtremeDB User’s Guide

Vector Functions................................................................................................ 221

Chapter 11 : eXtremeDB XML Interfaces.......................................................................230

Appendix A: Data Types and Structure Definitions ........................................................240

Appendix B: eXtremeDB Return Codes ..........................................................................245

Appendix C: Samples, API Reference.............................................................................256

Appendix D: Frequently Asked Questions ......................................................................257

eXtremeDB User’s Guide v

However, eXtremeDB has been engineered to also accommodate “persistent”

eXtremeDB enables developers to combine both database paradigms—in-memory

transient class classname {

persistent class classname {

eXtremeDB User’s Guide 1

• Three levels of transaction logging—Undo, Redo and No Logging—to

eXtremeDB is available for a variety of embedded, real-time, desktop, and server

2 eXtremeDB User’s Guide

• eXtremeDB Transaction Logging: a version of eXtremeDB that supports

eXtremeDB User’s Guide 3

different from ordinary database business applications such as payroll or

Representative application domains are Internet appliances (IA), process control,

a) soft or hard real-time requirements,

An important constraint is to fully support the conventional DBMS ACID

To these goals was added the need to provide an easy-to-use object-oriented

• Minimize resources necessary to support persistent data—essentially

4 eXtremeDB User’s Guide

• Provide native support for dynamic data structures such as variable-length

As more diverse application data persistence features were requested, a number of

• Choice of “optimistic” or “pessimistic” transaction management by simply

eXtremeDB User’s Guide 5

• Data import/export using standard XML

6 eXtremeDB User’s Guide

• eXtremeDB is implemented as a set of C-language libraries that are linked into

Consequently the eXtremeDB documentation is divided into separate volumes. The

eXtremeDB User’s Guide 7

classname_fieldname_put( classname *handle, <type> value)

type-identifier | struct-name field-name;

8 eXtremeDB User’s Guide

Chapter 2 : Database Concepts

Each database has groupings of elements. We call the definition of a group of

eXtremeDB User’s Guide 9

A blob is an arbitrarily large stream of bytes; untyped “opaque” data. From

A vector is an arbitrarily large stream of typed data (vector elements), such as a

Note: eXtremeDB structures, in contrast to C or C++ structures, cannot be

When working with traditional relational or hierarchical data models, application