Documente Academic
Documente Profesional
Documente Cultură
Design Overview of
Automatic Storage Managment,
Oracle 10g
1 Introduction
Automatic Storage Managment is a file system and volume manager built into the database
kernel that allows the practical management of thousands of disk drives with 24x7 availability.
It provides management across multiple nodes of a cluster for Oracle Real Application Clusters
(RAC) support as well as single SMP machines. It automatically does load balancing in parallel
across all available disk drives to prevent hot spots and maximize performance, even with
rapidly changing data usage patterns. It prevents fragmentation so that there is never a need to
relocate data to reclaim space. It does automatic online disk space reorganization for the
incremental addition or removal of storage capacity. It can maintain redundant copies of data to
provide fault tolerance, or it can be built on top of vendor supplied reliable storage mechanisms.
Data management is done by selecting the desired reliability and performance characteristics for
classes of data rather than with human interaction on a per file basis.
ASM solves many of the practical management problems of large Oracle databases. As the size
of a database server increases towards thousands of disk drives, or tens of nodes in a cluster, the
traditional techniques for management stop working. They do not scale efficiently, they become
too prone to human error, and they require independent effort on every node of a cluster. Other
tasks, such as manual load balancing, become so complex as to prohibit their application. These
problems must be solved for the reliable management of databases in the tens or hundreds of
terabytes. Oracle is uniquely positioned to solve these problems as a result of our existing Real
Application Cluster technology. Oracle’s control of the solution ensures it is reliable and
integrated with Oracle products.
This document is intended to give some insight into the internal workings of ASM. It is not a
detailed design document. It should be useful for people that need to support ASM.
2 Instances
Automatic Storage Managment is part of the database kernel. It is linked into “/bin/oracle” so
that its code may be executed by all database processes.
One portion of the ASM code allows for the start-up of a special instance called a ASM Instance.
ASM Instances do not mount databases, but instead manage the metadata needed to make ASM
files available to ordinary database instances. Both ASM Instances and database instances have
access to some common set of disks. ASM Instances manage the metadata describing the layout
of the ASM files. Database instances access the contents of ASM files directly, communicating
with a ASM Instance only to get information about the layout of these files. This requires that a
second portion of the ASM code run in the database instance, in the I/O path.
Oracle Confidential 1
Disk Data Structures Design Overview ASM, Oracle 10g
Each database instance using ASM has two new background process called ASMB and RBAL.
The ASMB background process runs in a database instance and connects to a foreground
process in an ASM Instance. Over this connection, periodic messages are exchanged to update
statistics and to verify that both instances are healthy. All extent maps describing open files are
sent to the database instance via ASMB. If an extent of an open file is relocated or the status of a
disk is changed, messages are received by the ASMB process in the affected database instances.
During operations which require ASM intervention, such as a file creation by a database
foreground, the database foreground connects directly to the ASM Instance to perform the
operation. Each database instance maintains a pool of connections to its ASM instance to avoid
the overhead of reconnecting for every file operation.
Like RAC, the ASM Instances themselves may be clustered, using the existing Distributed Lock
Manager (DLM) infrastructure. There will be one ASM Instance per node on a cluster. As with
existing RAC configurations, ASM requires that the Operating System make the disks globally
visible to all of the ASM Instances, irrespective of node. Database instances only communicate
with ASM Instances on the same node, so that the independent failure of ASM or Oracle is
immediately detected by OS means. If there are several database instances for different
databases on the same node, they share the same single ASM Instance on that node.
If the ASM instance on one node fails, all the database instance connected to it will also fail. As
with RAC the ASM and database instances on other nodes recover the dead instances and
continue operations. This is similar to what happens when a file system or volume manager in
the OS fails. However an ASM instance failure does not require rebooting the OS to bring it back
up.
The design allows for multiple ASM Instances on the same node in separate lock name spaces.
This arrangement may work better with existing OS security mechanisms if the database
instances need to be partitioned from each other for security reasons. However this
configuration is not supported by any of the user interface tools such as the installer, DBCA, or
Enterprise Manager. It is not a recommended configuration.
A Disk Group can contain files for many different Oracle databases. Thus multiple database
instances serving different databases can access the same Disk Group even on a single system
without RAC. It is also possible to configure the ASM instances into a RAC cluster to allow
multiple non-RAC databases running on different hosts to share storage.
Cluster Synchronization Services (CSS) is used to ensure that each Disk Group is being modified
under at most one lock name space. There is a resource named after each Disk Group that
associates it with the name of the lock name space currently mounting that Disk Group. ASM
Instances must contact the Node Monitor at the time Disk Groups are mounted to ensure that
they are using a consistent lock name space.
Group Services is used to register the connection information needed by the database instances
to find ASM Instances. When a ASM Instance mounts a Disk Group, it registers the Disk Group
and connect string with Group Services. The database instance knows the name of the Disk
Group, and can therefore use it to lookup connect information for the correct ASM Instance.
This section gives a high level description of the most important data structures that ASM
maintains on persistent storage.
2 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures
Oracle Confidential 3
Disk Data Structures Design Overview ASM, Oracle 10g
Disk with one entry for every Allocation Unit on the ASM Disk. Extent pointers for files in a
Disk Group give the ASM Disk number and Allocation Unit number where the extent resides.
An Allocation Unit is small enough that a file will contain many of them so that it can be spread
across many disks, and a disk will contain many of them so that it can be shared by many files.
An Allocation Unit is large enough that accessing an AU in one I/O operation will give very
good through put. The time to access an Allocation Unit will be dominated by the transfer rate
of the disk rather than the time to seek to the beginning of the AU. Rebalancing of a disk group
is done one AU at a time.
Unless overridden by an administrator value, ASM will choose the AU as 1 MB. The
administrator value is set via the hidden initialization parameter “_OSM_AUSIZE“.
4 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures
Oracle executables and ASCII files, such as alert logs and trace files, cannot be in an ASM disk
group.
Oracle Confidential 5
Disk Data Structures Design Overview ASM, Oracle 10g
The file blocks are then striped across each group of eight virtual extents in stripes of 128K. Thus
with an 8K block size blocks 0 - 15 are on the virtual extent 0, blocks 16 - 31 are on virtual extent
1,..., blocks 112 - 127 are on virtual extent 7, and blocks 128 - 143 follow blocks 0-15 on virtual
extent 0. Blocks 1024 - 2047 are similarly striped across virtual extents 8 -15.
6 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures
1Because ASM stripes files onto all disks, each disk will contain a mixture of primary and
secondary extents. Hence, all disks will still contribute appropriately to the workload even
though primaries are preferentially read. There is no performance benefit in attempting to
divide the read workload over all of the disks with a copy of the extent as is done in traditional
inflexible mirroring schemes.
Oracle Confidential 7
Disk Data Structures Design Overview ASM, Oracle 10g
redundancy information includes the type and logical address of the block. A checksum is also
included to catch fractured blocks. All metadata blocks are 4k in size.
Each entry in the file directory is a single metadata block containing the description of the file
and pointers to its data or Indirect Extents. Indirect Extents are a sequence of metadata blocks
each of which points to one or more Data Extents. The allocation table is a sequence of metadata
blocks each of which describes the current usage of some number of Allocation Units.
A metadata block is the basic unit of caching for access to the metadata. Large sequential reads
of metadata may be done in some circumstances.
8 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures
■ Disk size
Up to three disks in a disk group contain a copy of the root extent. The root extent contains the
beginning of the file directory. From the root extent, all extent maps for all files can be found.
The following information is replicated in the headers of disks containing one of the data
extents for the first virtual extent of the file directory.
■ AU within this disk containing a copy of the root extent.
■ Disk header redo for metadata relocation
Oracle Confidential 9
Disk Data Structures Design Overview ASM, Oracle 10g
Most files will only need one Indirect Extent, but files over 120 gigabytes can use multiple
indirect extent pointers in the directory entry to point to as many as 100 Virtual Indirect Extents.
There is but one level of indirection allowed; An extent pointer in an indirect extent will always
point to a data extent and never to another Indirect Extent.
10 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures
each portion contains the checkpoint record. The checkpoint record is written every 3 seconds,
and it describes where recovery needs to start reading redo if this instance crashes. The rest of
the instance’s ACD portion is circularly written with redo describing all metadata block
changes. This portion of a disk group is similar to the online logs for a database.
Oracle Confidential 11
Rebalance Design Overview ASM, Oracle 10g
4 Rebalance
When one or more disks are added, dropped, or resized then the disk group is rebalanced to
ensure even use of all storage. Rebalancing does not relocate data based on I/O statistics nor is
it started as a result of statistics. It is completely driven by the size of the disks in the disk group.
It is automatically started when the storage configuration changes. There is also a command to
do it manually, but there should be no need to use the command.
The following sections give a rough description of the steps taken when rebalancing a disk
group.
4.1 Repartner
When a disk is added it needs partners so that it can contain primary extents with mirror copies
on its partners. Since ideally every disk will already have the maximum number of partners,
giving a new disk partners usually requires breaking some existing partnerships. When
dropping a disk, its existing partnerships will have to be broken. This leaves disks with less than
the ideal number of partners. So the first phase of rebalance is to calculate a new set of
partnerships. The partnerships are chosen to minimize the amount of data that will be relocated
to break existing partnerships. This is one of the reasons it is better to add or drop multiple
disks at the same time.
12 Oracle Confidential
Design Overview ASM, Oracle 10g Rebalance
4.5 Restart
There can only be one rebalance at a time. If anything interrupts an ongoing rebalance then it is
automatically restarted. For example if a node fails then the rebalance will be restarted by
recovery where it left off. The same will happen if the administrator manually changes the
power setting. If there is another storage reconfiguration, then the entire rebalance is restarted
from the beginning.
Oracle Confidential 13