Sunteți pe pagina 1din 13

Design Overview ASM, Oracle 10g Introduction

Design Overview of
Automatic Storage Managment,
Oracle 10g

1 Introduction

Automatic Storage Managment is a file system and volume manager built into the database
kernel that allows the practical management of thousands of disk drives with 24x7 availability.
It provides management across multiple nodes of a cluster for Oracle Real Application Clusters
(RAC) support as well as single SMP machines. It automatically does load balancing in parallel
across all available disk drives to prevent hot spots and maximize performance, even with
rapidly changing data usage patterns. It prevents fragmentation so that there is never a need to
relocate data to reclaim space. It does automatic online disk space reorganization for the
incremental addition or removal of storage capacity. It can maintain redundant copies of data to
provide fault tolerance, or it can be built on top of vendor supplied reliable storage mechanisms.
Data management is done by selecting the desired reliability and performance characteristics for
classes of data rather than with human interaction on a per file basis.
ASM solves many of the practical management problems of large Oracle databases. As the size
of a database server increases towards thousands of disk drives, or tens of nodes in a cluster, the
traditional techniques for management stop working. They do not scale efficiently, they become
too prone to human error, and they require independent effort on every node of a cluster. Other
tasks, such as manual load balancing, become so complex as to prohibit their application. These
problems must be solved for the reliable management of databases in the tens or hundreds of
terabytes. Oracle is uniquely positioned to solve these problems as a result of our existing Real
Application Cluster technology. Oracle’s control of the solution ensures it is reliable and
integrated with Oracle products.
This document is intended to give some insight into the internal workings of ASM. It is not a
detailed design document. It should be useful for people that need to support ASM.

2 Instances

Automatic Storage Managment is part of the database kernel. It is linked into “/bin/oracle” so
that its code may be executed by all database processes.
One portion of the ASM code allows for the start-up of a special instance called a ASM Instance.
ASM Instances do not mount databases, but instead manage the metadata needed to make ASM
files available to ordinary database instances. Both ASM Instances and database instances have
access to some common set of disks. ASM Instances manage the metadata describing the layout
of the ASM files. Database instances access the contents of ASM files directly, communicating
with a ASM Instance only to get information about the layout of these files. This requires that a
second portion of the ASM code run in the database instance, in the I/O path.

Oracle Confidential 1
Disk Data Structures Design Overview ASM, Oracle 10g

Each database instance using ASM has two new background process called ASMB and RBAL.
The ASMB background process runs in a database instance and connects to a foreground
process in an ASM Instance. Over this connection, periodic messages are exchanged to update
statistics and to verify that both instances are healthy. All extent maps describing open files are
sent to the database instance via ASMB. If an extent of an open file is relocated or the status of a
disk is changed, messages are received by the ASMB process in the affected database instances.
During operations which require ASM intervention, such as a file creation by a database
foreground, the database foreground connects directly to the ASM Instance to perform the
operation. Each database instance maintains a pool of connections to its ASM instance to avoid
the overhead of reconnecting for every file operation.
Like RAC, the ASM Instances themselves may be clustered, using the existing Distributed Lock
Manager (DLM) infrastructure. There will be one ASM Instance per node on a cluster. As with
existing RAC configurations, ASM requires that the Operating System make the disks globally
visible to all of the ASM Instances, irrespective of node. Database instances only communicate
with ASM Instances on the same node, so that the independent failure of ASM or Oracle is
immediately detected by OS means. If there are several database instances for different
databases on the same node, they share the same single ASM Instance on that node.
If the ASM instance on one node fails, all the database instance connected to it will also fail. As
with RAC the ASM and database instances on other nodes recover the dead instances and
continue operations. This is similar to what happens when a file system or volume manager in
the OS fails. However an ASM instance failure does not require rebooting the OS to bring it back
up.
The design allows for multiple ASM Instances on the same node in separate lock name spaces.
This arrangement may work better with existing OS security mechanisms if the database
instances need to be partitioned from each other for security reasons. However this
configuration is not supported by any of the user interface tools such as the installer, DBCA, or
Enterprise Manager. It is not a recommended configuration.
A Disk Group can contain files for many different Oracle databases. Thus multiple database
instances serving different databases can access the same Disk Group even on a single system
without RAC. It is also possible to configure the ASM instances into a RAC cluster to allow
multiple non-RAC databases running on different hosts to share storage.
Cluster Synchronization Services (CSS) is used to ensure that each Disk Group is being modified
under at most one lock name space. There is a resource named after each Disk Group that
associates it with the name of the lock name space currently mounting that Disk Group. ASM
Instances must contact the Node Monitor at the time Disk Groups are mounted to ensure that
they are using a consistent lock name space.
Group Services is used to register the connection information needed by the database instances
to find ASM Instances. When a ASM Instance mounts a Disk Group, it registers the Disk Group
and connect string with Group Services. The database instance knows the name of the Disk
Group, and can therefore use it to lookup connect information for the correct ASM Instance.

3 Disk Data Structures

This section gives a high level description of the most important data structures that ASM
maintains on persistent storage.

2 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures

3.1 Disk Group


The Disk Group is the highest level data structure maintained by ASM. Each Disk Group is self
describing, containing its own file directory, disk directory, and other metadata. Any ASM file is
completely contained within a single Disk Group.
I/O load is evenly distributed over all the disks in a single Disk Group by allocating a
proportional number of extents on each disk according to its size. This ensures that all disks are
full when a disk group runs out of space. This presumes that all disks in a disk group have the
same I/O density (Megabytes per second of transfer rate per Gigabyte of capacity). Thus disks
in a disk group should be of similar performance characteristics.
ASM must scale to supports at 63 simultaneously mounted disk groups.

3.2 ASM Disk


AnASM Disk must be available for direct disk I/O from all database instances that use its Disk
Group. An ASM Disk should be a physical disk or a virtual disk that does not share resources
with any other disks in the same disk group. Since ASM load balances between all the disks in a
disk group, it makes no sense for two of the disks to actually be different areas on the same
physical disk.
Note that the name of a ASM Disk is not the same as the name the OS uses to access the disk.
The ASM Name is provided by the administrator when the ASM Disk is added to the Disk
Group. In a cluster, the same disk may be accessed by different OS names on different nodes.
When a Disk Group is mounted by a ASM Instance, a set of OS names is examined for the
purpose of finding all the ASM Disks. Each OS name is opened and its ASM header is read to
determine which ASM Disk it is. The mapping of a ASM Disk to a OS name is kept in memory
of every database instance. The OS names are not stored on the ASM Disk since they can change
without warning.
There is no fundamental reason to limit a ASM Disk to 232 physical blocks. However this is a
common restriction in operating system interfaces, and it is a limitation in the OSDs for phase
one development. However if accessing a disk through osmlib there are no size restrictions.
ASM supports up to 10,000 disks at a time.

3.3 Failure Group


ASM disks within a disk group are partitioned into failure groups. Two disks are in the same
failure group if they share a common resource whose failure must be tolerated.
Because Failure Group configuration depends on the site’s configuration and the types of
failures that a particular installation is willing to accept, ASM provides an option for
administrators to manually specify Failure Groups. If the Failure Group specification is omitted,
ASM automatically places each disk into its own Failure Group.
ASM does not allocate redundant copies of data on disks that are in the same Failure Group.
The failure of an entire Failure Group is almost always a Temporary Failure.
Disk Groups with external redundancy do not use Failure Groups. Normal redundancy Disk
Groups require at least two Failure Groups. High Redundancy Disk Groups require at least
three Failure Groups.

3.4 Allocation Unit


The Allocation Unit (AU) is the fundamental unit of allocation within a disk group. The usable
space in a ASM Disk is a multiple of this size. There is a table at the beginning of every ASM

Oracle Confidential 3
Disk Data Structures Design Overview ASM, Oracle 10g

Disk with one entry for every Allocation Unit on the ASM Disk. Extent pointers for files in a
Disk Group give the ASM Disk number and Allocation Unit number where the extent resides.
An Allocation Unit is small enough that a file will contain many of them so that it can be spread
across many disks, and a disk will contain many of them so that it can be shared by many files.
An Allocation Unit is large enough that accessing an AU in one I/O operation will give very
good through put. The time to access an Allocation Unit will be dominated by the transfer rate
of the disk rather than the time to seek to the beginning of the AU. Rebalancing of a disk group
is done one AU at a time.
Unless overridden by an administrator value, ASM will choose the AU as 1 MB. The
administrator value is set via the hidden initialization parameter “_OSM_AUSIZE“.

3.5 ASM Files


ASM Files may be created, destroyed, resized, read, and written. An ASM File is allocated
within a single Disk Group, and is spread over many or all of the ASM Disks in the Disk Group.
Automatic Storage Managment can move parts of a file while it is in active use.
The higher layers of the Oracle kernel communicate with ASM in terms of files. This is identical
to the way Oracle uses any file system or logical volume manager. ASM presents datafiles,
logfiles, controlfiles, archive logs, dumpsets,... to the upper layers of the kernel just as any file
system would.
An ASM file name starts with a "+" and the disk group name. When the file I/O layer in the
Oracle kernel sees a file name starting with "+" it routes the request to the ASM code rather than
calling the OSD’s to access the file. Thus the upper layers of the kernel are unaware that the file
is an ASM file rather than an OS file. ASM has no effect on concepts such as rowid’s or segments
since it is simply implementing datafiles for tablespaces to use.
ASM supports one million files in a disk group. ASM supports file sizes of at least 100 TB.

3.5.1 File Type


Only files with a known oracle file type are allowed in an ASM disk group. If a file is copied into
an ASM disk group via FTP, the first block of the file is examined to determine its file type and
other information needed to construct the full file name. If the header is not recognized, the file
creation will get an error. Only the following types of files can exist in an ASM disk group.
■ Control File
■ Datafile
■ Online Redo Log
■ Archive Log
■ Temporary data file
■ RMAN backup piece
■ Datafile Copy
■ SPFILE
■ Disaster Recovery Configuration
■ Flashback Log
■ Change Tracking Bitmap
■ DataPump Dumpset

4 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures

Oracle executables and ASCII files, such as alert logs and trace files, cannot be in an ASM disk
group.

3.5.2 File Blocks


All the file types supported by ASM are read and written in file blocks. The size of a block is set
by the upper layer in the kernel when the file is created. The block size is always a power of two
so that an integral number of them will fit in one Allocation Unit.

3.5.3 File Virtual Address Space


One of the primary functions of ASM is to map the blocks in a file to sectors in a LUN. The
sequence of blocks 0,1,2,... form the file virtual address space. This mapping can be rather
complicated. Here we describe the mapping starting at the bottom and working up toward the
file virtual address space.

3.5.3.1 Data Extents


The data extents are the raw storage used to hold the contents of a file. Each data extent is a
single Allocation Unit on a specific disk. The extent map for a file is a list of data extents giving
the disk and AU for each data extent. A data extent can be stale if it missed an update because
its disk is offline. In some cases a data extent may be missing because there is no failure group
available to allocate it.
In a future release of ASM, data extents of 4, 16, or 64 allocation units will be supported for
larger files.

3.5.3.2 Virtual Data Extents


A virtual data extent is a set of data extents that have the same contents. Mirroring is done at the
virtual extent level. Each virtual extent provides one extent of address space for file blocks. A
write to a file block is written to every online data extent in the virtual extent. A read of a file
block is sent to the primary extent of the extent set unless that disk is offline. For a file with no
redundancy (external redundancy disk group), every virtual extent is a single data extent.
For a file with normal mirroring, every virtual extent is composed of two data extents located on
two different disks in two different failure groups. The two data extents form an extent set and
are next to each other in the extent map. The primary extent is the even numbered data extent
and is followed by the odd numbered secondary extent. The extents in an extent set have to be
carefully allocated to insure the data is safe from a hardware failure. When an extent is relocated
it is usually necessary to relocate all the extents in the extent set to maintain the redundancy
requirements.
A high redundancy disk group uses triple mirroring by default. Virtual extents in a triple
mirrored file have one primary and two secondary data extents. The secondary extents are
allocated in different failure groups from each other so it requires at least three failure groups to
implement high redundancy.

3.5.3.3 Striped blocks


For a file with coarse grained striping the file virtual address space is simply a concatenation of
all the virtual data extents. The primary data extents are allocated so that two extents for the
same file on the same disk are as far apart in the file address space as is reasonable. This spreads
the virtual data extents across all the disks in the disk group. It has a similar effect to one
megabyte striping in a traditional volume manager.
A file with fine grain striping has its virtual data extents allocated identically to a file with
coarse grained striping. It still has one AU virtual extents. However the file blocks are not laid
out linearly on each virtual extent. The file is always grown in multiples of eight virtual extents.

Oracle Confidential 5
Disk Data Structures Design Overview ASM, Oracle 10g

The file blocks are then striped across each group of eight virtual extents in stripes of 128K. Thus
with an 8K block size blocks 0 - 15 are on the virtual extent 0, blocks 16 - 31 are on virtual extent
1,..., blocks 112 - 127 are on virtual extent 7, and blocks 128 - 143 follow blocks 0-15 on virtual
extent 0. Blocks 1024 - 2047 are similarly striped across virtual extents 8 -15.

3.6 Disk Partners


A Disk Partnership is a symmetric relationship between two disks in a high or normal
redundancy Disk Group. Disks in a Disk Group are partnered with a small number of other
disks in the same group. ASM automatically creates and maintains these relationships. Mirrored
copies of data are only allocated on disks which are partners of the disk containing the primary
data extent.
Disk partnering is used to reduce the chance of a double disk failure leading to a data loss. In a
ASM configuration with thousands of disks, if mirroring were performed by randomly picking
secondary disks for mirrored copies, two drives failing would have a significant chance for data
loss. The reason is that there could be data with both its primary and mirrored data copies on
the two failing disks. Without disk partnering, chances of data loss with a two-disk failure
increase with the number of disks in a Disk Group. The disk partnering strategy limits the
number of disks protecting another disk's data copy. ASM limits the number of disk partners to
eight for any single disk. The smaller the number, the more resilient the system is to double disk
failures.
ASM selects partners for a disk from Failure Groups other than the Failure Group to which the
disk belongs, but an ASM Disk may have multiple partners that are in the same Failure Group.
Partners are chosen to be in as many different Failure Groups as possible. This ensures that a
disk with a copy of the lost disk’s data will be available following the failure of the shared
resource associated with the Failure Group.
If a ASM Disk fails, its protected extents can be rebuilt from the ASM Disk’s partners. By having
multiple partners the extra I/O load for the rebuild is spread over multiple ASM Disks. This
reduces the mean time to repair the failure, since a higher I/O rate can be used to reconstruct
the lost data. Partners are chosen to be in as many different Failure Groups as possible. This
evenly distributes the load of rebuilding a lost disk over as many different hardware resources
as possible. This presumes that it is unlikely that two entire failure groups will fail at the same
time.
Partnering is not partitioning. Partnering is a symmetric relationship. If disk A lists disk B as its
partner then disk B will also list disk A as a partner. However, partnering is not a transitive
relationship. If disk A and B are partners, and disk B and C are partners, it is not always the case
that A and C are partners. In fact, adding the transitive property to the partnering relationship
results in partitioning. Thus, partitioning can always be expressed as partnering, but it is only a
subset of the possibilities that partnering can offer. The illustration shows (seen left to right) the

6 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures

difference between partitioning, partitioning expressed as partnering, general partnering, and


partnering with varying numbers of disk per failure group:

Partnering is superior to partitioning because it permits uniform load balancing even on


configurations which have irregular geometric arrangements of disks. It removes many
restrictions on how additional capacity must be arranged in order to be added to an existing
system. However, not all configurations will work well. This is a property of the configuration,
and not ASM. ASM will work as well as possible for any given configuration.

3.7 ASM Mirroring Protection


ASM Mirroring Protection protects against the loss of data following the loss of a disk.
The ASM mirroring policy is a property of a file, and is the same for all virtual extents in that
file. This property is set at file creation time and cannot be subsequently changed. Typically,
each virtual extent is stored in two extents, but for very critical data it is also possible to do
triple mirroring where more than three copies are kept for each virtual extent in a file
ASM mirroring is more flexible than OS mirrored disks since it allows the redundancy to be
specified on a per file basis. Thus two files can share the same disk with one file being mirrored
while the other is not.
A mirrored virtual extent requires two or more extent pointers be maintained for every Data
Extent. The Indirect Extents of a mirrored file keep the multiple pointers for every logical extent
of data together in the same block of metadata. This ensures they are always cached together
and simplifies locking the extent if it needs to be relocated.
ASM distinguishes between a primary and the secondary copies of a mirrored extent. ASM will
write updates of an extent to all copies at the same time. ASM will attempt to read the primary
copy first, and will only read any of the secondaries if the primary is unavailable1
Striping can be done on top of mirroring to achieve both throughput and reliability. See the
sections on striping for details.
OSM will protect meta data using triple mirroring even in an external redundancy disk group.

3.8 Metadata Blocks


All of the metadata is divided into blocks that have redundancy information to verify their
contents. This metadata includes all the directories, Indirect Extents, and allocation tables. The

1Because ASM stripes files onto all disks, each disk will contain a mixture of primary and
secondary extents. Hence, all disks will still contribute appropriately to the workload even
though primaries are preferentially read. There is no performance benefit in attempting to
divide the read workload over all of the disks with a copy of the extent as is done in traditional
inflexible mirroring schemes.

Oracle Confidential 7
Disk Data Structures Design Overview ASM, Oracle 10g

redundancy information includes the type and logical address of the block. A checksum is also
included to catch fractured blocks. All metadata blocks are 4k in size.
Each entry in the file directory is a single metadata block containing the description of the file
and pointers to its data or Indirect Extents. Indirect Extents are a sequence of metadata blocks
each of which points to one or more Data Extents. The allocation table is a sequence of metadata
blocks each of which describes the current usage of some number of Allocation Units.
A metadata block is the basic unit of caching for access to the metadata. Large sequential reads
of metadata may be done in some circumstances.

3.9 Physically Addressed Metadata


Each disk contains metadata about itself. If the disk fails, this metadata becomes unnecessary. It
is called physically addressed because when it is in the cache of an ASM instance it is tagged
with a disk number and block within disk. The physically addressed metadata is actually spread
across the disk so that it grows as the disk grows. With the normal AU size of one megabyte and
metadata block size of 4K, there is one AU of physically addressed metadata every 113,792 Au’s.
To keep the block numbers within a ub4, block number is relative to the physical metadata
blocks rather than all the blocks on the disk. Thus block number 256 is the first block of the
second physically addressed metadata AU, and is 113,792 megabytes into the disk.

3.9.1 Disk Header


Block zero of a disk contains the disk header. Note that this is block zero as presented to ASM.
Many operating systems reserve the first block of a LUN to hold a partition table and other OS
information about the disk. ASM must not be given access to this block if it exists. On some
platforms the platform specific Oracle code will skip over the operating system block, while on
others the administrator must give ASM a disk partition that does not include the partition
table.
The disk header describes both the disk and attributes of its disk group. By looking at all
available disk headers ASM can discover all the disks that it is managing and their disk groups.
Bytes 32-39 of every disk header being used by ASM contain the 8 characters "ORCLDISK". This
is useful for identifying ASM disks. The 24 following bytes are preserved when ASM formats
the header of a disk being added to a disk group.
The following information about the disk group is replicated in the header of every disk in the
disk group.
■ Disk group name and creation timestamp
■ Physical sector size of all disks in disk group
■ Allocation unit size
■ Metadata block size
■ Software version compatibility
■ Default redundancy
The header also contains the following information about this particular disk.
■ Disk name (ASM name not OS name)
■ Disk number within disk group
■ Failure group name

8 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures

■ Disk size
Up to three disks in a disk group contain a copy of the root extent. The root extent contains the
beginning of the file directory. From the root extent, all extent maps for all files can be found.
The following information is replicated in the headers of disks containing one of the data
extents for the first virtual extent of the file directory.
■ AU within this disk containing a copy of the root extent.
■ Disk header redo for metadata relocation

3.9.2 Free Space Table


Block 1 of every physically addressed Allocation Unit contains a free space table. It contains
approximate information about the amount of freespace available in each block of the allocation
table in the AU. This is used to avoid looking for free space in an allocation table block that is
completely allocated.

3.9.3 Allocation Table


The last 254 metadata blocks in every physically addressed AU is used to keep per AU
allocation information. Each metadata block describes the state of 448 AU’s. If an AU is
allocated to a file then the allocation table entry contains the number of the file and the data
extent number. Entries for free AU’s are linked into a free list.

3.10 Partner and Status Table


Allocation Unit 1 (right after first allocation table) is reserved for a copy of the Partner and
Status Table (PST). Up to five disks in a disk group will contain a copy of the PST. A majority of
the PST’s must be found with identical contents to obtain a valid version of the table. This is
necessary to determine which of the discovered disks actually have current data.
There is one entry in the PST for every disk in the disk group. The entry enumerates the disks
that are partners of the disk owning the entry. It also has flags to indicate if the disk is online for
reads and online for writes. This information is needed before recovery can proceed. Thus the
PST is not updated via redo.

3.11 Virtually Addressed Metadata


Most of the metadata for an ASM disk group is kept in ASM files. File numbers below 256 are
reserved for metadata files. They are often referred to as directories rather than files. A metadata
file is allocated like any other file except that it is always triple mirrored even in an external
redundancy disk group. Metadata is relocated when rebalancing just like any other data. The
blocks are addressed by their file number and block within file. Thus the location is virtual since
it may change as a result of rebalancing.

3.11.1 Indirect Extent


Indirect Extents provide the additional space for extent map storage required for files with more
than 60 data extents. If indirect extents are required, the later Extent Map entries within the
directory entry point to the Indirect Extents. Each Indirect Extent contains a portion of the
Extent Map for the file, which in turn has pointers to the Data Extents of the file.
Each indirect extent is triple mirrored, as is all virtual metadata. The indirect extent is composed
of metadata blocks addressed by file number and block within indirect pointer blocks. Thus
every file has two virtual address spaces - indirect pointer blocks and file data blocks.

Oracle Confidential 9
Disk Data Structures Design Overview ASM, Oracle 10g

Most files will only need one Indirect Extent, but files over 120 gigabytes can use multiple
indirect extent pointers in the directory entry to point to as many as 100 Virtual Indirect Extents.
There is but one level of indirection allowed; An extent pointer in an indirect extent will always
point to a data extent and never to another Indirect Extent.

3.11.2 File Directory (file #1)


The file directory contains a one block entry for every file in the Disk Group. Thus file X is
described by block X in the file directory. Since the file directory is file 1, block 1 of the file
directory describes the file directory itself. Block zero of the file directory is used to keep a
freelist of available directory entries. The file directory grows as needed to create more file
numbers.
Each directory block keeps the following information about its file:
■ File block size in bytes
■ File size in bytes (which is always a multiple of the file block size)
■ Oracle file type (datafile, online log, archived log, controlfile,...)
■ File redundancy (external, mirrored, triple mirrored)
■ Striping configuration (coarse vs. fine grained)
■ Direct extent pointers to the first 60 data extents: This covers between 60 and 20 virtual data
extents depending on the file’s redundancy.
■ 300 indirect extent pointers: Since indirect extents are triple mirrored this supports 100
virtual indirect extents.
■ Creation timestamp
■ Last update timestamp
■ Pointer to user alias and filename in alias directory
The entry points to the first 60 Data Extents and the Indirect extents if necessary. It contains
information about the file such as its size and how it is allocated. The Disk Group, file block size,
extent size and mirroring factor are given for both Indirect and Data Extents.
The file directory is itself a file. The first entry in the directory is the file directory itself. A file’s
number is an index into the file directory to the entry for that file. To detect stale file numbers a
file is also identified by a 32 bit incarnation number from the time of its creation. Thus Disk
Group id, file number, and incarnation uniquely identifies a particular file.
Note that the first block in an ASM file is addressed as block zero. Block zero usually contains
port specific information. This ensures that a byte for byte copy of an ASM file can be copied to
an operating system file and used as the ASM file was used.

3.11.3 ASM Disk Directory (file #2)


The ASM Disk directory contains information about every ASM Disk known to the Disk Group.
Most of the information is replicated in the header of the disk. However the directory contains
status information that may not be on the disk. If the disk fails, its failure is recorded in the
directory and PST, but cannot be recorded on the disk itself.

3.11.4 Active Change Directory (file #3)


When it is necessary to make an atomic change to one or more metadata blocks, a log record is
written into the Active Change Directory (ACD). This log record is written in a single IO. Each
OSM instance is assigned a 42 AU portion of the ACD for its redo generation. The first block of

10 Oracle Confidential
Design Overview ASM, Oracle 10g Disk Data Structures

each portion contains the checkpoint record. The checkpoint record is written every 3 seconds,
and it describes where recovery needs to start reading redo if this instance crashes. The rest of
the instance’s ACD portion is circularly written with redo describing all metadata block
changes. This portion of a disk group is similar to the online logs for a database.

3.11.5 Continuing Operations Directory (file #4)


Some long-running operations cannot be described by a single record in the ACD. In these
cases, an entry is allocated in the Continuing Operations Directory (COD) to track the operation
and ensure it completes. If a process dies without marking the change entry as complete, then a
recovery process will look at the entry, and complete or rollback the operation.
There are two kinds of continuing operations: rollback and background. A background
operation is preformed by an OSM instance background process. It is done as part of disk group
maintenance not an operation for a specific request. A background operation continues until it is
complete or the instance dies. If the instance dies, then the recovering instance needs to resume
the background operation. Rebalancing the disk group is the best example of a background
operation.
A rollback operation is similar to a database transaction. It is started at the request of a OSM
foreground process. During the operation the disk group is in an inconsistent state. The
operation needs to either complete or rollback all its changes to the disk group. The foreground
is usually performing the operation on behalf of a database instance. If the database dies, the
OSM foreground dies, or an unrecoverable error occurs then the operation must be terminated.
Creating a file is a good example of a rollback operation. If an error occurs while allocating the
space for the file, then the partially created file must be deleted. If the database instance does not
commit the creation, the file must be automatically deleted. If the OSM instance dies then this
must be done by the recovering instance.

3.11.6 Template Directory (file #5)


The template directory provides named groups of attributes which may be applied during the
creation of new OSM files. Each template directory block is composed of an array of template
directory entries.
Since the number of templates is small, and they are only used at file creation time, the directory
is maintained as a densely packed unordered array of these records. Resolving a template name
requires a full scan of the directory.
When a diskgroup is created, there are a number of “System” templates that are automatically
created. Each system template corresponds to an Oracle file type. System templates define the
default attributes for each Oracle file type. These may be modified to suit the needs of a
particular site.
In addition to system templates, there are user-created templates. This permits further
customization of the types of files which can be created.

3.11.7 Alias Directory (file #6)


The alias directory provides a hierarchical naming system for all the files in a disk group. A
system file name is created for every file based on the file type, database instance, and type
specific information such as tablespace name. A user alias may also be created if a full path
name was given by the user when the file was created.
Multiple alias directory blocks are linked together to create a directory. Entries in a directory
may be for a file or for a subdirectory.

Oracle Confidential 11
Rebalance Design Overview ASM, Oracle 10g

4 Rebalance

When one or more disks are added, dropped, or resized then the disk group is rebalanced to
ensure even use of all storage. Rebalancing does not relocate data based on I/O statistics nor is
it started as a result of statistics. It is completely driven by the size of the disks in the disk group.
It is automatically started when the storage configuration changes. There is also a command to
do it manually, but there should be no need to use the command.
The following sections give a rough description of the steps taken when rebalancing a disk
group.

4.1 Repartner
When a disk is added it needs partners so that it can contain primary extents with mirror copies
on its partners. Since ideally every disk will already have the maximum number of partners,
giving a new disk partners usually requires breaking some existing partnerships. When
dropping a disk, its existing partnerships will have to be broken. This leaves disks with less than
the ideal number of partners. So the first phase of rebalance is to calculate a new set of
partnerships. The partnerships are chosen to minimize the amount of data that will be relocated
to break existing partnerships. This is one of the reasons it is better to add or drop multiple
disks at the same time.

4.2 Calculate weights


The goal of rebalancing is to have every disk in a disk group be the same percent allocated. Thus
a larger disk needs to get more of each file. A weight is calculated for each disk to determine
how much it should be given. This can be affected by both the size of the disk and its
partnerships.

4.3 Scan files


Rebalancing is done on a file by file basis starting at file one and proceeding to the last file. The
extent map of the file is scanned to see how well it is balanced. When a primary extent is
encountered which puts the file too far out of balance, then that extent set is relocated to another
disk that will bring it back into balance. This usually results in relocating to a recently added
disk. It may also be necessary to do a relocation to enforce the new partnerships.

4.4 Extent relocation


Relocating an extent requires coordination with any I/O to that extent. If the file is not open
then there is no problem. If the file is open then a message is sent to every database instance that
has the file open. Before relocating a message to lock the extent is sent. Any new writes to the
extent will be held, but relocation can proceed. The extent is read from the old location and
written to the new location in one megabyte I/O’s. After relocation an unlock message is sent.
Any held writes are allowed to proceed. Writes started before the relocation began are reissued.
There may be multiple slave processes doing relocations at the same time. The power setting
controls the number of slaves.

12 Oracle Confidential
Design Overview ASM, Oracle 10g Rebalance

4.5 Restart
There can only be one rebalance at a time. If anything interrupts an ongoing rebalance then it is
automatically restarted. For example if a node fails then the rebalance will be restarted by
recovery where it left off. The same will happen if the administrator manually changes the
power setting. If there is another storage reconfiguration, then the entire rebalance is restarted
from the beginning.

Oracle Confidential 13

S-ar putea să vă placă și