Documente Academic
Documente Profesional
Documente Cultură
In the ZFS file sytems, storage devices are grouped into pools, called zpools. These pools provide
all of the storage allocations that are used by the file systems and volumes that will be allocated
from the pool. Let's begin by creating a simple zpool, called datapool.
# zpool create datapool raidz disk1 disk2 disk3 disk4
That's all there is to it. We can use zpool status to see what our first pool looks like.
# zpool status datapool
pool: datapool
state: ONLINE
scan: none requested
config:
What we can see from this output that our new pool called datapool has a single ZFS virtual device
(vdev) called raidz1-0. That vdev is comprised of our four disk files that we created in the previous
step.
This type of vdev provides single device parity protection, meaning that if one device develops an
error, no data is lost because it can be reconstructed using the remaining disk devices. This
organization is commonly called a 3+1, 3 data disks plus one parity.
ZFS provides additional types of availability: raidz2 (2 device protection), raidz3 (3 device
protection), mirroring and none. We will look at some of these in later exercises.
Before continuing, let's take a look at the currently mounted file systems.
NOTE: In Oracle Solaris 11 zfs list command shows how much space ZFS filesystems
consume. In case you need to see how much space is available on non-ZFS filesystem, such as
mounted over the network via NFS or another protocol, traditional df(1) command exists in
Oracle Solaris 11 and can be used. System administrators familiar with df(1) can continue to use
df(1), while using zfs list for ZFS filesystems is encouraged.
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
datapool 97.2K 1.41G 44.9K /datapool
rpool 7.91G 54.6G 39K /rpool
rpool/ROOT 6.36G 54.6G 31K legacy
rpool/ROOT/solaris 6.36G 54.6G 5.80G /
rpool/ROOT/solaris/var 467M 54.6G 226M /var
rpool/dump 516M 54.6G 500M -
rpool/export 6.52M 54.6G 33K /export
rpool/export/home 6.42M 54.6G 6.39M /export/home
rpool/export/home/oracle 31K 54.6G 31K /export/home/oracle
rpool/export/ips 63.5K 54.6G 32K /export/ips
rpool/export/ips/example 31.5K 54.6G 31.5K /export/ips/example
rpool/swap 1.03G 54.6G 1.00G -
Notice that when we created the pool, ZFS also created the first file system and also mounted it.
The default mountpoint is derived by the name of the pool, but can be changed if necessary. With
ZFS there's no need to create a file system, make a directory to mount the file system. It is also
unnecessary to add entries to /etc/vfstab. All of this is done when the pool is created, making ZFS
much easier to use than traditional file systems.
Before looking at some other types of vdevs, let's destroy the datapool, and see what happens.
# zpool destroy datapool
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 7.91G 54.6G 39K /rpool
rpool/ROOT 6.36G 54.6G 31K legacy
rpool/ROOT/solaris 6.36G 54.6G 5.80G /
rpool/ROOT/solaris/var 467M 54.6G 226M /var
rpool/dump 516M 54.6G 500M -
rpool/export 6.52M 54.6G 33K /export
rpool/export/home 6.42M 54.6G 6.39M /export/home
rpool/export/home/oracle 31K 54.6G 31K /export/home/oracle
rpool/export/ips 63.5K 54.6G 32K /export/ips
rpool/export/ips/example 31.5K 54.6G 31.5K /export/ips/example
rpool/swap 1.03G 54.6G 1.00G -
All file systems in the pool have been unmounted and the pool has been destroyed. The devices in
the vdev have also been marked as free so they can be used again.
Let's now create a simple pool using a 2 way mirror instead of raidz.
# zpool create datapool mirror disk1 disk2
Now the vdev name has changed to mirror-0 to indicate that data redundancy is provided by
mirroring (redundant copies of the data) instead of parity as it was in our first example.
What happens if you try to use a disk device that is already being used by another pool? Let's take a
look.
# zpool create datapool2 mirror disk1 disk2
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/disk1 is part of active pool 'datapool'
The usage error indicates that /dev/dsk/disk1 has been identified as being part of an existing pool
called datapool. The -f flag to the zpool create command can override the failsafe in case datapool
is no longer being used, but use that option with caution.
Adding capacity to a pool
Since we have two additional disk devices (disk3 and disk4), let's see how easy it is to grow a ZFS
pool.
# zpool list datapool
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 492M 89.5K 492M 0% 1.00x ONLINE -
See that a second vdev (mirror-1) has been added to the pool. Let's look at the zpool listing now.
# zpool list datapool
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 984M 92.5K 984M 0% 1.00x ONLINE -
Notice that you don't have to grow file systems when the pool capacity increases. File systems can
use whatever space is available in the pool, subject to quota limitations, which we will see in a later
exercise.
Importing and exporting pools
ZFS zpools can also be exported, allowing all of the data and associated configuration information
to be moved from one system to another. For this example, let's use two of our SAS disks (c4t0d0
and c4t1d0).
# zpool create pool2 mirror c4t0d0 c4t1d0
As before, we have created a simple mirrored pool of two disks. In this case, the disk devices are
real disks, not files. In this case we've told ZFS to use the entire disk (no slice number was
included). If the disk was not labeled, ZFS will write a default label.
Now let's export pool2 so that another system can use it.
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 984M 92.5K 984M 0% 1.00x ONLINE -
pool2 7.94G 83.5K 7.94G 0% 1.00x ONLINE -
rpool 19.9G 6.18G 13.7G 31% 1.00x ONLINE -
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 984M 92.5K 984M 0% 1.00x ONLINE -
rpool 19.9G 6.19G 13.7G 31% 1.00x ONLINE -
Let's import the pool, demonstrating another easy to use feature of ZFS.
# zpool import pool2
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 984M 92.5K 984M 0% 1.00x ONLINE -
pool2 7.94G 132K 7.94G 0% 1.00x ONLINE -
rpool 19.9G 6.19G 13.7G 31% 1.00x ONLINE -
Notice that we didn't have to tell ZFS where the disks were located. All we told ZFS was the name
of the pool. ZFS looked through all of the available disk devices and reassembled the pool, even if
the device names had been changed.
What if we didn't know the name of the pool ? ZFS can help there too.
# zpool export pool2
# zpool import
pool: pool2
id: 12978869377007843914
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
pool2 ONLINE
mirror-0 ONLINE
c4t0d0 ONLINE
c4t1d0 ONLINE
Without an argument, ZFS will look at all of the disks attached to the system and will provide a list
of pool names that it can import. If it finds two pools of the same name, the unique identifier can be
used to select which pool you want imported.
Pool Properties
There are a lot of pool properties that you might want to customize for your environment. To see a
list of these properties, use zpool get
# zpool get all pool2
NAME PROPERTY VALUE SOURCE
datapool size 7.94G -
datapool capacity 0% -
datapool altroot - default
datapool health ONLINE -
datapool guid 18069440062221314300 -
datapool version 33 default
datapool bootfs - default
datapool delegation on default
datapool autoreplace off default
datapool cachefile - default
datapool failmode wait default
datapool listsnapshots off default
datapool autoexpand off default
datapool dedupditto 0 default
datapool dedupratio 1.00x -
datapool free 7.94G -
datapool allocated 216K -
datapool readonly off
These properties are all described in the zpool(1M) man page. Type man zpool to get more
information. To set a pool property, use zpool set. Note that not all properties can be changed (ex.
version, free, allocated).
# zpool set listsnapshots=on pool2
VER DESCRIPTION
--- --------------------------------------------------------
1 Initial ZFS version
2 Ditto blocks (replicated metadata)
3 Hot spares and double parity RAID-Z
4 zpool history
5 Compression using the gzip algorithm
6 bootfs pool property
7 Separate intent log devices
8 Delegated administration
9 refquota and refreservation properties
10 Cache devices
11 Improved scrub performance
12 Snapshot properties
13 snapused property
14 passthrough-x aclinherit
15 user/group space accounting
16 stmf property support
17 Triple-parity RAID-Z
18 Snapshot user holds
19 Log device removal
20 Compression using zle (zero-length encoding)
21 Deduplication
22 Received properties
23 Slim ZIL
24 System attributes
25 Improved scrub stats
26 Improved snapshot deletion performance
27 Improved snapshot creation performance
28 Multiple vdev replacements
29 RAID-Z/mirror hybrid allocator
30 Encryption
31 Improved 'zfs list' performance
32 One MB blocksize
33 Improved share support
And that's it - nothing more complicated than zpool upgrade. Now you can use features provided in
the newer zpool version, like log device removal (19), snapshot user holds (18), etc.
One word of warning - this pool can no longer be imported on a system running a zpool version
lower than 33.
We're done working with zpools. There are many more things you can do. If you want to explore,
see the man page for zpool (man zpool) and ask a lab assistant if you need help.
Let's now clean up before proceeding.
# zpool destroy pool2
# zpool destroy datapool
Exercise 2: Working with Datasets (File Systems, Volumes)
Now that we understand how to manage ZFS zpools, the next topic are the file systems. We will use
the term datasets because a zpool can provide many different types of access, not just through
traditional file systems.
As we saw in the earlier exercise, a default dataset (file system) is automatically created when
creating a zpool. Unlike other file system and volume managers, ZFS provides hierarchical datasets
(peer, parents, children), allowing a single pool to provide many storage choices.
ZFS datasets are created, destroyed and managed using the zfs(1M) command. If you want to learn
more, read the associated manual page by typing man zfs.
To begin working with datasets, let's create a simple pool, again called datapool and 4 additional
datasets called bob joe fred and pat.
# zpool create datapool mirror c4t0d0 c4t1d0
# zfs create datapool/bob
# zfs create datapool/joe
# zfs create datapool/fred
# zfs create datapool/pat
We can use zfs list to get basic information about all of our ZFS datasets.
# zfs list -r datapool
NAME USED AVAIL REFER MOUNTPOINT
datapool 238K 7.81G 35K /datapool
datapool/bob 31K 7.81G 31K /datapool/bob
datapool/fred 31K 7.81G 31K /datapool/fred
datapool/joe 31K 7.81G 31K /datapool/joe
datapool/pat 31K 7.81G 31K /datapool/pa
By using zfs list -r datapool, we are listing all of the datasets in the pool named datapool. As in the
earlier exercise, all of these datasets (file systems) have been automatically mounted.
If this was a traditional file system, you might think there was 39.05 GB (7.81 GB x 5) available for
datapool and its 4 datasets, but the 8GB in the pool is shared across all of the datasets. Let's see how
that works.
# mkfile 1024m /datapool/bob/bigfile
Notice that in the USED column, datapool/bob shows 1GB in use. The other datasets show just the
metadata overhead (21k), but their available space has been reduced to 6.81GB. That's because that
is the amount of free space available to them after datapool/bob had consumed the 1GB.
Hierarchical Datasets
A dataset can have children, just as a directory can have subdirectories. For datapool/fred, let's
create a dataset for documents, and then underneath that, additional datasets for pictures, video and
audio.
# zfs create datapool/fred/documents
# zfs create datapool/fred/documents/pictures
# zfs create datapool/fred/documents/video
# zfs create datapool/fred/documents/audio
The first thing to notice is that the available space for datapool/fred and all of its children is now
2GB, which was the quota we set with the command above. Also notice that the quota is inherited
by all of the children.
The reservation is a bit harder to see.
Original pool size 7.81GB
In use by datapool/bob 1.0GB
Reservation by datapool/fred 1.5GB
So, datapool/joe should see 7.81GB - 1.0GB - 1.5GB = 5.31GB available.
Changing the mountpoint
With a traditional file system in operating systems, other than Oracle Solaris 11, changing a
mountpoint would require:
• Unmounting the file system
• Making a new directory
• Editing /etc/vfstab
• Mounting the new file system
With ZFS it can be done with a single command. In the next example, let's move datapool/fred to a
directory just called /fred.
# zfs set mountpoint=/fred datapool/fred
Notice that not only did it change datapool/fred, but also all of its children. In one single command.
No unmounting, making directories. Just change the mountpoint.
All of these properties are preserved across exporting and importing of zpools.
# zpool export datapool
# zpool import datapool
Everything comes back exactly where you left it, before the export.
ZFS Volumes (zvols)
So far we have only looked at one type of dataset: the file system. Now let's take a look at zvols and
what they do.
Volumes provide a block level (raw and cooked) interface into the zpool. Instead of creating a file
system where you place files and directories, a single object is created and then accessed as if it
were a real disk device. This would be used for things like raw database files, virtual machine disk
images and legacy file systems. Oracle Solaris also uses this for the swap and dump devices when
installed into a zpool.
# zfs list -r rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool 7.91G 54.6G 39K /rpool
rpool/ROOT 6.36G 54.6G 31K legacy
rpool/ROOT/solaris 6.36G 54.6G 5.80G /
rpool/ROOT/solaris/var 467M 54.6G 226M /var
rpool/dump 516M 54.6G 500M -
rpool/export 6.49M 54.6G 33K /export
rpool/export/home 6.39M 54.6G 6.39M /export/home
rpool/export/ips 63.5K 54.6G 32K /export/ips
rpool/export/ips/example 31.5K 54.6G 31.5K /export/ips/example
rpool/swap 1.03G 54.6G 1.00G -
In this example, rpool/dump is the dump device for Solaris and it 516MB. rpool/swap is the swap
device and it is 1GB. As you can see, you can mix files and devices within the same pool.
Use zfs create -V to create a volume. Unlike a file system dataset, you must specific the size of the
device when you create it, but you can change it later if needed. It's just another dataset property.
# zfs create -V 2g datapool/vol1
Expanding a volume is just a matter of setting the dataset property volsize to a new value. Be
careful when lowering the value as this will truncate the volume and you could lose data. In this
next example, let's grow our volume from 2GB to 4GB. Since there is a UFS file system on it, we'll
use growfs to make the file system use the new space.
# zfs set volsize=4g datapool/vol1
# growfs /dev/zvol/rdsk/datapool/vol1
Warning: 4130 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/datapool/vol1: 8388574 sectors in 1366 cylinders of 48 tracks,
128 sectors
4096.0MB in 86 cyl groups (16 c/g, 48.00MB/g, 11648 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
7472672, 7571104, 7669536, 7767968, 7866400, 7964832, 8063264, 8161696,
8260128, 8358560
# zfs upgrade -v
The following filesystem versions are supported:
VER DESCRIPTION
--- --------------------------------------------------------
1 Initial ZFS filesystem version
2 Enhanced directory entries
3 Case insensitive and File system unique identifier (FUID)
4 userquota, groupquota properties
5 System attributes
The value after the @ denotes the name of the snapshot. Any number of snapshots can be taken.
# zfs snapshot datapool/bob@just-a-bit-later
Let's delete these snapshots so they don't get in the way of our next example.
# zfs destroy datapool/bob@even-later-still
# zfs destroy datapool/bob@just-a-bit-later
# zfs destroy datapool/bob@now
Now that we can create these point in time snapshots, we can use them to create new datasets.
These are called clones. They are datasets, just like any other, but start off with the contents from
the snapshot. Even more interesting, these clones only require space for the data that's different than
the snapshot. That means that if 5 clones are created from a single snapshot, only 1 copy of the
common data is required.
Remember that datapool/bob has a 1GB file in it? Let's snapshot it, and then clone it a few times to
see this.
# zfs snapshot datapool/bob@original
# zfs clone datapool/bob@original datapool/newbob
# zfs clone datapool/bob@original datapool/newfred
# zfs clone datapool/bob@original datapool/newpat
# zfs clone datapool/bob@original datapool/newjoe
Let's use zfs list to get a better idea of what's going on.
# zfs list -r -o space datapool
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV
USEDCHILD
datapool 1.19G 6.63G 0 40K 0
6.63G
datapool/bob 1.19G 1.00G 0 1.00G 0
0
datapool/fred 2.00G 159K 0 32K 0
127K
datapool/fred/documents 2.00G 127K 0 34K 0
93K
datapool/fred/documents/audio 2.00G 31K 0 31K 0
0
datapool/fred/documents/pictures 2.00G 31K 0 31K 0
0
datapool/fred/documents/video 2.00G 31K 0 31K 0
0
datapool/joe 1.19G 31K 0 31K 0
0
datapool/newbob 1.19G 18K 0 18K 0
0
datapool/newfred 1.19G 18K 0 18K 0
0
datapool/newjoe 1.19G 18K 0 18K 0
0
datapool/newpat 1.19G 18K 0 18K 0
0
datapool/old 1.19G 22K 0 22K 0
0
datapool/pat 1.19G 31K 0 31K 0
0
datapool/vol1 5.19G 4.13G 0 125M 4.00G
0
We can see that there's a 1GB file in datapool/bob. Right now, that's the dataset being charged with
the copy, although all of the clones can use it.
Now let's delete it in the original file system, and all of the clones, and see what happens.
# rm /datapool/*/bigfile
# zfs list -r -o space datapool
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV
USEDCHILD
datapool 1.19G 6.63G 0 40K 0
6.63G
datapool/bob 1.19G 1.00G 1.00G 31K 0
0
datapool/fred 2.00G 159K 0 32K 0
127K
datapool/fred/documents 2.00G 127K 0 34K 0
93K
datapool/fred/documents/audio 2.00G 31K 0 31K 0
0
datapool/fred/documents/pictures 2.00G 31K 0 31K 0
0
datapool/fred/documents/video 2.00G 31K 0 31K 0
0
datapool/joe 1.19G 31K 0 31K 0
0
datapool/newbob 1.19G 19K 0 19K 0
0
datapool/newfred 1.19G 19K 0 19K 0
0
datapool/newjoe 1.19G 19K 0 19K 0
0
datapool/newpat 1.19G 19K 0 19K 0
0
datapool/old 1.19G 22K 0 22K 0
0
datapool/pat 1.19G 31K 0 31K 0
0
datapool/vol1 5.19G 4.13G 0 125M 4.00G
0
Notice that the 1GB has not been freed (avail space is still 3.28G), but the USEDSNAP value for
datapool/bob has gone from 0 to 1GB, indicating that the snapshot is now holding that 1GB of data.
To free that space you will have to delete the snapshot. In this case you would also have to delete
any clones that are derived from it.
# zfs destroy datapool/bob@original
cannot destroy 'datapool/bob@original': snapshot has dependent clones
use '-R' to destroy the following datasets:
datapool/newbob
datapool/newfred
datapool/newpat
datapool/newjoe
Now the 1GB that we deleted has been freed because the last snapshot holding it has been deleted.
One last example and we'll leave snapshots. You can also take a snapshot of a dataset and all of its
children. A recursive snapshot is atomic, meaning that it is a consistent point in time picture of the
contents of all of the datasets. Use -r for a recursive snapshot.
# zfs snapshot -r datapool/fred@now
# mkfile 1g /datapool/bob/bigfile
Now let's turn on compression for datapool/bob and copy the original 1GB file. Verify that you now
have 2 separate 1GB files when this is done.
# zfs set compression=on datapool/bob
# cp /datapool/bob/bigfile /datapool/bob/bigcompressedfile
# ls -la /datapool/bob
total 2097450
drwxr-xr-x 2 root root 4 Nov 22 04:29 .
drwxr-xr-x 6 root root 6 Nov 22 04:26 ..
-rw------- 1 root root 1073741824 Nov 22 04:29 bigcompressedfile
-rw------T 1 root root 1073741824 Nov 22 04:28 bigfile
There are now 2 different 1GB files in /datapool/bob, but df only says 1GB is used. It turns out that
mkfile creates a file filled with zeroes. Those compress extremely well - too well, as they take up no
space at all. To make things even more fun, copy the compressed file back on top of the original and
they will both be compressed, and you'll get an extra 1GB of free space back in the pool.
# cp /datapool/bob/bigcompressedfile /datapool/bob/bigfile
# zfs list datapool/bob
NAME USED AVAIL REFER MOUNTPOINT
datapool/bob 31K 2.19G 31K /datapool/bob
NFS
Each file system dataset has a property called sharenfs. This can be set to the values that you would
typically place in /etc/dfs/dfstab. See the manual page for share_nfs for details on specific settings.
Create a simple pool called datapool with 3 datasets, fred, barney and dino.
# zpool create datapool c4t0d0
# zfs create datapool/fred
# zfs create datapool/barney
# zfs create datapool/dino
Verify that no file systems are shared and that the NFS server is not running.
# share
# svcs nfs/server
STATE STIME FMRI
disabled 12:47:45 svc:/network/nfs/server:default
Not only were the file systems shared via NFS, the NFS server is now running.
Now let's export the pool and notice that the NFS shares have gone away, but the NFS server is still
running.
# zpool export datapool
# share
# svcs nfs/server
STATE STIME FMRI
online 13:08:42 svc:/network/nfs/server:default
Notice that when the pool is imported, the NFS shares come back.
# zpool import datapool
# share
- /datapool/fred sec=sys,ro ""
- /datapool/barney rw ""
Notice the new vdev type of spare. This is used to designate a set of devices to be used as a spare in
case too many errors are reported on a device in a data vdev.
Now let's put some data in the new pool. /usr/share/man is a good source of data. The snyc
command simply flushes the file system buffers so the following disk space usage command will be
accurate.
# cp -r /usr/share/man /datapool
# sync
# df -h /datapool
Filesystem Size Used Avail Use% Mounted on
datapool 460M 73M 388M 16% /datapool
Let's destroy one of the mirror halves, and see what ZFS does about it.
# dd if=/dev/zero of=/dev/dsk/disk1 bs=1024k count=100 conv=notrunc
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 2.23626 s, 46.9 MB/s
Since the data errors were injected silently, we had to tell ZFS to compare all of the replicas. zpool
scrub does exactly that. When it finds an error, it generates an FMA error report and then tries to
correct the error by rewriting the block, and reading it again. If too many errors are occuring, or the
rewrite/reread cycle still fail, a hot spare is requested, if available. Notice that the hot spare is
automatically resilvered and the pool is returned to the desired availability.
This can also be seen in the FMA error and fault reports.
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 5.8 0 0 0 0 0 0
disk-transport 0 0 0.0 1954.1 0 0 0 0 32b 0
eft 431 0 0.0 32.4 0 0 0 0 1.3M 0
ext-event-transport 1 0 0.0 15.8 0 0 0 0 46b
0
fabric-xlate 0 0 0.0 6.1 0 0 0 0 0 0
fmd-self-diagnosis 183 0 0.0 3.8 0 0 0 0 0 0
io-retire 0 0 0.0 6.0 0 0 0 0 0 0
sensor-transport 0 0 0.0 201.8 0 0 0 0 32b 0
ses-log-transport 0 0 0.0 196.3 0 0 0 0 40b 0
software-diagnosis 0 0 0.0 5.8 0 0 0 0 316b 0
software-response 0 0 0.0 6.1 0 0 0 0 316b 0
sysevent-transport 0 0 0.0 3527.4 0 0 0 0 0 0
syslog-msgs 1 0 0.0 220.9 0 0 0 0 0 0
zfs-diagnosis 4881 4843 0.6 11.0 1 0 1 1 184b 140b
zfs-retire 29 0 0.0 433.3 0 0 0 0 8b 0
The zfs-diagnosis module was invoked each time an ZFS error was discovered. Once an
unsatisfactory error threshold was reached, the zfs-retire agent was called to record the fault and
start the hot sparing process. An error log message was written (syslog-msgs > 0).
Now let's put things back the way they were.
# zpool detach datapool disk1
# zpool replace datapool disk3 disk1
# zpool status datapool
pool: datapool
state: ONLINE
scan: resilvered 72.6M in 0h0m with 0 errors on Tue Apr 19 13:21:42 2011
config: