Sunteți pe pagina 1din 23

Block Reclamation

Why Block Space Reclamation needed?

To ensure Thin storage environment stays Thin

ashwinwriter@gmail.com
July, 2014

Background: Thin provisioning allows administrators to allocate logical


capacity that is greater than a storage systems total physical capacity. It does
so by using on-demand block allocation of data based on host writes versus
allocating all of the blocks during the initial volume creation. As a result of
this on-demand approach to allocating actual physical storage capacity,
customers can realize significant economic benefits by over-provisioning or
thin provisioning their storage. By and large, this is due to not having to
commit considerable storage capacity up front (as with thick provisioning) to
users or business groups that often consume only a fraction of the allocated
physical capacity.
Thin provisioning broke the direct relationship between storage purchasing
and provisioning, which otherwise had in the past led to shockingly low levels
of capacity utilisation but high levels of investment.
So, how did it all begin: Traditional storage provisioning maintains a one-toone map between internal disk drives and the capacity used by servers. In the
world of block storage, a server would see a fixed-size drive, volume or
LUN, and every bit of that capacity would exist on hard disk drives residing in
the storage array. The 100GB C drive in a Windows server, for example, would
access 100 GB of reserved RAID-protected capacity on a set of disk drives in a
storage array.
The simplest implementation of thin provisioning is straightforward - Storage
capacity is aggregated into pools of same-sized pages, which are then
allocated to servers on demand rather than on initial creation. In our
example, the 100 GB C drive might contain only 10GB of files, and this space
alone would be mapped to 10 GB of capacity in the array. As new files are
written, the array would pull additional capacity from the free pool and
assign it to that server.
This type of allocate-on-write thin provisioning is fairly widespread today.
Most midrange and enterprise storage arrays, and some smaller devices,
include this capability either natively or as an added-cost option.

All was good, until the problem became apparent - Such systems are only
THIN for a time. Most file-systems use 'clear' space for new files to avoid
fragmentation; deleted content is simply marked unused at the file-system
layer rather than zeroed out or otherwise freed up at the storage layer. These
systems will eventually end up occupying entire allocation of storage even
without much additional new writes.

Root cause - is basically a lack of communication between applications and


storage systems. File-systems aren't generally thin-aware, and no mechanism
exists to report when capacity is no longer needed. The key to effective thin
provisioning is discovering opportunities to reclaim unused capacity. That is,
as soon as the data is deleted from the host file system, the application
should automatically reclaim the freed storage space. Hence, even though
Thin Provisioning worked wonders at the beginning, but as files got
deleted/moved on the host filesystem, it eventually started filling up the
storage LUN/Volume on the storage array and basically defeated the very
purpose of thin provisioning.
So, what did it lacked?
Thin provisioning lacked space reclamation mechanism. It didnt how to
reclaim the dead space on the storage array.
Lets examine what exactly we mean by dead space?
In order to fully appreciate dead space reclamation, one must examine the
host front-end and the storage back-end. Once a host writes to a thin
provisioned volume, physical capacity is allocated to the host file system,
sounds good so far! But
Unfortunately, if the host deletes the file, only the host file system frees up
that space.

As seen in the illustration above, the physical capacity of the storage system
remains unchanged. In other words, the storage system does not free up the
capacity for the deleted host file which is commonly referred to as dead
space or hole punching.
NetApp for long made use of SnapDrive plug-in:
Basically, storage managed by SnapDrive on the Server/Host system logically
appears to come from a locally attached storage subsystem. In reality, the
capacity comes from the NetApp system. One advantage of this is that it
allows NetApp to use interfaces to the Windows API (specifically by becoming
part of the device driver layer and using the IOCTL functions) to watch for file
system changes on the host, and inform the NetApp system of these changes
via new and additional SCSI commands.

To address this thin provisioning limitation, the SCSI T10 Technical Committee
established the T10 SCSI Block Command 3 (SBC3) specification which defines
the UNMAP command for a diverse spectrum of storage devices including
hard disk drives (HDDs) and numerous other storage media.
Using SCSI UNMAP, IT administrators can now reclaim host file system space
and back-end storage dead space.
However, not only does SCSI UNMAP require T10 SBC3 compliant SCSI
hardware, it also requires necessary software application programming
interfaces (APIs) such as those now included in Windows Server 2012 or

Windows 8. That being said, previous Windows OS releases do not support the
necessary APIs.
Unmapping - In simple words, is basically de-allocating relationship between
and LBA and a physical block in a logical unit.
This is also known as HOLE PUNCHING on filesystem side.
What is Hole punching?
Hole punching in file-systems is to mark a portion of a file as being unneeded
and the associated storage to that file portion can then be freed.
Background: Block reclamation was always a SAN Hardware feature as it was
easier to get control over entire stack [OS/Kernel/filesystem/Driver/Storage].
Whereas, when the LUN is attached to the Host System, Host takes over the
LUN and formats it with either open source or a proprietary filesystem such as
NTFS. For something like NTFS, where the data structures are proprietary and
not officially documented, a Storage vendor generally provides their own
plug-in to take advantage of 'block reclamation', but not any longer.

Good News is that Number of popular OS/Filesystem application vendors are


now [working towards] providing this feature Natively in their new product
releases by adapting to "T10 SBC3 specification. For more information on T10
please see the last page.

OS & Storage vendors adapting to T10 Standards to reclaim space

Microsoft Adapts T10 Standard with Windows 2012/Windows8

NTFS Filesystem : As files are added to an NTFS volume, more entries are
added to the MFT and so the MFT increases in size. When files are deleted
from an NTFS volume, their MFT entries are marked as free and may be
reused, but the MFT does not shrink. Thus, space used by these entries is not
reclaimed from the disk.
Microsoft solved this problem by adapting to T10 Standard with
WIN2012/WIN8
By default, Windows 8 and 2012 enable real-time space reclamation using SCSI
UNMAP. That means, it does not require any third party API's to reclaim the
dead space.
So, what's new with Windows 2012 that allows space reclaim?
A new API implementation known as an IOCTL DSM allocation which retrieves
the logical block address (LBA) status of thin provisioned LUNs. All logical
blocks are grouped into slabs or clusters which are classified into mapped, deallocated or anchored states which Windows considers unmapped states. This
is transparent to users and ensures the Windows thin provisioning framework,
which includes space reclamation, performs as intended.
For further information about the Windows 2012 Thin Provisioning features,
reference the following link:
http://msdn.microsoft.com/en-us/library/windows/hardware/hh770514.aspx

Caution:
As previously described, anytime a large file is deleted, multi-level space
reclamation occurs. However, this may impact performance depending on how
often users or applications delete large files. Proper planning should help to
alleviate any real-time space reclamation performance impacts and can be
accomplished establishing performance baselines.
If Windows space reclamation planning identifies a high probability of
performance impact, consider the following option:
Real-time space reclamation can be disabled for large file deletions in the
following Windows registry.
1. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
2. Set the DisableDeleteNotification value to 1
Note: This Windows registry setting affects all LUNs on that particular
Windows Host. For further information, visit the following:
Plan and Deploy Thin Provisioning
http://technet.microsoft.com/en-us/library/jj674351.aspx

Symantec Storage Foundation provides middleware functionality for its


software-based dynamic disks which is beyond typical Windows NTFS volume
definitions. This differentiation carries over to its alternate space reclamation
approach. Rather than using SCSI UNMAP commands, Symantec Storage
Foundation employs SCSI WRITE SAME commands to achieve the same end
result.
Thin Reclamation using Thin Reclamation API and Thin Provisioning
Reclamation Add-on:
http://www.symantec.com/business/support/index?page=content&id=HOWTO
78517
http://public.dhe.ibm.com/common/ssi/ecm/en/tsw03164usen/TSW03164US
EN.PDF

Red Hat Enterprise Linux 6 introduced the SCSI UNMAP command to the ext4
file systems to support releasing space on SAN platforms that also
implemented the UNMAP command.
Linux kernel uses discard requests to inform the storage that a given range of
blocks is no longer in use.
How to Use Discard option:
Create a new ext4 file system and mount it using the new discard option.
This is the piece that tells Red Hat to send the SCSI UNMAP command to
Storage Centre when it is done with blocks of storage.
[root@redhat ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.0 (Santiago)
[root@redhat ~]# mkfs.ext4 L DemoVol /dev/sdb
[root@redhat ~]# mount -o discard LABEL=DemoVol /files/
Fore more details, please see the DELL community Page.
http://en.community.dell.com/techcenter/b/techcenter/archive/2011/06/29
/native-free-space-recovery-in-red-hat-linux.aspx

VMware began supporting SCSI UNMAP commands when it introduced the


VMware vSphere 5.0 storage APIs for Array Integration (VAAI) primitives.
However, VMware discovered issues which affected their Storage vMotion and
VM Snapshot consolidation that led them to alter their SCSI UNMAP support for
its newest release vSphere 5.1. Specifically, vSphere 5.1 does not provide
proactive or automatic space reclamation for SCSI UNMAP commands. Manual
user intervention or scripts must be implemented to realize the SCSI UNMAP
benefits preferably outside of peak business hours.

SUMMARY

Note: With T10 SBC3 adaption by both OS & Storage vendors, propriety APIs
will not be required.

Demonstration to test the block reclamation theory


Using NetApp SnapDrive API
Scenario: [All Thick]
Thick Volume = 4GB
Thick LUN
= 2GB
Windows 2008 R2 NTFS Filesystem = 1.97GB [After formatting]
This is how VOLUME usage looks like in the Oncommand System Manager.

We copied 1.33GB of data on this LUN.

This is how LUN space usage stands after copying 1.33GB to the LUN.

We created a snapshot

Now, we intend to delete 655MB from the NTFS file system on host OS.

Shift + delete -> New folder (5) of 655MB size.

We checked the new available size on the volume (1.33 655 = 703 MB) and
looks like NTFS has unmapped the blocks and gained space back.

However, when we checked the space usage on the LUN [storage array], looks
like the blocks freed on the host hasnt reflected on the LUN side.

It means the Storage Array [Filer] has no idea about the blocks that have been
freed on the file system and its still showing the old allocated space.

How NetApp uses SnapDrive API to tackle this problem

How to reclaim the unused BLOCKS?

NetApp SnapDrive: snapdrive runs space reclaimer scanner and informs the
NetApp filer that these blocks should be freed from the storage sub-system.

Snapdrive predicts based on the initial scan that it can free up to 670MB
worth of blocks back to the storage POOL. [You remember we deleted 655MB
of data on the host NTFS file system]

Enable the check box if you wish to [it is quite self-explanatory] and then
click ok to begin Space Reclamation.

Once the space reclamation process was finished, we checked the LUN space
on the Filer. It looks like we have regained space on the LUN.

Bottom line With THICK LUN, there is no benefit as far as reclaiming the
dead space, bcos you cant really give that space to any other volume as it is
fixed to that volume. However, it does improve space reporting on the
NetApp System Manager and/or other reporting tools. Hence, you would no
longer see LUN 100 % usage on the reporting tool, in spite of having plenty of
space on the HOST FILE SYSTEM.

Unfortunately, as I said earlier the space we re-claimed above is just gone


back to the LUN but not to the volume or aggregate.
The only way we could have made use of the shared space was by giving it
back to either:

Volume

[By making LUN THIN & VOLUME THICK]

Usually, as a best practice we create a one-2-one mapping between LUN and


VOLUME and hence, in this case, even if the space was given back to the
volume, we cannot actually share the re-gained space with other LUNs.
Unless, you have multiples LUNS sharing the same volume.
Or,

Aggregate [By making both LUN & VOLUME THIN]

VMware introduced Space Reclamation as part of VAAI


(VMware Storage APIs for Array Integration)
vSphere 5.0 introduced the VAAI Thin Provisioning Block Space Reclamation
(UNMAP) Primitive. This feature was designed to efficiently reclaim deleted
space to meet continuing storage needs. ESXi 5.x issues UNMAP commands for
space reclamation during several operations.
When is 'UNMAP' called?
When you delete virtual machine files from a VMFS datastore, or migrate
them through Storage vMotion, the datastore frees blocks of space and
informs the storage array via UNMAP command, so that the blocks can be
reclaimed.
Soon problem was discovered with UNMAP: Poor system performance
As a result of this, VMware recommended disabling UNMAP on ESXi 5.0 hosts
with thin-provisioned LUNs.
For this reason, the UNMAP operation has been disabled by default in ESXi500201112001 (ESXi 5.0 Patch 02) and ESXi 5.0 Update 1. This is now a manual
process. This means that tasks such as Storage Migration and Snapshot
Consolidation do not automatically attempt UNMAP on the back end LUN. If
you continue to use an unpatched ESXi 5.0 host, you must manually disable
UNMAP on all hosts. For more information, see Disabling VAAI Thin Provisioning
Block Space Reclamation (UNMAP) in ESXi 5.0 (2007427).
ESXi 5.0 Update 1 includes an updated version of vmkfstools that provides an
option (-y) to send the UNMAP command to the storage arrays, regardless of
the ESXi host's global setting. This option also exists on earlier ESXi versions,
but does not reclaim the space when run.
Note: When you run vmkfstools --help, the -y option is not displayed in the
help output.
To avoid the use of UNMAP commands on Thin Provisioned LUNs:
1. Log in to your host using the Tech Support mode. For more information
on using Tech Support mode, see Tech Support Mode in ESXi 4.1 and 5.0
(1017910).
2. From your ESXi 5.0 host, run this command:
esxcli system settings advanced set --int-value 0 --option
/VMFS3/EnableBlockDelete

3. To verify this setting, run this command :


esxcli system settings advanced list --option /VMFS3/EnableBlockDelete
Path: /VMFS3/EnableBlockDelete
Type: integer
Int Value: 0 <<<<<<<<<< 0 means Disabled
Default Int Value: 1
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Enable VMFS block delete

KB article: Disabling VAAI Thin Provisioning Block Space Reclamation


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cm
d=displayKC&externalId=2007427
KB article: Using vmkfstools to manually reclaim VMFS deleted block
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cm
d=displayKC&externalId=2014849

Note: To verify that you have a T10 storage array, consult the VMware
Compatibility Guide.
http://www.vmware.com/resources/compatibility/search.php

List of VAAI capable storage arrays:


http://v-reality.info/2010/10/list-of-vaai-capable-storage-arrays/

FAQ
On
THIN PROVISIONING
NetApp
WHAT IS BASIC THIN PROVISIONING (GENERIC DEFINITION)?
Answer: Thin provisioning provides the ability to allocate space from a pool of
storage to a volume or LUN only when the data is written, rather than
preallocating the space. This allows the storage to be purchased
incrementally as it is needed, rather than purchasing large amounts of storage
upfront based on guesses about storage requirements.
SO, WHAT IS THICK PROVISIONING?
Answer: Thick provisioning is the traditional approach of fully preallocating all
space to a volume or LUN on its creation, rather than waiting for data to be
written to the volume or LUN.
WHAT ARE THE KEY BENEFITS OF USING NETAPP THIN PROVISIONING?
Answer: NetApp thin provisioning can increase storage utilization while
providing the flexibility to address the challenges in a dynamic IT
environment. Since space is not taken from the storage pool until data is
written to a volume or LUN, the unused space is available to any thinprovisioned volume or LUN using that common shared pool. For more details
about NetApp thin provisioning, refer to TR-3563, NetApp Thin Provisioning
Increases Storage Utilization with on-Demand Allocation.
http://media.netapp.com/documents/tr-3563.pdf
CAN I GROW OR SHRINK THE SHARED STORAGE POOL (AGGREGATE)?
Answer: The aggregate can be expanded, but cannot be reduced.
CAN I ALLOCATE MORE STORAGE TO VOLUMES AND LUNS THAN IS
AVAILABLE IN THE AGGREGATE?
Answer: Yes, this is possible when volumes or LUNs use thin provisioning. This
is known as overcommitment.
IS THERE AN ADVANTAGE TO THIN PROVISIONING A LUN WITHIN A
VOLUME, BUT NOT THIN PROVISIONING THE VOLUME? WHEN WOULD IT
BE USED?
Answer: Doing this is useful if it is desirable to have the LUNs use the volume
as the shared pool of guaranteed space instead of the aggregate.

CAN I USE THIN PROVISIONING WITH OTHER NETAPP STORAGE


EFFICIENCY FEATURES?
Answer: Yes. As a matter of fact, using other NetApp storage efficiency
features, such as deduplication and FlexClone, can provide even greater
storage utilization.
CAN THIN PROVISIONING BE DISABLED AT ANY TIME?
Answer: Yes. It is possible to turn off NetApp thin provisioning at any time for
volumes or LUNs.

WHERE WILL I SEE THE INCREASE IN STORAGE UTILIZATION WHEN I USE


THIN PROVISIONING?
Answer: The easiest way to recognize the increase in storage utilization as a
result of using NetApp thin provisioning is to measure storage utilization with
the Operations Manager Storage Efficiency Dashboard and/or the storage
efficiency section of My AutoSupport.
CAN I SET THRESHOLDS BASED ON THE FULLNESS OF THE AGGREGATE?
Answer: Yes. In Operations Manager, use aggrFullThreshold and
aggrNearlyFullThreshold.
CAN I SET THRESHOLDS BASED ON THE FULLNESS OF THE VOLUME?
Answer: Yes. In Operations Manager, use volFullThreshold and
volNearlyFullThreshold.
CAN I SET THRESHOLDS BASED ON THE LEVEL OF OVERCOMMITMENT OF
THE AGGREGATE?
Answer: Yes. In Operations Manager, use aggrOvercommittedThreshold and
aggrNearlyOvercommittedThreshold.

CAN I SET THRESHOLDS BASED ON THE LEVEL OF OVERCOMMITMENT OF


THE VOLUME?
Answer: Yes. In Operations Manager, use volOvercommittedThreshold and
volNearlyOvercommittedThreshold.
CAN I USE BOTH THIN AND THICK PROVISIONING TOGETHER?
Answer: Yes. It is possible to have thin-provisioned and thick-provisioned
volumes and LUNs within the same aggregate. Thin provisioning can be
enabled at any time without any performance impact

Virtual machine disk provisioning methods


(VMware)
VMDKs can be provisioned using two different methods, namely thick
provisioning and thin provisioning.
Thick provisioning can be categorized into two methods:
1. Lazy zeroed thick
2. Eagerzeroed thick
Before we define what these two are, it is important to understand what
'zeroing' is.
Zeroing - Is a process of writing zeroes to the disk blocks corresponding to a
VMDK, to make sure that the existing data in those blocks, if any, are not
exposed via the new VMDK.

Eager Zeroes Thick provisioning: An eager zeroed thick disk, when


created, will get all of the space allocation it needs, and all of the disk
blocks allocated to it, are zeroes out at the time of creation.
Therefore, it takes longer as compared to lazy zeroed or thinprovisioned disk.

However, it offers better first write performance, this is due to the fact that
the disk blocks corresponding to an eager zeroes disk are already zeroed out
during its creation.

Lazy Zeroed thick provisioning: A lazy zeroed thick disk will also get
all of the space allocation it needs at the time of creation, but unlike
eager zeroed disk, it DOES NOT write zeroes to all of the disk blocks.
Each disk block is zeroes out only during the first write. Although, it
doesnt offer the first write performance like eager zeroed disk, all of
the subsequent writes to the zeroed blocks will have the same
performance.

Thin-provisioning disk: This type of disk will not use all of the disk
space assigned to it during creation. It will only consume the disk space
needed by the data on the disk. For example - If you create a think
VMDK of 100GB, it will not use 100GB of space at the back-end. If a
100MB file is added to the VMDK, then the VMDK will grow by 100MB
only.

T10 Technical Committee


For more information on SCSI T10 SBC3 UNMAP
1. Go to T10 website:
www.t10.org/
2. Click Search docs as shown in the figure below.

3. In the search box enter unmap and click search or press return key.

4. Search should return quite a few documents that you can refer to for more
information on UNMAP command and how it works internally.

Informational articles:
http://www.13thmonkey.org/documentation/SCSI/spc3r23.pdf
http://www.snia.org/sites/default/files2/SDC2011/presentations/monday/Fr
ederickKnight_File_Systems_Thin_Provisioning.pdf
https://communities.netapp.com/community/netappblogs/efficiency/blog/2
010/08/04/punching-holes
Thin Provisioning: [Must Read]
http://msdn.microsoft.com/enus/library/windows/hardware/dn265487(v=vs.
85).aspx
Note: For any current working draft, you will need to be a member of T10.

PS: This document is my own small effort to shed light on thin provisioning & block
reclamation and therefore there may be some information which is incorrect and I hope the
reader will point to it. Thanks!

Courtesy: T10 org, Symantec, IBM, Redhat, VMware, Dell, Microsoft & NetApp

ashwinwriter@gmail.com
July, 2014

S-ar putea să vă placă și