NetApp FAS3050-R200 EMC Centera Report

July 2006
www.veritest.com • info@veritest.com
Network Appliance™
FAS3050, R200, and EMC Centera G4
Performance Comparison
Test report prepared under contract from Network Appliance
Executive Summary
Key Findings
$$3 4 0 - '/ ' -
%$3 4 0 - - '/ - ' -
! "#$%$ & $$
'() ) *+ , ! "#$%$ + #53 2,
6
+ 7 " 8) *+ +5
& $$6 , 8 +%3
, , - ! "#$%$
.
- ( ) *+ 3
'/ $$#
&- 0 ! "#$%$
"0 1' -
2 $" , - ! "#$%$ 5$3
) *+7 & $$ 9+3
- , - ! "#$%$ %
The back-end storage architecture ) *+
is not the only factor when
evaluating performance. Archival
, ! "#$%$ #%3 2, 6
+ 7
performance also depends on the " 8) *+
application use and application , - ! "#$%$
server configuration. For this ) *+ 3
4- - , - ! "#$%$ +53
comparison, we generated 80.4 ) *+
million e-mails using Microsoft ! "#$%$ - / - 0
LoadSim 2003, populating eight ) *+ - -
Exchange 2003 servers. All

Exchange servers had 4GB of , ! "#$%$ + 3 2,
RAM and dual Xeon processors ) ! "#$%$ 0 %$3
(2.8GHz to 3.2GHz). Likewise, the
Symantec Enterprise Vault servers were equipped with 4GB of RAM and dual Xeon processors (2.8GHz up to
3.2GHz). We used a set of four to eight Enterprise Vault servers to archive, search, and re-index up to 1.72
million e-mails from the total population of 80.4 million e-mails. Using multiple large-memory servers allowed
us to evenly distribute threads across back-end interfaces and maximize bandwidth utilization of each
platform.
It should be noted that we used a Network Appliance FAS3050 and a FAS3020 as the primary back-end
storage for the Exchange and SQL databases. These were mutually exclusive arrays, separate from the
archive storage platforms. The Exchange host servers connected to the FAS3050 and the FAS3020 via Fibre
Channel using a Brocade 3800 Silkworm Fibre Channel switch. A detailed diagram of the performance test
bed can be found in “Testing Methodology.”
The EMC Centera G4 configuration followed the EMC Centera Best Practices Guide. A total of 16 G4 nodes
were deployed. We assigned 4 nodes access roles, with the remaining 12 nodes assigned a storage role.
Each storage node contained four 500GB SATA drives, for a total of forty-eight 500GB drives. All nodes were
dedicated to their assigned role. This 3:1 storage node-to-access-node ratio exceeds the EMC best practices
for small object workloads (<100K avg). The Centera was deployed using the default CPM data protection
scheme as well as the Storage Strategy Performance setting to ensure optimal write performance in small
object environments such as the e-mail archival environment.
®
We compared the Centera G4 against two platforms from Network Appliance: the NearStore R200 and a
single-node FAS3050. The R200 uses a single controller, or node. as well. We configured the R200 with one
aggregate (or pool) of forty-eight 320GB SATA drives and four dedicated Gigabit Ethernet interfaces. The
FAS3050 was also configured with four dedicated Gigabit Ethernet interfaces and a single aggregate of
twenty-four 250GB SATA drives. Aside from capacity differences, both EMC and Network Appliance platforms
used 7200 RPM SATA drives. The Enterprise Vault archive servers communicated to the Centera via the
®
Centera access APIs. We used a standard CIFS or Windows share connection to the FAS3050 and R200 for
their respective tests.
We ran two scenarios. In the first scenario, we conducted what was termed a 100% deployment, composed of
eight Exchange 2003 servers and eight Enterprise Vault servers, along with one Microsoft SQL server. We
believed that the eight Exchange 2003 servers would generate a sufficient workload for the eight Enterprise
Vault servers. In the second scenario, we conducted a 50% deployment using four Exchange 2003 servers
and four Enterprise Vault servers, with one Microsoft SQL server.
To create our simulated e-mail environment, we used the industry-standard Microsoft Exchange Server 2003
Load Simulator (LoadSim), available from Microsoft at
http://www.microsoft.com/downloads/details.aspx?familyid=92EB2EDC-3433-47CA-A5F8-
0483C7DDEA85&displaylang=en. Using the LoadSim default message mixture, we generated 80.4 million e-
mail messages across 15,000 mailboxes, for an average of 5,360 e-mail messages per user. Each Exchange
Server also contained 5,000 unique e-mails (5 mailboxes with 1,000 messages each), generated using the
Mailsend.exe tool and containing a 50/50 split of embedded text and attached text with a unique string used
for retrieval by the Symantec Enterprise Vault Discovery Accelerator application. We stored these e-mails on
the FAS3050 and FAS3020 Fibre Channel connected arrays mentioned earlier. For the archive storage
hardware, we used an authorized service partner (Unisys) and the publicly available best practices for the
installation and configuration of the Centera G4. We also used the publicly available best practices
documentation from Network Appliance to provision the FAS3050 and the R200. We archived approximately
1.7 million e-mails from a population of 80.4 million for each execution of the Enterprise Vault archiving tasks
during the 100% deployment testing. The archival volume was half this value when running the 50%
deployment tests.
Overall we found that a single-node FAS3050 with twenty-four 250GB 7200RPM drives scaled performance
at a rate 42% greater than a 16-node Centera G4 with forty-eight 500GB 7200RPM drives, even as we
continued to fill the file systems on the FAS3050 past 50% full. Moving from the 50% deployment up to the
100% deployment, the FAS3050 archival rate increased from 157MB/minute to 261MB/minute, a
104MB/minute improvement. The Centera G4 moved from 116MB/minute to 189MB/minute, an improvement
of only 73MB/minute.
! "#$%$ '() ) *+ ' - )

Enterprise Vault Archival Scalability
MB/Minute
300
250
104 MB/Minute
200
73 MB/Minute
150
100
50
0
FAS3050 Centera
50% Deployment 100% Deployment
Please refer to “Testing Methodology,” later in this report, for complete details of how we conducted the
performance testing on the FAS3050, R200, and Centera G4.
! "#$%$ '() ) *+ ' - ) #

Enterprise Vault Archival Throughput Results
100% Deployment: Eight Enterprise Vault Servers with Eight Exchange
Servers
The 100% deployment test consisted of eight Enterprise Vault servers to archive e-mail located on eight
Exchange servers. For a detailed view of the test environment, see “Testing Methodology,” later in this report.
Establish Baseline Enterprise Vault Archival Throughput

To establish a baseline for archival throughput, we enabled 35 LoadSim-generated mailboxes and the 5
Mailsend.exe-generated mailboxes in Enterprise Vault for each corresponding Exchange Server. We
monitored the archival throughput by capturing the archival rate as reported in the saveset SQL table. We
monitored this rate for 60 minutes to determine the average baseline archival throughput.
Monitoring the archival process over the course of one hour, we found that the Network Appliance R200
(single node) achieved the highest archival performance rate, averaging 275MB/minute ingestion of content
storage—45% faster than the Centera G4 and 5% faster than the FAS3050. The FAS3050 (single node)
averaged 261MB/minute, while the Centera G4 ingested content at a rate of 189MB/minute, putting the
FAS3050 ingestion rate at a level 38% greater than that of the Centera G4. Moving forward, we considered
this the baseline archival rate for each platform.
Baseline Enterprise Vault Archival Rates

100% Deployment
MB/Minute
300
250
200
150
100
50
R200 FAS3050 Centera
! "#$%$ '() ) *+ ' - ) +

RAID vs. RAIN
Both EMC and Network Appliance provide storage resiliency features; however, there are significant
differences in the individual drive replacement procedures between the Network Appliance FAS3050/R200
and the EMC Centera G4. The Network Appliance FAS3050/R200 products support the removal of a single
failing disk and the replacement of the failing disk on a one-to-one basis without bringing the disk enclosure
offline, as would be necessary in standard storage arrays. In order to test a drive failure scenario on the
Centera, we had to work within its unique RAIN architecture.
(http://www.emc.com/products/systems/centera/pdf/C938.6_Centera_PDG_ldv.pdf).
With the EMC implementation of a RAIN architecture, the storage node containing a single failed disk must be
brought offline prior to replacing the failed disk. Each Centera storage node contains four 500GB SATA
drives. In consultation with EMC technical resources, we received the following instructions for conducting a
drive failure/pull test:
If a single drive in a storage node is pulled. it cannot immediately be returned into the Centera. The
drive must be returned to EMC for repair and a spare drive obtained.
Upon receipt of the replacement drive, the procedure for the single-drive replacement requires:
o Shut down the storage node

o Replace the drive
o Restart the storage node
In order to simulate a drive failure given this constraint, all drive failure test data used in this report were
obtained by powering off a Centera storage node from the front panel and unplugging the dual power
supplies. The second storage node powered off in the same manner. Powering the storage nodes back up
brought them back online.
! "#$%$ '() ) *+ ' - ) %

Single-Disk Failure
In the SATA market, it is not unreasonable to expect drive failures in 7/24/365 archival environments. In order
to test a drive failure condition, we physically pulled a single-disk drive from the drive shelf enclosures on the
FAS3050 and on the R200. Given the RAIN architecture of the Centera and the drive replacement procedure
prescribed by EMC (and summarized in the previous section, “RAID vs. RAIN”), we simply powered off a
single storage node (vs. an access node). To determine the performance impact on each system, we
measured archival rates using EV Archival Throughput as reported on the Enterprise Vault saveset SQL
table.
A single-drive failure on the Network Appliance systems resulted in little or no performance degradation. The
R200 decreased 6.5% during this drive failure test. The FAS3050 actually showed a slight statistical
improvement in throughput capacity (up 1.5%), albeit at a higher utilization rate. A storage node failure on the
Centera G4 resulted in a 22% drop in performance.
In an archival test while experiencing a single-drive failure, the FAS3050 had an ingestion rate 80% greater
than the Centera G4 (265MB/minute vs. 147MB/minute), while the R200 showed a 74% advantage over the
Centera G4.
Enterprise Vault Archival Ingestion Rates

Single Drive Failure Test
MB/Minute
300
250
200
150
100
50
0
Baseline Single Drive Failure
Double-Disk Failure
Today, the likelihood of experiencing a media/bit error during reconstruction of an eight-drive RAID 4/5 set
with 500GB SATA drives has reached 28% (see Appendix C, “Application Data Integrity with RAID-DP,” for
supporting information). With that in mind, we constructed a dual-drive failure scenario. To determine
performance impact to each system, we again measured archival rates using EV Archival Throughput as
reported on the Enterprise Vault saveset SQL table.
In a two-drive failure scenario, the FAS3050 again maintained an archival rate within 5% of its baseline.
However, the R200 dropped 36% from its baseline archival rate, while the Centera fell 41% from its baseline.
Relative to each other under a two-drive failure scenario, the FAS3050 archived 124% (2.24 times) faster
than the Centera G4. The R200 experienced an archival rate that was 56.36% faster than the Centera G4.
Again, it’s important to note that the FAS3050 used half the data drives of the Centera G4.
! "#$%$ '() ) *+ ' - ) 2

Dual Drive Failure Test
MB/Minute
300
250
200
150
100
50
0
Baseline Single Drive Failure Dual Drive Failure
Enterprise Vault Archival Throughput During a Discovery Accelerator Retrieval Task

Upon completion of the archival test for the 35 LoadSim-generated and 5 Mailsend.exe-generated e-mails, we
enabled 40 additional LoadSim-generated mailboxes per EV server and initiated a second archival run—a net
new archival test. While these Enterprise Vault tasks were running, we launched a concurrent Enterprise
Vault Discovery Accelerator process to retrieve the 40,000 unique e-mails and place them in a PST file. We
again measured the performance impact to archival throughput using the EV Archival Throughput tool as
reported on the EV saveset SQL table. We did not fail drives during this test. We have merely maintained
the results of the single-drive failure test for reference. All tests were mutually exclusive. In this particular test,
the goal was to measure the impact of the discovery process on archival performance.

During Discovery Retrieval Task
MB/Minute Time (HH:MM)
12:00
300
11:00
10:00
250
9:00
8:00
200
7:00
6:00
150
5:00
4:00
100
3:00
2:00
50
1:00
0:00
0
Baseline Single Drive Failure Dual Drive Failure Discovery
! "#$%$ '() ) *+ ' - ) 9

During the Discovery Accelerator search and retrieval of the 40,000 unique e-mails, the archival rate for the
FAS3050 was unaffected. We did see that the R200 was down 13% from its baseline archival performance.
The Centera G4 fell 15% from its baseline. Relative to each other, we found that during the discovery
process, the FAS3050 ingested content at a rate 62.5% greater than that of the Centera G4. On average, the
R200 archival throughput rate was 48.7% faster than the Centera. The discovery process itself ran well
across all systems, completing in 1 hour 30 minutes on the Centera; 1 hour 55 minutes on the FAS3050; and
2 hours 20 minutes on the R200.
Rebuild EV Indexes During Archival Throughput

To measure the impact on archival throughput during an index rebuild, we failed one index on the Enterprise
Vault server by running the “indexvolumereplay” utility. The performance impact was measured using the
Enterprise Vault Archival Throughput as reported on the EV saveset SQL table. Once again, we did not fail
drives during this test. The goal was to measure the impact of an index rebuild on archival performance.
During this test, we observed the largest impact to the FAS3050 in any test to this point; its performance
dropped 22% from baseline archival rate. Likewise, the R200 archival rate dropped to its lowest level, off 52%
from its baseline archival rate. The R200 showed an archival rate 18% below that of the Centera G4 during
this index rebuild test. The Centera G4 dropped 17% from its baseline. Relative to the FAS3050, the
FAS3050 maintained a 30% performance advantage over the Centera G4. Another way to look at this is that
the FAS3050 during an index rebuild statistically outperformed the Centera G4 baseline.
The following figure summarizes all measured archival rates, including the index rebuild test.

During an Index Rebuild
MB/Minute
300
250
200
150
100
50
0 R200 FAS3050 Centera
Baseline 275 261 189
Single Drive Failure 257 265 147
Dual Drive Failure 176 251 112
Discovery 237 260 160
Index Rebuild 132 203 156
! "#$%$ '() ) *+ ' - ) 5

50% Deployment: Four Enterprise Vault Servers with Four Exchange
Servers
For a more moderate test, we cut the test environment in half, using four Microsoft Exchange 2003 servers
and four Enterprise Vault servers. For this series of tests, we compare the results for the FAS3050 with the
results for the Centera G4.
Establish Baseline Enterprise Vault Archival Throughput

To establish a baseline for archival throughput, we enabled 35 LoadSim-generated mailboxes and 5
Mailsend.exe-generated unique mailboxes in Enterprise Vault for each corresponding Exchange Server. We
monitored the archival throughput by capturing the archival rate as reported in the saveset SQL table. We
monitored this rate for 60 minutes to determine the average baseline archival throughput.
Monitoring the archival process over the course of one hour, we found that the FAS3050 averaged 157
MB/minute, while the Centera G4 ingested content at a rate of 116MB/minute, putting the FAS3050 ingestion
rate at a level 35% greater than that of the Centera G4. Moving forward, we considered this the baseline
archival rate for each platform.
Baseline Enterprise Vault Archival Rates

50% Deployment
MB/Minute
300
250
200
150
100
50
FAS3050 Centera
Single-Disk Failure
In order to test a drive failure condition, we physically pulled a single-disk drive from the drive shelf enclosures
on the FAS3050. We powered off a single storage node (vs. an access node) on the Centera G4. To
determine the performance impact on each system, we measured archival rates using EV Archival
Throughput as reported on the Enterprise Vault saveset SQL table.
As in the 100% deployment scenario, a single-drive failure on the FAS3050 showed no impact on its overall
archival performance. A storage node failure on the Centera showed a subsequent drop in archival
performance of 11%. The FAS3050 ingested content at a rate 54% greater than the Centera G4.
! "#$%$ '() ) *+ ' - ) :

Single Drive Failure Test
MB/Minute
300
250
200
150
100
50
0
FAS3050 Centera
Baseline Single Drive Failure
Double-Disk Failure
We then conducted a dual-drive failure test for the 50% deployment test environment. To determine
performance impact to each system, we again measured archival rates using EV Archival Throughput as
reported on the Enterprise Vault saveset SQL table.
In a two-drive failure scenario, the FAS3050 maintained an archival rate within 5% of its baseline. With the
failure of two storage nodes, the Centera experienced a 12% decrease in performance relative to its baseline.
Relative to each other, the FAS3050 archived e-mail 48% faster than the Centera G4.

Dual Drive Failure Test
MB/Minute
300
250
200
150
100
50
0
FAS3050 Centera
Baseline Single Drive Failure Dual Drive Failure
! "#$%$ '() ) *+ ' - ) $

Enterprise Vault Archival Throughput During Discovery
Upon completion of the archival test for the 35 LoadSim-generated and 5 Mailsend.exe-generated emails, we
enabled 40 additional LoadSim-generated mailboxes per EV server and initiated a second archival run. While
these Enterprise Vault tasks were running, we launched a concurrent Enterprise Vault Discovery Accelerator
process to retrieve the 20,000 unique e-mails and place them in a PST file. We again measured the
performance impact to archival throughput using the EV Archival Throughput tool as reported on the EV
saveset SQL table. We did not fail drives during this test. The goal was to measure the impact of the
discovery process on archival performance.

During Discovery Retrieval Task
MB/Minute Time (HH:MM)
12:00
300
11:00
10:00
250
9:00
8:00
200
7:00
6:00
150
5:00
4:00
100
3:00
2:00
50
1:00
0:00
0
FAS3050 Centera
Baseline Single Drive Failure Dual Drive Failure Discovery
During the Discovery Accelerator search and retrieval of the 20,000 unique e-mails, the archival rate for the
FAS3050 fell 7% from its baseline rate, dropping from 157MB/minute to 145-MB/minute. The Centera G4 fell
20% from its baseline. Relative to each other, we found that during the discovery process the FAS3050
ingested content at a rate 58% greater than that of the Centera G4. The discovery process itself ran well for
both systems, completing in 57 minutes on the FAS3050 and 1 hour 5 minutes on the Centera.
Rebuild EV Indexes During Archival Throughput

To measure the impact on archival throughput during an index rebuild, we failed one index on the Enterprise
Vault server by running the “indexvolumereplay” utility. The performance impact was measured using the
Enterprise Vault Archival Throughput tool as reported on the EV saveset SQL table. Once again, we did not
fail drives during this test. The goal was to measure the impact of an index rebuild on archival performance.
The index rebuild dropped the content archival rate 24%, from 157MB/minute to 118MB/minute. Once again,
by way of comparison, the FAS3050 during an index rebuild statistically outperformed the Centera G4
baseline. The Centera G4 archival performance dropped 14% during the index rebuild. Relative to the
FAS3050 performance during its index rebuild, the Centera G4 was 18% slower.
The following figure summarizes all measured archival rates in each test.
! "#$%$ '() ) *+ ' - )

During an Index Rebuild
MB/Minute
300
250
200
150
100
50
0 FAS3050 Centera
Baseline 157 116
Single Drive Failure 157 103
Dual Drive Failure 149 101
Discovery 145 92
Index Rebuild 118 100
! "#$%$ '() ) *+ ' - )

Testing methodology
For testing purposes, we used the following assumptions and facts:
o Enterprise Vault Journaling was turned off for all testing.
o All tests were performed in a serial fashion for each storage product.
o Each storage platform had dedicated access to the test environment during its test runs.
o For our test scenarios, we used a LoadSim-generated workload consisting of 80.4 million e-mails using
the default LoadSim message mixture.
o We stored Microsoft Exchange and Microsoft SQL Server data on Fibre Channel attached FAS3050 and
FAS3020 storage appliances.
o During the 100% deployment test, eight Enterprise Vault servers archived e-mail messages from eight
Microsoft Exchange servers to the storage targets.
o 100% load for all storage targets was considered to be 560,000 e-mails per hour.
o 50% load for all storage targets was considered to be 280,000 e-mails per hour.
For testing purposes, we created a typical medium-sized enterprise Microsoft Exchange e-mail environment
with 80+ million e-mails, spread evenly across 15,000 mailboxes, and archived a percentage of those e-mails
to three types of NAS storage solutions, using the Symantec Enterprise Vault application. We then measured
the archival throughput performance of Enterprise Vault for each of the NAS storage devices while the
archival tasks were running. Each test was conducted independently. We generated two deployment levels,
based upon differing numbers of available Enterprise Vault and Exchange servers.
We first deployment level consisted of four Enterprise Vault Servers and four Microsoft Exchange 2003
Servers configured in a one-to-one relationship. This deployment is referred to as the 50% deployment level.
The second deployment level consisted of eight Enterprise Vault servers and eight Microsoft Exchange 2003
servers, also configured in a one-to-one relationship. This deployment is referred to as the 100% deployment
level.
For the Microsoft Exchange servers, 15,000 users were generated using the LoadSim 2003 application to
populate the Exchange data evenly across the available Exchange Servers. We then created five unique
mailboxes per Exchange server, and used the mailsend.exe utility to generate 1,000 unique e-mails for each
of those accounts. Therefore, for the 50% testing there were 20,000 unique e-mails available for Discovery
Accelerator search and retrieval, and 40,000 unique e-mails available at the 100% level.
We started by determining the baseline throughput when only archival tasks were executing. We then
introduced scenarios that one would expect to encounter in a real-world deployment, and measured the
Enterprise Vault archival throughput during those situations. Each deployment was configured with a single
Windows 2003 Active Directory server to manage the AD domain and a single SQL Server 2000 to contain
the database information for the Enterprise Vault application. Further, we broke the Enterprise Vault archival
tasks into two rounds of execution.
The first round archived e-mail for 35 LoadSim-generated mailboxes per Exchange Server, along with the 5
unique mailboxes on each Exchange Server. For this first round of archiving tasks, we gave Enterprise Vault
45 minutes to ramp up. We then measured for 60 minutes to determine the baseline archival throughput. To
simulate a single-disk failure, we removed a single-parity disk from the active aggregate on the NetApp NAS
devices, and powered off a single storage node on the Centera G4. We then measured the Enterprise Vault
archival throughput for 60 minutes to monitor for any performance change. We then simulated a double-disk
failure by removing a second disk in the aggregate on the NetApp NAS devices, and by powering off a
second storage node on the Centera G4. Again, we measured the archival throughput to monitor for any
change in performance. We then reinserted the disks on the NetApp NAS devices, powered on the downed
nodes on the Centera G4, and waited for the archival tasks to finish before moving on to the next scenario.
A second Enterprise Vault archiving task was executed using 40 newly enabled LoadSim-generated
mailboxes for each Exchange server. We again allowed Enterprise Vault 45 minutes to ramp up, and
monitored for 60 minutes to verify the baseline archival throughput rate. We then executed a Discovery
Accelerator search and retrieval to retrieve the unique e-mails previously archived, while the current archiving
! "#$%$ '() ) *+ ' - ) #

tasks were running. We then monitored for any change in archival throughput performance during the retrieval
process. We also measured the length of time it took Discovery Accelerator to retrieve the unique messages.
Once the DA retrieval was complete, we used the Symantec IndexVolumeReplay utility to rebuild a mailbox
index, previously archived, for each Enterprise Vault server. Again, we monitored for any change in archival
throughput.
Server performance impact was measured through Perfmon, to monitor the relative health and performance
of the Enterprise Vault, Exchange, SQL, and domain controller servers. During the Centera G4 testing, the
Centera Viewer utility was used to monitor the performance of the G4. During the FAS3050 and R200
®
testing, the native Data ONTAP command sysstat was used to monitor the performance of the
FAS3050/R200 devices. To monitor the performance of the Brocade 3800 Silkworm FCP switch, we ran the
portperfshow command continuously to real-time monitor the throughput for each Exchange server to
verify that there were no performance bottlenecks coming from the Exchange storage. We used the NetGear
administration GUI to monitor the Gigabit Ethernet switch to make sure that no errors or performance
bottlenecks were experienced on that device as well.
The following code shows the SQL query used to retrieve the EV archival throughput rate.
select "Archived Date" = cast (archiveddate as smalldatetime),
"Hourly Rate" = count (*)*60,
"Av Size" = sum (itemsize)/count (*)
from saveset
group by
cast (archiveddate as smalldatetime)
order by
"Archived Date"
desc
! "#$%$ '() ) *+ ' - ) +

Network Network EMC Centera
Appliance Appliance 4 Access/12
FAS3050 R200 Storage
24 250GB 48 320GB Nodes
SATA SATA 48 500GB
4 Gbe 4 Gbe SATA
Ethernet Ethernet 4 Gbe
Ethernet
! "#$%$ '() ) *+ ' - ) %

Detailed Test Execution
Performance Test Case #1: Baseline Throughput, 100% Deployment
• Enterprise Vault was configured to enable 35 LoadSim-generated mailboxes per EV server, as well
as 5 unique mailboxes per EV server.
• Microsoft Performance Monitor was executed on each of the EV, Exchange, and SQL servers.
• The Data ONTAP sysstat command was executed on the Exchange storage heads.
• The portperfshow command was executed on the Brocade 3800 FCP switch.
• The NetGear GUI was launched to monitor the Gigabit Ethernet switch.
• We logged into each EV, Exchange, SQL, and AD server via the Admin interface to execute Task
Manager for real-time performance monitoring and Event Viewer monitoring.
• During Centera G4 testing, the Centera Viewer and Centera Monitor GUIs were launched over the
crossover connection from EV Server 2
• During FAS3050 and R200 testing, the sysstat command was executed on each storage head to
monitor its health.
• Once all monitors were up, the EV Archiving Tasks were executed.
• During execution, our test engineer monitored the MSMQ private message queues for EV to verify
that the messages were being processed.
• The test engineer ran the “archrate.sql” query on each of the EV databases on the SQL server to
capture the EV archival throughput rate for each EV server. This data was then aggregated to show a
cumulative throughput rate being processed.
• For the first 45 minutes, the archiving rate was not recorded for testing purposes to allow EV to ramp
up.
• After 45 minutes, the next 60 minutes of “archrate.sql” output was captured for each server and
recorded on an Excel spreadsheet.
Performance Test Case #2: Single-Disk Failure, 100% Deployment

• After the baseline rate was determined, the test engineer initiated a single-disk failure scenario.
• For the Centera G4 testing, this was accomplished by powering off the top-most storage node in the
array.
• For the FAS3050 and FAS3020 devices, a single disk (parity) was pulled from the active aggregate in
the array.
• EV archival throughput rate was captured for 60 minutes using the “archrate.sql” query.
Performance Test Case #3: Double-Disk Failure, 100% Deployment

• After test case #2 was complete, the test engineer introduced a double-disk failure.
• For the Centera G4 testing, this was accomplished by powering off a second storage node (the
second storage node from the top).
• For the FAS3050 and R200, this was accomplished by removing a second disk (data) from the active
aggregate in the array.
• EV archival throughput rate was captured for 60 minutes using the “archrate.sql” query.
• EV tasks were allowed to finish.
• Disk reconstruction and node on-lining were allowed to finish for each type of storage device.
Performance Test Case #4: DA Search and Retrieval, 100% Deployment

• Enterprise Vault was configured to enable a different set of 40 LoadSim-generated mailboxes per EV
server.
• Microsoft Performance Monitor was executed on each of the EV, Exchange, and SQL servers.
• The Data ONTAP sysstat command was executed on the Exchange storage heads.
• The portpershow command was executed on the Brocade 3800 FCP switch.
• The NetGear GUI was launched to monitor the Gigabit Ethernet switch.
• We logged into each EV, Exchange, SQL, and AD server via the Admin interface to execute Task
Manager for real-time performance monitoring and Event Viewer monitoring.
! "#$%$ '() ) *+ ' - ) 2

• During Centera G4 testing, the Centera Viewer and Centera Monitor GUIs were launched over the
crossover connection from EV Server 2
• During FAS3050 and R200 testing, the sysstat command was executed on each storage head to
monitor its health.
• Once all monitors were up, the EV Archiving Tasks were executed.
• During execution, the test engineer monitored the MSMQ private message queues for EV to verify
that the messages were being processed.
• The test engineer ran the “archrate.sql” query on each of the EV databases on the SQL server to
capture the EV archival throughput rate for each EV server. This data was then aggregated to show a
cumulative throughput rate being processed.
• For the first 45 minutes, the archiving rate was not recorded for testing purposes to allow EV to ramp
up.
• After 45 minutes, the next 60 minutes of “archrate.sql” output was captured for each server and
recorded on an Excel spreadsheet.
• The test engineer initiated a Discovery Accelerator search and retrieval to retrieve the unique e-mails
to one or more PST files on one of the EV servers. We used EV Server 4 (dell2) to perform this
operation.
• While the DA retrieval was occurring, the test engineer captured the EV archival throughput rate
using the “archrate.sql” query. This data was captured until the DA retrieval stage completed. We did
not monitor the time taken to merge the MSG files to the PST file because we were not interested in
measuring the local disk performance of the EV server.
• The test engineer also recorded the length of time it took the DA retrieval portion to complete.
Performance Test Case #5: EV Index Rebuild

• While the EV archiving tasks were still running, the IndexVolumeReplay tool was used to generate an
index rebuild for one index per EV server.
• While the index rebuild was occurring along with the archival tasks, the test engineer used the
“archrate.sql” query to capture the EV archival throughput rate for 60 minutes.
! "#$%$ '() ) *+ ' - ) 9

Appendix A: Test Bed Disclosures
• The maxdirsize value was modified on the FAS3050 from the default 1% of RAM in KB to 1.4% to
allow all e-mail from each round of EV archiving tasks to complete. Pre-testing determined that this
value did not affect the FAS3050 performance either positively or negatively.
• Each server in the test bed had an administration interface configured on a separate subnet, which
was used to remotely log in via RDP to monitor the health of the servers and to execute necessary
applications.
• For EV index storage, the FAS3050 used for Exchange storage (fas3050-01) had a volume created
and shared out via FCP to the SQL Server. The SQL Server used Windows sharing to share out the
LUN to each of the EV servers via the Gigabit Ethernet data network.
! "#$%$ '() ) *+ ' - ) 5

Appendix B: System disclosures
Network Appliance FAS3020 -01 Exchange Storage Solution - SATA
Storage Unit Network Appliance FAS3020
Disk Drives 23 – 7200 RPM SATA 150 (with 2 spares)
Base Software version Network Appliance Data ONTAP Release 7.1
Table 1. Network Appliance FAS3020-01 Disclosure Information
Network Appliance FAS3050 -01 Exchange Storage Solution - FC

Aggregate 1 - 77 10K RPM SCSI – (with 3 spares)
Disk Drives Aggregate 2 - 13 15K RPM SCSI – (with 1 spare)
Base Software Version Network Appliance Data ONTAP Release 7.1
Network Appliance FAS3050-02 Enterprise Vault Storage (250GB Drives)
Aggregate 1 - 24 7200 RPM SATA 150 (with 2 spares)
Disk Drives
Table 3. Network Appliance R200 Disclosure Information
Network Appliance R200 Enterprise Vault Storage (320GB Drives)

Storage Unit Network Appliance R200
Disk Drives Aggregate 1 - 48 7200 RPM SATA 150 (with 2 spares)
EMC Centera G4 Storage
Four access nodes and twelve storage nodes
Storage Unit Two Allied Telesyn AT9924T Managed Gigabit Switches
Access Nodes Four - 2.8GHz Intel Prescott processors/1GB RAM.
Storage Nodes Twelve - 2.8GHz Intel Prescott processors/1GB RAM.
Base Software Version 3.1.0-956-654-12195
Table 5. EMC Centera G4 Disclosure Information
Dell Computer Corporation PowerEdge 2650

Exchange Server (2)
SQL Server (1)
Domain Controller/Active Directory Server (1)
Enterprise Vault (1)
Processor / Speed / # of CPUs Dual Intel 2.8Ghz Xeon, 533MHz
System RAM / Type 4GB (PC600 RDRAM)
Network Adapter 2- Intel 82544 PRO/1000 XT Gigabit Ethernet Controller
OS Microsoft Windows 2003 Enterprise Edition
2 x Qlogic 2340 version 9.0.1.12 (SCSI miniport)
Installed HBAs Driver Date: 10/10/2004
Table 6. Dell PowerEdge 2650 Servers Used for Exchange, SQL, and Domain Controller/Active
Directory
! "#$%$ '() ) *+ ' - ) :

Dell Computer Corporation PowerEdge 2850
Enterprise Vault (2)

System RAM / Type 4GB (PC600 RDRAM)
Network Adapter 2- Intel 82544 PRO/1000 XT Gigabit Ethernet Controller
Table 7. Dell PowerEdge 2850 Servers Used for Enterprise Vault
Hewlett Packard ProLiant DL360 G3

Enterprise Vault Server (2)

System RAM / Type 4GB PC 2100 DDR
Network Adapter Dual HP NC7781 PCI-X Gigabit server adapters (embedded)
Table 8. HP ProLiant DL360 Servers Used for Enterprise Vault

System RAM / Type 4GB PC 2100 DDR
Network Adapter Dual HP NC7781 PCI-X Gigabit server adapters (embedded)
Table 9. HP ProLiant DL380 G3 Servers Used for Enterprise Vault
Exchange Server (2)

System RAM / Type / # of Slots 4GB (PC600 RDRAM)
Network Adapter Dual port NC7782 Gigabit NIC
Table 10. Hewlett Packard DL380 G4 for Exchange, SQL, and Domain Controller/Active Directory
Networking Equipment
Brocade SilkWorm 3800
Fibre Channel Switch Firmware v3.1.3
Gig-E Non Blocking Managed
Switch
NetGear GS748T
Table 11. Networking Equipment Used
! "#$%$ '() ) *+ ' - ) $

Appendix C: Application Data Integrity with RAID-DP™
The dramatic increase in disk sizes, the relatively slower growth in disk bandwidth, the construction of disk
arrays containing larger numbers of disks, and the use of less reliable and poorer performing varieties of disk
such as ATA combine to increase the rate of double-disk failures. The use of algorithms can protect against
double-disk failures and ensure adequate data integrity. For a complete discussion of algorithms that protect
against two-disk failures, refer to the Usenix publication “Row-Diagonal Parity for Double Disk Failure
Correction” http://www.usenix.org/publications/library/proceedings/fast04/tech/corbett/corbett.pdf (Peter
Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar).
System Reliability Event FC1 SATA2

Typical Disk Drive Replacements/Year
1–3 2–5
(per year per 100 drives)
Bit Error Likelihood

0.2% 4.0%
(per spindle, based on typical disk drive specs)
Bit Error Likelihood – Single Parity

1.4% 28.0%
(per reconstruction of an 8-drive RAID 4 or 5 set)
Bit Error Likelihood – Dual Parity

< 1 in 30 million
(per reconstruction of an 16-drive RAID-DP set)
1 Bit error rates for a 10K rpm 300GB FC drive

2
Bit error rates for a 7200 rpm 500GB SATA drive
RAID-DP is needed to ensure primary application data availability.

SATA drive failure rates are estimated at up to double FC over a three- to five-year service life.
Drive failures result in RAID reconstructions, and SATA reconstructions occur more often.
An unrecoverable drive error during a RAID reconstruction results in data loss.
Four reconstructions per year = ~100% chance of data loss with 500GB SATA and RAID 5.
• Disk drive replacement: The failure of a disk drive in a single-parity RAID group presents a substantial risk
of data loss during the RAID rebuild/reconstruction process. Some disk replacements result from transient
drive, array, or loop problems and thus are not hard failures; but any event that causes a RAID rebuild
creates a risk. How often drives are replaced is typically a function of many factors, including physical
environment, I/O workload, drive age, drive production batch, drive design generation, etc. Experience to
date shows that SATA drives can be every bit as reliable as FC drives when they are used for their
appropriate workloads, but there is some indication that SATA drive failure rates may increase more than
FC as the drives age. More important than debating how often one or the other drive type fails is the
understandings that drive failures will occur, thus presenting risk of data loss without double-parity
protection.
• Bit error likelihood: Typical non-recoverable (or uncorrectable) error rate (UER) specs are 1 error in 10^15
bits read for FC drives, and 1 in 10^14 for SATA drives.
Note:
• The specs assume that full data recovery procedures are available, but these entail more read retries
than RAID array system timeout limits allow, which may result in not achieving the spec error rate (that is,
a higher incidence of errors may result). The 0.2% rate applies to 300GB FC drives; it is the number of
bits per drive divided by the UER spec of 1 in 10^15. The same calculation for a 500GB SATA drive with
a UER spec of 1 in 10^14 results in a 4% rate.
! "#$%$ '() ) *+ ' - )

• Single-parity RAID bit error likelihood: The product of the individual drive error likelihood times the seven
drives being read during rebuild of a 7+1 single-parity RAID group.
• Double-parity RAID bit error likelihood: A more complex calculation that incorporates the probability of a
second drive failing during RAID reconstruction of another drive in the same 16-drive RAID-DP group and
a bit error occurring while both drives are in rebuild.
The probability of two disks failing simultaneously:
The rate of single-disk failures per year squared times the reconstruction time in years times the
number of two-disk combinations in the array.
This result is (0.029 failures/year)^2 x 7.9E-4 years x 16*15/2 combinations, which equals 0.08E-
3 failures per year.
The probability of another media error in the array is the number of bits left to read times the error
rate per bit: 500GB*8 * 14 * 1E-14 = 0.56
Multiplying, the result is 0.08E-3 * 0.56 = 0.045E-3 data loss failures per year.
This is quite a lot higher than 1 in 30 million, so it is a conservative estimate; however, it is a
significant improvement in data protection relative to single-parity RAID.
• The rate of single-disk failures in a 16-disk single-parity RAID group:

0.029 * 16 = 0.464
The probability of a bit error during reconstruction is 500GB*8 * 15 * 1E-14 = 0.60
This gives a total failure rate of 0.278 data loss failures per year, which is 6,186 times higher than the
RAID-DP rate. This ratio is consistent with the claim that RAID-DP gives about 1,000 to 10,000 times
better data protection than single-parity protection does.
• Bit error rate sources:

• http://www.hitachigst.com/hdd/support/7k500/7k500.htm
• http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah10k.7.pdf
; $$2 " -< -

4 = "
& 4,4
>" - ( ? (
) "0 "0 )
- -
-
! "#$%$ '() ) *+ ' - )

NetApp FAS3050-R200 EMC Centera Report

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

NetApp FAS3050-R200 EMC Centera Report

Încărcat de

Drepturi de autor:

Formate disponibile

July 2006

$$3 4 0 - '/ ' -

%$3 4 0 - - '/ - ' -

Exchange 2003 servers. All

! "#$%$ '() ) *+ ' - )

50% Deployment 100% Deployment

! "#$%$ '() ) *+ ' - ) #

Establish Baseline Enterprise Vault Archival Throughput

Baseline Enterprise Vault Archival Rates

R200 FAS3050 Centera

! "#$%$ '() ) *+ ' - ) +

o Shut down the storage node

! "#$%$ '() ) *+ ' - ) %

Enterprise Vault Archival Ingestion Rates

Baseline Single Drive Failure

! "#$%$ '() ) *+ ' - ) 2

Enterprise Vault Archival Throughput During a Discovery Accelerator Retrieval Task

Enterprise Vault Archival Ingestion Rates

! "#$%$ '() ) *+ ' - ) 9

Rebuild EV Indexes During Archival Throughput

Enterprise Vault Archival Ingestion Rates

! "#$%$ '() ) *+ ' - ) 5

Establish Baseline Enterprise Vault Archival Throughput

Baseline Enterprise Vault Archival Rates

! "#$%$ '() ) *+ ' - ) :

Baseline Single Drive Failure

Enterprise Vault Archival Ingestion Rates

! "#$%$ '() ) *+ ' - ) $

Enterprise Vault Archival Ingestion Rates

Rebuild EV Indexes During Archival Throughput

! "#$%$ '() ) *+ ' - )

! "#$%$ '() ) *+ ' - )

! "#$%$ '() ) *+ ' - ) #

! "#$%$ '() ) *+ ' - ) +

! "#$%$ '() ) *+ ' - ) %

Performance Test Case #2: Single-Disk Failure, 100% Deployment

Performance Test Case #3: Double-Disk Failure, 100% Deployment

Performance Test Case #4: DA Search and Retrieval, 100% Deployment

! "#$%$ '() ) *+ ' - ) 2

Performance Test Case #5: EV Index Rebuild

! "#$%$ '() ) *+ ' - ) 9

! "#$%$ '() ) *+ ' - ) 5

Network Appliance FAS3050 -01 Exchange Storage Solution - FC

Network Appliance R200 Enterprise Vault Storage (320GB Drives)

Dell Computer Corporation PowerEdge 2650

! "#$%$ '() ) *+ ' - ) :

Processor / Speed / # of CPUs Dual Intel 3.0Ghz Xeon, 800MHz

Hewlett Packard ProLiant DL360 G3

Processor / Speed / # of CPUs Dual Intel 2.8Ghz Xeon, 400MHz

Processor / Speed / # of CPUs Dual Intel 2.8Ghz Xeon, 400MHz

Processor / Speed / # of CPUs Dual Intel 3.2Ghz Xeon, 800MHz

Table 11. Networking Equipment Used

! "#$%$ '() ) *+ ' - ) $

System Reliability Event FC1 SATA2

Bit Error Likelihood

Bit Error Likelihood – Single Parity

Bit Error Likelihood – Dual Parity

1 Bit error rates for a 10K rpm 300GB FC drive

RAID-DP is needed to ensure primary application data availability.

! "#$%$ '() ) *+ ' - )

The probability of two disks failing simultaneously:

• The rate of single-disk failures in a 16-disk single-parity RAID group:

• Bit error rate sources: