Sunteți pe pagina 1din 11

White Paper

PERFORMANCE AND TUNING TIPS FOR EMC® DOCUMENTUM® FOUNDATION SERVICES CONTENT TRANSFER

AAbAAbbstractbstractstractstract

This white paper explains the options of transferring content over Documentum Foundation Services (DFS). It demonstrates how different factors would impact the performance by the experiment results. This document summarizes some useful tuning tips to improve the content transfer performance.

September 2011

Copyright © 2011 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

TableTableTableTable ofofofof ContentsContentsContentsContents

ExecutiveExecutiveExecutiveExecutive summarysummarysummarysummary

Audience

4444

4

IntroductionIntroductionIntroductionIntroduction

4444

RecommendationsRecommendationsRecommendationsRecommendations

5555

#1 Use Base64 Transfer Mode Only for Small Files

5

#2 Use UCF to Optimize Large File Transmission

6

#3 Allocate Sufficient JVM Heap Size

7

#4 Re-use ActivityInfo to Avoid Creating New UCF Connections

8

#5 Use DataPackage to Transfer Multiple DataObject Instances

9

#6 Optimize UCF Server Configuration

9

ConclusionConclusionConclusionConclusion

10101010

ReferencesReferencesReferencesReferences

11111111

ExecutiveExecutiveExecutiveExecutive summarysummarysummarysummary

EMC Documentum Foundation Services (DFS) supports standard web services transfer modes (Base64 and MTOM), as well as proprietary technologies (UCF and ACS) that optimize transfer of content in a distributed environment. The performance in DFS content transmission is a function of a combination of many different factors. It depends on Content Server, Java Virtual Machine configurations, and the use of DFS consumer.

This white paper explains how these setting configurations would impact the performance by the results of a series of tests. It provides some useful tuning tips and recommendations to improve the content transfer performance.

AudienceAudienceAudienceAudience

This white paper is intended for application developers using Documentum Foundation Services (DFS). It assumes that the readers possess a basic knowledge of DFS.

The paper focuses on content transfer performance of DFS and provides a few tips for optimization based on the experiment results. The reader is recommended to refer to the Documentum Foundation Classes Guides for more information.

IntroductionIntroductionIntroductionIntroduction

The performance in DFS content transmission is a function of a combination of many different factors. The purpose of this document is to provide the reader with some basic guidance and recommendations in improving performance of content transfer over DFS. A series of performance tests were executed in our performance laboratory in order to figure out how different configuration settings may impact the performance metrics. Several recommendations are made, based on the test results we observed.

The results presented in this document are collected in an internal test environment where Oracle database, Content Server and DFS server were deployed on separate VMWare virtual machines, as shown in Figure 1. Each virtual machine was allocated with 4 CPUs and 8 GB memory and virtually located in a LAN environment with a 1 Gbps network interface. A software network simulator was used in DFS Client to model different network conditions. If not specified, Java productivity layer was used to build a DFS consumer program but the recommendations should also apply to the .NET case.

As each customer environment is different, not all recommendations will have the same effect as observed in these tests, but they have been documented to provide options when optimizing a DFS solution.

FiFiFiFi guregureguregure 1111 TestTestTestTest environmentenvironmentenvironmentenvironment diagramdiagramdiagramdiagram

FiFiFiFi guregureguregure 1111 TestTestTestTest environmentenvironmentenvironmentenvironment diagramdiagramdiagramdiagram

RecommendationsRecommendationsRecommendationsRecommendations

#1#1#1#1 UseUseUseUse Base64Base64Base64Base64 TransferTransferTransferTransfer MMMM

odeodeodeode OnlyOnlyOnlyOnly forforforfor SmallSmallSmallSmall FilesFilesFilesFiles

DFS supports standard WS t ransfer modes, both Base64 and MTOM. In B ase64 mode,

the binary data is converted

The experiment results in Fi gure 2 show that Base64 is pretty efficient in transferring

small files in LAN environm ent (1Gbps bandwidth, 0ms latency). The left chart compares the response tim e performance of a single thread uploading a nd the right chart compares the through put performance of uploading a batch of files with multiple threads.

The performance of Base64 mode is comparable with MTOM mode for th e file size

less than 100KB. In fact, in the test environment, it is even slightly more

transferring 10KB files as th ere is some overhead in encoding/decoding MTOM.

to characters which are embedded into SOA P envelope.

efficient in

which are embedded into SOA P envelope. efficient in FigureFigureFigureFigure 2222
which are embedded into SOA P envelope. efficient in FigureFigureFigureFigure 2222

FigureFigureFigureFigure 2222 ComparisoComparisoComparisoCompariso nnnn ofofofof thethethethe smallsmallsmallsmall filefilefilefile uploadinguploadinguploadinguploading performanceperformanceperformanceperformance oveoveoveove rrrr LANLANLANLAN

The disadvantage of Base6 4 is the content is expanded up to 1.3 times w hich requires more CPU time, me mory usage and bandwidth for the data trans mission. Therefore, the performance is significantly degraded with large files. Figu re 3 shows

that the response time with

compared with MTOM. As e xpected, when using multiple threads to uplo ad a batch of

files, MTOM could achieve a

phenomena will occur as th e thread number increases. This is because p rocessing

Base64 content consumes

JVM garbage collection will decrease the system throughput under heavy loads.

Base64 was nearly doubled to upload a 100 MB file

much higher throughput. Besides, throughp ut thrashing

a large amount of memory on the server side. The frequent

a large amount of memory on the server side. The frequent FigureFigureFigureFigure 3333

FigureFigureFigureFigure 3333 ComparisoComparisoComparisoCompariso nnnn ofofofof thethethethe largelargelargelarge filefilefilefile uploadinguploadinguploadinguploading performanceperformanceperformanceperformance oveoveoveove rrrr LANLANLANLAN

#2#2#2#2 UseUseUseUse UCFUCFUCFUCF totototo OptimizeOptimizeOptimizeOptimize LLLL argeargeargearge FileFileFileFile TransmissionTransmissionTransmissionTransmission

Unified Client Facilities (UC F) is a remote content transfer application, an d is available in the productivity layer in J ava (remote mode only) and also in .NET. UCF provides a series of performance optim izations for content transmission. UCF will co mpress, by default, the content and en able direct content transfer between the clien t machine

and a Content Server host,

server and the DFS server, a nd then on to the client machine, which is re quired for the other transfer mechanisms.

Figure 4 compares the uplo ading response time of MTOM mode and UCF mode. Note that in order to provide a by te-by-byte comparison, the UCF server was co nfigured not to compress the file conten t before transmission, so the actual transferre d bytes are same for both MTOM and U CF. In normal UCF configuration, the benefit m ight be even larger.

The DFS client will upload t he file data to the Content Server directly rath er than through DFS server. Therefo re, using UCF is effective in reducing the end- to-end response time for uploadin g a large file.

as opposed to transferring the content from th e content

FiguFiguFiguFigu rrrreeee 4444 ComparisonComparisonComparisonComparison ofofofof MTOMMTOMMTOMMTOM andandandand

FiguFiguFiguFigu rrrreeee 4444 ComparisonComparisonComparisonComparison ofofofof MTOMMTOMMTOMMTOM andandandand UCFUCFUCFUCF

For the small files, we have seen that MTOM will have a better response t ime

performance than UCF, as s howed in Figure 5. Although the content must

transferred between Conten t Server and DFS server in MTOM, such cost i s quite small since the servers are usuall y deployed in a LAN environment. On the othe r side, the client will make extra comm unications with the servers to establish the c onnection before the UCF transmission . This will contribute large portion of the resp onse time, especially when the networ k latency is high between the client and the s erver.

be

k latency is high between the client and the s erver. be FigureFigureFigureFigure 5555

FigureFigureFigureFigure 5555

ImportingImportingImportingImporting aaaa 100KB100KB100KB100KB filefilefilefile (MTOM(MTOM(MTOM(MTOM vs.vs.vs.vs. UCF)UCF)UCF)UCF)

#3#3#3#3 AllocateAllocateAllocateAllocate SufficientSufficientSufficientSufficient JVMJVMJVMJVM

HeapHeapHeapHeap SizeSizeSizeSize

In general, the memory usa ge of DFS server is increased with the growth

number of active threads. T herefore, frequent garbage collection will dec rease the throughput and under heav y load, the memory requirements may be larg er than the

amount allocated to the JVM crashes.

For example, the maximum

set to 256MB by default, wh ich might be too small to effectively handle

requests simultaneously. It might be necessary to tune the JVM heap size situations.

The lack of memory issue m ost likely occurs when using BASE64 or MTO M in the .NET client to transfer large files. As the content data is encoded within a SOA P message in

of the

, resulting in OutOfMemoryError exceptions a nd/or JVM

heap size of the DFS bundled with the Conten t Server is

many

in these

BASE64, the DFS server will allocate several buffers to hold the received and the decoded content. For MTOM, the .NET client will use the buffered transfer mode provided by WCF, which means the entire content will be buffered in memory before transfer. These will result in unusually high memory usage, especially in transferring large content payloads.

The optimal JVM memory configuration settings depend on the system overall workload and has to be tuned case by case. As rule of thumb, we recommend set JVM heap size to 1,024 MB and make sure that both DFS server and client JVM run with the enough memory.

#4#4#4#4 ReRe-ReRe--use-useuseuse ActivityInfoActivityInfoActivityInfoActivityInfo totototo AvoidAvoidAvoidAvoid CreatingCreatingCreatingCreating NewNewNewNew UCFUCFUCFUCF ConnectiConnectiConnectConnectiionsonsonsons

When using client-orchestrated UCF, the ActivityInfo install could be cached and passed in all service operation calls. The following sample code snippet demonstrates how to import four documents through the same UCF connection.

ActivityInfo theInfo = new ActivityInfo(false); for ( int i = 0; i < 4; i++ ) {

if (i == 3) {

/*Close UCF connection after the last transfer*/

theInfo.setAutoCloseConnection(true);

}

OperationOptions theOptions = new OperationOptions(); ContentTransferProfile theTransferProfile = new ContentTransferProfile(); theTransferProfile.setActivityInfo( theInfo ); theOptions.setContentTransferProfile( theTransferProfile );

/*Create DataPackage*/ …… ……

theObjectService.create(theDataPackage, theOptions);

}

Caching the ActivityInfo avoids creating new UCF connections to the server for subsequent content transfer operations. We have seen that the performance improvement is upper bounded by the reverse of the portion of total time for the operations not including creating UCF connections.

Table 1 lists the test results of transmission throughput when reusing UCF connections in LAN environment. Each row in the table represents a set of results for a certain content size. The 2nd to 5th columns show the measured overall throughput by re-using UCF connections for 1, 2, 10 and 100 transmissions. The last column is

the throughput upper boun d which is estimated by using regression anal ysis. It is

clear that this approach is m creating new connections is

more efficient for small files since the time sp ent in relatively large.

TableTableTableTable 1111 ThrougThrougThrougThroug hputhputhputhput improvementimprovementimprovementimprovement bybybyby rererere----usingusingusingusing UCFUCFUCFUCF connectioconnectioconnectioconnectio nnnn

 

1

2

10

100

Up per Bound

100 KB

3.41

Mbps

4.54

Mbps

6.15

Mbps

6.77

Mbps

6.84

Mbps

1 MB

27.4

Mbps

31.8

Mbps

39.6

Mbps

42.7

Mbps

42.9

Mbps

10 MB

92.9

Mbps

99.5

Mbps

107.5 Mbps

111.6 Mbps

1 11.8 Mbps

#5#5#5#5 UseUseUseUse DataPackageDataPackageDataPackageDataPackage totototo TrTrTrTr ansferansferansferansfer MultipleMultipleMultipleMultiple DataObjectDataObjectDataObjectDataObject InstancesInstancesInstancesInstances

A DataPackage is a collectio n of DataObject instances, which is typically passed to,

and returned by, DFS Objec tService operations. ObjectService operations the DataObject instances in the DataPackage sequentially.

Encapsulating multiple Dat aObject instances into a single DataPackage improve the performance b y reducing the number of round trips between

DFS client

and the server. Figure 6 com pares the import throughput (24 threads) wit h different package sizes. The experim ental results demonstrate that the throughpu t could be improved significantly by tr ansferring a DataPackage containing multiple (e.g. >10) DataObject instances.

process all

helps to

(e.g. >10) DataObject instances. process all helps to FigureFigureFigureFigure 6666
(e.g. >10) DataObject instances. process all helps to FigureFigureFigureFigure 6666

FigureFigureFigureFigure 6666 ThroughputThroughputThroughputThroughput ofofofof transferringtransferringtransferringtransferring DataPackageDataPackageDataPackageDataPackage withwithwithwith differedifferedifferedifferentntntnt sizesizesizesize

overoveroverover LANLANLANLAN

#6#6#6#6 OptimizeOptimizeOptimizeOptimize UCFUCFUCFUCF ServerServerServerServer CCCC onfigurationonfigurationonfigurationonfiguration

The UCF server is deployed

configuration file ucf.serverucf.serverucf.serverucf.server .config.xml.config.xml.config.xml.config.xml is located in /APP-INF/classes in applications.

UCF will compress the conte nt to reduce the number of bytes during the t ransmission. However, for some types of the files, such as ZIP, the compression ratio i s close to

as part of the application in DFS server. The s erver

DFS

zero. In this case, the performance will be reversely impacted because of the overhead of the compression. A list of file formats are excluded from compression by default as specified in the compression.exclusion.formatscompression.exclusion.formatscompression.exclusion.fcompression.exclusion.formatsormats element. The user can add other file types that can hardly be compressed any further to optimize the content transfer performance.

Besides the compression ratio, whether or not to compress the content depends on the network condition, especially the bandwidth. Table 2 lists the test results of importing document of 50% compression ratio under different network conditions.

It is not too much of a surprise to observe the no-compression UCF transmission has the better performance in LAN. The reason is because relatively small portion of the time is spent at the link. On the contrary, a large amount of time could be saved by reducing the content size when transferring documents over the poor network with the very limited bandwidth (e.g. WAN).

By comparing the response time between LAN and WAN for the same file, it could be found the extra seconds spent in WAN condition approximates the number of content size divides the network bandwidth. We have also seen the overhead of the compression increases linearly with the original size of the document (about 0.085s/MB for the test environment).

The breakthrough point of the bandwidth could be roughly estimated from these data. Suppose the compression ratio is and the overhead is second/MB, a quick analysis conducts that the breakthrough point is 8(1- )/ Mbps. For example, the number is roughly 50 Mbps for the test scenario in Table 2 ( =0.085s/MB, =0.5), which means, it would be more efficient to disable compression when the network bandwidth is much larger than 50 Mbps.

TableTableTableTable 2222 TransTransmissionTransTransmissionmissionmission timetimetimetime underunderunderunder differentdifferentdifferentdifferent networknetworknetworknetwork conditionsconditionsconditionconditionss

FileFileFileFile SizeSizeSizeSize

CompressCompressCompressCompress

LANLANLANLAN

WAN1WAN1WAN1WAN1

WAN2WAN2WAN2WAN2

40

MB

Yes

3.1

18.9

20.0

40

MB

No

2.2

34.7

35.2

80

MB

Yes

6.0

38.3

39.4

80

MB

No

4.1

69.0

69.8

120

MB

Yes

8.8

56.1

58.5

120

MB

No

6.2

103.3

104.3

* LAN: 1Gbps, 0ms; WAN1: 10Mbps, 0ms; WAN2: 10Mbps, 20ms

ConclusionConclusionConclusionConclusion

This document provides a list of recommendations that can be used as a reference guide for obtaining optimum performance for DFS content transfer.

The first thing to improve the performance is to choose the suitable content transfer mode. In spite of the complexity of using DFS in real cases, we propose a rule for selection simply based on the size of the target file. BASE64 works well for the small files (e.g. less than 100KB); MTOM is suitable in transferring the files with the small and medium size; and UCF is most efficient for the large files.

Array processing is a common performance optimization technique by reducing the number of round trips between the client and the server. DFS data module provides the DataPackage type which can process multiple objects in a single server call. This will increase the throughput in transferring a large amount of documents such as in a batch job. For UCF, the ActivityInfo can be cache to re-use the existing connection.

UCF server tuning is discussed a little in the last section, about the content compression. The overall impact of compression is determined by the file compression ratio, the network bandwidth and the time spent in compression. In general, disabling compression may improve the end-to-end response time in the cases when the compression ratio is low and/or when the network bandwidth is large.

ReferencesReferencesReferencesReferences

Documentum Foundation Services Development Guide