Sunteți pe pagina 1din 64

Garbage Collection Tuning in the Java HotSpot Virtual Machine

Tony Printezis, Charlie Hunt


Sun Microsystems

Trademarks and Abbreviations


> >

Java Virtual Machine (JVM) Java HotSpot Virtual Machine (HotSpot JVM)

Who We Are
>

Tony Printezis
GC Group / HotSpot JVM development team Been working on the HotSpot JVM since 2006 10+ years of GC experience Charlie Hunt

>

Java Platform Performance Engineering Group Works with many Sun product teams and customers 10+ years of Java technology performance work
3

And if you remember only one thing... GC Tuning is an Art!

GC Tuning is an Art
>

Unfortunately, we can't give you a flawless recipe or a flowchart that will apply to all your GC tuning scenarios GC tuning involves a lot of common pattern recognition This pattern recognition requires experience

>

>

We have a lot of it. :-)

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


6

GCs in the HotSpot JVM


>

Three available GCs:


Serial GC Parallel GC / Parallel Old GC Concurrent Mark-Sweep GC (CMS)

Heap Layout (same for all GCs)


Young Generation

Old Generation

Permanent Generation

Young Generation
Allocation (new Object())

Eden

Survivor Spaces

Old Generation

Promotion (survivors from the Young Generation)

10

Permanent Generation

Allocation (only directly from the JVM)


11

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


12

Your Dream GC
>

You would really like a GC that has


Low GC overhead, Low GC pause times, and Good space efficiency

>

Unfortunately, you'll have to pick two (any two!)

13

Heap Sizing Tuning Advice

Supersize it!

14

Heap Sizing Trade-Offs


>

Generally, the larger the heap space, the better


For both young and old generation Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage Smaller space: faster GCs (not always! see later)

>

Sometimes max heap size is dictated by available memory and/or max space the JVM can address

You have to find a good balance between young and old generation size
15

Generation Size Roles


>

Young Generation Size


Dictates frequency of minor GCs Dictates how many objects will be reclaimed in the young generation

Along with tenuring threshold + survivor space size tuning

>

Old Generation

Should comfortably hold the application's steadystate live size Decrease the major GC frequency as much as possible
16

Two Very Important Points


>

You should try to maximize the number of objects reclaimed in the young generation

This is probably the most important piece of advice when sizing a heap and/or tuning the young generation

>

Your application's memory footprint should not exceed the available physical memory

This is probably the second most important piece of advice when sizing a heap

>

The above apply to all our GCs


17

Sizing Heap Spaces


>

-Xmx<size> : max heap size

young generation + old generation young generation + old generation

>

-Xms<size> : initial heap size

> >

-Xmn<size> : young generation size Applications with emphasis on performance tend to set -Xms and -Xmx to the same value When -Xms != -Xmx, heap growth or shrinking requires a Full GC
18

>

Should -Xms == -Xmx?


>

Set -Xms to what you think would be your desired heap size

It's expensive to grow the heap

>

If memory allows, set -Xmx to something larger than -Xms just in case

Maybe the application is hit with more load Maybe the DB gets larger over time

>

In most occasions, it's better to do a Full GC and grow the heap than to get an OOM and crash
19

Sizing Heap Spaces (ii)


>

-XX:PermSize=<size> : permanent generation initial size -XX:MaxPermSize=<size> : permanent generation max size Applications with emphasis on performance almost always set -XX:PermSize and -XX:MaxPermSize to the same value

>

>

Growing or shrinking the permanent generation requires a Full GC too

>

Unfortunately, the permanent generation occupancy is hard to predict


20

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


21

Young Generation Sizing


>

Eden size determines


The frequency of minor GCs Which objects will be reclaimed at age 0


Newly-allocated objects in Eden start from age 0 Their age is incremented at every minor GC

>

Increasing the size of the Eden will not always affect minor GC times

Remember: minor GC times are proportional to the amount of objects they copy (i.e., the live objects), not the young generation size
22

Young Object Survivor Ratio

Survivor Ratio
0 Youngest

New-Allocated Object Age

Oldest

23

Young Object Survivor Ratio (ii)

Survivor Ratio
0 Youngest

New-Allocated Object Age

Oldest

24

Young Object Survivor Ratio (iii)

Survivor Ratio
0 Youngest

New-Allocated Object Age

Oldest

25

Sizing Heap Spaces (iii)


>

-XX:NewSize=<size> : initial young generation size -XX:MaxNewSize=<size> : max young generation size -XX:NewRatio=<ratio> : young generation to old generation ratio Applications with emphasis on performance tend to use -Xmn to size the young generation since it combines the use of -XX:NewSize and -XX:MaxNewSize
26

>

>

>

Tenuring
>

-XX:TargetSurvivorRatio=<percent>, e.g., 50

How much of the survivor space should be filled

Typically leave extra space to deal with spikes

>

-XX:InitialTenuringThreshold=<threshold> (PGC only)

> >

-XX:MaxTenuringThreshold=<threshold> -XX:+AlwaysTenure

Never keep any objects in the survivor spaces Very bad idea!
27

>

-XX:+NeverTenure

Tenuring Threshold Trade-Offs


>

Try to retain as many objects as possible in the survivor spaces so that they can be reclaimed in the young generation

Less promotion into the old generation Less frequent old GCs

>

But also, try not to unnecessarily copy very longlived objects between the survivors

Unnecessary overhead on minor GCs Generally: better copy more, than promote more
28

>

Not always easy to find the perfect balance

Tenuring Distribution
>

Monitor tenuring distribution with -XX:+PrintTenuringDistribution


Desired survivor size 6684672 bytes, new threshold 8 (max 8) - age - age - age - age 1: 2: 3: 4: 2315488 bytes, 19528 bytes, 96 bytes, 32 bytes, 2315488 total 2335016 total 2335112 total 2335144 total

>

Young generation seems well tuned here

We can even decrease the survivor space size

29

Tenuring Distribution (ii)


Desired survivor size 3342336 bytes, new threshold 1 (max 6) - age 1: 3956928 bytes, 3956928 total

>

Survivor space too small!

Increase survivor space and/or eden size

30

Tenuring Distribution (iii)


Desired survivor size 3342336 bytes, new threshold 6 (max 6) - age - age - age - age - age - age 1: 2: 3: 4: 5: 6: 2483440 bytes, 501240 bytes, 50016 bytes, 49088 bytes, 48616 bytes, 50128 bytes, 2483440 total 2984680 total 3034696 total 3083784 total 3132400 total 3182528 total

>

Might be able to do better


Either increase max tenuring threshold Or even set max tenuring threshold to 2

If ages > 6 still have around 50K of surviving bytes


31

Stop-The-World Parallel GC Threads


>

The number of parallel GC threads is controlled by -XX:ParallelGCThreads=<num> Default value assumes only one JVM per system Set the parallel GC thread number according to:

> >

Number of JVMs deployed on the system / processor set / zone CPU chip architecture

Multiple hardware threads per chip core, i.e., UltraSPARC T1 / T2

32

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


33

Parallel GC Ergonomics
>

The Parallel GC has ergonomics

i.e., auto-tuning

>

Ergonomics help in improving out-of-the-box GC performance To get maximum performance, most customers we know do manual tuning

>

34

Parallel GC Tuning Advice


> >

Tune the young generation as described so far Try to avoid / decrease the frequency of major GCs We know of customers who use the Parallel GC in low-pause environments

>

Avoid Full GCs by avoiding / minimizing promotion Maximize heap size

35

NUMA
>

Non-Uniform Memory Access

Applicable to most SPARC, Opteron, more recently Intel platforms

> >

-XX:+UseNUMA Splits the young generation into partitions

Each partition belongs to a CPU

>

Allocates new objects into the partition that belongs to the allocating CPU Big win for some applications
36

>

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


37

CMS Tuning Advice


> >

Tune the young generation as described so far Need to be even more careful about avoiding premature promotion

Originally we were using an +AlwaysTenure policy We have since changed our mind :-)

> >

Promotion in CMS is expensive (free lists) The more often promotion / reclamation happens, the more likely fragmentation will settle in

38

CMS Tuning Advice (ii)


>

We know customers who tune their applications to do mostly minor GCs, even with CMS

CMS is used as a safety net, when applications load exceeds what they have provisioned for Schedule Full GCs at non-critical times (say, late at night) to tidy up the heap and minimize fragmentation

39

Fragmentation
>

Two types

External fragmentation

No free chunk is large enough to satisfy an allocation Allocator rounds up allocation requests Free space wasted due to this rounding up

Internal fragmentation

40

Fragmentation (ii)
>

The bad news: you can never eliminate it!

It has been proven Decrease promotion into the CMS old generation Be careful when coding

>

The good news: you can decrease its likelihood


Large objects of various sizes are the main cause

>

But, when is the heap fragmented anyway?

41

Concurrent CMS GC Threads


>

Number of parallel CMS threads is controlled by -XX:ParallelCMSThreads=<num>

Available in post 6 JVMs CMS cycle duration vs. Concurrent overhead during a CMS cycle

>

Trade-Off

42

Permanent Generation and CMS


>

To date, classes will not be unloaded by default from the permanent generation when using CMS

Both -XX:+CMSClassUnloadingEnabled and -XX: +PermGenSweepingEnabled need to be set to enable class unloading in CMS The 2nd switch is not needed in post 6u4 JVMs

43

Setting CMS Initiating Threshold


> >

Again, a tricky trade-off! Starting a CMS cycle too early


Frequent CMS cycles High concurrent overhead Chance of an evacuation failure / Full GC

>

Starting a CMS cycle too late

>

Initiating heap occupancy should be (much) higher than the application steady-state live size Otherwise, CMS will constantly do CMS cycles
44

>

Common CMS Scenarios


>

Applications that promote non-trivial amounts of objects to the old generation


Old generation grows at a non-trivial rate Very frequent CMS cycles CMS cycles need to start relatively early

>

Applications that promote very few or even no objects to the old generation

Old generation grows very slowly, if at all Very infrequent CMS cycles CMS cycles can start quite late
45

Initiating CMS Cycles


>

CMS will try to automatically find the best initiating occupancy


It first does a CMS cycle early to collect stats Then, it tries to start cycles as late as possible, but early enough not to run out of heap before the cycle completes It keeps collecting stats and adjusting when to start cycles Sometimes, the second cycle starts too late

46

Initiating CMS Cycles (ii)


>

-XX:CMSInitiatingOccupancyFraction=<percent>

Occupancy percentage of CMS old generation that triggers a CMS cycle Don't use the ergonomic initiating occupancy

>

-XX:+UseCMSInitiatingOccupancyOnly

47

Initiating CMS Cycles (iii)


>

-XX:CMSInitiatingPermOccupancyFraction=<percent>

Occupancy percentage of permanent generation that triggers a CMS cycle Class unloading must be enabled

48

CMS Cycle Initiation Example


>

Cycle started too early:


[ParNew 390868K->296358K(773376K), 0.1882258 secs] [CMS-initial-mark 298458K(773376K), 0.0847541 secs] [ParNew 401318K->306863K(773376K), 0.1933159 secs] [CMS-concurrent-mark: 0.787/0.981 secs] [CMS-concurrent-preclean: 0.149/0.152 secs] [CMS-concurrent-abortable-preclean: 0.105/0.183 secs] [CMS-remark 374049K(773376K), 0.0353394 secs] [ParNew 407285K->312829K(773376K), 0.1969370 secs] [ParNew 405554K->311100K(773376K), 0.1922082 secs] [ParNew 404913K->310361K(773376K), 0.1909849 secs] [ParNew 406005K->311878K(773376K), 0.2012884 secs] [CMS-concurrent-sweep: 2.179/2.963 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 387767K->292925K(773376K), 0.1843175 secs] [CMS-initial-mark 295026K(773376K), 0.0865858 secs] [ParNew 397885K->303822K(773376K), 0.1995878 secs]
49

CMS Cycle Initiation Example (ii)


>

Cycle started too late:


[ParNew 742993K->648506K(773376K), 0.1688876 secs] [ParNew 753466K->659042K(773376K), 0.1695921 secs] [CMS-initial-mark 661142K(773376K), 0.0861029 secs] [Full GC 645986K->234335K(655360K), 8.9112629 secs] [ParNew 339295K->247490K(773376K), 0.0230993 secs] [ParNew 352450K->259959K(773376K), 0.1933945 secs]

50

CMS Cycle Initiation Example (iii)


>

This is better:
[ParNew 640710K->546360K(773376K), 0.1839508 secs] [CMS-initial-mark 548460K(773376K), 0.0883685 secs] [ParNew 651320K->556690K(773376K), 0.2052309 secs] [CMS-concurrent-mark: 0.832/1.038 secs] [CMS-concurrent-preclean: 0.146/0.151 secs] [CMS-concurrent-abortable-preclean: 0.181/0.181 secs] [CMS-remark 623877K(773376K), 0.0328863 secs] [ParNew 655656K->561336K(773376K), 0.2088224 secs] [ParNew 648882K->554390K(773376K), 0.2053158 secs] ... [ParNew 489586K->395012K(773376K), 0.2050494 secs] [ParNew 463096K->368901K(773376K), 0.2137257 secs] [CMS-concurrent-sweep: 4.873/6.745 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 445124K->350518K(773376K), 0.1800791 secs] [ParNew 455478K->361141K(773376K), 0.1849950 secs]
51

Start CMS Cycles Explicitly


>

If relying on explicit GCs and want them to be concurrent, use:

-XX:+ExplicitGCInvokesConcurrent

Requires a post 6 JVM Requires a post 6u4 JVM

-XX:+ExplicitGCInvokesConcurrentAndUnloadClasses

>

Useful when wanting to cause references / finalizers to be processed

52

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


53

Monitoring the GC
>

Online

VisualVM: http://visualvm.dev.java.net/ VisualGC:


http://java.sun.com/performance/jvmstat/ VisualGC is also available as a VisualVM plug-in Can monitor multiple JVMs within the same tool

>

Offline

GC Logging PrintGCStats GChisto


54

GC Logging in Production
>

Don't be afraid to enable GC logging in production

Very helpful when diagnosing production issues Maybe some large files in your file system. :-) We are surprised that customers are still afraid to enable it If someone doesn't enable GC logging in production, I shoot them!
55

>

Extremely low / non-existent overhead


>

Real customer quote:

Most Important GC Logging Parameters


>

You need at least:

-XX:+PrintGCTimeStamps

Add -XX:+PrintGCDateStamps if you must Preferred over -verbosegc as it's more detailed

-XX:+PrintGCDetails

>

Also useful:

-Xloggc:<file> Separates GC logging output from application output


56

PrintGCStats
> >

Summarizes GC logs Downloadable script from

http://java.sun.com/developer/technicalArticles/Pr ogramming/turbo/PrintGCStats.zip PrintGCStats -v cpus=<num> <gc log file>

>

Usage

Where <num> is the number of CPUs on the machine where the GC log was obtained

>

It might not work with some of the printing flags


57

PrintGCStats Parallel GC
what gen0t(s) gen1t(s) GC(s) alloc(MB) promo(MB) used0(MB) used1(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 193 1 194 193 193 193 1 194 193 193 193 = = = = = = = = total 11.470 7.350 18.819 11244.609 807.236 16018.930 635.896 91802.213 17854.188 123520.000 141374.188 11244.609 11244.609 11244.609 807.236 807.236 301.110 0.000 301.110 MB MB MB MB MB s s s / / / / / / / / mean 0.05943 7.34973 0.09701 58.26222 4.18257 82.99964 635.89648 473.20728 92.50874 640.00000 732.50874 77.237 1235.792 934.682 77.237 11.470 1235.792 1235.792 1235.792 s s s s s s s s max 0.687 7.350 7.350 100.875 96.426 114.375 635.896 736.490 114.500 640.000 754.500 stddev 0.0633 0.0000 0.5272 18.8519 9.9291 17.4899 0.0000 87.8376 9.8209 0.0000 9.8209

alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load

= 145.586 MB/s = 9.099 MB/s = 12.030 MB/s = 10.451 MB/s = 70.380 MB/s = 24.366% = 0.000% = 24.366%

58

PrintGCStats CMS
what gen0(s) gen0t(s) cmsIM(s) cmsRM(s) GC(s) cmsCM(s) cmsCP(s) cmsCS(s) cmsCR(s) alloc(MB) promo(MB) used0(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 110 110 3 3 113 3 6 3 3 110 110 110 110 110 110 110 = = = = = = = = total 24.381 24.397 0.285 0.092 24.774 2.459 0.971 14.620 0.036 11275.000 1322.718 12664.750 56546.542 12677.500 70400.000 83077.500 11275.000 11275.000 11275.000 1322.718 1322.718 396.378 18.086 414.464 MB MB MB MB MB s s s / / / / / / / / mean 0.22164 0.22179 0.09494 0.03074 0.21924 0.81967 0.16183 4.87333 0.01200 102.50000 12.02471 115.13409 514.05947 115.25000 640.00000 755.25000 83.621 1337.936 923.472 83.621 24.397 1337.936 1337.936 1337.936 s s s s s s s s max 1.751 1.751 0.108 0.032 1.751 0.835 0.191 4.916 0.016 102.500 104.608 115.250 640.625 115.250 640.000 755.250 stddev 0.2038 0.2038 0.0112 0.0015 0.2013 0.0146 0.0272 0.0638 0.0035 0.0000 11.8770 1.2157 91.5858 0.0000 0.0000 0.0000

alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load

= 134.835 MB/s = 8.427 MB/s = 12.209 MB/s = 15.818 MB/s = 54.217 MB/s = 29.626% = 1.352% = 30.978%

59

GChisto
> >

Graphical GC log visualizer Under development

Currently, can only show pause times http://gchisto.dev.java.net/

>

Open source at

>

It might not work with some of the printing flags

60

Demo

GChisto Demo

61

Agenda
> > >

Introductions Brief GC Overview GC Tuning


Tuning the young generation Tuning Parallel GC Tuning CMS

> >

Monitoring the GC Conclusions


62

Conclusions
> >

Remember: GC tuning is an art The talk contained


Basic GC tuning concepts How to monitor GCs What to look out for Examples of good tuning practices

>

...and practice makes perfect!

63

Tony Printezis, Charlie Hunt


tony.printezis@sun.com charlie.hunt@sun.com

S-ar putea să vă placă și