Garbage Collection

Garbage Collection Tuning in the Java HotSpot Virtual Machine
Tony Printezis, Charlie Hunt

Sun Microsystems
Trademarks and Abbreviations

> >
Java Virtual Machine (JVM) Java HotSpot Virtual Machine (HotSpot JVM)
Who We Are
>
Tony Printezis
GC Group / HotSpot JVM development team Been working on the HotSpot JVM since 2006 10+ years of GC experience Charlie Hunt

>
Java Platform Performance Engineering Group Works with many Sun product teams and customers 10+ years of Java technology performance work
3
And if you remember only one thing... GC Tuning is an Art!
GC Tuning is an Art
>
Unfortunately, we can't give you a flawless recipe or a flowchart that will apply to all your GC tuning scenarios GC tuning involves a lot of common pattern recognition This pattern recognition requires experience
>
>
We have a lot of it. :-)
Agenda
> > >
Introductions Brief GC Overview GC Tuning

Tuning the young generation Tuning Parallel GC Tuning CMS
> >
Monitoring the GC Conclusions

6
GCs in the HotSpot JVM

>
Three available GCs:

Serial GC Parallel GC / Parallel Old GC Concurrent Mark-Sweep GC (CMS)
Heap Layout (same for all GCs)

Young Generation
Old Generation
Permanent Generation
Young Generation
Allocation (new Object())
Eden
Survivor Spaces
Old Generation
Promotion (survivors from the Young Generation)
10
Permanent Generation
Allocation (only directly from the JVM)

11
Agenda
> > >

> >

12
Your Dream GC
>
You would really like a GC that has

Low GC overhead, Low GC pause times, and Good space efficiency
>
Unfortunately, you'll have to pick two (any two!)
13
Heap Sizing Tuning Advice
Supersize it!
14
Heap Sizing Trade-Offs

>
Generally, the larger the heap space, the better

For both young and old generation Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage Smaller space: faster GCs (not always! see later)
>
Sometimes max heap size is dictated by available memory and/or max space the JVM can address
You have to find a good balance between young and old generation size
15
Generation Size Roles

>
Young Generation Size

Dictates frequency of minor GCs Dictates how many objects will be reclaimed in the young generation
Along with tenuring threshold + survivor space size tuning
>
Old Generation
Should comfortably hold the application's steadystate live size Decrease the major GC frequency as much as possible
16
Two Very Important Points

>
You should try to maximize the number of objects reclaimed in the young generation
This is probably the most important piece of advice when sizing a heap and/or tuning the young generation
>
Your application's memory footprint should not exceed the available physical memory
This is probably the second most important piece of advice when sizing a heap
>
The above apply to all our GCs

17
Sizing Heap Spaces

>
-Xmx<size> : max heap size
young generation + old generation young generation + old generation
>
-Xms<size> : initial heap size
> >
-Xmn<size> : young generation size Applications with emphasis on performance tend to set -Xms and -Xmx to the same value When -Xms != -Xmx, heap growth or shrinking requires a Full GC
18
>
Should -Xms == -Xmx?

>
Set -Xms to what you think would be your desired heap size
It's expensive to grow the heap
>
If memory allows, set -Xmx to something larger than -Xms just in case

Maybe the application is hit with more load Maybe the DB gets larger over time
>
In most occasions, it's better to do a Full GC and grow the heap than to get an OOM and crash
19
Sizing Heap Spaces (ii)

>
-XX:PermSize=<size> : permanent generation initial size -XX:MaxPermSize=<size> : permanent generation max size Applications with emphasis on performance almost always set -XX:PermSize and -XX:MaxPermSize to the same value
>
>
Growing or shrinking the permanent generation requires a Full GC too
>
Unfortunately, the permanent generation occupancy is hard to predict

20
Agenda
> > >

> >

21
Young Generation Sizing

>
Eden size determines

The frequency of minor GCs Which objects will be reclaimed at age 0

Newly-allocated objects in Eden start from age 0 Their age is incremented at every minor GC
>
Increasing the size of the Eden will not always affect minor GC times
Remember: minor GC times are proportional to the amount of objects they copy (i.e., the live objects), not the young generation size
22
Young Object Survivor Ratio
Survivor Ratio
0 Youngest
New-Allocated Object Age
Oldest
23
Young Object Survivor Ratio (ii)
Survivor Ratio
0 Youngest
Oldest
24
Young Object Survivor Ratio (iii)
Survivor Ratio
0 Youngest
Oldest
25
Sizing Heap Spaces (iii)

>
-XX:NewSize=<size> : initial young generation size -XX:MaxNewSize=<size> : max young generation size -XX:NewRatio=<ratio> : young generation to old generation ratio Applications with emphasis on performance tend to use -Xmn to size the young generation since it combines the use of -XX:NewSize and -XX:MaxNewSize
26
>
>
>
Tenuring
>
-XX:TargetSurvivorRatio=<percent>, e.g., 50
How much of the survivor space should be filled
Typically leave extra space to deal with spikes
>
-XX:InitialTenuringThreshold=<threshold> (PGC only)
> >
-XX:MaxTenuringThreshold=<threshold> -XX:+AlwaysTenure
Never keep any objects in the survivor spaces Very bad idea!
27
>
-XX:+NeverTenure
Tenuring Threshold Trade-Offs

>
Try to retain as many objects as possible in the survivor spaces so that they can be reclaimed in the young generation

Less promotion into the old generation Less frequent old GCs
>
But also, try not to unnecessarily copy very longlived objects between the survivors
Unnecessary overhead on minor GCs Generally: better copy more, than promote more
28
>
Not always easy to find the perfect balance
Tenuring Distribution
>
Monitor tenuring distribution with -XX:+PrintTenuringDistribution

Desired survivor size 6684672 bytes, new threshold 8 (max 8) - age - age - age - age 1: 2: 3: 4: 2315488 bytes, 19528 bytes, 96 bytes, 32 bytes, 2315488 total 2335016 total 2335112 total 2335144 total
>
Young generation seems well tuned here
We can even decrease the survivor space size
29
Tenuring Distribution (ii)

Desired survivor size 3342336 bytes, new threshold 1 (max 6) - age 1: 3956928 bytes, 3956928 total
>
Survivor space too small!
Increase survivor space and/or eden size
30
Tenuring Distribution (iii)

Desired survivor size 3342336 bytes, new threshold 6 (max 6) - age - age - age - age - age - age 1: 2: 3: 4: 5: 6: 2483440 bytes, 501240 bytes, 50016 bytes, 49088 bytes, 48616 bytes, 50128 bytes, 2483440 total 2984680 total 3034696 total 3083784 total 3132400 total 3182528 total
>
Might be able to do better

Either increase max tenuring threshold Or even set max tenuring threshold to 2
If ages > 6 still have around 50K of surviving bytes

31
Stop-The-World Parallel GC Threads

>
The number of parallel GC threads is controlled by -XX:ParallelGCThreads=<num> Default value assumes only one JVM per system Set the parallel GC thread number according to:
> >
Number of JVMs deployed on the system / processor set / zone CPU chip architecture
Multiple hardware threads per chip core, i.e., UltraSPARC T1 / T2
32
Agenda
> > >

> >

33
Parallel GC Ergonomics
>
The Parallel GC has ergonomics
i.e., auto-tuning
>
Ergonomics help in improving out-of-the-box GC performance To get maximum performance, most customers we know do manual tuning
>
34
Parallel GC Tuning Advice

> >
Tune the young generation as described so far Try to avoid / decrease the frequency of major GCs We know of customers who use the Parallel GC in low-pause environments

>
Avoid Full GCs by avoiding / minimizing promotion Maximize heap size
35
NUMA
>
Non-Uniform Memory Access
Applicable to most SPARC, Opteron, more recently Intel platforms
> >
-XX:+UseNUMA Splits the young generation into partitions
Each partition belongs to a CPU
>
Allocates new objects into the partition that belongs to the allocating CPU Big win for some applications
36
>
Agenda
> > >

> >

37
CMS Tuning Advice

> >
Tune the young generation as described so far Need to be even more careful about avoiding premature promotion

Originally we were using an +AlwaysTenure policy We have since changed our mind :-)
> >
Promotion in CMS is expensive (free lists) The more often promotion / reclamation happens, the more likely fragmentation will settle in
38
CMS Tuning Advice (ii)

>
We know customers who tune their applications to do mostly minor GCs, even with CMS
CMS is used as a safety net, when applications load exceeds what they have provisioned for Schedule Full GCs at non-critical times (say, late at night) to tidy up the heap and minimize fragmentation
39
Fragmentation
>
Two types
External fragmentation
No free chunk is large enough to satisfy an allocation Allocator rounds up allocation requests Free space wasted due to this rounding up
Internal fragmentation

40
Fragmentation (ii)
>
The bad news: you can never eliminate it!
It has been proven Decrease promotion into the CMS old generation Be careful when coding
>
The good news: you can decrease its likelihood

Large objects of various sizes are the main cause
>
But, when is the heap fragmented anyway?
41
Concurrent CMS GC Threads

>
Number of parallel CMS threads is controlled by -XX:ParallelCMSThreads=<num>
Available in post 6 JVMs CMS cycle duration vs. Concurrent overhead during a CMS cycle
>
Trade-Off

42
Permanent Generation and CMS

>
To date, classes will not be unloaded by default from the permanent generation when using CMS
Both -XX:+CMSClassUnloadingEnabled and -XX: +PermGenSweepingEnabled need to be set to enable class unloading in CMS The 2nd switch is not needed in post 6u4 JVMs
43
Setting CMS Initiating Threshold

> >
Again, a tricky trade-off! Starting a CMS cycle too early

Frequent CMS cycles High concurrent overhead Chance of an evacuation failure / Full GC
>
Starting a CMS cycle too late
>
Initiating heap occupancy should be (much) higher than the application steady-state live size Otherwise, CMS will constantly do CMS cycles
44
>
Common CMS Scenarios

>
Applications that promote non-trivial amounts of objects to the old generation

Old generation grows at a non-trivial rate Very frequent CMS cycles CMS cycles need to start relatively early
>
Applications that promote very few or even no objects to the old generation

Old generation grows very slowly, if at all Very infrequent CMS cycles CMS cycles can start quite late
45
Initiating CMS Cycles

>
CMS will try to automatically find the best initiating occupancy

It first does a CMS cycle early to collect stats Then, it tries to start cycles as late as possible, but early enough not to run out of heap before the cycle completes It keeps collecting stats and adjusting when to start cycles Sometimes, the second cycle starts too late
46
Initiating CMS Cycles (ii)

>
-XX:CMSInitiatingOccupancyFraction=<percent>
Occupancy percentage of CMS old generation that triggers a CMS cycle Don't use the ergonomic initiating occupancy
>
-XX:+UseCMSInitiatingOccupancyOnly
47
Initiating CMS Cycles (iii)

>
-XX:CMSInitiatingPermOccupancyFraction=<percent>
Occupancy percentage of permanent generation that triggers a CMS cycle Class unloading must be enabled
48
CMS Cycle Initiation Example

>
Cycle started too early:

[ParNew 390868K->296358K(773376K), 0.1882258 secs] [CMS-initial-mark 298458K(773376K), 0.0847541 secs] [ParNew 401318K->306863K(773376K), 0.1933159 secs] [CMS-concurrent-mark: 0.787/0.981 secs] [CMS-concurrent-preclean: 0.149/0.152 secs] [CMS-concurrent-abortable-preclean: 0.105/0.183 secs] [CMS-remark 374049K(773376K), 0.0353394 secs] [ParNew 407285K->312829K(773376K), 0.1969370 secs] [ParNew 405554K->311100K(773376K), 0.1922082 secs] [ParNew 404913K->310361K(773376K), 0.1909849 secs] [ParNew 406005K->311878K(773376K), 0.2012884 secs] [CMS-concurrent-sweep: 2.179/2.963 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 387767K->292925K(773376K), 0.1843175 secs] [CMS-initial-mark 295026K(773376K), 0.0865858 secs] [ParNew 397885K->303822K(773376K), 0.1995878 secs]
49
CMS Cycle Initiation Example (ii)

>
Cycle started too late:

[ParNew 742993K->648506K(773376K), 0.1688876 secs] [ParNew 753466K->659042K(773376K), 0.1695921 secs] [CMS-initial-mark 661142K(773376K), 0.0861029 secs] [Full GC 645986K->234335K(655360K), 8.9112629 secs] [ParNew 339295K->247490K(773376K), 0.0230993 secs] [ParNew 352450K->259959K(773376K), 0.1933945 secs]
50
CMS Cycle Initiation Example (iii)

>
This is better:
[ParNew 640710K->546360K(773376K), 0.1839508 secs] [CMS-initial-mark 548460K(773376K), 0.0883685 secs] [ParNew 651320K->556690K(773376K), 0.2052309 secs] [CMS-concurrent-mark: 0.832/1.038 secs] [CMS-concurrent-preclean: 0.146/0.151 secs] [CMS-concurrent-abortable-preclean: 0.181/0.181 secs] [CMS-remark 623877K(773376K), 0.0328863 secs] [ParNew 655656K->561336K(773376K), 0.2088224 secs] [ParNew 648882K->554390K(773376K), 0.2053158 secs] ... [ParNew 489586K->395012K(773376K), 0.2050494 secs] [ParNew 463096K->368901K(773376K), 0.2137257 secs] [CMS-concurrent-sweep: 4.873/6.745 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 445124K->350518K(773376K), 0.1800791 secs] [ParNew 455478K->361141K(773376K), 0.1849950 secs]
51
Start CMS Cycles Explicitly

>
If relying on explicit GCs and want them to be concurrent, use:
-XX:+ExplicitGCInvokesConcurrent
Requires a post 6 JVM Requires a post 6u4 JVM
-XX:+ExplicitGCInvokesConcurrentAndUnloadClasses
>
Useful when wanting to cause references / finalizers to be processed
52
Agenda
> > >

> >

53
Monitoring the GC
>
Online

VisualVM: http://visualvm.dev.java.net/ VisualGC:

http://java.sun.com/performance/jvmstat/ VisualGC is also available as a VisualVM plug-in Can monitor multiple JVMs within the same tool
>
Offline

GC Logging PrintGCStats GChisto

54
GC Logging in Production
>
Don't be afraid to enable GC logging in production
Very helpful when diagnosing production issues Maybe some large files in your file system. :-) We are surprised that customers are still afraid to enable it If someone doesn't enable GC logging in production, I shoot them!
55
>
Extremely low / non-existent overhead

>
Real customer quote:
Most Important GC Logging Parameters

>
You need at least:
-XX:+PrintGCTimeStamps
Add -XX:+PrintGCDateStamps if you must Preferred over -verbosegc as it's more detailed
-XX:+PrintGCDetails
>
Also useful:

-Xloggc:<file> Separates GC logging output from application output

56
PrintGCStats
> >
Summarizes GC logs Downloadable script from
http://java.sun.com/developer/technicalArticles/Pr ogramming/turbo/PrintGCStats.zip PrintGCStats -v cpus=<num> <gc log file>
>
Usage
Where <num> is the number of CPUs on the machine where the GC log was obtained
>
It might not work with some of the printing flags

57
PrintGCStats Parallel GC
what gen0t(s) gen1t(s) GC(s) alloc(MB) promo(MB) used0(MB) used1(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 193 1 194 193 193 193 1 194 193 193 193 = = = = = = = = total 11.470 7.350 18.819 11244.609 807.236 16018.930 635.896 91802.213 17854.188 123520.000 141374.188 11244.609 11244.609 11244.609 807.236 807.236 301.110 0.000 301.110 MB MB MB MB MB s s s / / / / / / / / mean 0.05943 7.34973 0.09701 58.26222 4.18257 82.99964 635.89648 473.20728 92.50874 640.00000 732.50874 77.237 1235.792 934.682 77.237 11.470 1235.792 1235.792 1235.792 s s s s s s s s max 0.687 7.350 7.350 100.875 96.426 114.375 635.896 736.490 114.500 640.000 754.500 stddev 0.0633 0.0000 0.5272 18.8519 9.9291 17.4899 0.0000 87.8376 9.8209 0.0000 9.8209
alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load
= 145.586 MB/s = 9.099 MB/s = 12.030 MB/s = 10.451 MB/s = 70.380 MB/s = 24.366% = 0.000% = 24.366%
58
PrintGCStats CMS
what gen0(s) gen0t(s) cmsIM(s) cmsRM(s) GC(s) cmsCM(s) cmsCP(s) cmsCS(s) cmsCR(s) alloc(MB) promo(MB) used0(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 110 110 3 3 113 3 6 3 3 110 110 110 110 110 110 110 = = = = = = = = total 24.381 24.397 0.285 0.092 24.774 2.459 0.971 14.620 0.036 11275.000 1322.718 12664.750 56546.542 12677.500 70400.000 83077.500 11275.000 11275.000 11275.000 1322.718 1322.718 396.378 18.086 414.464 MB MB MB MB MB s s s / / / / / / / / mean 0.22164 0.22179 0.09494 0.03074 0.21924 0.81967 0.16183 4.87333 0.01200 102.50000 12.02471 115.13409 514.05947 115.25000 640.00000 755.25000 83.621 1337.936 923.472 83.621 24.397 1337.936 1337.936 1337.936 s s s s s s s s max 1.751 1.751 0.108 0.032 1.751 0.835 0.191 4.916 0.016 102.500 104.608 115.250 640.625 115.250 640.000 755.250 stddev 0.2038 0.2038 0.0112 0.0015 0.2013 0.0146 0.0272 0.0638 0.0035 0.0000 11.8770 1.2157 91.5858 0.0000 0.0000 0.0000
alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load
= 134.835 MB/s = 8.427 MB/s = 12.209 MB/s = 15.818 MB/s = 54.217 MB/s = 29.626% = 1.352% = 30.978%
59
GChisto
> >
Graphical GC log visualizer Under development
Currently, can only show pause times http://gchisto.dev.java.net/
>
Open source at
>
It might not work with some of the printing flags
60
Demo
GChisto Demo
61
Agenda
> > >

> >

62
Conclusions
> >
Remember: GC tuning is an art The talk contained

Basic GC tuning concepts How to monitor GCs What to look out for Examples of good tuning practices
>
...and practice makes perfect!
63
Tony Printezis, Charlie Hunt

tony.printezis@sun.com charlie.hunt@sun.com

Garbage Collection

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Garbage Collection

Încărcat de

Drepturi de autor:

Formate disponibile

Garbage Collection Tuning in the Java HotSpot Virtual Machine

Tony Printezis, Charlie Hunt

Trademarks and Abbreviations

And if you remember only one thing... GC Tuning is an Art!

We have a lot of it. :-)

Introductions Brief GC Overview GC Tuning

Tuning the young generation Tuning Parallel GC Tuning CMS

Monitoring the GC Conclusions

GCs in the HotSpot JVM

Three available GCs:

Serial GC Parallel GC / Parallel Old GC Concurrent Mark-Sweep GC (CMS)

Heap Layout (same for all GCs)

Promotion (survivors from the Young Generation)

Allocation (only directly from the JVM)

Introductions Brief GC Overview GC Tuning

Tuning the young generation Tuning Parallel GC Tuning CMS

Monitoring the GC Conclusions

You would really like a GC that has

Low GC overhead, Low GC pause times, and Good space efficiency

Unfortunately, you'll have to pick two (any two!)

Heap Sizing Tuning Advice

Heap Sizing Trade-Offs

Generally, the larger the heap space, the better

Generation Size Roles

Young Generation Size

Along with tenuring threshold + survivor space size tuning

Two Very Important Points

The above apply to all our GCs

Sizing Heap Spaces

-Xmx<size> : max heap size

young generation + old generation young generation + old generation

-Xms<size> : initial heap size

Should -Xms == -Xmx?

It's expensive to grow the heap

Sizing Heap Spaces (ii)

Growing or shrinking the permanent generation requires a Full GC too

Unfortunately, the permanent generation occupancy is hard to predict

Introductions Brief GC Overview GC Tuning

Tuning the young generation Tuning Parallel GC Tuning CMS

Monitoring the GC Conclusions

Young Generation Sizing

Eden size determines

The frequency of minor GCs Which objects will be reclaimed at age 0

Young Object Survivor Ratio

New-Allocated Object Age

Young Object Survivor Ratio (ii)

New-Allocated Object Age

Young Object Survivor Ratio (iii)

New-Allocated Object Age

Sizing Heap Spaces (iii)

How much of the survivor space should be filled

Typically leave extra space to deal with spikes

-XX:InitialTenuringThreshold=<threshold> (PGC only)

Tenuring Threshold Trade-Offs

Not always easy to find the perfect balance

Monitor tenuring distribution with -XX:+PrintTenuringDistribution

Young generation seems well tuned here

We can even decrease the survivor space size

Tenuring Distribution (ii)

Survivor space too small!

Increase survivor space and/or eden size

Tenuring Distribution (iii)

Might be able to do better