Sunteți pe pagina 1din 141

Java Performance Tuning

Atthakorn Chanthong

What is software tuning?

User Experience
The software has poor response time. I need it runs more faster

Software tuning is to make application runs faster

Many people think Java application is slow, why?

There are two major reasons

The first is the Bottleneck

The Bottleneck
Increased Memory Use Lots of Casts

Automatic memory management by Garbage Collector

All Object are allocated on the Heap.

Java application is not native

The second is The Bad Coding Practice

How to make it run faster?

The bottleneck is unavoidable

But the man could have a good coding practice

A good design

A good coding practice

Java application normally run fast enough

So the tuning game comes into play

Knowing the strategy

Tuning Strategy
1

Identify the main causes

Choose the quickest and easier one to fix

Fix it, and repeat again for other root cause

Inside the strategy

Tuning Strategy
Profile, Measure Problem Priority
Need more faster, repeat again

Yes, its better Identify the location of bottleneck


Still bad? The result isnt good enough

Test and compare Before/after alteration

Think a hypothesis

Code alteration Create a test scenario

How to measure the software performance?

We use the profiler

Profiler is a programming tool that can track the performance of another computer program

The two common usages of profiler are to analyze a software problem


Profiler

Profile application performance

Monitor application memory usage

How we get the profiler?

Dont pay for it!

An opensource profiler is all around

Some interesting opensource profilers

Opensource Profiler

JConsole

http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html

Opensource Profiler

Eclipse TPTP

http://www.eclipse.org/tptp/index.php

Opensource Profiler
NetBeans Built-in Profiler

http://profiler.netbeans.org/

Opensource Profiler
This is pulled out from NetBeans to act as standalone profiler

VisualVM

https://visualvm.dev.java.net/

And much more

Opensource Profiler
JRat Cougaar

DrMem

InfraRED

Profiler4j

Jmeasurement

TIJMP

We love opensource

Make the brain smart with good code practice

1st Rule Avoid Object-Creation

Object-Creation causes problem Why?

Avoid Object-Creation
Lots of objects in memory means GC does lots of work Program is slow down when GC starts

Avoid Object-Creation

Creating object costs time and CPU effort for application

Reuse objects where possible

Pool Management
Most container (e.g. Vector) objects could be reused rather than created and thrown away

Pool Management
VectorPoolManager

V1

V3

V4

V5

getVector()

returnVector()

V2

Pool Management
public void doSome() { for (int i=0; i < 10; i++) { Vector v = new Vector() do vector manipulation stuff } }

public static VectorPoolManager vpl = new VectorPoolManager(25) public void doSome() {

for (int i=0; i < 10; i++) {


Vector v = vectorPoolManager.getVector( ); do vector manipulation stuff vectorPoolManager.returnVector(v); } }

Canonicalizing Objects
Replace multiple object by a single object or just a few

Canonicalizing Objects
public class VectorPoolManager { private static final VectorPoolManager poolManager; private Vector[] pool; private VectorPoolManager(int size) { .... } public static Vector getVector() { if (poolManager== null) poolManager = new VectorPoolManager(20); ... return pool[pool.length-1]; } }

Singleton Pattern

Canonicalizing Objects
Boolean Boolean Boolean Boolean b1 b2 b3 b4 = = = = new new new new Boolean(true); Boolean(false); Boolean(false); Boolean(false); 4 objects in memory

Boolean Boolean Boolean Boolean

b1 b2 b3 b4

= = = =

Boolean.TRUE Boolean.FALSE Boolean.FALSE Boolean.FALSE 2 objects in memory

Canonicalizing Objects
String string = "55"; Integer theInt = new Integer(string);
No Cache

String string = "55"; Integer theInt = Integer.valueOf(string);

Object Cached

Canonicalizing Objects
private static class IntegerCache { private IntegerCache(){} static final Integer cache[] = new Integer[-(-128) + 127 + 1]; static { for(int i = 0; i < cache.length; i++) cache[i] = new Integer(i - 128); } } public static Integer valueOf(int i) { final int offset = 128; if (i >= -128 && i <= 127) { // must cache return IntegerCache.cache[i + offset]; } return new Integer(i); }

Caching inside Integer.valueOf()

Keyword, final
Use the final modifier on variable to create immutable internally accessible object

Keyword, final
public void doSome(Dimension width, Dimenstion height) { //Re-assign allow width = new Dimension(5,5); ... }

public void doSome(final Dimension width, final Dimenstion height) { //Re-assign disallow width = new Dimension(5,5); ... }

Auto-Boxing/Unboxing
Use Auto-Boxing as need not as always

Auto-Boxing/UnBoxing
Integer i = 0; //Counting by 10M while (i < 100000000) { i++; } Takes 2313 ms Why it takes 2313/125 =~ 20 times longer?

int p = 0; //Counting by 10M while (p < 100000000) { p++; } Takes 125 ms

Auto-Boxing/UnBoxing

Object-Creation made every time we wrap primitive by boxing

2nd Rule Knowing String Better

String is the Object mostly used in the application

Overlook the String


The software may have the poor performance

Compile-Time String Initialization


Use the string concatenation (+) operator to create Strings at compile-time.

Compile-Time Initialization
for (int i =0; i < loop; i++) { //Looping 10M rounds String x = "Hello" + "," +" "+ "World"; } Takes 16 ms

for (int i =0; i < loop; i++) { //Looping 10M rounds String x = new String("Hello" + "," +" "+ "World"); } Takes 672 ms

Runtime String Initialization


Use StringBuffers/StringBuilder to create Strings at runtime.

Runtime String Initialization


String name = "Smith"; for (int i =0; i < loop; i++) { //Looping 1M rounds String x = "Hello"; x += ","; x += " Mr."; x += name; }

Takes 10298 ms

String name = "Smith"; for (int i =0; i < loop; i++) { //Looping 1M rounds String x = (new StringBuffer()).append("Hello") .append(",").append(" ") .append(name).toString(); }

Takes 6187 ms

String comparison
Use appropriate method to compare the String

To Test String is Empty


for (int i =0; i < loop; i++) { //10m loops if (a != null && a.equals("")) { } }.

Takes 125 ms

for (int i =0; i < loop; i++) { //10m loops if (a != null && a.length() == 0) { } } Takes 31 ms

If two strings have the same length


String a = abc String b = cdf for (int i =0; i < loop; i++) { if (a.equalsIgnoreCase(b)) { } }

Takes 750 ms

String a = abc String b = cdf for (int i =0; i < loop; i++) { if (a.equals(b)) { } }

Takes 125 ms

If two strings have different length


String a = abc String b = cdfg for (int i =0; i < loop; i++) { if (a.equalsIgnoreCase(b)) { } }

Takes 780 ms

String a = abc String b = cdfg for (int i =0; i < loop; i++) { if (a.equals(b)) { } }

Takes 858 ms

String.equalsIgnoreCase() does only 2 steps

It checks for identity and then for Strings being the same size

Intern String To compare String by identity

Intern String

Normally, string can be created by two ways

Intern String
By new String()
String s = new String(This is a string literal.);

By String Literals
String s = This is a string literal.;

Intern String

Create Strings by new String()

JVM always allocate a new memory address for each new String created even if they are the same.

Intern String
String a = new String(This is a string literal.); String b = new String(This is a string literal.);

This is a string literal. The different memory address

This is a string literal.

Intern String
Create Strings by Literals Strings will be stored in Pool Double create Strings by laterals They will share as a unique instances

Intern String
String a = This is a string literal.; String b = This is a string literal.;

Same memory address This is a string literal.

Intern String
We can point two Stings variable to the same address if they are the same values. By using String.intern() method

Intern String
String a = new String(This is a string literal.).intern(); String b = new String(This is a string literal.).intern();

Same memory address This is a string literal.

Intern String
The idea is
Intern String could be used to compare String by identity

Intern String

What compare by identity means?

Intern String
If (a == b)
Identity comparison (by reference)

If (a.equals(b))
Value comparison

Intern String

By using reference so identity comparison is fast

Intern String

In traditionally style
String must be compare by equals() to avoid the negative result

Intern String
But Intern String
If Strings have different value they also have different address. If Strings have same value they also have the same address.

Intern String

So we can say that


(a == b) is equivalent to (a.equals(b))

Intern String
For these string variables
String a = "abc"; String b = "abc"; String c = new String("abc").intern()

They are pointed to the same address with the same value

Intern String
for (int i =0; i < loop; i++) { if (a.equals(b)) { } } Takes 312 ms

for (int i =0; i < loop; i++) { if (a == b) { } } Takes 32 ms

Intern String

Wow, Intern String is good Unfortunately, it makes code hard understand, use it carefully

Intern String
String.intern() comes with overhead as there is a step to cache
Use Intern String if they are planed to compare two or more times

char array instead of String


Avoid doing some stuffs by String object itself for optimal performance

char array
String x = "abcdefghijklmn"; for (int i =0; i < loop; i++) { if (x.charAt(5) == 'x') { } }

Takes 281 ms

String x = "abcdefghijklmn"; char y[] = x.toCharArray(); for (int i =0; i < loop; i++) { if ( (20 < y.length && 20 >= 0) && y[20] == 'x') { } Takes 156 ms }

3rd Rule Exception and Cast

Stop exception to be thrown if it is possible


Exception is really expensively to execute

Avoid Exception
Object obj = null; for (int i =0; i < loop; i++) { try { obj.hashCode(); } catch (Exception e) {} }

Takes 18563 ms

Object obj = null; for (int i =0; i < loop; i++) { if (obj != null) { obj.hashCode(); } }

Takes 16 ms

Cast as Less
We can reduce runtime cost by grouping cast object which is several used

Cast as Less
Integer io = new Integer(0); Object obj = (Object)io; for (int i =0; i < loop; i++) { if (obj instanceof Integer) { byte x = ((Integer) obj).byteValue(); double d = ((Integer) obj).doubleValue(); float f = ((Integer) obj).floatValue(); } } for (int i =0; i < loop; i++) { if (obj instanceof Integer) { Integer icast = (Integer)obj; byte x = icast.byteValue(); double d = icast.doubleValue(); float f = icast.floatValue(); } }

Takes 31 ms

Takes 16 ms

4th Rule The Rhythm of Motion

Loop Optimization
There are several ways to make a faster loop

Dont terminate loop with method calls

Eliminate Method Call


byte x[] = new byte[loop]; for (int i = 0; i < x.length; i++) { for (int j = 0; j < x.length; j++) { } } byte x[] = new byte[loop]; int length = x.length; for (int i = 0; i < length; i++) { for (int j = 0; j < length; j++) { } } Takes 62 ms

Takes 109 ms

Method Call generates some overhead in Object Oriented Paradigm

Use int to iterate over loop

Iterate over loop by int


for (int i = 0; i < length; i++) { for (int j = 0; j < length; j++) { } } Takes 62 ms

for (short i = 0; i < length; i++) { for (short j = 0; j < length; j++) { } } Takes 125 ms

VM is optimized to use int for loop iteration not by byte, short, char

Use System.arraycopy() for copying object instead of running over loop

System.arraycopy(.)
for (int i = 0; i < length; i++) { x[i] = y[i]; } Takes 62 ms

System.arraycopy(x, 0, y, 0, x.length);

Takes 16 ms

System.arraycopy() is native function It is efficiently to use

Terminate loop by primitive use not by function or variable

Terminate Loop by Primitive


for(int i = 0; i < countArr.length; i++) { for(int j = 0; j < countArr.length; j++) { } } for(int i = countArr.length-1; i >= 0; i--) { for(int j = countArr.length-1; j >= 0; j--) { } } Takes 298 ms Takes 424 ms

Primitive comparison is more efficient than function or variable comparison

The average time of switch vs. if-else is about equally in random case

Switch vs. If-else


for(int i = 0; i < loop; i++) { if (i%10== 0) { } else if (i%10 == 1) { ... } else if (i%10 == 8) { } else if (i%10 == 9) { } } Takes 2623 ms Takes 2608 ms for(int i = 0; i < loop; i++) { switch (i%10) { case 0: break; case 1: break; ... case 7: break; case 8: break; default: break; } }

Switch is quite fast if the case falls into the middle but slower than if-else in case of falling at the beginning or default case ** Test against a contiguous range of case values eg, 1,2,3,4,..

Recursive Algorithm
Recursive function is easy to read but it costs for each recursion

Tail Recursion
A recursive function for which each recursive call to itself is a reduction of the original call.

Recursive vs. Tail-Recursive


public static long factorial1(int n) { if (n < 2) return 1L; else return n*factorial1(n-1); }

Takes 172 ms
public static long factorial1a(int n) { if (n < 2) return 1L; else return factorial1b(n, 1L); } public static long factorial1b(int n, long result) { if (n == 2) return 2L*result; else return factorial1b(n-1, result*n); }

Takes 125 ms

Dynamic Cached Recursive Do cache to gain more speed

Dynamic-Cached Recursive
public static long factorial1(int n) { if (n < 2) return 1L; else return n*factorial1(n-1); }

Takes 172 ms
public static final int CACHE_SIZE = 15; public static final long[ ] factorial3Cache = new long[CACHE_SIZE]; public static long factorial3(int n) { if (n < 2) return 1L; else if (n < CACHE_SIZE) { if (factorial3Cache[n] == 0) factorial3Cache[n] = n*factorial3(n-1); return factorial3Cache[n]; } else return n*factorial3(n-1); }

Takes 94 ms

Recursion Summary
Dynamic-Cached Tail Recursive
is better than

Tail Recursive
is better than

Recursive

5th Rule Use Appropriate Collection

Accession ArrayList vs. LinkedList

Random Access
ArrayList al = new ArrayList(); for (int i =0; i < loop; i++) { al.get(i); } Takes 281 ms

LinkedList ll = new LinkedList(); for (int i =0; i < loop; i++) { ll.get(i); } Takes 5828 ms

Sequential Access
ArrayList al = new ArrayList(); for (Iterator i = al.iterator(); i.hasNext();) { i.next(); } Takes 1375 ms

LinkedList ll = new LinkedList(); for (Iterator i = ll.iterator(); i.hasNext();) { i.next(); } Takes 1047 ms

ArrayList is good for random access LinkedList is good for sequential access

Random vs. Sequential Access


ArrayList al = new ArrayList(); for (int i =0; i < loop; i++) { al.get(i); } Takes 281 ms

LinkedList ll = new LinkedList(); for (Iterator i = ll.iterator(); i.hasNext();) { i.next(); } Takes 1047 ms

Random Access is better than Sequential Access

Insertion ArrayList vs. LinkedList

Insertion at zero index


ArrayList al = new ArrayList(); for (int i =0; i < loop; i++) { al.add(0, Integer.valueOf(i)); } Takes 328 ms

LinkedList ll = new LinkedList(); for (int i =0; i < loop; i++) { ll.add(0, Integer.valueOf(i)); }

Takes 109 ms

LinkedList does insertion better than ArrayList

Vector is likely to ArrayList but it is synchronized version

Accession and Insertion Vector vs. ArrayList

Random Accession
ArrayList al = new ArrayList(); for (int i =0; i < loop; i++) { al.get(i); } Takes 281 ms

Vector vt = new Vector(); for (int i =0; i < loop; i++) { vt.get(i); } Takes 422 ms

Sequential Accession
ArrayList al = new ArrayList(); for (Iterator i = al.iterator(); i.hasNext();) { i.next(); } Takes 1375 ms

Vector vt = new Vector(); for (Iterator i = vt.iterator(); i.hasNext();) { i.next(); } Takes 1890 ms

Insertion
ArrayList al = new ArrayList(); for (int i =1; i < loop; i++) { al.add(0, Integer.valueOf(i)); } Takes 328 ms

Vector vt = new Vector(); for (int i =0; i < loop; i++) { vt.add(0, Integer.valueOf(i)); } Takes 360 ms

Vector is slower than ArrayList in every method Use Vector if only synchronize needed

Summary
Type ArrayList Random (get) 281 Sequential Insertion (Iterator) 1375 1047 1890 328 109 360

LinkedList 5828 Vector 422

Addition and Accession Hashtable vs HashMap

Addition
Hashtable ht = new Hashtable(); for (int i =0; i < loop; i++) { ht.put(Integer.valueOf(i), Integer.valueOf(i)); } Takes 453 ms HashMap hm = new HashMap(); for (int i =0; i < loop; i++) { hm.put(Integer.valueOf(i), Integer.valueOf(i)); } Takes 328 ms

Accession
Hashtable ht = new Hashtable(); for (int i =0; i < loop; i++) { ht.get(Integer.valueOf(i)); } Takes 94 ms HashMap hm = new HashMap(); for (int i =0; i < loop; i++) { hm.get(Integer.valueOf(i)); } Takes 47 ms

Hashtable is synchronized so it is slower than HashMap

Q&A

Reference
O'Reilly Java Performance Tuning 2nd http://www.javaperformancetuning.com http://www.glenmccl.com/jperf/

Future Topic
I/O Logging, and Console Output Sorting Threading Tweak JVM and GC Strategy

The End