Sunday, December 25, 2011

Java Performance Tuning, Profiling, and Memory Management

Java application performance is an abstract word until you face its real implications. It may vary depending on your interpretation of the word 'performance'. This article is meant to give the developer a perspective of the various aspects of the JVM internals, the controls and switches that can be altered to optimal effects that suit your application. There is no single size that can fits all. You need to customize to suit your application.

You may be facing one of the issues listed below:

  1. The dreaded java.lang.OutOfMemory Error
  2. Your application is literally crawling.

Before we take the plunge into solving the issues, we first need to understand some of the theory behind the issues.

Theory

What does the JVM do? The Java Virtual Machine has two primary jobs:

  1. Executes Code
  2. Manages Memory
    This includes allocating memory from the OS, managing Java allocation including heap compaction, and removal of garbaged objects

Besides the above, the JVM also does stuff like managing monitors.

Very Basic Java Theory

An object is created in the heap and is garbage-collected after there are no more references to it. Objects cannot be reclaimed or freed by explicit language directives. Objects become garbage when they’re no longer reachable from the root set (e.g static objects)

Objects inside the blue square are reachable from the thread root set, while objects outside the square (in red) are not.

The sequence of the garbage collection process is as follows:

1. Root set tracing and figure out objects that are not referenced at all.
2. Put the garbage objects from above in finalizer Q
3. Run finalize() of each of these instances
4. Free memory

Infant mortality in Java

Most of the objects (80%) in a typical Java application die young. But this may not be true for your application. Hence there is a need to figure out this rough infant mortality number so that you can tune the JVM accordingly.

JVM flavors

The Sun JVM understands the options -classic, -client and -server

  • classic : disables the Hotspot JIT compiler.
  • client (default): activates the Hotspot JIT for "client" applications.
  • server: activates the "server" Hotspot JIT: it requires a fair amount of time to warm up, but delivers best performance for server.

Don't forget that, if you use them, -server or -client must be the first argument to Java.

The Hotspot JVM uses adaptive optimization

  • JVM begins by interpreting all code, but it monitors the HotSpot
  • Fires off a background thread that compiles hotspot bytecode to native code
  • Hotspot JVM is only compiling and optimizing the "hot spot". Hotspot JVM has more time than a traditional JIT to perform optimizations
  • The Hotspot JVM keeps the old bytecodes around in case a method moves out of the hot spot.

Java Garbage Collector

The following describes what the Java Garbage Collector does.

Sun Classic (1.1 JVM) ...for historical reasons

  • Mark, Sweep & Compact
    Mark: identify garbage
    Sweep: Find garbage on heap, de-allocate it
    Compact: collect all empty memory together
  • Eligibility for garbage collection is determined by walking across memory, determining reachability and then compacting the heap
  • Compaction is just copying the live objects so that they’re adjacent in memory
  • there’s one large, contiguous block of free memory
  • The main problem with classic mark, sweep and compact is that all other threads have to be suspended while the garbage collector runs
  • Pause time is proportional to the number of objects on the heap


Sun HotSpot (1.2+ JVM)

  • Sun improved memory management in the Java 2 VMs by switching to a generational garbage collection scheme.
  • The Java Heap is separated into two regions(we will exclude the Permanent Generation for the time being):
    New Objects
    Old Objects
  • The New Object Regions is subdivided into three smaller regions:
    1. Eden, where objects are allocated
    2. Survivor semi-spaces: From and To
  • The Eden area is set up like a stack - an object allocation is implemented as a pointer increment. When the Eden area is full, the GC does a reachability test and then copies all the live objects from Eden to the To region.
  • The labels on the regions are swapped
  • To becomes from - now the From area has objects.

JVM Heap:

Java Heap is divided into 3 generations: Young(Eden), Old(Tenured), and Permanent.

Arrangement of generations:

The diagram below shows how objects get created in new generation and then move to survivor Spaces at every GC run, and if they survive for long to be considered old, they get moved to the tenured generation. The number of times an object needs to survive GC cycles to be considered old enough can be configured.

By default, Java has 2 separate threads for GC, one each for young (minor GC) and old generation (major GC). The minor GC (smaller pause, but more frequent) occurs to clean up garbage in the young generation, while the major GC (larger pause, but less frequent) cleans up the garbage in the old generation. If the major GC too fails to free required memory, the JVM increases the current memory to help create new object. This whole cycle can go on till the current memory reaches the MaxMemory for the JVM (default is 64MB for client JVM), after which JVM throws OutOfMemory Error.

Permanent Generation

Class information is stored in the perm generation. Also constant strings are stored there. Strings created dynamically in your application with String.intern () will also be stored in the perm generation. Reflective objects (classes, methods, etc.) are stored in perm. It holds all of the reflective data for the JVM

JVM process memory

The windows task manager just shows the memory usage of the java.exe task/process. It is not unusual for the total memory consumption of the VM to exceed the value of -Xmx Managed Heap (java heap, PERM, code cache) + NativeHEAP + ThreadMemory <= 2GB (PAS on windows)
Code-cache contains JIT code and hotspot code.
Thread Memory = Thread_stack_size*Num_threads.ManagedHeap: Managed by the developer.
Java heap: This part of the memory is used when you create new java objects.

Perm: for reflective calls etc.
Native Heap: Used for native allocations. Thread Memory: used for thread allocations.

What you see in the Task Manager is the total PAS, while what the profiler shows is the Java Heap and the PERM (optionally)

Platforms Maximum PAS*

  1. x86 / Redhat Linux 32 bit 2 GB
  2. x86 / Redhat Linux 64 bit 3 GB
  3. x86 / Win98/2000/NT/Me/XP 2 GB
  4. x86 / Solaris x86 (32 bit) 4 GB
  5. Sparc / Solaris 32 bit 4 GB

Why GC needs tuning

• • •

Limits of vertical scaling

If F is the fraction of a calculation that is sequential (i.e. cannot benefit from parallelization), and (1 − F) is the fraction that can be parallelized, then the maximum speedup that can be achieved by using N processors is:
1
------------ Amdahl's law
F + (1-F)/N

In the limit, as N -> infinity, the maximum speedup tends to 1/F. If F is only 10%, the problem can be sped up by only a maximum of a factor of 10, no matter how large the value of N used.

So we assume that there is a scope of leveraging benefits of multiple CPUs or multithreading. All right, enough of theory..........can it solve my problem??

Problem Statements

  1. Application slow
    your application may be crawling because it's spending too much time cleaning up the garbage, rather than running the app.

    Solution: Need to tune the JVM parameters. Take steps to Balance b/w pause and GC freq.
  2. Consumes too much memory
    The memory footprint of the application is related to the number and size of the live objects that are in the JVM at any given point of time. This can be either due to valid objects that are required to stay in memory, or because programmer forgot to remove the reference to unwanted objects (typically known as 'Memory leaks' in java parlance. And as the memory footprint hits the threshold, the JVM throws the java.lang.OutOfMemoryError.

Java.lang.OutOfMemoryError can occur due to 3 possible reasons:
1. Java Heap space low to create new objects. Increase by -Xmx (java.lang.OutOfMemoryError: Java heap space).
java.lang.OutOfMemoryError: Java heap space
MaxHeap=30528 KB Total Heap=30528 KB Free Heap=170 KB Used Heap=30357 KB

2. Permanent Generation low. Increase by XX:MaxPermSize=256m (java.lang.OutOfMemoryError: PermGen space)
java.lang.OutOfMemoryError: PermGen space
MaxHeap=65088 KB TotalHeap=17616 KB FreeHeap=9692 KB UsedHeap=7923 KB

Heap def new generation total 1280K, used 0K [0x02a70000, 0x02bd0000, 0x02f50000)

Eden space 1152K, 0% used [0x02a70000, 0x02a70000, 0x02b90000)

From space 128K, 0% used [0x02bb0000, 0x02bb0000, 0x02bd0000)

To space 128K, 0% used [0x02b90000, 0x02b90000, 0x02bb0000)

Tenured generation total 16336K, used 7784K [0x02f50000, 0x03f44000, 0x06a70000)

the space 16336K, 47% used [0x02f50000, 0x036ea3f8, 0x036ea400, 0x03f44000)

Compacting perm gen total 12288K, used 12287K [0x06a70000, 0x07670000, 0x07670000)

The space 12288K, 99% used [0x06a70000, 0x0766ffd8, 0x07670000, 0x07670000)

3. Java.lang.OutOfMemoryError: Out of swap space ...

JNI Heap runs low on memory, even though the Java Heap and the PermGen have memory. This typically happens if you are making lots of heavy JNI calls, but the Java Heap objects occupy little space. In that scenario the GC might not feel the urge to cleanup Java Heap, while the JNI Heap keeps on increasing till it goes out of memory.

If you use java NIO packages, watch out for this issue. Direct Buffer allocation uses the native heap.

The Native Heap can be increased by -XX:MaxDirectMemorySize=256M (default is 128)

Diagnosis:

There are some starting points to diagnose the problem. You may start with the

'-verbose: gc’ flag on the java command and see the memory footprint as the application progresses, till you find a spike. You may analyze the logs or use a light profiler like JConsole (part of JDK) to check the memory graph. If you need the details of the objects that are occupying the memory at a certain point, then you may use JProfiler or AppPerfect which can provide the details of each object instance and all the in/out bound references to/from it. This is a memory intensive procedure and not meant for production systems. Depending upon your application, these heavy profilers can slow down the app up to 10 times.

Below are some of the ways you can zero-in on the issue.

A) GC outputs
-verbose: gc

This flag starts printing additional lines to the console, like given below

[GC 65620K -> 50747K(138432K), 0.0279446 secs]
[Full GC 46577K -> 18794K(126848K), 0.2040139 secs]
Combined size of live objects before(young+tenured) GC -> Combined size of live objects(young+tenured) after GC (Total heap size, not counting the space in the permanent generation
-XX:+PrintHeapAtGC : More details
•-XX:+PrintGCTimeStamps will additionally print a time stamp at the start of each collection.
111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]
111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs]
26282K->2311K(32704K), 0.1293306 secs]
The collection starts about 111 seconds into the execution of the application. The tenured generation usage was reduced to about 10%
18154K->2311K(24576K)


B) Hprof output file
java –Xrunhprof:heap=sites, cpu=samples, depth=10,thread=y, doe=y
The heap=sites tells the profiler to write information about memory utilization on the heap, indicating where it was allocated.
cpu=samples tells the profiler to do statistical sampling to determine CPU use.
depth=10 indicates the depth of the trace for threads.
thread=y tells the profiler to identify the threads in the stack traces.
doe=y tells the profiler to produce dump of profiling data on exit.


C) -XX: +HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=C:\OOM.txt

Dump the heap on OOM, and then analyze the OOM.txt (Binary file) with jhat tool (bundled with JDK)

The command below will launch http server @port 7777. Open a browser with the URL 'http://localhost:7777' to see the results.

jhat -port 7777 c:\OOM.txt


D) Profiling the app

You can profile the application to figure out Memory Leaks.

Java memory leaks (or what we like to call unintentionally retained objects), are often caused by saving an object reference in a class level collection and forgetting to remove it at the proper time. The collection might be storing 100 objects, out of which 95 might never be used. So in this case those 95 objects are creating the memory leak, since the GC cannot free them as they are referenced by the collection.

There are also other kinds of problems with managing resources that impact performance, such as not closing JDBC Statements/ResultSets in a finally block (many JDBC drivers store a Statement reference in the Connection object).

A java "memory leak" is more like holding a strong reference to an object though it would never be needed anymore. The fact that you hold a strong reference to an object prevents the GC from deallocating it. Java "memory leaks" are objects that fall into category (2). Objects that are reachable but not "live" can be considered memory leaks.

JVMPI for Profiling applications give a high level of detailing
Profilers: Hprof, JConsole, JProfiler, AppPerfect, YourKit, Eclipse Profiler, NetBeans Profiler, JMP, Extensible Java Profiler (EJP), TomcatProbe, Profiler4j

JConsole is good for summary level info, tracking the memory footprint, checking Thread deadlocks etc. It does not provide details of the Heap object. For Heap details you may use AppPerfect (licensed) or JProfiler.

E) For Native Heap issues.....

JRockit JDK (from BEA) provides better tools than the SUN JDK to peep inside the JNI Heap (atleast on Windows).

JRockit Runtime Analyzer ...this is part of the jrockit install.
jrcmd PSID print_memusage JRMC.exe ...launch from /bin and start recording.

Try to get some Solution:

Based on the findings from the diagnosis, you may have to take these actions:

  1. Code change - For memory leak issues, it has to be a code change.
  2. JVM parameters tuning - You need to find the behavior of your app in terms of the ratio of young to old objects, and then tune the JVM accordingly. We ll talk about when to tune a parameter as we discuss the relevant params below.
  3. Memory parameters:
    Memory Size: overall size, individual region sizes
    -ms, -Xms sets the initial heap size (young and tenured generation ONLY, NOT Permanent)

    If the app starts with a large memory footprint, then you should set the initial heap to a large value so that the JVM does not consume cycles to keep expanding the heap.

    -mx, -Xmx sets the maximum heap size(young and tenured gen ONLY,NOT Perm) (default: 64mb)

    This is the most frequently tuned parameter to suit the max memory requirements of the app. A low value overworks the GC so that it frees space for new objects to be created, and may lead to OOM. A very high value can starve other apps and induce swapping. Hence, Profile the memory requirements to select the right value.

    -XX: PermSize=256 -XX: MaxPermSize=256m

    MaxPermSize default value (32mb for -client and 64mb for -server)
    Tune this to increase the Permanent generation max size.

  1. GC parameters:

    -Xminf [0-1], -XX:MinHeapFreeRatio [0-100]
    sets the percentage of minimum free heap space - controls heap expansion rate

    -Xmaxf [0-1], -XX:MaxHeapFreeRatio [0-100]
    sets the percentage of maximum free heap space - controls when the VM will return unused heap memory to the OS

    -XX: NewRatio
    sets the ratio of the old and new generations in the heap. A New Ratio of 5 sets the ratio of new to old at 1:5, making the new generation occupy 1/6th of the overall heap , defaults: client 8, server 2

    -XX:SurvivorRatio
    sets the ratio of the survivor space to the Eden in the new object area. A SurvivorRatio of 6 sets the ratio of the three spaces to 1:1:6, making each survivor space 1/8th of the new object region

Garbage Collector Tuning:

Types of GarbageCollectors (not complete list)

  1. Throughput collector: (default for Server JVM)
    • parallel version of the young generation collector.
    • -XX: +UseParallelGC (minor collection)
    • the tenured GC is the same as the serial collector (default GC for client JVM).
    • multiple threads to execute a minor collection
    • application has a large number of threads allocating objects / large Eden
    •-XX:+UseParallelOldGC (major also in parallel)
  2. Concurrent low pause collector:
    •collects the tenured generation and does most of the collection concurrently with the execution of the application. Attempts to reduce the pause times needed to collect the tenured generation
    •-Xincgc™ or -XX: +UseConcMarkSweepGC
    • The application is paused for short periods during the collection. A parallel version of the young generation copying collector is used with the concurrent collector.
    •Multiprocessor; apps that have a relatively large set of long-lived data (a large tenured generation);
    •Apps where response time is more important than overall throughput e.g. JAVA_OPTS= -Xms128M -Xmx1024M -XX:NewRatio=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:E:\loggc.txt

    FlipSide: Synchronization overhead, Fragmentation

Performance Solution

  1. Application Software profiling
  2. Server and JVM tuning
  3. Right Hardware and OS
  4. Code improvement as per the Behaviour of your application & profiling results….…. easier said than done
  5. Use JVM the right way : optimal JVM params
  6. Client / server application
  7. -XX:+UseParallelGC if u have multiprocessors

Some Tips

· Unless you have problems with pauses, try granting as much memory as possible to the virtual machine

· Setting -Xms and -Xmx to the same value ….but be sure about the application behavior

· Be sure to increase the memory as you increase the number of processors, since allocation can be parallelized

· Don’t forget to tune the Perm generation

· Minimize the use of synchronization

· Use multithreading only if it benefits. Be aware of the thread overheads. E.g. a simple task like counter incrementing from 1 to billion ... use single thread. Multiple threads will run to multiple of 10. I tested it out on dual CPU WinXP with 8 threads.

· Avoid premature object creation. Creation should be as close to the actual place of use as possible. Very basic concept that we tend to overlook.

· JSPs are generally slower than servlets.

· Too many custom CLs, reflection: increase Perm generation. Don't be PermGen-agnostic.

· Soft References for memory leakages. They enable smart caches and yet do not load memory. GC will flush out Soft References automatically if the JVM runs low on memory.

· String Buffer instead of String concat

· Minimize JNI calls in your code

· XML APIs – be careful …SAX or DOM- make correct choice. Use precompiled xpaths for better performance of the queries.

Conclusion:

There can be various bottlenecks for the entire application, and application JVM may be one of the culprits. There can be various reasons like JVM not tuned optimally to suit your application, Memory leakages, JNI issues etc. They need to be diagnosed, analyzed and then fixed.

Sun started splitting up the memory into generations since JDK1.4. Earlier heap memory was there as a single generation. The heap memory was divided into generations with an objective to reduce the time taken the Garbage Collection process by utilizing the memory better

7 comments:

  1. Very informative article post. Really looking forward to read more. Will read on…

    Java and J2EE Training in Chennai - AmitySoft

    ReplyDelete
  2. bf Hızlı takipçi almak için takipçi satın al
    Organik takipçi almak için takipçi satın al
    Bilgisayardan takipçi almak için takipçi satın al
    Mobil cihazdan takipçi almak için takipçi satın al
    Gerçek ve orijinal takipçi almak için takipçi satın al
    Yazarkasa ile takipçi almak için takipçi satın al
    Bitcoin takipçi almak için takipçi satın al
    Pos ile takipçi almak için takipçi satın al
    EFT ile takipçi almak için takipçi satın al
    Havale ile takipçi almak için takipçi satın al
    Mobil ödeme ile takipçi almak için takipçi satın al
    Tamamı orijinal takipçi almak için takipçi satın al
    Organik ile takipçi almak için takipçi satın al
    Türkiye takipçi almak için takipçi satın al
    Global takipçi almak için takipçi satın al
    En hızlı instagram takipçi satın al
    En uygun instagram takipçi satın al
    En telafili instagram takipçi satın al
    En gerçek spotify takipçi satın al
    En ucuz instagram takipçi satın al
    En otomatik instagram takipçi satın al
    En sistematik tiktok takipçi satın al
    En otantik instagram takipçi satın al
    En opsiyonel instagram takipçi satın al
    En güçlü instagram takipçi satın al
    En kuvvetli instagram takipçi satın al
    En seri instagram takipçi satın al
    En akıcı instagram takipçi satın al
    En akıcı instagram takipçi satın al
    En akıcı instagram takipçi satın al
    En akıcı instagram takip etmeyenler

    ReplyDelete