Have you ever encountered circumstance where your applications pcdesigner maxes out and never goes down even if traffic volume goes down? Did you had to recycle to JVM to remediate the problem? Even if you recycle the JVM, does your CPU start to spike up after some time?
Full GC is an important phase of Garbage Collection process. During this phase, entire JVM is frozen, every single object in the memory is evaluated for garbage collection, naturally, it turns out to be a CPU intensive operation. If application happens to have memory leak, then “Full GC” will start to run repeatedly without reclaiming any memory. When ‘Full GC’ runs repeatedly, CPU will start to spike up and never come down.
Tactical Solution: To resolve the problem completely, memory leak in the application has to be fixed. Resolving memory leaks might take some time. (Of course it goes without saying, you can engage experts like me.to resolve it quickly). Until then below mentioned tactical solution can be implemented to keep the application functioning in production. You need to instrument a script which would monitor garbage collection log file of the application for every 2 minutes. If the script notices more than 3 ‘Full GC’ runs in a 10-minute window, then that particular JVM should be decommissioned from taking production traffic. JVM should be recycled after capturing thread dump and heap dump. After recycling JVM should be placed back to take active traffic.
Sometimes due to bug in your code or in the 3rd party library that you use – loop constructs (while, for, do.while) may run forever. Consider the scenario below:
Due to certain data condition or bug in the code, ‘myCondition’ may never get satisfied. In such scenario, thread would be spinning infinitely in the while loop. This would cause the CPU to spike up. Unless JVM is restarted, CPU maxing out wouldn’t stop at all.
Solution: When you observe CPU maxing out and utilization not coming go down, you should take 2 thread dumps in a gap of 10 seconds between each thread dump – right when problem is happening. Every thread in “runnable” state in the first taken thread dump should be noted down. Same threads state in the second thread dump should be compared. If in the second thread dump also those threads remain the runnable state within the same method, then it would indicate in which part of the code thread(s) are looping infinitely. Once you know which part of the code is looping infinitely then it should be trivial to address the problem.
When multiple threads tries to access HashMap’s get() and put() APIs concurrently it would cause threads go into infinite looping. This problem doesn’t happen always, but rarely it does happens.
Solution: When you observe CPU maxing out and utilization not coming go down, you should take a thread dump – right when problem is happening. You need to see which are threads that are in “runnable” state. If that thread happens to be working on HashMap’s get() or put() API, then it’s indicative that HashMap is causing CPU spike. Now you can replace that HashMap with ConcurrentHashMap.