Internal workings of Java Virtual Machine

JVM Architecture

Introduction

The Java virtual machine is an abstract (virtual) computer defined by JVM specification. JVM’s primary function is to execute the byte code and reference implementation is given out by the HotSpot JVM. HotSpot JVM is available in 32-bit and 64-bit variants. The JVM is primarily written in CPP and is open-sourced. JVM architecture can be divided into three subsystems.

  • ClassLoader subsystem
    • A mechanism to load classes into JVM.
  • Run time data area
    • When a Java virtual machine runs a program, it needs memory to store many things, including bytecodes, objects, parameters, return values, local variables, and intermediate results of computations.
  • Execution Engine
    • A mechanism responsible for executing the instructions contained in the methods of loaded classes

Architecture

The reference implementation of JVM is released by combined efforts of Oracle, and the OpenJDK project. JDK-17 is the latest release from JSR-392(Java specification request) and the documentation can be accessed from https://docs.oracle.com/javase/specs/index.html

We would be covering an overview of JVM architecture as defined in the specification.

ClassLoader

This has been explained in detail here: https://www.jonesjalapat.com/2021/09/05/how-a-class-is-loaded-in-java-classloader/

RunTimeData Area

When a Java virtual machine runs a program, it needs memory to store many things. This includes bytecodes and other information it extracts from loaded class files, objects, methods, return values, local variables, and intermediate results of computations. The Java virtual machine organizes the memory it needs to execute a program into several runtime data areas. Some of these start at application startup and end when the app exits, while some others are started when the thread initializes and exits as the thread completes.

Method Area(Metaspace)

  • One Method area per JVM instance, shared by all threads.
  • JVM places class information into method, which include
    • fully qualified name, superclasses, interfaces, modifiers,
    • field information
    • method information
    • static variables
    • reference to class ClassLoader which loaded the class, and Class which gives the information of class from method area like getSuperClass(), getClassLoader() etc..
  • Method area may be garbage collected depending on VM implementation.
  • Logically method area is part of heap before Java8
  • OutOfMemoryError can be thrown
  • Metaspace : OpenJDK Hotspot VM implementation of Method Area
    • Permanent generation has been completely removed in JDK 8.
    • In JDK 8, classes metadata is now stored in the native heap and this space is called Metaspace.
    • By default class metadata allocation is only limited by the amount of available native memory, however there are flags you can set to limit it -XX:MaxMetaspaceSize.
    • The Metaspace employs Metaspace VM for memory management
    • The idea behind the Metaspace is, as long as the classloader is alive, the metadata(class data) remains alive in the Metaspace and can’t be freed.
  • Run-Time Constant Pool
    • constant pool table :
      • Constant pool can be imagined like an array which can be accessed by index, and creates one for each class it loads.
      • It stores constants used in class like literals(values of int, float…) and symbolic reference of classes, fields and methods.
      • javap -v class.name to view the contant pool for a compiled class.
    • run-time constant pool is a per-class or per-interface run-time representation of the constant_pool table in a class file. Each run-time constant pool is allocated from the Java Virtual Machine’s method area
    • When creating a class or interface, if the construction of the run-time constant pool requires more memory than can be made available in the method area of the Java Virtual Machine, the Java Virtual Machine throws an OutOfMemoryError.
An Example code 

package com.jonesjalapat.classloader;

public class Test {
	public static void main(String[] args) {
		System.out.println(" Hello World");
	}

}
Compiled code in readable format obtained from javap -c classFile

public class com.jonesjalapat.classloader.Test {
  public com.jonesjalapat.classloader.Test();
    Code:
       0: aload_0
       1: invokespecial #8                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #16                 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #22                 // String  Hello World
       5: invokevirtual #24                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return
}
Constant Pool table of the class 

Constant pool:
   #1 = Class              #2             // com/jonesjalapat/classloader/Test
   #2 = Utf8               com/jonesjalapat/classloader/Test
   #3 = Class              #4             // java/lang/Object
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = Utf8               Code
   #8 = Methodref          #3.#9          // java/lang/Object."<init>":()V
   #9 = NameAndType        #5:#6          // "<init>":()V
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               Lcom/jonesjalapat/classloader/Test;
  #14 = Utf8               main
  #15 = Utf8               ([Ljava/lang/String;)V
  #16 = Fieldref           #17.#19        // java/lang/System.out:Ljava/io/PrintStream;
  #17 = Class              #18            // java/lang/System
  #18 = Utf8               java/lang/System
  #19 = NameAndType        #20:#21        // out:Ljava/io/PrintStream;
  #20 = Utf8               out
  #21 = Utf8               Ljava/io/PrintStream;
  #22 = String             #23            //  Hello World
  #23 = Utf8                Hello World
  #24 = Methodref          #25.#27        // java/io/PrintStream.println:(Ljava/lang/String;)V
  #25 = Class              #26            // java/io/PrintStream
  #26 = Utf8               java/io/PrintStream
  #27 = NameAndType        #28:#29        // println:(Ljava/lang/String;)V
  #28 = Utf8               println
  #29 = Utf8               (Ljava/lang/String;)V
  #30 = Utf8               args
  #31 = Utf8               [Ljava/lang/String;
  #32 = Utf8               SourceFile
  #33 = Utf8               Test.java


symbolic reference : symbolic references loosely are strings that can be used to retrieve the actual object.
For ex, #22 is a valid index into the constant_pool table pointing to a String value. All references in the run-time constant pool are initially symbolic. The symbolic references in the run-time constant pool are created from different binary structures defined in https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4, ex: CONSTANT_Class_info for class, CONSTANT_String_info for String etc.

Heap

  • One Method area per JVM instance, shared by all threads, and are stored in RAM.
  • JVM places instantiated objects into heap.
  •  Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector);
  • The heap is created on virtual machine start-up
  • OutOfMemoryError can be thrown.
  • Heap space is divided into
    • Eden Space: The pool from which memory is initially allocated for most objects.
    • Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.
    • Tenured Generation or Old Gen: The pool containing objects that have existed for some time in the survivor space.

Java Stacks

  • Each thread has a seperate Java stack and hence data is threadsafe.
  • Java stack consists of stack frames, and are stored in RAM.
  • When a thread invokes a method, the Java virtual machine pushes a new frame onto that thread’s Java stack. When the method completes, the virtual machine pops and discards the frame for that method.
  • method variables, and other intermediate variables are stored in stack itself.
  • StackOverflowError, and OutOfMemoryError can be throwm

Let us see the code below and see How stack and heap are allocated.

package com.jonesjalapat.classloader;

import java.util.ArrayList;
import java.util.List;

public class Test {
	
	int a = 10;
	String string1 = "abc";
	List<String> list1 = new ArrayList<>();

	
	public static void main(String[] args) {
		int b = 0 ;
		Test testObject = new Test();
		String string2 = "abc";
		
		testObject.printList1(string2);

		List<Integer> list2 = new ArrayList<>();
		list2.add(testObject.a);
		list2.add(b);
		list2.stream().forEach(System.out::println);
	}

	private  void printList1(String string2) {
		list1.add(string2);
	    list1.add(string1);
		list1.stream().forEach(System.out::println);		
	}

}
Memory allocation in heap and stop for the code above, rectangle are refs, and square are objects

PC registers

  • Each thread has a seperate Program counter.
  • The address of current executing method in JVM thread is stored here.
  • For native method execution, the address of PC is undefined.

Native method stacks

  • Each thread has a seperate Java stack.
  • Stores the state of native method invocations in the thread

Execution Engine

The execution engine executes the bytecode as it encounters them one at a time. The specification does not specify how to implement this, however, HotSpot JVM describes a reference implementation that can be loosely categorized into the following components.

Interpreter

Java Virtual Machine instructions are a stream of byte codes to be executed one after the other, an instruction consists of an opcode specifying the operation to be performed, followed by zero or more operands embodying values to be operated upon.

Short description of the instruction : 
mnemonic operand1 operand2 ...

mnemonic = opcode which are instruction set defined in JVM specifications. For ex : 
aaload = load references from array
istore = store int to local variable
The complete list of instructions could be read from JAVA DOCS


An example of instructions picked from https://www.artima.com/insidejvm/ed2/jvm10.html
// Bytecode stream: 03 3b 84 00 01 1a 05 68 3b a7 ff f9
// Disassembly:
// Left column: offset of instruction from beginning of method
// |   Center column: instruction mnemonic and any operands
// |   |                   Right column: comment
   0   iconst_0           // 03
   1   istore_0           // 3b
   2   iinc 0, 1          // 84 00 01
   5   iload_0            // 1a
   6   iconst_2           // 05
   7   imul               // 68
   8   istore_0           // 3b
   9   goto 2             // a7 ff f9

Execution

A runtime implementation of an execution engine is called a thread, hence each thread of Application is a distinct instance of Virtual Machine’s Execution Engine. This engine then finally executes the instructions in the processor.

JIT compiler

Most programs spend 80 to 90 percent of their time executing 10 to 20 percent of the code. JVM monitors the code getting executed, and picks the most commonly executed code, and fires off a thread which picks this common code and directly compiles into machine code also it does optimization on these machine code. This adaptive compilation increases the performance of the Java applications.

GarbageCollector

Objects and Arrays are never explicitly de-allocated instead the garbage collector automatically reclaims them, GC inspects the object on the heap, checks if they are still referenced, and releases the memory used by those objects that are no longer needed. Since this is a very critical component, we will be elaborating on this.

GC Process
  • Step 1: Mark
    • This is where the garbage collector identifies which pieces of memory are in use and which are not.
  • Step 2: Sweep(Normal Deletion)
    • Normal deletion removes unreferenced objects leaving referenced objects and pointers to free space.
  • Step 3: Compact
    • Moving referenced object together, this makes new memory allocation much easier and faster.

To clear the memory, application threads have to be stopped. This is where the work of your application stops and the next steps start. The garbage collector (Step 1)marks objects that are no longer used and (Step 2)reclaims the memory. Finally, an optional step of heap (Step 3)resizing may happen if possible. Then the circle starts again, application threads are started. The full circle of Garbage Collection is called epoch.

JVM Generations:

Having to mark and compact all the objects in a JVM is inefficient, analysis of applications has shown that most objects are short-lived. Fewer and fewer objects remain allocated over time, henceforth, the heap is broken up into smaller parts or generations. The heap parts are Young Generation and Tenured Generation. The key takeaway in this process is to minimize the application stoppages

GC generations

A simplified GC approach is defined as, when an object is created it is first put into the young generation space into the Eden space. Once the young garbage collection happens the object is promoted into Survivor space 0 and next into Survivor space 1. If the object is still used at this point the next garbage collection cycle will move it to the Tenured space which means that it is moved to the old generation.

GC Events
  • Minor event – happen when the Eden space is full and moves data to Survivor space.
  • Mixed event – a Minor event plus reclaim of the Tenured generation
  • Full GC event – a young and old generation space clearing together, causes a Stop the world.
GC Types
  • Serial Garbage Collector
    •  This GC freezes(STOP the world) all the application threads and executes the Garbage collection in using single thread which is little more efficient since there are no communication overhead.
  • Parallel Garbage Collector
    • This is same as the above only that GC process runs in multiple threads, was the default till Java 8.
    • This means that GC pools in threads to complete garbage collection than using a single thread.
  • Concurrent Mark Sweep Garbage Collector
    • The app and GC thread run concurrenty, used for applications that prefer shorter garbage collection pauses and can afford to share processor resources with the garbage collection.
    • Young generation of CMS actually performs like Parallel generation with short pauses, the old generation are subjected to concurrency, however in that aswell some stages like Mark, Re-Mark is done with STW, henceforth it is also called Mostly CMS Garbage Collector.
    • The apps tend to perform slower on avg but there are less STOP the worlds happening.
  • G1(Garbage First) Garbage Collector

This server-style collector is for multiprocessor machines with a large amount of memory. It meets garbage collection pause-time goals with high probability while achieving high throughput. Default GC from Java 9. G1 partitions the heap into a set of equally sized heap regions, each a contiguous range of virtual memory as shown in fig.

G1 Heap Layout

G1 uses the features such as generational, incremental, parallel, and concurrency. It also monitors pause-time goals in each of the stop-the-world pauses.

Some operations are always performed in stop-the-world pauses to improve throughput. Global marking is performed in parallel and concurrently with the application, whereas G1 does space-reclamation incrementally in steps and in parallel. G1 achieves predictability by tracking information about previous application behavior and garbage collection pauses to build a model of the associated costs. It uses this information to size the work done in the pauses. For example, G1 reclaims space in the most efficient areas first (that is the areas that are mostly filled with garbage, therefore the name). This helps in reducing the time-consuming Full GC.

G1 reclaims space by using evacuation: live objects found within selected memory areas to collect are copied into new memory areas, compacting them in the process. After an evacuation has been completed, the space previously occupied by live objects is reused for allocation by the application. For further details check the official doc https://docs.oracle.com/en/java/javase/13/gctuning/garbage-first-garbage-collector.html

  • Z Garbage Collector
    • It is the latest GC and is in experimental stage. This is designed to be highly scalable and low latency. The production version is available from JAVA 15

There are numerous types of GC, and Java 9 uses  Concurrent Mark Sweep and G1 garbage collector. The primary problem with GC is that developers are not in charge of when a GC can occur, hence can cause unpredictable pauses or disruption in the working of the application.

Native Method Interface

The JNI is a native programming interface. It allows Java code that runs inside a Java Virtual Machine (VM) to interoperate with applications and libraries written in other programming languages, such as C, C++, and assembly.

  • By programming through the JNI, you can use native methods to:
    • Create, inspect, and update Java objects (including arrays and strings).
    • Call Java methods.
    • Catch and throw exceptions.
    • Load classes and obtain class information.

Virtual Machine Errors

A Java Virtual Machine implementation throws an object that is an instance of a subclass of the class VirtualMachineError when an internal error or resource limitation prevents it from implementing the instructions.

  • InternalError: An internal error has occurred in the Java Virtual Machine implementation because of a fault in the software implementing the virtual machine, a fault in the underlying host system software, or a fault in the hardware. This error is delivered asynchronously when it is detected and may occur at any point in a program.
  • OutOfMemoryError: The Java Virtual Machine implementation has run out of either virtual or physical memory, and the automatic storage manager was unable to reclaim enough memory to satisfy an object creation request.
  • StackOverflowError: The Java Virtual Machine implementation has run out of stack space for a thread, typically because the thread is doing an unbounded number of recursive invocations as a result of a fault in the executing program.

2 thoughts on “Internal workings of Java Virtual Machine”

Comments are closed.

error: Content is protected !!
Scroll to Top