Chapter 7. Performance Optimizations

Table of Contents
Code Size vs. Runtime Performance
Optimizing RAM Memory Demand
Summary

In this section we want to give an example on how to achieve optimal runtime performance for your Java application while reducing the code size and RAM memory demand to a minimum. As a small example application we use Pendragon Software's embedded CaffeineMark (tm) 3.0.

Code Size vs. Runtime Performance

Using Smart Linking

When an application is built without giving any additional arguments to the builder, all classes that are part of the application or that are referenced directly or indirectly by any class of the application will be included in the taget application. Due to the large set of library classes that are available, this results in a fairly large application:

	> jamaica CaffeineMarkEmbeddedApp -destination caffeine
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + caffeine__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 58.915745% (1734516 ==> 712613)
	  
	
	> filesize caffeine
	985416
	
      

Using smart linking can significantly reduce the size of the applications since only those library classes, methods and fields are included that may actually be used by the application. Since smart linking cannot analyse the use of reflection or code referenced by classes that will later be loaded dynamically, classes that may be referenced via reflection or dynamically loaded classes have to be excluded from smart linking through the option -includeClasses.

	> jamaica CaffeineMarkEmbeddedApp -smart -destination caffeine_smart
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + CaffeineMarkEmbeddedApp__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 81.37746% (1734516 ==> 323011)
	  
	
	> filesize caffeine_smart
	558780
	
      

The smart linking process in this case reduced the class file data by about 50%, resulting in an application size that is about 400kB smaller. The effect of smart linking on the runtime performance is negligible.

	> ./caffeine
	Sieve score = 399 (98)
Loop score = 328 (2017)
Logic score = 330 (0)
String score = 1397 (708)
Float score = 284 (185)
Method score = 121 (166650)
Overall score = 357
	
	> ./caffeine_smart
	Sieve score = 382 (98)
Loop score = 269 (2017)
Logic score = 368 (0)
String score = 1630 (708)
Float score = 292 (185)
Method score = 128 (166650)
Overall score = 363
	
      

Using Compilation

Compilation can be used to increase the runtime performance of Java applications significantly. Compiled code is typically a factor of about 20 to 30 faster than interpreted code. However, due to the fact that Java bytecode is very compact compared to machine code on CISC or RISC machines, fully compiled applications require significantly more memory.

Warning

Fully compiling an application leads to very poor turn-around times and may require significant amounts of memory during the C compilation phase. It is therefore recommended to use compilation only through profiling or in conjunction with smart linking as described below.

To compile the complete application, the option -compile needs to be set:

	> jamaica -compile CaffeineMarkEmbeddedApp 
-destination caffeine_compile
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + caffeine_compile__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 73.941086% (1734516 ==> 451996)
	  
	
	> filesize caffeine_compile
	13961896
	
      

Combining compilation and smart linking results in a more reasonable build time and application size:

	> jamaica -smart -compile CaffeineMarkEmbeddedApp 
-destination caffeine_smart_compile
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + caffeine_smart_compile__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 85.13701% (1734516 ==> 257801)
	  
	
	> filesize caffeine_smart_compile
	2794268
	
      

Compared to the interpreted version, the performance of the compiled version has improved signficantly. There is nonly little difference between the versions built with and without smart linking which are related to the larger code size and optimization decisions made by the compiler.

	> ./caffeine_compile
	Sieve score = 15155 (98)
Loop score = 21694 (2017)
Logic score = 24529 (0)
String score = 18345 (708)
Float score = 7817 (185)
Method score = 5253 (166650)
Overall score = 13507
	
	> ./caffeine_smart_compile
	Sieve score = 15115 (98)
Loop score = 21632 (2017)
Logic score = 24564 (0)
String score = 19222 (708)
Float score = 8058 (185)
Method score = 18448 (166650)
Overall score = 16857
	
      

Enabling C-compiler optimizations for code size or execution speed can have an important effect on the the size and speed of the application. These optimizations are enabled via setting the command line options -optimize=size or -optimize=speed, respectively.

	> jamaica -smart -compile -optimize=size CaffeineMarkEmbeddedApp 
-destination caffeine_smart_compile_size
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + caffeine_smart_compile_size__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 85.13701% (1734516 ==> 257801)
	  
	
	> filesize caffeine_smart_compile_size
	1947644
	

	> jamaica -smart -compile -optimize=speed CaffeineMarkEmbeddedApp 
-destination caffeine_smart_compile_speed
		  
Jamaica Builder Tool 2.1beta Release 6
Generating code for target 'linux-gnu-i686'
 + caffeine_smart_compile_speed__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 85.13701% (1734516 ==> 257801)
	  
	
	> filesize caffeine_smart_compile_speed
	1960188
	
      

The resulting performace strongly depends on the C compiler that is employed and it may even show anomalies such as better runtime performance for the version optimized for smaller code size:

	> ./caffeine__smart_compile_size
	Sieve score = 20108 (98)
Loop score = 61539 (2017)
Logic score = 71198 (0)
String score = 21438 (708)
Float score = 10907 (185)
Method score = 30134 (166650)
Overall score = 29207
	
	> ./caffeine_smart_compile_speed
	Sieve score = 19888 (98)
Loop score = 45763 (2017)
Logic score = 69151 (0)
String score = 21141 (708)
Float score = 11713 (185)
Method score = 26464 (166650)
Overall score = 27283
	
      

Using Profiling

For faster turn-around times and smaller applications, generation of a profile for compilation is a powerful tool. The profile collects information on the runtime behaviour of an application and guides the compiler in its optimization process and in the selection of which methods to compile and which methods to leave in compact bytecode format.

Apart from performance information, profiling information also collects information on the use of reflection, such that an application that cannot use smart linking due to reflection can profit from smart linking even without manually listing all classes that are referenced via reflection.

To generate the profile, we first need to create a profiling version of the applications using the builder option -profile.

The profiling run can be performed on a different system than the target system. The information collected via several profiling runs can be joined by simple file concatenation (e.g., via redirection using ">>"). Also, the profiling application does not need equal the target application completely, profiling data could be generated by a different application that was generated only for this purpose (e.g., using a different main class or termination the application after several iteration, while the later application will run perpetuously).

For our example application, we generate a profile using jamaica as follows:

	> jamaica -smart -profile >prof CaffeineMarkEmbeddedApp
      

Alternatively, we could use jamaicavmp to run the application and generate profiling data:

	> jamaicavmp >prof CaffeineMarkEmbeddedApp
      

Now, an application can be compiled using this profiling data:

	>jamaica -smart -useProfile prof -optimize=speed 
CaffeineMarkEmbeddedApp -destination caffeine_useProfile10_speed
		  
Jamaica Builder Tool 2.2 Release 1
Generating code for target 'linux-gnu-i686'
 + caffeine_useProfile10_speed__.c
 * C compiling
 * linking
 * stripping
Class file compaction gain: 81.4599% (1734516 ==> 321581)
	  
	
	> filesize caffeine_useProfile10_speed
	603548
	
      

The resulting application size is only slightly larger than the interpreted version, but the runtime performance is nearly the same as that of the fully compiled version:

	>./caffeine_useProfile10_speed
		  
Sieve score = 20393 (98)
Loop score = 46501 (2017)
Logic score = 69074 (0)
String score = 21915 (708)
Float score = 11201 (185)
Method score = 29851 (166650)
Overall score = 27981
	  
	
      

When using a profile to guide the compiler, by default 10% of the methods that were executed during the profile run will be compiled. This value results in a moderate code size increase compared to fully interpreted code and typically results in a runtime performance that is very close to fully compiled code. Via the builder option -percentageCompiled, this default setting can be changed to a value between 0% and 100% to permit fine adjustment. Note that a value of 100% is not equal to setting option -compile, since the percentage refers only to those methods that were executed during the profiling run. Methods that where not at all executed during the profiling run will not be compiled when -useProfile is used.

Entries in the profile can be edited manually, for example to enforce compilation of a method that is performance critical. For example, the profile generated for this example contains the following entry for the method size() of class java.util.Vector.

		  
PROFILE: 64 (0%)        java/util/Vector.size()I
 	  
	
      

To enforce compilation of this method even when -percentageCompiled is not set to 100%, the profiling data can be changed to a higher value, e.g.,

		  
PROFILE: 1000000 (0%)        java/util/Vector.size()I