Table of Contents
- Introduction
- Language Basics
- Language Statements
- Class statement
- Version statement
- Name statement
- Superclass statement
- Interfaces statement
- Signature statement
- Deprecated statement
- Synthetic statement
- Source file statement
- Constant statements
- Utf8 constant statement
- Class reference statement
- Name and type statement
- Field reference statement
- Method reference statement
- Interface method reference statement
- String statement
- Integer value statement
- Long integer value statement
- Floating point value statement
- Double-precision floating point value statement
- Method type statement
- Method handle statement
- Dynamic method reference statement
- Field statement
- Descriptor statement
- Constant value statement
- Method statement
- Exceptions statement
- Variable statement
- Instruction statements
- Exception Handler statement
- Line numbers statement
- Debug variables statement
- Max stack statement
- Max locals statement
- Stackmap statement
- Inner class statement
- Enclosing method statement
- Bootstrap method statement
- Annotation statement
- Annotation members
- Annotation type statement
- Annotation parameter index statement
- Annotation element statement
- Annotation target statement
- Return target
- Receiver type target
- Field type target
- Parameter type target
- Parameter type bound target
- Type parameter target
- Supertype target
- Exception type target
- New type target
- Instanceof type target
- Method reference type target
- Constructor reference type target
- Cast type target
- Constructor type argument target
- Method type argument target
- Constructor reference type argument target
- Method reference type argument target
- Catch type target
- Variable type targetVariable
- Annotation target path statement
- Annotation default statement
- Unknown attribute
- Macros
Introduction
A Java assembler (or assembly) language is a low level programming language for the Java Virtual Machine (JVM) in which there is a strong correspondence between the language constructs and the JVM bytecode. There is no official standard for the syntax of a Java assembler language. The JDK does contain something like a disassembler (but no assembler) - the javap utility - unfortunately the syntax of the javap's output has not been designed as a possible input to an assembler but rather with a focus on human readability. Accordingly no one ever tried to develop an assembler for this syntax. From 1996 onwards Jasmin has gradually established itself as a de-facto alternative standard for a Java assembler language, its syntax is however poorly documented and seems at times to lag considerably behind the development of the JVM itself.
This reference describes the syntax of the Java assembler language as defined by Lilac - an open source assembler/disassembler tool suite for the JVM.
Please note: this is a reference of the the Java assembler language, not of the Java language itself or of the Java Virtual Machine. So throughout this document a thorough knowledge of the Java language concepts as well as some basic understanding of the JVM's inner workings are assumed.
Notation
Throughout this specification the syntax definitions will be given in the Extended Backus-Naur Form - notation.
Language Basics
Lexical Structure
On the lexical level a Java assembler program is a sequence of words and separators. There are following separators in the language: , ; : { } -> . The words of a Java Assembler program fall, like in other programming languages, into three broad categories: keywords, literals and identifiers.
Literals
There are four different different literal types: integer literals, floating point literals, string literals and base64 literals. The syntax of the first three is the same as in the Java language. A base64 literal is just a base64 encoded byte sequence enclosed in square brackets. Additionally, there is, like, again, in the Java language, a special null - literal, which, when used, stands for a reference pointing to nowhere.
Here are some examples of a Java assembler literal:
"Hello Word" 1234 1235.56 [UG9seWZvbiB6d2l0c2NoZXJuZCBhw59lbiBNw6R4Y2hlbnMgVsO2Z2VsIFLDvGJlbiwgSm9naHVydCB1bmQgUXVhcms=] null
Identifiers
The syntax of a Java assembler identifier is defined as follows:
identifier = quoted_identifier|unquoted_identifier; quoted_identifier = "'", javaidentifier, "'"; identifier = javaidentifier, {'.' javaidentifier} ;
A javaidentifier from the definition above is a Java language identifier as defined in the Java Language specification.
Here are some examples of a valid Java assembler identifier:
this 'name' out in_12 this$0 System.out
Note: to avoid conflict with assembler keywords (which aren't the same as in java) it's possible to use single quotes around an identifier.
Binary Identifiers
Besides of java identifiers in a special case of directly referencing a class or interface so called binary identifiers may be used. The syntax of a binary identifier is as defined in the JVM specification, however with a slight difference - in order for the assembler to be able to distinguish java identifiers and binary identifiers a binary identifier must always contain a slash and so may start with one.
Here is the corresponding EBNF expression:
binary identifier = ['/'], javaidentifier, {'/' javaidentifier} ;
And here are some examples of a valid Java assembler binary identifier:
java/lang/Thread /java/lang/Thread /Thread
Comments
In addition to the words and separators a Java assembler program might also contain comments, whose syntax is, again, the same as in the Java Language.
Here are some examples of a comment:
//This is a sigle line comment /** And this is A multiple-line one **/
Syntactic Structure
Names and labels
On the syntactic level Java assembler has two different kinds of identifiers: names and labels. Names identify an entity declared in a program, in the context of Java assembler those entities can be variables and constants.
The following example shows two constants: a string constant referencing a utf8 constant by it's name:
const utf8 helloword_content "Hello World"; const string helloword helloword_content;
Additionally a assembler programm may contain names of external classes or interfaces, those have to be formatted as binary identifiers. The following example shows a macro class reference statement, referencing a class.
const classref java/lang/Object;
Labels on the other hand identify a location inside the program to which, for example, the control flow of the program might be transferred. This is illustrated in the following example where the if_acmpne instruction transfers the control to the return instruction
if_acmpne end; iconst_1; putfield MyClass.int_field; end: return;
Statements
On the syntactic level a Java assembler program is a sequence of statements, which come in three flavors: simple statements and block statements and macro instructions.
A block statement generally consists of some keywords followed by a sequence of member statements enclosed in curly brackets as illustrated in the following example:
private final field { name value_name; descriptor type_desc; }
In the example we see a field statement which in turn contains two further statements: a name statement and a descriptor statement.
A simple statement is just a sequence of keywords, identifiers and literals possibly separated by some separators and, this is important, always terminated by ;
Note: the definition given above is a most general definition of a simple statements, in case of particular statements there are always additional syntactic rules governing the occurrence and order of keywords, identifiers. literals and separators.
Here are some examples of a simple statement:
const nameandtype hash_nat hash_name, hash_desc; line ir0, 532; lookupswitch 1->ir36,3->ir41,100->ir46,default->ir53; append else, {int}; const classref java/lang/Thread;
Because block statements themselves contain another statements the whole syntactic structure of a Java assembler program can be in effect seen as a forest of statements with block statements being parent nodes of their member statements. Because on the semantic level there is an additional requirement, that a Java assembler source file contains exactly one class statement, this forest is in fact just a tree with the class statement at the root.
The syntax of macro instructions will be introduced below in an extra chapter on the topic.
Language Statements
Having described above the general lexical and syntactic structure of a Java assembler program, the following chapters will cover the syntax and semantics of the particular statements used in Java assembler.
Class statement
A class statement specifies a class type. As already mentioned above, the must be exactly one class statement in a Java assembler source file. Note however, that in Java assembler the term "class" encompasses not only class types as defined by the Java language specification which includes classes in the narrow sense and enums, but also interface types including interfaces and annotation types.
A class statement, which is a block statement, consists of some class modifier keywords followed by the keyword class and then by some class member statements enclosed in curly brackets as defined in the following EBNF expression:
class statement = {class modifier}, 'class', '{',{class member},'}'; class modifier = 'public'|'final'|'abstract'|'super'|'interface'|'synthetic'|'annotation'|'enum'; class member = version|name|superclass|interfaces|source file|signature|synthetic|deprecated|constant|method|field|annotation|type annotation|bootstrap method|inner class|enclosing method|unknown attribute
Class modifiers
Class modifiers in the Java assembler correspond one to one to the access flags defined in the JVM specification as shown in the following table:
Assembler modifier keyword | access flag from the Jvm spec |
---|---|
public | ACC_PUBLIC |
final | ACC_FINAL |
abstract | ACC_ABSTRACT |
super | ACC_SUPER |
interface | ACC_INTERFACE |
synthetic | ACC_SYNTETHIC |
annotation | ACC_ANNOTATION |
enum | ACC_ENUM |
Read more about the meaning of modifiers as well as as about the rules governing the allowed combinations of modifiers in the JVM specification.
Class members
As mentioned above a class statement contains member statements. The following table lists all statements which can serve as members of a class statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within a class statement.
class member | how many |
---|---|
version | exactly one |
name | exactly one |
superclass | exactly one (except for the java.lang.Object class) |
interfaces | zero or one |
synthetic | zero or one |
deprecated | zero or one |
source file | zero or one |
signature | zero or one |
constant | zero or more |
method | zero or more |
field | zero or more |
annotation | zero or more |
bootstrap method | zero or more |
inner class | zero or more |
enclosing method | zero or more |
unknown attribute | zero or more |
Class statement example
Here is an example of a class declaration including some members:
public interface abstract class { version 52.0; name ThisClass; extends Object; const utf8 run_desc "()V"; const utf8 RuntimeVisibleAnnotations_utf8 "RuntimeVisibleAnnotations"; const utf8 SourceFile_utf8 "SourceFile"; const utf8 Object_name "java/lang/Object"; const utf8 ThisClass_name "java/lang/Runnable"; const utf8 run_name "run"; const classref Object Object_name; const classref ThisClass ThisClass_name; const utf8 type_desc "Ljava/lang/FunctionalInterface;"; const utf8 source_file_name "Runnable.java"; source file source_file_name; annotation { type type_desc; } public abstract method { name run_name; descriptor run_desc; } }
Version statement
A version statement specifies the version of the class file format to use in the class file generated by the assembler. It is a simple statement with has as a single argument a floating point literal specifying the actual version number.
version statement = 'version', floating point literal, ';' ;
Example:
version 52.0;
Name statement
A name statement specifies the name of a class, a method, field,inner class or annotation element. It is a simple statement which has as a single argument the name of an utf8 constant or class reference constant which in turn specifies the actual name as defined in the following EBNF expression:
name statement = 'name', (utf8 constant|class reference constant), ';' ;
Note: for the name of a class the argument must be a class reference constant
Example:
name ThisClass;
Superclass statement
A superclass statement specifies the direct superclass of the current class. It is a simple statement which has as a single argument the name of a class reference constant which in turn specifies the actual super class as defined in the folowing EBNF expression:
superclass statement = 'extends', class reference constant, ';' ;
Example:
extends Object;
There must always exist a superclass statement within a class statement except for one special case: a class with the name java.lang.Object which is defined as the root of the Java class hierarchy.
Interfaces statement
An interfaces statement specifies the superintefaces of the current class. It is a simple statement with multiple arguments, every one of which is a name of a class reference constant. The class reference constants in turn specify the actual super interfaces of the current class. Every constant name may be preceded by a label. The following EBNF expression defines the syntax of an interfaces statement:
interfaces statement = 'implements', [label, ':'], class reference constant, {',',[label, ':'], class reference constant} , ';' ;
Example:
implements runnable: Runnable,Serializable;
Signature statement
A signature statement specifies a signature for a class, method or field. It is a simple statement whose single argument ist the name of an utf8 constant which in turn specifies the actual signature as defined in the following EBNF expression:
signature statement = 'signature', utf8 constant, ';' ;
Example:
signature ThisClassSignature;
Note: like the JVM itself the java assembler doesn't enforce the correct syntax of a signature. Read more about this syntax here.
Deprecated statement
This statement marks a class, method or field as deprecated. It consists of a single keyword as defined in the following EBNF expression:
deprecated statement = 'deprecated', ';' ;
Synthetic statement
This statement marks a class, method or field as synthetic, that is, as have been generated by the compiler itself without having a corresponding declaration in the underlying source file. It consists of a single keyword as defined in the following EBNF expression:
synthetic statement = 'synthetic', ';' ;
Source file statement
The source file statement specifies the source file from which the current class has been generated by a compiler. The main use of this information is for debugging purposes. It is a simple statement whose single argument the name of an utf8 constant which in turn specifies the actual source file name as defined in the following EBNF expression:
source file statement = 'source file', utf8 constant, ';' ;
Example:
source file SourceFileName;
Constant statements
A constant statement declares a constant which may be used by other statements in the program. Java assembler uses constants not only as symbolic names for literals like in high level program languages but also to hold dynamic linking infos like class, method or field references.
Utf8 constant statement
An utf8 constant statement declares an utf8 string constant. In has as a single argument a string literal, which specifies the value of the constant. The syntax of the statement is as follows:
utf8 constant statement = 'const', 'utf8', name, string literal, ';' ;
Example
const utf8 helloworld "Hello World";
Class reference statement
A class reference statement declares a class reference constant. It has as a single argument the name of an utf8 constant which in turn specifies the actual class name. The utf8 constant must contain either a valid binary class name or a valid array type descriptor. The syntax of the statement is as follows:
class reference constant statement = 'const', 'classref',name, utf8 constant, ';' ;
Example
const classref Object object_utf8;
Name and type statement
A name an type statement declares a name and type constant, that is, a constant combining a name and a type descriptor. The statement has the names of two utf8 constants as arguments. The first utf8 constant contains a valid field or method name, the second a valid field or method descriptor as defined in the JVM specification. The syntax of the statement is as follows:
name and type constant statement = 'const', 'nameandtype',name, utf8 constant, utf8 constant, ';' ;
Example:
const nameandtype hash_nat hash_name, hash_desc;
Field reference statement
A field reference statement declares a field reference constant. It has two arguments. The first argument specifies the name of a class reference constant which in turn specifies the actual class or interface owning the field. The second argument specifies the name of a name and type constant which in turn specifies the name an the type of the field. The syntax of the statement is as follows:
field reference statement = 'const', 'fieldref', name, class reference constant, name and type constant, ';' ;
Example:
const fieldref hash ThisClass, hash_nat;
Method reference statement
A method reference statement declares a class method reference constant. Note that this statement can be used only to define references for class methods while interface method references are defined using an interface method reference statement. The statement has two arguments: the first argument specifies the name or a class reference constant which in turn specifies the actual class owning the method, the second argument specifies the name of a name an type constant which in turn specifies the name an the type of the method. The syntax of the statement is as follows:
method reference statement = 'const', 'methodref',name, class reference constant, name and type constant, ';' ;
Example:
const methodref Character.isBmpCodePoint Character, Character.isBmpCodePoint_nat;
Interface method reference statement
An interface method reference statement declares an interface method reference constant. Note that this statement can be used only to define references for interface methods, while class method references are defined using an method reference statement. The statement has two arguments: the first argument specifies the name or a class reference constant which in turn specifies the actual interface in which the method resides, the second argument specifies the name of a name an type constant which in turn specifies the name an the type of the method. The syntax of the statement is as follows:
interface method reference statement = 'const', 'intfmethodref', name, class reference constant, name and type constant, ';' ;
Example:
const intfmethodref CharSequence.length CharSequence, AbstractStringBuilder.length_nat;
String statement
A string statement declares a string constant, which can be used by instructions. The statement has as a single argument the name of an utf8 constant, which specifies the actual string content. The syntax of the statement is as follows:
string statement = 'const', 'string', name, utf8 constant ';' ;
Example:
const string charsetName utf8_319;
Integer value statement
An integer value statement declares an integer constant. It has as a single argument an integer literal, which specifies the value of the constant. The syntax of the statement is as follows:
integer statement = 'const', 'int', name, integer literal, ';' ;
Example:
const int int_15 124567;
Long integer value statement
A long integer value statement declares a long integer constant. It has as a single argument an integer literal, which specifies the value of the constant. The syntax of the statement is as follows:
long integer statement = 'const', 'long', name, integer literal, ';' ;
Example:
const long long_15 124567;
Floating point value statement
A floating point value statement declares a floating point constant. It has as a single argument a floating point literal, which specifies the value of the constant. The syntax of the statement is as follows:
floating point value statement = 'const', 'float', name, floating point literal, ';' ;
Example:
const float float_15 123.5;
Double-precision floating point value statement
A double-precision floating point value statement declares a double-precision floaing point constant. It has as a single argument a floating point literal, which specifies the value of the constant. The syntax of the statement is as follows:
double-precision floating point value statement = 'const', 'double', name, floating point literal, ';' ;
Example:
const double double_15 123.5;
Method type statement
A method type statement declares a method type constant. It has as a single argument the name of an utf8 constant, which in turn specifies a valid method descriptor as defined in the JVM specification. The syntax of the statement is as follows:
method type statement = 'const', 'methodtype', name, utf8 constant, ';' ;
Example:
const methodtype mType mType_utf8;
Method handle statement
A method handle statement declares a method handle constant. The value of such constant is a typed, directly executable reference to an underlying method, constructor, field, or similar low-level operation, with optional transformations of arguments or return values. The syntax of the statement is as follows:
method handle statement = 'const ', ('getfield'|'getstatic'|'putfield'|'putstatic'|'invokespecial'|'invokevirtual'|'invokeinterface'|'invokestatic'|'newinvokespecial'), 'methodhandle', name,constant, ';' ;
As can be seen from the definition above a method handle defined by a method handle statement can belong to one of 9 different "kinds" (see more here ), specified by one of the 9 corresponding modifier keywords. The statement has as a single argument the name of a constant, which, dependent on the "kind", can be a field reference, a method reference or an interface method reference statement.
Example:
const getstatic methodhandle System.out;
Dynamic method reference statement
A dynamic method reference statement declares a dynamic method reference constant which can be used as an argument of the invokedynamic instruction. The statement has two arguments - the first argument specifies the name of a bootstrap method, the second argument specifies the name of a name and type constant. Read more about the exact meaning of these two arguments in the JVM specification. The syntax of the statement is as follows:
dynamic method reference statement = 'const', 'dynref', name, bootstrapmethod,',',name and type constant,';' ;
Example:
const dynref inv1 bootstrapmethod1, name_and_type1;
Field statement
A field statement declares a field within a class type. It is a block statement,which consists of some field modifier keywords followed by the keyword field and then by some field member statements enclosed in curly brackets as defined in the following EBNF expression:
field statement = {field modifier}, 'field', '{',{field member},'}' ; field modifier = 'public'|'private'|'protected'|'static'|'final'|'volatile'|'transient'|'synthetic'|'enum' ; field member = name|descriptor|constant value|signature|synthetic|deprecated|annotation|type annotation|unknown attribute ;
Field modifiers
Field modifiers in the Java assembler correspond one to one to the access flags defined in the JVM specification as shown in the following table:
Assembler modifier keyword | access flag from the Jvm spec |
---|---|
public | ACC_PUBLIC |
private | ACC_PRIVATE |
protected | ACC_PROTECTED |
static | ACC_STATIC |
final | ACC_FINAL |
volatile | ACC_VOLATILE |
transient | ACC_TRANSIENT |
synthetic | ACC_SYNTHETIC |
enum | ACC_ENUM |
Read more about the meaning of modifiers as well as as about the rules governing the allowed combinations of modifiers in the JVM specification
Field members
The following table lists all statements which can serve as members of a field statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within a field statement.
field member | how many |
---|---|
name | exactly one |
descriptor | exactly one |
constant value | zero or one |
synthetic | zero or one |
deprecated | zero or one |
signature | zero or one |
annotation | zero or more |
unknown attribute | zero or more |
Field statement example
private static final field { name serialVersionUID_name; // serialVersionUID descriptor serialVersionUID_desc; // J constant value long_158; // -6849794470754667710 }
Descriptor statement
A name statement specifies the descriptor of a method or a field. It is a simple statement which has as a single argument the name of an utf8 constant which in turn specifies the actual descriptor string as defined in the following EBNF expression:
descriptor statement = 'descriptor', utf8 constant, ';' ;
Dependent on the statement's context the descriptor string must contain either a valid field descriptor or a valid method descriptor.
Example:
descriptor method_descriptor;
Constant value statement
A constant value statement specifies an initial value for a static field as described in the JVM specification. It is a simple statement which has as a single argument the name of a long, float, double, or string constant which in turn contains the actual value to assign as defined in the following EBNF expression:
constant value statement = 'constant','value', constant, ';' ;
Example:
constant value string123;
Method statement
A method statement declares a method within a class type. It is a block statement,which consists of some method modifier keywords followed by the keyword method and then by some method member statements enclosed in curly brackets as defined in the following EBNF expression:
method statement = {method modifier}, 'method', '{',{method member},'}' ; method modifier = 'public'|'private'|'protected'|'static'|'final'|'synchronized'|'bridge'|'varargs'|'native'|'abstract'|'strict'|'synthetic' ; method member = name|descriptor|exception|signature|synthetic|deprecated|annotation|parameter annotation|type annotation|annotation default|stack map|unknown attribute|variable|instruction|exception handler|line number table|variable table|variable type table|max stack|max locals;
Method modifiers
Method modifiers in the Java assembler correspond one to one to the access flags defined in the JVM specification as shown in the following table:
Assembler modifier keyword | access flag from the Jvm spec |
---|---|
public | ACC_PUBLIC |
private | ACC_PRIVATE |
protected | ACC_PROTECTED |
static | ACC_STATIC |
final | ACC_FINAL |
synchronized | ACC_SYNCHRONIZED |
bridge | ACC_BRIDGE |
varargs | ACC_VARARGS |
native | ACC_NATIVE |
abstract | ACC_ABSTRACT |
strict | ACC_STRICT |
synthetic | ACC_SYNTHETIC |
Read more about the meaning of modifiers as well as as about the rules governing the allowed combinations of modifiers in the JVM specification
Method members
The following table lists all statements which can serve as members of a method statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within a method statement.
method member | how many |
---|---|
name | exactly one |
descriptor | exactly one |
exceptions | zero or more |
synthetic | zero or one |
deprecated | zero or one |
signature | zero or one |
annotation | zero or more |
annotation default | zero or one |
stack map | zero or one |
unknown attribute | zero or more |
variable | zero or more |
instruction | zero or more |
exception handler | zero or more |
line numbers | zero or more |
debug variables | zero or more |
max stack | zero or one |
max locals | zero or one |
Note: method members can appear in any order within a method statement. The order doesn't have any semantic meaning, however, different orders may result in binary different though semantically identical class files.
Method statement example
static method { name clinit0_name; // <clinit> descriptor method_desc$1; // ()V line numbers { line ir0, 129; line ir7, 1171; } maxstack 3; //Instructions ir0: iconst_0; anewarray ObjectStreamField; putstatic serialPersistentFields; ir7: new String$CaseInsensitiveComparator; dup; aconst_null; invokespecial String$CaseInsensitiveComparator.init0; putstatic CASE_INSENSITIVE_ORDER; return; }
Exceptions statement
An exceptions statement specifies exceptions which may be thrown by the surrounding method. It is a simple statement with multiple arguments. All arguments specify a name of a class reference constant. The class reference constants in turn specify the actual exception classes thrown by the method. Every class reference constant may be preceded by a label. The following EBNF expression defines the syntax of the exceptions statement:
exceptions statement = 'throws', [label, ':'], class reference constant, {',',[label, ':'], class reference constant} , ';' ;
Example:
throws ioexception: IOException,IllegalArgumentException;
Variable statement
A variable statement declares a local variable within a method statement. The statement specifies the type and the name of the variable, the allowed types being: double, float, int, long, object and returnadress. Additionally the index of the variable within the local variable array may be specified, either as an absolute index value or as an offset relative to the index of another variable. Note, that, dependent on the type, a variable can occupy one or two slots in the local variable array. For the variables whose declarations do not contain an index specification the assembler assigns an implicit index which is the slot following on the last slot occupied by the previously declared variables.
Note: The JVM specification allows multiple variables occupying the same slot in the local variable array.
The exact syntax of the variable statement is as defined in the following EBNF expression:
variable statement = 'var', variable type, identifier, [index], ';' ; variable type = 'double'|'float'|'int'|'long'|'object'|'returnadress' ; index = 'at', [relative offset|non-negative integer] ; relative offset = identifier, '+'|'-', non-negative integer ;
Example:
var int a; var float b; var double d1 at 0; var double d2 at b - 1; var double d2 at b; var double d3; var double d4 at b + 1;
The example above illustrates various possibilities to specify the index of a variable. The integer variable a has an implicit index 0 and occupies one slot. The floating point variable b also has an implicitl index, this time it is 1 and again occupies one slot. The double precision variable d1 has an explicitly specified index 0 (remember: multiple variables can occupy same slots) and occupies two slots. The variable d2 has, again, an explicitly specified index, this time however, the index has been specified as an offset relative to the slot occupied by b. And so on...
Instruction statements
An instruction statement specifies an instruction from the JVM instruction set within a method statement. An instruction statement generally starts with a mnemonic keyword which might be followed by one or more arguments. An argument of an instruction can be a name of a constant or a local variable, a label of an instruction or a literal. The instruction statement might be preceded by a label.
Correspondence betweeen the mnemonic keywords in Java assembler and the JVM instruction set.
In general the mnemonic keywords of the Java assembler correspond one to one with the instructions from the JVM instruction set. There are however the following exceptions:
-
The argument-less load and store instructions from the the JVM instruction set such as aload_0 or fstore_1 don't have a corresponding mnemonic keyword. Instead they will always be created by the assembler if a aload, astore, iload, istore,dload, dstore, fload or fstore instruction statement has as argument a local variable whose index lies between 0 and 3. If in such case you want the assembler to generate a "normal" load or store instruction, you have to precede the mnemonic keyword of the instruction with the modifier keyword normal.
-
To generate the instruction wide from the JVM instruction set you have to precede the mnemonic keyword of an instruction statement with the modifier keyword wide.
Examples:
normal load this; wide store this;
Instruction categories
Most JVM instructions belong to one of three following categories: argumentless instructions, constant instructions, variable instructions and branch instructions. Additionally there are some instructions which don't fit an any of those broad categories but stand on their own.
Argumentless instructions
These are the instruction which don't have any specified arguments (they may however require parameters pushed on the stack bevore execution). The majority of the JVM instructions belong to this category. The following list contains all argumentless instructions with every entry linked to the corresponding section in the JVM specification:
aaload, aastore, aconst_null, areturn, arraylength, athrow, baload, bastore, caload, castore, d2f, d2i, d2l, dadd, daload, dastore, dcmpg, dcmpl, dconst_0, dconst_1, ddiv, dmul, dneg, drem, dreturn, dsub, dup, dup_x1, dup_x2, dup2, dup2_x1, dup2_x2, f2d, f2i, f2l, fadd, faload, fastore, fcmpg, fcmpl, fconst_0, fconst_1, fconst_2, fdiv, fmul, fneg, frem, freturn, fsub, i2b, i2c, i2d, i2f, i2l, i2s, iadd, iaload, iand, iastore, iconst_m1, iconst_0, iconst_1, iconst_2, iconst_3, iconst_4, iconst_5, idiv, imul, ineg, ior, irem, ireturn, ishl, ishr, isub, iushr, ixor, l2d, l2f, l2i, ladd, laload, land, lastore, lcmp, lconst_0, lconst_1, ldiv, lmul, lneg, lor, lrem, lreturn, lshl, lshr, lsub, lushr, lxor, monitorenter, monitorexit, nop, pop, pop2, return, saload, sastore, swap
Example:
aaload;
Instructions with constant arguments
These are the instructions which have as a single argument the name of a constant. Dependent on the instruction there may be restrictions on the types of the constants allowed as argument. The following list contains all instructions from this category with every entry linkedto the corresponding section in the JVM specification:
invokeinterface, invokedynamic, anewarray, checkcast, getfield, getstatic, instanceof, invokespecial, invokestatic, invokevirtual, ldc2_w, ldc2, new, putfield, putstatic
Example:
new java.lang.String;
Instructions with local variable arguments
These are the instructions which have as a single argument the name of a local valiable. Dependent on the instruction there may be restrictions on the types of the variables allowed as argument. The following list contains all instructions from this category with every entry linked to the corresponding section in the JVM specification:
aload, astore, dload, dstore, fload, fstore, iload, istore, lload, lstore,ret
Example:
aload this;
Branch instructions
These are the instructions ordering the virtual machine to transfer, conditionaly or unconditionally,the execution flow to an instruction different than the next instruction. Branch instructions have as a single argument the label of the instruction to which the execution flow will be transferred. The following list contains all instructions from this category with every entry linkedto the corresponding section in the JVM specification:
goto, goto_w, if_acmpeq, if_acmpne, if_icmpeq, if_icmpge, if_icmpgt, if_icmple, if_icmplt, if_icmpne, ifeq, ifge, ifgt, ifle, iflt, ifne, ifnonnull, ifnull, jsr, jsr_w
Example:
goto end;
Integer literal instructions
These are the instructions which take an integer literal as argument. There are two instructions of this type: bipush, taking argument values from -128 to 127, and sipush, taking argument values from -32768 to 32767.
Examples:
bipush 120; sipush 1000;
iinc instruction
The iinc instruction takes two arguments: the name of a local variable and an integer literal in the value range from -128 to 127.
Example:
iinc counter,2;
newarray instruction
The newarray instruction, which creates a primitive array on the stack, has a special syntax consisting of two keywords: the newarray keyword followed by the keyword designating a primitive type which can be one of the following: boolean, byte, char, double, float, int, long, short**.
multianewarray instruction
The multianewarray instruction, which creates a multidimensional array, takes two arguments: the name of a class reference constant and a integer literal in the range from 1 to 255.
Example:
multianewarray Object,3;
Switch instructions
Switch instructions, which include tableswitch and lookupswitch, are special branch instructions which calculate their target dependent on the integer value on the stack. A switch instruction has multiple arguments with a special syntax: an integer literal or the keyword default followed by -> and then by the label of the target instruction.
Example:
tableswitch 0->target0,1->target1,default->defaulttarget;
Bytecode ranges
Some statements require as arguments so called bytecode ranges. A bytecode range is just a sequence of instructions specified by two instruction labels separated by ->. The first of the two labels is the label of the first instruction in the range. The second labeld is either the label of the last instruction in the sequence or the label of the first instruction after the sequence. In the first case the specified bytecode range is an including bytecode range otherwise it is an excluding byte code range. When specifying an excluding byte code range the second label may be omitted indicating that the bytecode range includes all instructions beginning with the first instruction and ending with the last instruction in the method.
Example:
begin->end
Exception Handler statement
An exception handler statement specifies an exception handler within a method. This statement is neither a simple statement nor a block statement but has it's own special syntax which is defined in the following EBNF expression:
exception handler statement = [label, ':'], 'try', including bytecode range, catch, class reference constant|'all', 'go', 'to', label, ';' ;
The statement starts with the try keyword, possibly preceded by a label. It follows an including bytecode range, to which the handler applies. As next appears the keyword catch followed either by the name of a class reference constant, which specifies the exception class handled by the handler, or by the keyword all, which indicates that the handler handles all exceptions. At last we see keywords go and to followed by the label of the instruction, to which the handler transfers the execution flow if an exception occures.
Example:
firstHandler: try begin -> end catch RuntimeException go to handlerBegin;
Line numbers statement
A line numbers statement specifies a mapping between the line numbers of the original source file and the instructions of the surrounding method. It is a block statement which contains a sequence of line number statements as defined in the following EBNF expression:
line numbers statement = 'line', 'numbers', '{', {line number}, '}' ;
Example:
line numbers { line label, 5; }
Line number statement
A line number statement specifies the correspondence between an instruction and a line in the original source file. This is a simple statement with two arguments, the first argument is the label of the instruction, the second is an integer literal specifing the number of the source file line. The exact syntax is as follows:
line number statement = 'line', label, line number, ';' ;
Example:
line label, 5;
Debug variables statement
A debug variables statement specifies a mapping between the local variables in a method and variables in the original source file. It is a block statement which contains a sequence of debug variable statements as defined in the following EBNF expression:
debug variables statement = 'debug', 'variables',{'types'} '{', {debug variable}, '}' ;
Example:
debug variables { var this, ir0, this_name, this_desc; }
Note: the optional keyword types indicates that the mapping includes signatures of the source variables. The absence of the keyword indicates that the descriptors are included.
Debug variable statement
A debug variable statement specifies a mapping between a local variable in a method and a variable in the original source file. It is a simple statement with four arguments: the name of the local variable, an excluding bytecode range to which the mapping applies, a name of an utf8 constant specifying the name of the variable in the original source file, and at last a name of an utf8 constant specifying, dependent on the type of the surrounding block statement either the descriptor of the variable in the original source file or its signature.
The exact syntax is defined in the following EBNF expression:
debug variable statement = 'var', name, ',', excluding bytecode range, ',', utf8 constant, ',', utf8 constant, ';' ;
Example
var this, ir0, this_name, this_desc;
Max stack statement
A max stack statement specifies the maximum runtime size of the stack of a method. If this member is absent in a method statement the Java assembler will calculate this size itself. It is a simple statement with one argument - an integer literal specifying the actual stack size as defined in the following EBNF expression:
max stack statement = 'maxstack', integer literal, ';' ;
Example:
maxstack 5;
Max locals statement
A max locals statement specifies the size of the local variable array of a method. If this member is absent in a method statement the Java assembler will infer this size from the variable statements. It is a simple statement with one argument - an integer literal specifying the actual size as defined in the following EBNF expression:
max locals statement = 'maxlocals', integer literal, ';' ;
Example:
maxlocals 5;
Stackmap statement
A stackmap statement specifies the stackmap. There are two variants of this statement: The first simple variant, to be used by developers, consists just of a single keyword stackmap. This variant tells the assembler to generate the stackmap of the method itself. The second variant actually specifies the entire complicated stackmap structure and is primarily intended for the use by the disassembler in order to ensure the perfect assembler/disassembler roundtrip. In this specification no detailed explanation will be given for the second variant, as it shouldn't be used by developers. You might however get additional information from the JVM specification. The syntax of the second variant is as defined in the folowing EBNF expression:
stackmap statement = 'stackmap', '{', {stackmap frame}, '}'; stackmap frame = append|chop|same frame|same frame extended|same locals|same locals extended|full ; append = 'append', label, typeslist, ';' ; chop = 'chop', label, integer literal, ';' ; same frame = 'same', label, ';' ; same frame extended = 'same', 'extended', label, ';' ; same locals = 'same', 'locals', label, types list, ';' ; same locals extended = 'same', 'locals', 'extended', label, typeslist, ';' ; full = 'full', label, typeslist, typeslist, ';' ; typeslist = '{', {'double'|'float'|'int'|'long'|'null'|'uninitialized'|'unititializedthis'|'top'}, '}' ;
Example (first variant):
stackmap;
Example (second variant):
stackmap { full ir17, {object ThisClass,object classref_214,int,int}, {}; same ir30; same ir49; }
Inner class statement
An inner class statement within a class statement declares an "inner class"-relationship between two classes.One of the classes participating in the relationship is usually the current class, though it is not a requirement. The inner class statement is a block statement,which consists of some modifier keywords followed by the keywords inner and class and then by some inner class member statements enclosed in curly brackets as defined in the following EBNF expression:
inner class statement = {inner class modifier}, 'inner', 'class', '{',{inner class member},'}' ; inner class modifier = 'public'|'private'|'protected'|'static'|'final'|'interface'|'abstract'|'synthetic'|'annotation'|'enum' ; inner class member = name|inner|outer;
Example:
public inner class { inner Foo$; outer Foo; name Foo_name; }
Inner class modifiers
Inner class modifiers in the Java assembler correspond one to one to the access flags defined in the JVM specification as shown in the following table:
Assembler modifier keyword | access flag from the Jvm spec |
---|---|
public | ACC_PUBLIC |
private | ACC_PRIVATE |
protected | ACC_PROTECTED |
static | ACC_STATIC |
final | ACC_FINAL |
interface | ACC_INTERFACE |
abstract | ACC_ABSTRACT |
synthetic | ACC_SYNTHETIC |
annotation | ACC_ANNOTATION |
enum | ACC_ANNOTATION |
Inner class members
An inner class statement contains member statements, referred to in the definition above as inner class members. The following table lists all statements which can serve as members of an inner class statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within a method statement.
method member | how many |
---|---|
inner | exactly one |
outer | zero or one |
name | zero or one |
Inner statement
This statement within an inner-class-statement specifies the inner end of the "inner class"-relationship. It has as a single argument the name of a class reference constant which in turn specifies the actual class as defined in the folowing EBNF expression:
inner statement = 'inner', class reference, ';' ;
Outer statement
This statement within an inner-class-statement specifies the outer end of the "inner class"-relationship. It has as a single argument the name of a class reference constant which in turn specifies the actual class as defined in the folowing EBNF expression:
outer statement = 'outer', class reference, ';' ;
Enclosing method statement
An enclosing statement specifies the enclosing method for the inner end of an "inner class"-relationship between two classes (see also Inner class statement). This is a simple statement, which has two arguments of which the second is optional. The first argument is the name of a class reference constant, which specifies the class of the enclosing method. The second argument is the name of a name and type constant, which specifies the name and the type of the enclosing method. The absence of the second argument means that the inner class isn't enclosed in a method. The exact syntax of the statement is as defined in the following EBNF expression:
enclosing method statement = 'enclosing', 'method', class reference, [',', name and type], ';' ;
Example:
enclosing method InnerClassTests, nameandtype_26;
Bootstrap method statement
A bootstrap method statement specifies a bootstrap method which can be referenced by a dynamic method reference. It is a simple statement with multiple (at least one) arguments. The statement consists of the keywords bootstrap and method followed by the name of the method and then by comma separated arguments. The first argument specifies a name of a method handle constant. Further arguments, if present, specify names of further constants to be used as arguments of the bootstrap method as defined in the JVM specification. The exact syntax of the statement is as defined in the following EBNF expression:
bootstrap method statement = 'bootstrap', 'method', method name, method handle constant, {',', argument constant} ;
Example:
bootstrap method firstBootstrapMethod method_handle, string1, string2;
Annotation statement
An annotation statement declares either an annotation for a class, method,method parameter,field or a type declaration or a nested annotation within another annotation. It is a block statement which consists of the keyword annotation folowed by annotation member statements enclosed in curly brackets and possibly preceded by some of the following keywords: invisible, parameter, type as defined in the following EBNF expression:
annotation statement = {'invisible'|'parameter'|type}, 'annotation', '{',{annotation member},'}' ; annotation member = annotation type|annotation element
The meaning of the keywords preceding annotaion is as follows:
invisible - marks the annotation as invisible at runtime
parameter - states that the current annotation is a method parameter annotation. In this case the presence of the annotation parameter index as member is also requred. Note: parameter annotations are only allowed within a method statement
type - states that the current annotation is a type annotation. In this case the presence of the annotation target as member is also required.
Example (Deprecated annotation):
annotation { type type_desc; // Ljava/lang/Deprecated; }
Annotation members
An annotation statement contains member statements, referred to in the definition above as annotation members. The following table lists all statements which can serve as members of an annotation statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within an annotation statement.
method member | how many |
---|---|
annotation type | exactly one |
annotation element | zero or more |
annotation parameter index | zero or one |
annotation target | zero or one |
annotation target path | zero or one |
Annotation type statement
An annotation type statement declares the type (annotation class) of the surrounding annotation. It is a simple statement which has as a single argument a name of class reference constant, which in turn specifies the annotation class as defined in the following EBNF expression:
annotation type statement = 'type', class reference, ';' ;
Example:
type type_desc;
Annotation parameter index statement
An annotation parameter index statement specifies the index of a method parameter for which the surrounding annotation is being declared. It is a simple statement which has as a single argument an integer literal specifying the actual index as defined in following EBNF expression:
annotation parameter index statement = 'index',integer literal,';'
The index specified has to be a valid index of a parameter of the method for which the annotation has been declared.
Example:
index 0;
Annotation element statement
An annotation element statement specifies the value of an annotation element within annotation. An annotation element statement, which is block statement, consists auf the keyword element followed by some annotation member statements enclosed in curly brackets as defined in the following EBNF expression:
annotation element statement = 'element','{',{annotation element member},'}' ; annotation element member = name|annotation element value ;
Example:
element { name element1_name; string value empty_string; }
Annotation element members
An annotation element statement contains member statements, referred to in the definition above as annotation element members. The following table lists all statements which can serve as members of an annotation element statement. The second column of the table defines for each statement how many instances of it are allowed or required to exist within a annotation element statement.
method member | how many |
---|---|
name | exactly one |
annotation element value | zero or one |
annotation | zero or one |
An annotation element always has exactly two members: a name statement, specifying the name of the element and a value statement, which is either a annotation element value statement or a nested annotation
Annotation element value statement
An annotation element value statement either specifies a value for an annotation element or serves as a member for another array annotation element value . This value can be a simple value, an enumeration value or an array value. Every one of these three varieties has a different syntax which will be explained below:
Simple value
The syntax for a simple value is that of a simple statement which consists of type keyword followed by the keyword value an the by a single argument. The single argument is the name of a constant, the exact type of which dependends on the type of the value specified by the type keyword. The EBNF expression of the syntax is as follows:
simple value = type keyword, 'value', constant name, ';' type keyword = 'byte'|'boolean'|'char'|'class'|'float'|'double'|'int'|'long'|'short'|'string'
Example:
boolean value int_0;
Enumeration value
The syntax for an enumeration value is that of a simple statement which has two arguments as defined in the following EBNF experssion:
enumeration value = 'enum', 'value', utf8 constant name, utf8 constant name, ';' ;
The first argument is the name of a utf8 constant specifing the type descriptor an enumeration class, the second is the name of an utf8 constant which specifies the name of the enumeration member.
Example:
enum value type_desc, utf8_DAYS;
Array value
The syntax for an array value ist that of a block statement. This block statement can contain any number of nested annotation element values as defined in the foolowing EBNF expression:
array value = 'array','value','{',{annotation element value},'}' ;
Example:
array value { boolean value int_0; boolean value int_1; }
Annotation target statement
An annotation target statement denotes the kind of target on which a type annotation appears. The various kinds of targets correspond to the contexts of the Java programming language in which types can be used in declarations and expressions. Dependent on the actual target the statement can have various syntactic forms, which will be listed below.
Return target
States that the annotation appears on the return type of а method.
Syntax:
return target statement = 'targets', 'return', 'type', ';' ;
Example:
targets return type;
Receiver type target
States that the annotation appears on the type of the receiver parameter of а method.
Syntax:
receiver type target statement = 'targets', 'receiver', 'type', ';' ;
Example:
targets receiver type;
Field type target
States that the annotation appears on the type of а field.
Syntax:
field type target statement = 'targets', 'field', 'type', ';' ;
Example:
targets field type;
Parameter type target
States that the annotation appears on the type of а method parameter. This statement has an integer literal argument which denotes the index of the parameter
Syntax:
parameter type target statement = 'targets', 'parameter', 'type', integer literal, ';' ;
Example:
targets parameter 0;
Parameter type bound target
States that the annotation appears on the type in the bound of а method parameter. This statement has two integer literal arguments where the first denotes the index of the parameter and the second the index of the annotated type within the bound.
Syntax:
parameter type bound target statement = 'targets', 'type', 'parameter', 'bound', integer literal, ',', integer literal, ';' ;
Example:
targets type parameter bound 0, 1;
Type parameter target
States that the annotation appears on the type parameter of a generic class, interface, method or constructor. This statement has an integer literal as argument which denotes the index of the type parameter.
Syntax:
type parameter target statement = 'targets', 'type', 'parameter', integer literal, ';' ;
Example:
targets type parameter 0;
Supertype target
States that the annotation appears on the type in the extends or implements clause of a class. This statement has an optional argument - a label which, if present, denotes annotated type from the implements clause. The absence of this argument indicates that the annotation appears in the extends clause.
Syntax:
supertype target statement = 'targets', 'supertype', [interface label], ';' ;
Example:
targets supertype runnable;
Exception type target
States that the annotation appears on the exception type in the throws clause of a method. This statement has as single argument a label denoting the actual exception type from the throws clause.
Syntax:
exception type target statement = 'targets', 'throws', exception label, ';' ;
Example:
targets throws illegalargumentexception;
New type target
States that the annotation appears on the type in a new expression. This statement has has as single argument a label which points to the corresponding new instruction.
Syntax:
new type target statement = 'targets', 'new', label, ';' ;
Example:
targets new newLabel;
Instanceof type target
States that the annotation appears on the type in a instanceof expression. This statement has as single argument a label which points to the corresponding instanceof instruction.
Syntax:
instanceof type target statement = 'targets', 'instanceof', label, ';' ;
Example:
targets instanceof instanceofLabel;
Method reference type target
States that the annotation appears on the type in a method reference expression. This statement has as single argument a label which points to the corresponding bytecode instruction.
Syntax:
method reference type target statement = 'targets', 'method', 'reference', label, ';' ;
Example:
targets method reference methRefLabel;
Constructor reference type target
States that the annotation appears on the type in a constructor reference expression. This statement has as single argument a label which points to the corresponding bytecode instruction.
Syntax:
constructor reference type target statement = 'targets', 'constructor', 'reference', label, ';' ;
Example:
targets constructor reference methRefLabel;
Cast type target
States that the annotation appears on the type in a cast expression. This statement has two arguments. The first argument is the label of the corresponding bytecode instruction, the second is an integer literal, which specifies the index of the actual type in the cast expression. A value of 0 in the second argument specifies the first (or only) type in the cast operator. The possibility of more than one type in a cast expression arises from a cast to an intersection type.
Syntax:
cast type target statement = 'targets', 'cast', 'type', label, integer literal, ';' ;
Example:
targets cast type castLabel, 0;
Constructor type argument target
States that the annotation appears on the type argument in a generic constructor invocation. This statement has two arguments. The first arguments´ is the label of the corresponding bytecode instruction, the second is an integer literal, which specifies the index of the actual type argument.
Syntax:
constructor type argument target statement = 'targets', 'constructor', 'type', 'argument', label, integer literal, ';' ;
Example:
targets constructor type argument newLabel, 0;
Method type argument target
States that the annotation appears on the type argument in a generic method invocation. This statement has two arguments. The first argument is the label of the corresponding bytecode instruction, the second is an integer literal, which specifies the index of the actual type argument.
Syntax:
method type target statement = 'targets', 'method', 'type', 'argument', label, integer literal, ';' ;
Example:
targets method type argument methodIvokeLabel, 0;
Constructor reference type argument target
States that the annotation appears on the type argument in a generic constructor reference invocation. This statement has two arguments. The first argument is the label of the corresponding bytecode instruction, the second is an integer literal, which specifies the index of the actual type argument.
Syntax:
constructor reference type argument target statement = 'targets', 'constructor', 'reference', 'type', 'argument', label, integer literal, ';' ;
Example:
targets constructor reference type argument label, 0;
Method reference type argument target
States that the annotation appears on the type argument in a generic method reference invocation. This statement has two arguments. The first argument is the label of the corresponding bytecode instruction, the second is an integer literal, which specifies the index of the actual type argument.
Syntax:
method reference type target statement = 'targets', 'method', 'reference', 'type', 'argument', label, integer literal,';' ;
Example:
targets method reference type argument label, 0;
Catch type target
States that the annotation appears on the type argument in a catch expression. This statement has as single argument the label of the corresponding exception handler.
Syntax:
catch type target statement = 'targets', 'catch', 'type', label,';' ;
Example:
targets catch type handlerLabel;
Variable type targetVariable
States that the annotation appears on the type of a local variable or resource variable. This is a block statement with the following syntax:
variable type target statement = 'targets', ['resource'], 'var', 'types', '{', {excluding bytecode range},'}' ;
The member statements of the block statement are excluding bytecode ranges specifying the areas of the method's bytecode where the variable is valid.
Example:
targets var types { begin -> end }
Annotation target path statement
An annotation target path statements specifies the exact location of the annotation within a array, nested or parametrized type. (see also the explanations in the JVM specification). It is a simple statement with a variable number of arguments each of which specifies an iterative, left-to-right step towards the precise location of the annotation in the type. The exact syntax of the statement is defined in the following EBNF expression:
annotation target path statement = 'target', 'path', path part, {path part} ; path part = 'array'|'nested'|'type', 'argument', 'bound'|'type', 'argument', '(', integer literal, ')' ;
Examples:
target path array; target path type argument bound; target path type argument(1);
Read more about the the meaning of the different argument types in the JVM specification.
Annotation default statement
An annotation default statement specifies the default value of an annotation type element. Because annotation type elements are technically methods of annotation classes annotation default statements are always members of the corresponding method statements. An annotation default statement is a block statement with exact one member - an annotation element value which specifies the actual value. The syntax is defined in the following EBNF expression:
annotation default statement = 'annotation', 'default', '{', {annotation element value}, '}' ;
Example:
annotation default { boolean value int_0; }
Unknown attribute
An unknown attribute statement specifies an attribute of the class file whose syntax is unknown to the assembler. This statement is primarily intended for the use by the disassembler in order to represent attributes either specific to a JVM implementation or just, though standard, not implemented by the assembler - the second case ensuring a compatibility to the future versions of the JVM. It is a simple statement with a base64 literal as a single argument. The base64 literal specifies the binary content of the attribute. The exact syntax of the statement is defined in the following EBNF expression:
unknown attribute statement = 'unknown', 'attribute', ['code'], base64 literal ;
Example:
unknown attribute [UG9seWZvbiB6d2l0c2NoZXJuZCBhw59lbiBNw6R4Y2hlbnMgVsO2Z2VsIFLDvGJlbiwgSm9naHVydCB1bmQgUXVhcms=];
Macros
The statements which have been described up until now are of the "low level" kind. The "low level" property in this context means especially two things:
- There is a close correspondence between the low level statements and the structures of a the class file format as described in the JVM specification.
- Assembler files consisting entirely of low level statements have the property of being (almost always) round-trip-proof, that is, if you assemble the file and disassemble it again you'll get the same sequence of the low level statements as in original file.
In principle low level statements are entirely sufficient to produce anything acceptable to the JVM. However, after having used the lilac assembler for a while for different reverse engineering purposes the author realized, that, unfortunately, "sufficient" doesn't means comfortable. Regard the simple task of logging the content of some variable to a system output, something you need time and time again while trying to understand how a disassembled class works. To achieve the same result as with this simple java statement:
System.out.println("The content of the variable a is: "+a);
a being a local integer variable, you have to define a big plethora of 31 constants:
const classref System System_name; const utf8 System_name "java/lang/System"; const utf8 out_name "out"; const utf8 out_desc "Ljava/io/PrintStream;"; const nameandtype System.out_nat out_name,out_desc; const fieldref System.out System,System.out_nat; const classref PrintStream PrintStream_name; const utf8 PrintStream_name "java/io/PrintStream"; const utf8 println_name "println"; const utf8 println_desc "(Ljava/lang/String;)V"; const nameandtype PrintStream.println_nat println_name,println_desc; const methodref PrintStream.println PrintStream,PrintStream.println_nat; const utf_8 StringBuffer_name "java/lang/StringBuffer"; const classref StringBuffer StringBuffer_name; const utf8 init0_name "<init>"; const utf8 init0_desc "()V"; const nameandtype StringBuffer.init0_nat init0_name, init0_desc; const methodref StringBuffer.init0 StringBuffer, StringBuffer.init0_nat; const utf8 append_name "append"; const utf8 appendI_desc "(I)Ljava/lang/StringBuffer;"; const nameandtype StringBuffer.appendI_nat append_name, appendI_desc; const methodref StringBuffer.appendI StringBuffer, StringBuffer.appendI_nat; const utf8 appendS_desc "(Ljava/langString;)Ljava/lang/StringBuffer;"; const nameandtype StringBuffer.appendS_nat append_name, appendS_desc; const methodref StringBuffer.appendS StringBuffer, StringBuffer.appendS_nat; const utf8 toString_name "toString"; const utf8 toString_desc "()Ljava/lang/String;"; const nameandtype StringBuffer.toString_nat toString_name, toString_desc; const methodref StringBuffer.toString StringBuffer, StringBuffer.toString_nat; const utf8 prefix_content "The content of the variable a is: "; const string prefix prefix_content;
and 10 instructions:
getstatic System.out; new StringBuffer; dup; invokespecial StringBuffer.init0; ldc invokevirtual StringBuffer.appendS; iload a; invokevirtual StringBuffer.appendI; invokevirtual StringBuffer.toString; invokevirtual PrintStream.println;
And all that you have to do every time you need such a really trivial feature. Of course, one may say, that that is what assembler programming is all about - much routine work, but programmers are lazy and the author is a programmer and so, after having put himself through the ordeal of constant defining multiple times, he understood that shortcuts are urgently needed. The idea of macro statement or, for short, macros (which of course wasn't something really new at all) was born and after some months of work introduced in the version 1.1 of lilac.
There are three different kinds of macros in lilac:
Macro constant statements
Those are just shortcuts to reduce the amount of type work necessary to define a constant.
For example, the following macro class reference statement:
const classref String java/Lang/String;
is equivalent to the following two low level constant statements:
const utf8 String_name "java/Lang/String"; const classref String String_name;
Macro variants of field and method statements
For example the following macro field statement declares a string array field, just like in Java itself.
public java/lang/String [] string_array_field;
On encountering this declaration the assembler will generate all constants needed by a non-macro field statement
Macro instructions
Macro instructions are extensions to the original java assembler which provide the means to generate a commonly used instruction sequence together with necessary constant declarations. For example the following macro instruction .invokevirtual:
.invokevirtual(toString, ref);
calls the method toString (toString is the name of a method reference, which is defined elsewhere) on the local variable ref.
The rest of this chapter covers the syntax and semantics of various macro statements in more detail.
Java type
The following descriptions of the various macro statements make use of the construct java type which is very similar to the type construct in Java itself as specified in the following EBNF expression:
java type = primitive type|class type|array type ; primitive type = 'byte'|'boolean'|'char'|'double'|'float'|'int'|'long'|'short' ; class type = identifier|binary identifier ; array type = (primitive type|class type), '[', ']', {'[', ']'} ;
Examples:
int boolean [] [] String String [] java/lang/Object java/lang/Runtime [] []
If a java type contains a reference to a class type, this reference can be expressed either as an identifier specifying a class constant or a binary identifier which directly specifies the class.
Macro string statement
A macro string statement instructs the assembler to create two constants needed to declare a string constant
The syntax of the statement is as follows:
macro string statement = 'const', 'string', name, string literal, ';' ;
Example:
const string hello_world_str "Hello world!";
Macro class reference statement
A macro class reference statement instructs the assembler to create all constants needed to declare a class reference constant.
The syntax of the statement is as follows:
macro class reference statement = 'const', 'classref', (binary identifier|array type), ['as', class reference name], ';' ;
If 'as' class reference name part is present, which is required if a reference to an array type is being declared, then the defined class reference constant will have the name. Otherwise the last part of the binary identifier will be used as name.
Examples:
const classref java/lang/String
The above statement declares a class reference constant named String which specifies a reference to the class java.lang.String
const classref int[] as int_array;
The above statement declares a class reference constant named int_array which specifies a reference to the array type [I
Macro field reference statement
A macro field reference statement instructs the assembler to create all constants needed to declare a field reference constant.
The syntax of the statement is as follows:
macro field reference statement = 'const', 'fieldref', java type, field name, 'from', (class reference constant name|binary identifier), ['as', field reference name], ';' ;
The name of the generated field reference constant by which it can be accessed in the rest of the source code is specified in the 'as' field reference name if available. If 'as' field reference name clause is not available, then the name will be generated as a concatenation of the class constant name, a point, and the field name, if the declaring class of the field is specified by the name of a class reference constant. If the declaring class is defined by a binary identifier then the last part of this identifier is taken instead and concatenated with a point followed by the field name.
Examples:
const fieldref String 'name' from Car;
The above statement declares a field reference constant named Car.name which specifies a reference to the string field name from the (invented) class com/example/Car. Both the class java/lang/String and the class com/example/Car have been specified by class reference constants.
const fieldref String 'name' from com/example/Car;
The above statement declares again a field reference constant named Car.name which specifies a reference to the string field name from the (invented) class com/example/Car, This time, however, the class com/example/Car has been specified by a binary identifier.
const fieldref String 'name' from com/example/Car as car_name;
This third variant of the two statements above declares again a field reference constant which specifies a reference to the string field name from the (invented) class com/example/Car, However, the name of the resulting field reference constant has been, this time, specified explicitly as car_name.
Macro method reference statement
A macro method reference statement instructs the assembler to create all constants needed to declare a method reference constant or an interface method reference constant.
The syntax of the statement is as follows:
macro method reference statement = const, ('methodref'|'intfmethodref'), method return type, method name, '(', method parameters,')', 'from', (class reference constant name|binary identifier),['as', method reference name], ';' ; method return type = java type|'void' ; method parameters = method parameter, {',', method parameter} ; method parameter = java type, parameter name
The above ebnf expressions describe essentially a syntax which is very similar to the definition of the method signature in the Java language. The name of the method reference generated is specified in the same fashion as has been described above in the specification of the macro field reference statement.
Examples:
const intfmethodref void run() from java/lang/Runnable;
The above statement declares an interface method reference constant named Runnable.run which specifies a reference to the method run in the class java/lang/Runnable.
const methodref java/io/Writer append(CharSequence s, int start, int end) from java/io/Writer as writer_append;
The above statement declares an method reference constant named writer_append which specifies a reference to one of several methods append in the class java/io/Writer. Note that the references to the class java/io/Writer have been specified using a binary identifier while for the reference to the interface java/lang/CharSequence a class reference constant has beed used.
Macro field statement
A macro field statement is a high-level variant of the field statement which instructs the assembler to create automatically all constants needed to specify the name and the signature of the field to be declared.
The syntax of the statement is as follows:
macro field statement = {field modifier}, java type, name , ('{',{field member},'}'|';') ; field modifier = 'public'|'private'|'protected'|'static'|'final'|'volatile'|'transient'|'synthetic'|'enum' ; field member = constant value|signature|synthetic|deprecated|annotation|type annotation|unknown attribute ;
The syntax is a modification of the original field statement which replaces the header with a java-like field declaration (omitting the keyword field) and allows to terminate the statement with ; when no field members are there. The field modifiers and field members are the same as in the original statement except for the name statement as well as descriptor statement which aren't allowed as members anymore because the name and the descriptor of the field are derived from the statement header.
Examples:
public int size; private String title { deprecated; }
Macro method statement
A macro method statement is a high-level variant of the method statement which instructs the assembler to create automatically all constants needed to specify the name and the signature of the method to be declared. Additionally the statement generates local variables corresponding to method parameters as well as the special variable this which can then be used as parameters in subsequent instructions just like ordinary variables.
macro method statement = {method modifier}, method name, '(', method parameters,')', ('{',{method member},'}' | ';' ) ; method return type = java type|'void' ; method parameters = method parameter, {',', method parameter} ; method parameter = java type, parameter name ; method modifier = 'public'|'private'|'protected'|'static'|'final'|'synchronized'|'bridge'|'varargs'|'native'|'abstract'|'strict'|'synthetic' ; method member = exception|signature|synthetic|deprecated|annotation|parameter annotation|type annotation|annotation default|stack map|unknown attribute|variable|instruction|exception handler|line number table|variable table|variable type table|max stack|max locals;
The syntax is a modification of the original method statement which replaces the header with a java-like method declaration (omitting the keyword method) and allows to terminate the statement
with ; when no method members are there. The method modifiers and method members are the same as in the original statement
except for the name statement as well as descriptor statement which aren't allowed as members anymore because the name and the descriptor of the field are derived
from the statement header. Note that additionally to valid java identifiers special words
Examples:
static <clinit>() { line numbers { line ir0, 129; line ir7, 1171; } maxstack 3; //Instructions ir0: iconst_0; anewarray ObjectStreamField; putstatic serialPersistentFields; ir7: new String$CaseInsensitiveComparator; dup; aconst_null; invokespecial String$CaseInsensitiveComparator.init0; putstatic CASE_INSENSITIVE_ORDER; return; } public abstract void init(String title);
Macro instructions
Macro instructions tell the assembler to create a particular sequence of instructions together with possibly needed constants to perform an operation. The syntax and the semantics of a macro instruction call are similar to a call of a function/method in other program languages.
A macro instruction usually expects some parameters of a particular type, which are specified in the macro instruction call, and may return after it's execution a value which is pushed on the stack of JVM and may be used by subsequent instructions.
At the moment only built-in macro instructions can be used as there is no way to extend the assembler with own macro instructions (apart from extending the source code of lilac itself).
Macro instruction statement
A macro instruction statement instructs the assembler to generate a sequence of instructions and the constants as defined by a particular built-in macro instruction. Per convention this sequence of instructions will never change the values on the JVM stack except for pushing the result (a.k.a return value) of the macro instruction on the top of it. Some macro instructions don't have a return value and so don't change the JVM stack at all.
The general syntax of a macro instruction statement is as follows:
macro instruction statement = macro instruction identifier, '(',[macro instruction parameters], ')' ; macro instruction parameters = macro instruction parameter, {macro instruction parameter} ; macro instruction parameter = [cast expression], (constant name|field name|variable name|literal|macro instruction statement)
Some important remarks to the above definition:
- Different from a simple statement a macro instruction statement uses parentheses to group parameters together, just like in Java.
- A macro instruction identifier starts (by convention) with a point.
- The following entities may be used as parameters of a macro instruction statement: literals, local variables, constants (including field and method references), local fields (referred to by their names), results of other macro instructions
- Macro instruction statements (or rather their return values) may used in a recursive fashion as parameters of another macro instruction statements.
- As macro instructions may (and usually do) expect parameters of a particular type cast expressions may be used to tell the assembler to generate the necessary type conversion instructions.
Note: field references and local fields as parameters of a macro instruction don't refer to the field's current value but to the field itself. To use a field's value as a macro instruction parameter you have to use the built-in macro instruction .getfield
Examples
.invokevirtual(concat,this,.invokevirtual(toString,this),(Byte)arg1,(Boolean)arg2,(Char)arg3,(Double)arg4,(Float)arg5,(Int)arg6,(Long)arg7,(Short)arg8);
Macro parameter type conversions
While generating the sequence of instructions for a macro instruction the assembler tries to perform the necessary type conversions of parameters, such as boxing and unboxing, numerical conversions, string conversions etc. on the fly. Additionally it is possible to state a type conversion explicitly via a cast expression ( see above ). Note however that the assembler doesn't check if the specified conversion is really possible, so that generated instructions my still be rejected later in the verification phase either by the assembler itself or by the JVM. A further important point to note is that the assembler doesn't derive the type of the local variables corresponding to the current method parameters from the method signature, because the variable's content can always be changed while the method executes.
Built-in macro instructions
This following sections describe one after another lilac's built-in macro instructions: their purpose, parameters and, where appropriate, return value. As already said above the return value of a macro instruction will be pushed onto the JVM stack.
.getfield
Syntax:
.getfield(instance?, field reference)
Purpose:
Obtains the value from a field.
Parameters:
instance - the object to which the field to obtain belongs. This parameter is omitted if the field is static.
field reference - the field reference or a local field name which specfies the field to obtain.
Returns:
The field's value.
Example:
.getfield(System.out);
.putfield
Syntax:
.putfield(instance?, field reference, value)
Purpose:
Sets the value of a field.
Parameters:
instance - the object to which the field to set belongs. This parameter is omitted if the field is static.
field reference - the field reference or a local field name which specfies the field to set.
value - value to which the field should be set.
Returns:
The field's value.
Example:
.putfield(this, name, "Max");
.invokevirtual
Syntax:
.invokevirtual(method reference, instance?, ...parameters)
Purpose:
Invokes a non-static, non-private method on a class instance.
Parameters:
method reference - the method reference or a local method name which specfies the method to invoke.
instance - the instance on which to invoke the method.
parameters - parameters to pass to the method.
Returns:
The result of the invocation or nothing if the method is void
Example:
.invokevirtual(Object.equals, this, object2);
.invokeinterface
Syntax:
.invokeinterface(interface method reference, instance?, ...parameters)
Purpose:
Invokes an interface method on an interface instance.
Parameters:
interface method reference - the interface method reference which specfies the method to invoke.
instance - the instance on which to invoke the method.
parameters - parameters to pass to the method
Returns:
The result of the invocation or nothing if the method is void
Example:
.invokeinterface(Comparable.compareTo, this, object2);
.invokespecial
Syntax:
.invokespecial(method reference, instance?, ...parameters)
Purpose:
Invokes a private non-static method on a class instance.
Parameters:
method reference - the method reference or local method name which specfies the method to invoke.
instance - the instance on which to invoke the method.
parameters - parameters to pass to the method
Returns:
The result of the invocation or nothing if the method is void
Example:
.invokespecial(initialize, this, 1,2,"Max");
.invokestatic
Syntax:
.invokestatic(method reference, ...parameters)
Purpose:
Invokes a static method.
Parameters:
method reference - the method reference or local method name which specfies the method to invoke.
parameters - parameters to pass to the method
Returns:
The result of the invocation or nothing if the method is void
Example:
.invokestatic(Integer.parseInt, "12",10);
.new - creating a class instance
Syntax:
.new(class reference, constructor reference,...parameters)
Purpose:
Creates an instance of a class and initializes is using the passed constructor method.
Parameters:
class reference - the class to which the instance to create belongs.
constructor reference - the constructor to initialize the new instance
parameters - parameters to pass to the constructor
Returns:
The new initialized instance
Example:
.new(Integer,Integer.init,5);
.new - creating an array instance
Syntax:
.new(array reference, size)
Purpose:
Creates an instance of an arraytype of passed size.
Parameters:
array reference - the array type to which the instance to create belongs.
size - the size of the array to create
Returns:
The new initialized array instance
Example:
.new(IntegerArray,10);
.concat
Syntax:
.concat(parameters)
Purpose:
Creates a string representation of every parameter and concats the resulting strings together.
Parameters:
parameters - parameters to concat
Returns:
The resulting string
Example:
.concat("Name = ",name,", size = ", 5);
.println
Syntax:
.println(parameters)
Purpose:
Creates a string representation of every parameter concats the resulting strings together and prints the resulting line to the console.
Parameters:
parameters - parameters to concat and print.
Returns:
no result
Example:
.println("Name = ",name,", size = ", 5);
.sprintln
Syntax:
.sprintln(out, parameters)
Purpose:
Creates a string representation of every parameter concats the resulting strings together and prints the resulting line to the stream passed with the first parameter.
Parameters:
out - the stream
parameters - parameters to concat and print.‚
Returns:
no result
Example:
.sprintln(.getfield(System.out), "Name = ",name,", size = ", 5);