Applying Compiler Techniques to Iterate At Blazing Speed

Applying Compiler Techniques to Iterate at Blazing Speed @pascallouis julien@kaching.com

Engineering at kaChing TDD from day one Full regression suite runs in less than 3 minutes Deploy to production 30+ times a day People have written and launched new features during interview process

Agenda Apply Compiler Techniques ? Profit!

Seriously… (Java focused) Software Analysis Anatomy of a compiler Creating meta tests Leveraging Types Levels of interpretation Descriptors and signatures DRY your code (less bugs, greater reach for experts, higher testability)

Software Analysis Running a series of analyses on the code base. Catch common mistakes due to distracted developers, new hires or bad APIs.

Anatomy of a Compiler Annotated Abstract Syntax Tree Semantic Analysis Intermediate Representation Generation Intermediate Representation Abstract Syntax Tree Syntactic Analysis Optimization Optimized Intermediate Representation Tokens Lexical Analysis Machine Code Generation Target Code Source Code

int x 1; int y x + 2; Lexical Analysis IDENT(int) IDENT(x) ASGN NUMBER(1) SEMICOLON IDENT(int) IDENT(y) ASGN IDENT(x) PLUS NUMBER(2) SEMICOLON

IDENT(int) IDENT(x) ASGN NUMBER(1) SEMICOLON IDENT(int) IDENT(y) ASGN IDENT(x) PLUS NUMBER(2) SEMICOLON Syntactic Analysis PROGRAM( LET(x, int, 1), LET(y, int, PLUS(x, 2)))

PROGRAM( LET(x, int, 1), LET(y, int, PLUS(x, 2))) Semantic Analysis: Symbols PROGRAM( LET(x, int, 1), LET(y, int, PLUS(x, 2)))

PROGRAM( LET(x, int, 1), LET(y, int, PLUS(x, 2))) Semantic Analysis: Types PROGRAM( LET(x: int, int, 1: int), LET(y: int, int, PLUS(x: int, 2: int): int))

Optimizations Simple optimizations can be done on the Abstract Syntax Tree Other optimizations require specialized representations Static Single Assignment form Control Flow Graph

PROGRAM( LET(x: int, int, 1: int), LET(y: int, int, PLUS(3: int, 2: int): int)) Constant folding PROGRAM( LET(x: int, int, 1: int), LET(y: int, int, 5: int))

int x1 (a1 + b1) / c1; int y1(a1+ b1) / d1; Common sub-expression elimination int temp  a + b; int x  temp / d; inty  temp / d;

int x1 (a1 + b1) / c1; a2a1 + 1; int y1(a2+ b1) / d1; Common sub-expression elimination intx (a + b) / c; a  a + 1; inty (a + b) / d;

Anatomy of a Compiler Semantic Analysis Intermediate Representation Generation Syntactic Analysis Optimization Lexical Analysis Machine Code Generation

PMD Annotated Abstract Syntax Tree Source Code

joeq Intermediate Representation Source Code

scalac Semantic Analysis Intermediate Representation Generation Syntactic Analysis Optimization Lexical Analysis Machine Code Generation

Finding bad code snippets Describe bad code snippets using regular expressions. Analysis done on the source code, before lexical analysis as information such as whitespaces are lost. Extremely easy to implement.

@CodeSnippets({ @Check(paths = {"src", "srctest"}, snippets = { @Snippet("bif("), @Snippet("super()") }, @Check(paths = {"srctest"}, snippets = { @Snippet("@Ignores ") }) }) @RunWith(BadCodeSnippetsRunner.class) public classBadCodeSnippetsTest { }

for(Snippet s : snippets) { if(patterns.get(s).matcher(line).find()) { uses.get(s).add(file); } }

Forbidden Calls scala>BigDecimal.valueOf(1).equals(BigDecimal.valueOf(1.0)) res1: Boolean = false scala>BigDecimal.valueOf(1).compareTo(BigDecimal.valueOf(1.0)) == 0 res2: Boolean = true

@ForbiddenCalls( { @Check(paths = {"bin"}, forbiddenMethods = { "java.math.BigDecimal#equals(java.lang.Object)", }) }) @RunWith(ForbiddenCallsTestRunner.class) public class ForbiddenCallsTest { }

Finding Forbidden Calls Must be done after the typed AST is created. a.equals(b)

voiddoStuff(BigDecimal a, BigDecimal b) { boolean c = a.equals(b); } Code: 0: aload_1 1: aload_2 2: invokevirtual #2; 5: astore_3 6: return

voiddoStuff(BigDecimal a, BigDecimal b) { boolean c = a.equals(b); } Code: 0: aload_1 1: aload_2 2: invokevirtual #2; 5: astore_3 6: return 2:0xb6 3: 0x00 4: 0x02

const #2 = Method #14.#15; const #14 = class #18; const #15 = NameAndType #19:#20 const #18 = Ascizjava/math/BigDecimal; const#19 = Ascizequals; const #20 = Asciz(Ljava/lang/Object;)Z;

voiddoStuff(BigDecimal a, BigDecimal b) { boolean c = a.equals(b); } Code: 0: aload_1 1: aload_2 2: invokevirtualjava/math/BigDecimal.equals(Ljava/lang/Object;)Z 5: astore_3 6: return

ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_infoconstant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_infofields[fields_count]; u2 methods_count; method_infomethods[methods_count]; u2 attributes_count; attribute_infoattributes[attributes_count]; }

ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_infoconstant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_infofields[fields_count]; u2 methods_count; method_infomethods[methods_count]; u2 attributes_count; attribute_infoattributes[attributes_count]; } const #18 = Ascizjava/math/BigDecimal; const #19 = Ascizequals; const #20 = Asciz(Ljava/lang/Object;)Z;

ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_infoconstant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_infofields[fields_count]; u2 methods_count; method_infomethods[methods_count]; u2 attributes_count; attribute_infoattributes[attributes_count]; } method_info { u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_infoattributes[attributes_count]; }

method_info{ u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_info attributes[attributes_count]; } Code_attribute { u2 attribute_name_index; u4 attribute_length; u2 max_stack; u2 max_locals; u4 code_length; u1 code[code_length]; u2 exception_table_length; { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; } exception_table[exception_table_length]; u2 attributes_count; attribute_infoattributes[attributes_count]; }

Code_attribute { u2 attribute_name_index; u4 attribute_length; u2 max_stack; u2 max_locals; u4 code_length; u1 code[code_length]; u2 exception_table_length; { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; } exception_table[exception_table_length]; u2 attributes_count; attribute_info attributes[attributes_count]; } 0: aload_1 1: aload_2 2: invokevirtual #2; 5: astore_3 6: return

ASM @Override publicvoidvisitMethodInsn( intopcode, String owner, String name, String descriptor) { …. }

ASM @Override publicvoidvisitMethodInsn( intopcode, 0xb6 String owner, "java.math.BigDecimal" String name, "equals" String descriptor) { "(Ljava/lang/Object;)Z" …. }

Failed Assertion junit.framework.AssertionFailedError: com.kaching.trading.core.Trade#execute() calls java.math.BigDecimal#equals(java.lang.Object) on line 273

Visibility Test class Lists { … @VisibleForTesting static intcomputeArrayListCapacity(int size) { return (int) Math.min( 5L + size + (size / 10), Integer.MAX_VALUE); } }

Visibility Test class QuoteHttpClientimplementsQuoteClient { @Inject HttpClientclient; Quote getQuote(Symbol<?> symbol) { return …; } }

@Visibilities({ @Check(paths = {"bin"}, visibilities = { @Visibility(value = VisibleForTesting.class, intent = PRIVATE), @Visibility(value = Inject.class, intent = PRIVATE) }) }) @RunWith(VisibilityTestRunner.class) public class VisibilityTest { }

Two Passes Find all classes, fields and methods annotated with the specified annotations. Find all instructions referring to these classes, fields and methods.

ASM @Override publicAnnotationVisitorvisitAnnotation( String descriptor, booleanvisible) { … }

ASM @Override publicAnnotationVisitorvisitAnnotation( String descriptor, "Lcom/google/common/annotations/VisibleForTesting;" booleanvisible) { false … }

booleanisVisibleBy( ParsedElement location, ParsedClasscurrentClass) { Annotation annotation= annotations.get(location); if(annotation != null) { … } else { returntrue; // let's trust the compiler :) } }

Failed Assertion junit.framework.AssertionFailedError: com.kaching.account. ApplicationUploader#upload() refers to @VisibleForTesting method com.kaching.account.Customer#getState() on line 149

Java 5+ Type System Primitives Objects Generics Wildcards Intersection types

Erasure Object eraser() {returnnewArrayList<String>();} Object obj = eraser(); // impossible to recover that obj is a listof string

Compiled classes $ cat MustBeSerializable.java importjava.io.Serializable; interfaceMustBeSerializable<T extendsSerializable> {} $cat ExtendsMustBeSerializable.java class Value {} classExtendsMustBeSerializableimplementsMustBeSerializable<Value> {}

Compiled classes $javacMustBeSerializable.java $rm MustBeSerializable.java $ls MustBeSerializable.classExtendsMustBeSerializable.java $javac ExtendsMustBeSerializable.java -cp . ExtendsMustBeSerializable.java:2: type parameter Value is not within its bound classExtendsMustBeSerializableimplementsMustBeSerializable<Value> {} ^ 1 error

Compiled classes Compiler must write type information in class file for proper semantics When compiling other classes, need to read those type information and check against those contracts

Taking a peek at classes $javap -v MustBeSerializable | grep -A 1 'Signature;' const #3 = Asciz Signature; const #4 = Asciz <T::Ljava/io/Serializable;>Ljava/lang/Object;;

Signatures Primitives B for byte, C for char, D for double, … Objects Lclassname; such as Ljava/lang/String; Arrays [B for byte[], [[D for double[][] Void V … 8 pages of documentation

With ASM org.objectweb.asm.signature.* org.objectweb.asm.Type

With Reflection java.lang.reflect.Type Class GenericArrayType(Type component) ParametrizedType(Type raw, Type[] arguments) TypeVariable(Type[] bounds, Sting name) WildcardType( Type[] lowerBounds, Type[] upperBounds)

Some Examples String.class Class<String> List<Integer> ParametrizedType(List.class, Integer.class) List<int[]> ParametrizedType(List.class, GenericArrayType(int.class))

Some Examples Map<? extends Shape, ? super Area> ParametrizedType(Map.class, {WildcardType({}, Shape.class),WildcardType(Area.class, {}) })

With Reflection java.lang.Class getGenericSuperclass() getGenericInterfaces()

Concrete Examples Unification Just-In-Time Providers

Unification MyClassimplements Callable<String> { … MyClass.class .getGenericInterfaces()[0] .getActualTypeArguments()[0] String.class!

Unification But if we have MyClassextendsAbstractCallable<String> { …AbstractCallable<T> implements Callable<T> { … Unification.getActualTypeArgument(MyClass.class, Callable.class, 0);

Unification – Want to Try? class MergeOfIntegerAndStringextends Merge<Integer, String> {} class Merge<K, V> implementsOneTypeParam<Map<K, V>> {} interfaceOneTypeParam<T>

Guice class QuoteHttpClientimplementsQuoteClient { @Inject HttpClientclient; Quote getQuote(Symbol<?> symbol) { return …; } }

Providers bind(Repository.class) .toProvider(new Provider<Repository>() { Repository get() { return new RepositoryImpl(…); } });

Tedious Providers bind(new TypeLiteral<Marshaller<User>>() {}) .toProvider(new Provider<…>() { Marshaller<User> get() { return TwoLattes .createMarshaller(User.class); } });

It’s get tedious… @Inject Marshaller<User>@Inject Marshaller<Portfolio> @Inject Marshaller<Watchlist> …. ,[object Object]

TwoLatter.createMarshaller(Foo.class),[object Object]

Just-In-Time Providers Pattern matching on typesMarshaller<?> is a pattern forMarshaller<User>, Marshaller<Portfolio>, … Can be arbitrary complex, including wildcards, intersection types etc. http://github.com/pascallouisperez/guice-jit-providers

Tools PMD http://pmd.sourceforge.net/ Javassisthttp://www.csg.is.titech.ac.jp/~chiba/javassist/ FindBugshttp://findbugs.sourceforge.net/ Joeqhttp://suif.stanford.edu/~courses/cs243/joeq/index.htmlASM http://asm.ow2.org/ Guicehttp://code.google.com/p/google-guice/

References JVM spechttp://www.amazon.com/Java-Virtual-Machine-Specification-2nd/dp/0201432943 Class File spec http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf Super Type Tokenshttp://gafter.blogspot.com/2006/12/super-type-tokens.html Unifying Type Parameters in Javahttp://eng.kaching.com/2009/10/unifying-type-parameters-in-java.html Type Safe Bit Fields Using Higher Kinded Typeshttp://eng.kaching.com/2010/08/type-safe-bit-fields-using-higher.html

Applying Compiler Techniques to Iterate At Blazing Speed

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Applying Compiler Techniques to Iterate At Blazing Speed

Similar to Applying Compiler Techniques to Iterate At Blazing Speed (20)

More from Pascal-Louis Perez

More from Pascal-Louis Perez (14)

Recently uploaded

Recently uploaded (20)

Applying Compiler Techniques to Iterate At Blazing Speed

Editor's Notes