Applying Compiler Techniques to Iterate At Blazing Speed

3,364 views

Published on

In this session, we will present real life applications of compiler techniques helping kaChing achieve ultra confidence and power its incredible 5 minutes commit-to-production cycle [1]. We'll talk about idempotency analysis [2], dependency detection, on the fly optimisations, automatic memoization [3], type unification [4] and more! This talk is not suitable for the faint-hearted... If you want to dive deep, learn about advanced JVM topics, devoure bytecode and see first hand applications of theoretical computer science, join us.

[1] http://eng.kaching.com/2010/05/deployment-infrastructure-for.html

[2] http://en.wikipedia.org/wiki/Idempotence

[3] http://en.wikipedia.org/wiki/Memoization

[4] http://eng.kaching.com/2009/10/unifying-type-parameters-in-java.html

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,364
On SlideShare
0
From Embeds
0
Number of Embeds
1,311
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • A token is a string of characters, categorized according to the rules as a symbol.
  • Abstract interpretation
  • n compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is an intermediate representation (IR) in which every variable is assigned exactly once. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version
  • In functional language compilers, such as those for Scheme, ML and Haskell, continuation-passing style (CPS) is generally used where one might expect to find SSA in a compiler for Fortran or C. SSA is formally equivalent to a well-behaved subset of CPS
  • A token is a string of characters, categorized according to the rules as a symbol.
  • A token is a string of characters, categorized according to the rules as a symbol.
  • A token is a string of characters, categorized according to the rules as a symbol.
  • A token is a string of characters, categorized according to the rules as a symbol.
  • A token is a string of characters, categorized according to the rules as a symbol.
  • Internal name of the method’s owner class, method’s name and method’s descriptor.
  • Pretty tricky
  • Applying Compiler Techniques to Iterate At Blazing Speed

    1. 1. Applying Compiler Techniques to Iterate at Blazing Speed<br />@pascallouis<br />julien@kaching.com<br />
    2. 2. Engineering at kaChing<br />TDD from day one<br />Full regression suite runs in less than 3 minutes<br />Deploy to production 30+ times a day<br />People have written and launched new features during interview process<br />
    3. 3. Agenda<br />Apply Compiler Techniques<br />?<br />Profit!<br />
    4. 4. Seriously…<br />(Java focused)<br />Software Analysis<br />Anatomy of a compiler<br />Creating meta tests <br />Leveraging Types<br />Levels of interpretation<br />Descriptors and signatures<br />DRY your code (less bugs, greater reach for experts, higher testability)<br />
    5. 5. Software Analysis<br />Running a series of analyses on the code base.<br />Catch common mistakes due to distracted developers, new hires or bad APIs. <br />
    6. 6. Anatomy of a Compiler<br />Annotated Abstract Syntax Tree<br />Semantic Analysis<br />Intermediate Representation Generation<br />Intermediate Representation<br />Abstract Syntax Tree<br />Syntactic Analysis<br />Optimization<br />Optimized Intermediate <br />Representation<br />Tokens<br />Lexical Analysis<br />Machine Code Generation<br />Target Code<br />Source Code<br />
    7. 7. int x 1; <br />int y x + 2;<br />Lexical Analysis<br />IDENT(int) IDENT(x) ASGN NUMBER(1) SEMICOLON<br />IDENT(int) IDENT(y) ASGN IDENT(x) PLUS NUMBER(2) SEMICOLON<br />
    8. 8. IDENT(int) IDENT(x) ASGN NUMBER(1) SEMICOLON<br />IDENT(int) IDENT(y) ASGN IDENT(x) PLUS NUMBER(2) SEMICOLON<br />Syntactic Analysis<br />PROGRAM(<br /> LET(x, int, 1),<br /> LET(y, int, PLUS(x, 2)))<br />
    9. 9. PROGRAM(<br /> LET(x, int, 1),<br /> LET(y, int, PLUS(x, 2)))<br />Semantic Analysis: Symbols<br />PROGRAM(<br /> LET(x, int, 1),<br /> LET(y, int, PLUS(x, 2)))<br />
    10. 10. PROGRAM(<br /> LET(x, int, 1),<br /> LET(y, int, PLUS(x, 2)))<br />Semantic Analysis: Types<br />PROGRAM(<br /> LET(x: int, int, 1: int),<br /> LET(y: int, int, PLUS(x: int, 2: int): int))<br />
    11. 11. Optimizations<br />Simple optimizations can be done on the Abstract Syntax Tree<br />Other optimizations require specialized representations<br />Static Single Assignment form<br />Control Flow Graph<br />
    12. 12. PROGRAM(<br /> LET(x: int, int, 1: int),<br /> LET(y: int, int, PLUS(3: int, 2: int): int))<br />Constant folding<br />PROGRAM(<br /> LET(x: int, int, 1: int),<br /> LET(y: int, int, 5: int))<br />
    13. 13. int x1 (a1 + b1) / c1; <br />int y1(a1+ b1) / d1;<br />Common sub-expression elimination<br />int temp  a + b;<br />int x  temp / d;<br />inty  temp / d;<br />
    14. 14. int x1 (a1 + b1) / c1;<br />a2a1 + 1;<br />int y1(a2+ b1) / d1;<br />Common sub-expression elimination<br />intx (a + b) / c;<br />a  a + 1;<br />inty (a + b) / d;<br />
    15. 15. Anatomy of a Compiler<br />Semantic Analysis<br />Intermediate Representation Generation<br />Syntactic Analysis<br />Optimization<br />Lexical Analysis<br />Machine Code Generation<br />
    16. 16. javac<br />Target Code<br />Source Code<br />
    17. 17. PMD<br />Annotated Abstract Syntax Tree<br />Source Code<br />
    18. 18. joeq<br />Intermediate Representation<br />Source Code<br />
    19. 19. scalac<br />Semantic Analysis<br />Intermediate Representation Generation<br />Syntactic Analysis<br />Optimization<br />Lexical Analysis<br />Machine Code Generation<br />
    20. 20. Finding bad code snippets<br />Describe bad code snippets using regular expressions.<br />Analysis done on the source code, before lexical analysis as information such as whitespaces are lost. <br />Extremely easy to implement.<br />
    21. 21. @CodeSnippets({<br />@Check(paths = {"src", "srctest"}, snippets = {<br />@Snippet("if"),<br />@Snippet("super")<br /> },<br />@Check(paths = {"srctest"}, snippets = {<br />@Snippet("@Ignore ")<br /> })<br />})<br />@RunWith(BadCodeSnippetsRunner.class)<br />public classBadCodeSnippetsTest {<br />}<br />
    22. 22. for(Snippet s : snippets) {<br />if(patterns.get(s).matcher(line).find()) {<br />uses.get(s).add(file);<br /> }<br />}<br />
    23. 23. Forbidden Calls<br />scala>BigDecimal.valueOf(1).equals(BigDecimal.valueOf(1.0))<br />res1: Boolean = false<br />scala>BigDecimal.valueOf(1).compareTo(BigDecimal.valueOf(1.0)) == 0<br />res2: Boolean = true<br />
    24. 24. @ForbiddenCalls( {<br />@Check(paths = {"bin"}, forbiddenMethods = {<br />"java.math.BigDecimal#equals(java.lang.Object)",<br /> })<br />})<br />@RunWith(ForbiddenCallsTestRunner.class)<br />public class ForbiddenCallsTest {<br />}<br />
    25. 25. Finding Forbidden Calls<br />Must be done after the typed AST is created. <br />a.equals(b)<br />
    26. 26. voiddoStuff(BigDecimal a, BigDecimal b) {<br />boolean c = a.equals(b); <br />}<br />Code:<br /> 0: aload_1<br /> 1: aload_2<br /> 2: invokevirtual #2;<br /> 5: astore_3<br /> 6: return<br />
    27. 27. voiddoStuff(BigDecimal a, BigDecimal b) {<br />boolean c = a.equals(b); <br />}<br />Code:<br /> 0: aload_1<br /> 1: aload_2<br />2: invokevirtual #2;<br /> 5: astore_3<br /> 6: return<br />2:0xb6<br />3: 0x00<br />4: 0x02<br />
    28. 28. const #2 = Method #14.#15;<br />const #14 = class #18;<br />const #15 = NameAndType #19:#20<br />const #18 = Ascizjava/math/BigDecimal;<br />const#19 = Ascizequals;<br />const #20 = Asciz(Ljava/lang/Object;)Z;<br />
    29. 29. voiddoStuff(BigDecimal a, BigDecimal b) {<br />boolean c = a.equals(b); <br />}<br />Code:<br /> 0: aload_1<br /> 1: aload_2<br /> 2: invokevirtualjava/math/BigDecimal.equals(Ljava/lang/Object;)Z<br /> 5: astore_3<br /> 6: return<br />
    30. 30. ClassFile {<br /> u4 magic;<br /> u2 minor_version;<br /> u2 major_version;<br /> u2 constant_pool_count;<br />cp_infoconstant_pool[constant_pool_count-1];<br /> u2 access_flags;<br /> u2 this_class;<br /> u2 super_class;<br /> u2 interfaces_count;<br /> u2 interfaces[interfaces_count];<br /> u2 fields_count;<br />field_infofields[fields_count];<br /> u2 methods_count;<br />method_infomethods[methods_count];<br /> u2 attributes_count;<br />attribute_infoattributes[attributes_count];<br />}<br />
    31. 31. ClassFile {<br /> u4 magic;<br /> u2 minor_version;<br /> u2 major_version;<br /> u2 constant_pool_count;<br />cp_infoconstant_pool[constant_pool_count-1];<br />u2 access_flags;<br /> u2 this_class;<br /> u2 super_class;<br /> u2 interfaces_count;<br /> u2 interfaces[interfaces_count];<br /> u2 fields_count;<br />field_infofields[fields_count];<br /> u2 methods_count;<br />method_infomethods[methods_count];<br /> u2 attributes_count;<br />attribute_infoattributes[attributes_count];<br />}<br />const #18 = Ascizjava/math/BigDecimal;<br />const #19 = Ascizequals;<br />const #20 = Asciz(Ljava/lang/Object;)Z;<br />
    32. 32. ClassFile {<br /> u4 magic;<br /> u2 minor_version;<br /> u2 major_version;<br /> u2 constant_pool_count;<br />cp_infoconstant_pool[constant_pool_count-1];<br /> u2 access_flags;<br /> u2 this_class;<br /> u2 super_class;<br /> u2 interfaces_count;<br /> u2 interfaces[interfaces_count];<br /> u2 fields_count;<br />field_infofields[fields_count];<br /> u2 methods_count;<br />method_infomethods[methods_count];<br />u2 attributes_count;<br />attribute_infoattributes[attributes_count];<br />}<br />method_info {<br /> u2 access_flags;<br /> u2 name_index;<br /> u2 descriptor_index;<br /> u2 attributes_count;<br />attribute_infoattributes[attributes_count];<br />}<br />
    33. 33. method_info{<br /> u2 access_flags;<br /> u2 name_index;<br /> u2 descriptor_index;<br /> u2 attributes_count;<br />attribute_info attributes[attributes_count];<br />}<br />Code_attribute {<br /> u2 attribute_name_index;<br /> u4 attribute_length;<br /> u2 max_stack;<br /> u2 max_locals;<br /> u4 code_length;<br /> u1 code[code_length];<br /> u2 exception_table_length;<br /> {<br /> u2 start_pc;<br /> u2 end_pc;<br /> u2 handler_pc;<br /> u2 catch_type;<br /> } exception_table[exception_table_length];<br /> u2 attributes_count;<br />attribute_infoattributes[attributes_count];<br />}<br />
    34. 34. Code_attribute {<br /> u2 attribute_name_index;<br /> u4 attribute_length;<br /> u2 max_stack;<br /> u2 max_locals;<br /> u4 code_length;<br /> u1 code[code_length];<br />u2 exception_table_length;<br /> {<br /> u2 start_pc;<br /> u2 end_pc;<br /> u2 handler_pc;<br /> u2 catch_type;<br /> } exception_table[exception_table_length];<br /> u2 attributes_count;<br />attribute_info attributes[attributes_count];<br />}<br /> 0: aload_1<br />1: aload_2<br />2: invokevirtual #2;<br />5: astore_3<br />6: return<br />
    35. 35. ASM<br />@Override<br />publicvoidvisitMethodInsn(<br />intopcode, <br /> String owner, <br /> String name, <br /> String descriptor) {<br /> ….<br />}<br />
    36. 36. ASM<br />@Override<br />publicvoidvisitMethodInsn(<br />intopcode, 0xb6<br /> String owner, "java.math.BigDecimal"<br /> String name, "equals"<br /> String descriptor) { "(Ljava/lang/Object;)Z"<br /> ….<br />}<br />
    37. 37. Failed Assertion<br />junit.framework.AssertionFailedError:<br />com.kaching.trading.core.Trade#execute()<br /> calls java.math.BigDecimal#equals(java.lang.Object)<br /> on line 273<br />
    38. 38. Visibility Test<br />class Lists {<br /> …<br />@VisibleForTesting<br />static intcomputeArrayListCapacity(int size) {<br />return (int) Math.min(<br /> 5L + size + (size / 10), Integer.MAX_VALUE);<br /> }<br />}<br />
    39. 39. Visibility Test<br />class QuoteHttpClientimplementsQuoteClient {<br />@Inject HttpClientclient;<br /> Quote getQuote(Symbol<?> symbol) {<br />return …;<br /> }<br />}<br />
    40. 40. @Visibilities({<br />@Check(paths = {"bin"}, visibilities = {<br />@Visibility(value = VisibleForTesting.class, intent = PRIVATE),<br />@Visibility(value = Inject.class, intent = PRIVATE)<br /> })<br />})<br />@RunWith(VisibilityTestRunner.class)<br />public class VisibilityTest {<br />}<br />
    41. 41. Two Passes<br />Find all classes, fields and methods annotated with the specified annotations.<br />Find all instructions referring to these classes, fields and methods.<br />
    42. 42. ASM<br />@Override<br />publicAnnotationVisitorvisitAnnotation(<br /> String descriptor, <br />booleanvisible) {<br /> …<br />}<br />
    43. 43. ASM<br />@Override<br />publicAnnotationVisitorvisitAnnotation(<br /> String descriptor, "Lcom/google/common/annotations/VisibleForTesting;"<br />booleanvisible) { false<br /> …<br />}<br />
    44. 44. booleanisVisibleBy(<br />ParsedElement location,<br />ParsedClasscurrentClass) {<br /> Annotation annotation= annotations.get(location);<br />if(annotation != null) {<br /> …<br /> } else {<br />returntrue; // let's trust the compiler :)<br />}<br />}<br />
    45. 45. Failed Assertion<br />junit.framework.AssertionFailedError:<br />com.kaching.account. ApplicationUploader#upload()<br /> refers to @VisibleForTesting method<br />com.kaching.account.Customer#getState() on line 149<br />
    46. 46. Java 5+ Type System<br />Primitives<br />Objects<br />Generics<br />Wildcards<br />Intersection types<br />
    47. 47. Erasure<br />Object eraser() {returnnewArrayList<String>();}<br />Object obj = eraser();<br />// impossible to recover that obj is a listof string<br />
    48. 48. Compiled classes<br />$ cat MustBeSerializable.java<br />importjava.io.Serializable;<br />interfaceMustBeSerializable<T extendsSerializable> {}<br />$cat ExtendsMustBeSerializable.java<br />class Value {}<br />classExtendsMustBeSerializableimplementsMustBeSerializable<Value> {}<br />
    49. 49. Compiled classes<br />$javacMustBeSerializable.java<br />$rm MustBeSerializable.java<br />$ls<br />MustBeSerializable.classExtendsMustBeSerializable.java <br />$javac ExtendsMustBeSerializable.java -cp .<br />ExtendsMustBeSerializable.java:2: type parameter Value is not within its bound<br />classExtendsMustBeSerializableimplementsMustBeSerializable<Value> {}<br /> ^<br />1 error<br />
    50. 50. Compiled classes<br />Compiler must write type information in class file for proper semantics<br />When compiling other classes, need to read those type information and check against those contracts<br />
    51. 51. Taking a peek at classes<br />$javap -v MustBeSerializable | grep -A 1 'Signature;'<br />const #3 = Asciz Signature;<br />const #4 = Asciz <T::Ljava/io/Serializable;>Ljava/lang/Object;;<br />
    52. 52. Signatures<br />
    53. 53. Signatures<br />Primitives<br />B for byte, C for char, D for double, …<br />Objects<br />Lclassname; such as Ljava/lang/String;<br />Arrays<br />[B for byte[], [[D for double[][]<br />Void<br />V<br />… 8 pages of documentation <br />
    54. 54. With ASM<br />org.objectweb.asm.signature.*<br />org.objectweb.asm.Type<br />
    55. 55. With Reflection<br />java.lang.reflect.Type<br />Class<br />GenericArrayType(Type component)<br />ParametrizedType(Type raw, Type[] arguments)<br />TypeVariable(Type[] bounds, Sting name)<br />WildcardType( Type[] lowerBounds, Type[] upperBounds)<br />
    56. 56. Some Examples<br />String.class<br /> Class<String><br />List<Integer> ParametrizedType(List.class, Integer.class)<br />List<int[]> ParametrizedType(List.class, GenericArrayType(int.class))<br />
    57. 57. Some Examples<br />Map<? extends Shape, ? super Area> ParametrizedType(Map.class, {WildcardType({}, Shape.class),WildcardType(Area.class, {}) })<br />
    58. 58. With Reflection<br />java.lang.Class<br />getGenericSuperclass()<br />getGenericInterfaces()<br />
    59. 59. Concrete Examples<br />Unification<br />Just-In-Time Providers<br />
    60. 60. Unification<br />MyClassimplements Callable<String> { …<br />MyClass.class<br /> .getGenericInterfaces()[0] .getActualTypeArguments()[0] String.class!<br />
    61. 61. Unification<br />But if we have<br />MyClassextendsAbstractCallable<String> { …AbstractCallable<T> implements Callable<T> { …<br />Unification.getActualTypeArgument(MyClass.class, Callable.class, 0);<br />
    62. 62. Unification – Want to Try?<br />class MergeOfIntegerAndStringextends Merge<Integer, String> {}<br />class Merge<K, V> <br />implementsOneTypeParam<Map<K, V>> {} <br />interfaceOneTypeParam<T><br />
    63. 63. Guice<br />class QuoteHttpClientimplementsQuoteClient {<br />@Inject HttpClientclient;<br /> Quote getQuote(Symbol<?> symbol) {<br />return …;<br /> }<br />}<br />
    64. 64. Providers<br />bind(Repository.class) .toProvider(new Provider<Repository>() { Repository get() { return new RepositoryImpl(…); } });<br />
    65. 65. Tedious Providers<br />bind(new TypeLiteral<Marshaller<User>>() {}) .toProvider(new Provider<…>() { Marshaller<User> get() { return TwoLattes .createMarshaller(User.class); } });<br />
    66. 66. It’s get tedious…<br />@Inject Marshaller<User>@Inject Marshaller<Portfolio><br />@Inject Marshaller<Watchlist><br />….<br /><ul><li> Lots and lots and lots of bindings
    67. 67. TwoLatter.createMarshaller(Foo.class)</li></li></ul><li>Just-In-Time Providers<br />bindJit(new TypeLiteral<Marshaller<?>>() {}) .toProvider(new JitProvider<…>() { Marshaller<?> get(Key key) { return TwoLattes.createMarhsaller(<br />extractClassFromKey(key)); } });<br />
    68. 68. Just-In-Time Providers<br />Pattern matching on typesMarshaller<?> is a pattern forMarshaller<User>, Marshaller<Portfolio>, …<br />Can be arbitrary complex, including wildcards, intersection types etc.<br />http://github.com/pascallouisperez/guice-jit-providers<br />
    69. 69. Tools<br />PMD http://pmd.sourceforge.net/<br />Javassisthttp://www.csg.is.titech.ac.jp/~chiba/javassist/<br />FindBugshttp://findbugs.sourceforge.net/<br />Joeqhttp://suif.stanford.edu/~courses/cs243/joeq/index.htmlASM http://asm.ow2.org/<br />Guicehttp://code.google.com/p/google-guice/<br />
    70. 70. References<br />JVM spechttp://www.amazon.com/Java-Virtual-Machine-Specification-2nd/dp/0201432943<br />Class File spec http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf<br />Super Type Tokenshttp://gafter.blogspot.com/2006/12/super-type-tokens.html<br />Unifying Type Parameters in Javahttp://eng.kaching.com/2009/10/unifying-type-parameters-in-java.html<br />Type Safe Bit Fields Using Higher Kinded Typeshttp://eng.kaching.com/2010/08/type-safe-bit-fields-using-higher.html<br />

    ×