Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data centric Metaprogramming by Vlad Ulreche

694 views

Published on

Data centric Metaprogramming by Vlad Ulreche

Published in: Data & Analytics
  • Be the first to comment

Data centric Metaprogramming by Vlad Ulreche

  1. 1. DATA-CENTRIC METAPROGRAMMING Vlad Ureche
  2. 2. Vlad Ureche PhD in the Scala Team @ EPFL. Soon to graduate ;) ● Working on program transformations focusing on data representation ● Author of miniboxing, which improves generics performance by up to 20x ● Contributed to the Scala compiler and to the scaladoc tool. @ @VladUreche @VladUreche vlad.ureche@gmail.com scala-miniboxing.org
  3. 3. Research ahead* ! * This may not make it into a product. But you can play with it nevertheless.
  4. 4. STOP Please ask if things are not clear!
  5. 5. Motivation Transformation Applications Challenges Conclusion Spark
  6. 6. Motivation Comparison graph from http://fr.slideshare.net/databricks/spark-summit-eu-2015-spark-dataframes-simple-and-fast-analysis-of- structured-data and used with permission.
  7. 7. Motivation Comparison graph from http://fr.slideshare.net/databricks/spark-summit-eu-2015-spark-dataframes-simple-and-fast-analysis-of- structured-data and used with permission. Performance gap between RDDs and DataFrames
  8. 8. Motivation RDD DataFrame
  9. 9. Motivation RDD ● strongly typed ● slower DataFrame
  10. 10. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster
  11. 11. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster
  12. 12. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster ? ● strongly typed ● faster
  13. 13. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster Dataset ● strongly typed ● faster
  14. 14. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster Dataset ● strongly typed ● faster mid-way
  15. 15. Motivation RDD ● strongly typed ● slower DataFrame ● dynamically typed ● faster Dataset ● strongly typed ● faster mid-way Why just mid-way? What can we do to speed them up?
  16. 16. Object Composition
  17. 17. Object Composition class Vector[T] { … }
  18. 18. Object Composition class Vector[T] { … } The Vector collection in the Scala library
  19. 19. Object Composition class Employee(...) ID NAME SALARY class Vector[T] { … } The Vector collection in the Scala library
  20. 20. Object Composition class Employee(...) ID NAME SALARY class Vector[T] { … } The Vector collection in the Scala library Corresponds to a table row
  21. 21. Object Composition class Employee(...) ID NAME SALARY class Vector[T] { … }
  22. 22. Object Composition class Employee(...) ID NAME SALARY class Vector[T] { … }
  23. 23. Object Composition class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
  24. 24. Object Composition class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } Traversal requires dereferencing a pointer for each employee.
  25. 25. A Better Representation Vector[Employee] ID NAME SALARY ID NAME SALARY
  26. 26. A Better Representation NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  27. 27. A Better Representation ● more efficient heap usage ● faster iteration NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  28. 28. The Problem ● Vector[T] is unaware of Employee
  29. 29. The Problem ● Vector[T] is unaware of Employee – Which makes Vector[Employee] suboptimal
  30. 30. The Problem ● Vector[T] is unaware of Employee – Which makes Vector[Employee] suboptimal ● Not limited to Vector, other classes also affected
  31. 31. The Problem ● Vector[T] is unaware of Employee – Which makes Vector[Employee] suboptimal ● Not limited to Vector, other classes also affected – Spark pain point: Functions/closures
  32. 32. The Problem ● Vector[T] is unaware of Employee – Which makes Vector[Employee] suboptimal ● Not limited to Vector, other classes also affected – Spark pain point: Functions/closures – We'd like a "structured" representation throughout
  33. 33. The Problem ● Vector[T] is unaware of Employee – Which makes Vector[Employee] suboptimal ● Not limited to Vector, other classes also affected – Spark pain point: Functions/closures – We'd like a "structured" representation throughout Challenge: No means of communicating this to the compiler
  34. 34. Choice: Safe or Fast
  35. 35. Choice: Safe or Fast This is where my work comes in...
  36. 36. Data-Centric Metaprogramming ● compiler plug-in that allows ● Tuning data representation ● Website: scala-ildl.org
  37. 37. Motivation Transformation Applications Challenges Conclusion Spark
  38. 38. Transformation Definition Application
  39. 39. Transformation Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort
  40. 40. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort
  41. 41. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and complex ● affects code readability ● is verbose ● is error-prone
  42. 42. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and complex ● affects code readability ● is verbose ● is error-prone compiler (automated)
  43. 43. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and complex ● affects code readability ● is verbose ● is error-prone compiler (automated)
  44. 44. Data-Centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... def bypass_length: Int = ... def bypass_apply(i: Int): Employee = ... def bypass_update(i: Int, v: Employee) = ... def bypass_toString: String = ... ... }
  45. 45. Data-Centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... def bypass_length: Int = ... def bypass_apply(i: Int): Employee = ... def bypass_update(i: Int, v: Employee) = ... def bypass_toString: String = ... ... } What to transform? What to transform to?
  46. 46. Data-Centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... def bypass_length: Int = ... def bypass_apply(i: Int): Employee = ... def bypass_update(i: Int, v: Employee) = ... def bypass_toString: String = ... ... } How to transform?
  47. 47. Data-Centric Metaprogramming object VectorOfEmployeeOpt extends Transformation { type Target = Vector[Employee] type Result = EmployeeVector def toResult(t: Target): Result = ... def toTarget(t: Result): Target = ... def bypass_length: Int = ... def bypass_apply(i: Int): Employee = ... def bypass_update(i: Int, v: Employee) = ... def bypass_toString: String = ... ... } How to run methods on the updated representation?
  48. 48. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and complex ● affects code readability ● is verbose ● is error-prone compiler (automated)
  49. 49. Transformation programmer Definition Application ● can't be automated ● based on experience ● based on speculation ● one-time effort ● repetitive and complex ● affects code readability ● is verbose ● is error-prone compiler (automated)
  50. 50. http://infoscience.epfl.ch/record/207050?ln=en
  51. 51. Motivation Transformation Applications Challenges Conclusion Spark
  52. 52. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  53. 53. Scenario class Employee(...) ID NAME SALARY class Vector[T] { … }
  54. 54. Scenario class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … }
  55. 55. Scenario class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY
  56. 56. Scenario class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT
  57. 57. Scenario class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT
  58. 58. Scenario class Employee(...) ID NAME SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY class Vector[T] { … } NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY class NewEmployee(...) extends Employee(...) ID NAME SALARY DEPT Oooops...
  59. 59. Open World Assumption ● Globally anything can happen
  60. 60. Open World Assumption ● Globally anything can happen ● Locally you have full control: – Make class Employee final or – Limit the transformation to code that uses Employee
  61. 61. Open World Assumption ● Globally anything can happen ● Locally you have full control: – Make class Employee final or – Limit the transformation to code that uses Employee How?
  62. 62. Open World Assumption ● Globally anything can happen ● Locally you have full control: – Make class Employee final or – Limit the transformation to code that uses Employee How? Using Scopes!
  63. 63. Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  64. 64. Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  65. 65. Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Now the method operates on the EmployeeVector representation.
  66. 66. Scopes ● Can wrap statements, methods, even entire classes – Inlined immediately after the parser – Definitions are visible outside the "scope"
  67. 67. Scopes ● Can wrap statements, methods, even entire classes – Inlined immediately after the parser – Definitions are visible outside the "scope" ● Mark locally closed parts of the code – Incoming/outgoing values go through conversions – You can reject unexpected values
  68. 68. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  69. 69. Best Representation? Vector[Employee] ID NAME SALARY ID NAME SALARY
  70. 70. Best Representation? It depends. Vector[Employee] ID NAME SALARY ID NAME SALARY
  71. 71. Best ...? NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY It depends. Vector[Employee] ID NAME SALARY ID NAME SALARY
  72. 72. Best ...? Tungsten repr. <compressed binary blob> NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY It depends. Vector[Employee] ID NAME SALARY ID NAME SALARY
  73. 73. Best ...? EmployeeJSON { id: 123, name: “John Doe” salary: 100 } Tungsten repr. <compressed binary blob> NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY It depends. Vector[Employee] ID NAME SALARY ID NAME SALARY
  74. 74. Scopes allow mixing data representations transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) }
  75. 75. Scopes transform(VectorOfEmployeeOpt) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Operating on the EmployeeVector representation.
  76. 76. Scopes transform(VectorOfEmployeeCompact) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Operating on the compact binary representation.
  77. 77. Scopes transform(VectorOfEmployeeJSON) { def indexSalary(employees: Vector[Employee], by: Float): Vector[Employee] = for (employee ← employees) yield employee.copy( salary = (1 + by) * employee.salary ) } Operating on the JSON-based representation.
  78. 78. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  79. 79. Composition ● Code can be – Left untransformed (using the original representation) – Transformed using different representations
  80. 80. Composition ● Code can be – Left untransformed (using the original representation) – Transformed using different representations calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  81. 81. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  82. 82. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  83. 83. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation Easy one. Do nothing
  84. 84. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  85. 85. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  86. 86. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  87. 87. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation Automatically introduce conversions between values in the two representations e.g. EmployeeVector Vector[Employee] or back→
  88. 88. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  89. 89. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  90. 90. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  91. 91. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation Hard one. Do not introduce any conversions. Even across separate compilation
  92. 92. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  93. 93. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation Hard one. Automatically introduce double conversions (and warn the programmer) e.g. EmployeeVector Vector[Employee] CompactEmpVector→ →
  94. 94. Composition calling ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  95. 95. Composition calling overriding ● Original code ● Transformed code ● Original code ● Transformed code ● Same transformation ● Different transformation
  96. 96. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... }
  97. 97. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... } Method print in the class implements method print in the trait
  98. 98. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... }
  99. 99. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } transform(VectorOfEmployeeOpt) { class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... } }
  100. 100. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } transform(VectorOfEmployeeOpt) { class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... } } The signature of method print changes according to the transformation it no→ longer implements the trait
  101. 101. Scopes trait Printer[T] { def print(elements: Vector[T]): Unit } transform(VectorOfEmployeeOpt) { class EmployeePrinter extends Printer[Employee] { def print(employee: Vector[Employee]) = ... } } The signature of method print changes according to the transformation it no→ longer implements the trait Taken care by the compiler for you!
  102. 102. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  103. 103. Column-oriented Storage NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY
  104. 104. Column-oriented Storage NAME ...NAME EmployeeVector ID ID ... ...SALARY SALARY Vector[Employee] ID NAME SALARY ID NAME SALARY iteration is 5x faster
  105. 105. Retrofitting value class status (3,5) 3 5Header reference
  106. 106. Retrofitting value class status Tuples in Scala are specialized but are still objects (not value classes) = not as optimized as they could be (3,5) 3 5Header reference
  107. 107. Retrofitting value class status 0l + 3 << 32 + 5 (3,5) Tuples in Scala are specialized but are still objects (not value classes) = not as optimized as they could be (3,5) 3 5Header reference
  108. 108. Retrofitting value class status 0l + 3 << 32 + 5 (3,5) Tuples in Scala are specialized but are still objects (not value classes) = not as optimized as they could be (3,5) 3 5Header reference 14x faster, lower heap requirements
  109. 109. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum
  110. 110. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4)
  111. 111. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8)
  112. 112. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18
  113. 113. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18
  114. 114. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum }
  115. 115. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function
  116. 116. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function
  117. 117. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18
  118. 118. Deforestation List(1,2,3).map(_ + 1).map(_ * 2).sum List(2,3,4) List(4,6,8) 18 transform(ListDeforestation) { List(1,2,3).map(_ + 1).map(_ * 2).sum } accumulate function accumulate function compute: 18 6x faster
  119. 119. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  120. 120. Research ahead* ! * This may not make it into a product. But you can play with it nevertheless.
  121. 121. Spark ● Optimizations – DataFrames do deforestation – DataFrames do predicate push-down – DataFrames do code generation ● Code is specialized for the data representation ● Functions are specialized for the data representation
  122. 122. Spark ● Optimizations – RDDs do deforestation – RDDs do predicate push-down – RDDs do code generation ● Code is specialized for the data representation ● Functions are specialized for the data representation
  123. 123. Spark ● Optimizations – RDDs do deforestation – RDDs do predicate push-down – RDDs do code generation ● Code is specialized for the data representation ● Functions are specialized for the data representation This is what makes them slower
  124. 124. Spark ● Optimizations – Datasets do deforestation – Datasets do predicate push-down – Datasets do code generation ● Code is specialized for the data representation ● Functions are specialized for the data representation
  125. 125. User Functions X Y user function f
  126. 126. User Functions serialized data encoded data X Y user function f decode
  127. 127. User Functions serialized data encoded data X Y encoded data user function f decode encode
  128. 128. User Functions serialized data encoded data X Y encoded data user function f decode encode Allocate object Allocate object
  129. 129. User Functions serialized data encoded data X Y encoded data user function f decode encode Allocate object Allocate object
  130. 130. User Functions serialized data encoded data X Y encoded data user function f decode encode
  131. 131. User Functions serialized data encoded data X Y encoded data user function f decode encode Modified user function (automatically derived by the compiler)
  132. 132. User Functions serialized data encoded data encoded data Modified user function (automatically derived by the compiler)
  133. 133. User Functions serialized data encoded data encoded data Modified user function (automatically derived by the compiler) Nowhere near as simple as it looks
  134. 134. Challenge: Transformation not possible ● Example: Calling outside (untransformed) method
  135. 135. Challenge: Transformation not possible ● Example: Calling outside (untransformed) method ● Solution: Issue compiler warnings
  136. 136. Challenge: Transformation not possible ● Example: Calling outside (untransformed) method ● Solution: Issue compiler warnings – Explain why it's not possible: due to the method call
  137. 137. Challenge: Transformation not possible ● Example: Calling outside (untransformed) method ● Solution: Issue compiler warnings – Explain why it's not possible: due to the method call – Suggest how to fix it: enclose the method in a scope
  138. 138. Challenge: Transformation not possible ● Example: Calling outside (untransformed) method ● Solution: Issue compiler warnings – Explain why it's not possible: due to the method call – Suggest how to fix it: enclose the method in a scope ● Reuse the machinery in miniboxing scala-miniboxing.org
  139. 139. Challenge: Internal API changes
  140. 140. Challenge: Internal API changes ● Spark internals rely on Iterator[T] – Requires materializing values – Needs to be replaced throughout the code base – By rather complex buffers
  141. 141. Challenge: Internal API changes ● Spark internals rely on Iterator[T] – Requires materializing values – Needs to be replaced throughout the code base – By rather complex buffers ● Solution: Extensive refactoring/rewrite
  142. 142. Challenge: Automation
  143. 143. Challenge: Automation ● Existing code should run out of the box
  144. 144. Challenge: Automation ● Existing code should run out of the box ● Solution: – Adapt data-centric metaprogramming to Spark – Trade generality for simplicity – Do the right thing for most of the cases
  145. 145. Challenge: Automation ● Existing code should run out of the box ● Solution: – Adapt data-centric metaprogramming to Spark – Trade generality for simplicity – Do the right thing for most of the cases Where are we now?
  146. 146. Prototype
  147. 147. Prototype Hack
  148. 148. Prototype Hack ● Modified version of Spark core – RDD data representation is configurable
  149. 149. Prototype Hack ● Modified version of Spark core – RDD data representation is configurable ● It's very limited: – Custom data repr. only in map, filter and flatMap – Otherwise we revert to costly objects – Large parts of the automation still need to be done
  150. 150. Prototype Hack sc.parallelize(/* 1 million */ records). map(x => ...). filter(x => ...). collect()
  151. 151. Prototype Hack sc.parallelize(/* 1 million */ records). map(x => ...). filter(x => ...). collect()
  152. 152. Prototype Hack sc.parallelize(/* 1 million */ records). map(x => ...). filter(x => ...). collect() Not yet 2x faster, but 1.45x faster
  153. 153. Motivation Transformation Applications Challenges Conclusion Spark Open World Best Representation? Composition
  154. 154. Conclusion ● Object-oriented composition → inefficient representation
  155. 155. Conclusion ● Object-oriented composition → inefficient representation ● Solution: data-centric metaprogramming
  156. 156. Conclusion ● Object-oriented composition → inefficient representation ● Solution: data-centric metaprogramming – Opaque data → Structured data
  157. 157. Conclusion ● Object-oriented composition → inefficient representation ● Solution: data-centric metaprogramming – Opaque data → Structured data – Is it possible? Yes.
  158. 158. Conclusion ● Object-oriented composition → inefficient representation ● Solution: data-centric metaprogramming – Opaque data → Structured data – Is it possible? Yes. – Is it easy? Not really.
  159. 159. Conclusion ● Object-oriented composition → inefficient representation ● Solution: data-centric metaprogramming – Opaque data → Structured data – Is it possible? Yes. – Is it easy? Not really. – Is it worth it? You tell me!
  160. 160. Thank you! Check out scala-ildl.org.
  161. 161. Deforestation and Language Semantics ● Notice that we changed language semantics: – Before: collections were eager – After: collections are lazy – This can lead to effects reordering
  162. 162. Deforestation and Language Semantics ● Such transformations are only acceptable with programmer consent – JIT compilers/staged DSLs can't change semantics – metaprogramming (macros) can, but it should be documented/opt-in
  163. 163. Code Generation ● Also known as – Deep Embedding – Multi-Stage Programming ● Awesome speedups, but restricted to small DSLs ● SparkSQL uses code gen to improve performance – By 2-4x over Spark
  164. 164. Low-level Optimizers ● Java JIT Compiler – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles
  165. 165. Low-level Optimizers ● Java JIT Compiler – Access to the low-level code – Can assume a (local) closed world – Can speculate based on profiles ● Best optimizations break semantics – You can't do this in the JIT compiler! – Only the programmer can decide to break semantics
  166. 166. Scala Macros ● Many optimizations can be done with macros – :) Lots of power – :( Lots of responsibility ● Scala compiler invariants ● Object-oriented model ● Modularity
  167. 167. Scala Macros ● Many optimizations can be done with macros – :) Lots of power – :( Lots of responsibility ● Scala compiler invariants ● Object-oriented model ● Modularity ● Can we restrict macros so they're safer? – Data-centric metaprogramming

×