SlideShare a Scribd company logo
1 of 167
Download to read offline
DATA-CENTRIC
METAPROGRAMMING
Vlad Ureche
Vlad Ureche
PhD in the Scala Team @ EPFL. Soon to graduate ;)
● Working on program transformations focusing on data representation
● Author of miniboxing, which improves generics performance by up to 20x
● Contributed to the Scala compiler and to the scaladoc tool.
@
@VladUreche
@VladUreche
vlad.ureche@gmail.com
scala-miniboxing.org
Research ahead*
!
* This may not make it into a product.
But you can play with it nevertheless.
STOP
Please ask if things
are not clear!
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Motivation
Comparison graph from http://fr.slideshare.net/databricks/spark-summit-eu-2015-spark-dataframes-simple-and-fast-analysis-of-
structured-data and used with permission.
Motivation
Comparison graph from http://fr.slideshare.net/databricks/spark-summit-eu-2015-spark-dataframes-simple-and-fast-analysis-of-
structured-data and used with permission.
Performance gap between
RDDs and DataFrames
Motivation
RDD DataFrame
Motivation
RDD
●
strongly typed
●
slower
DataFrame
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
?
●
strongly typed
●
faster
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
Dataset
●
strongly typed
●
faster
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
Dataset
●
strongly typed
●
faster mid-way
Motivation
RDD
●
strongly typed
●
slower
DataFrame
●
dynamically typed
●
faster
Dataset
●
strongly typed
●
faster mid-way
Why just mid-way?
What can we do to speed them up?
Object Composition
Object Composition
class Vector[T] { … }
Object Composition
class Vector[T] { … }
The Vector collection
in the Scala library
Object Composition
class Employee(...)
ID NAME SALARY
class Vector[T] { … }
The Vector collection
in the Scala library
Object Composition
class Employee(...)
ID NAME SALARY
class Vector[T] { … }
The Vector collection
in the Scala library
Corresponds to
a table row
Object Composition
class Employee(...)
ID NAME SALARY
class Vector[T] { … }
Object Composition
class Employee(...)
ID NAME SALARY
class Vector[T] { … }
Object Composition
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
Object Composition
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
Traversal requires
dereferencing a pointer
for each employee.
A Better Representation
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
A Better Representation
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
A Better Representation
●
more efficient heap usage
●
faster iteration
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
The Problem
●
Vector[T] is unaware of Employee
The Problem
●
Vector[T] is unaware of Employee
– Which makes Vector[Employee] suboptimal
The Problem
●
Vector[T] is unaware of Employee
– Which makes Vector[Employee] suboptimal
●
Not limited to Vector, other classes also affected
The Problem
●
Vector[T] is unaware of Employee
– Which makes Vector[Employee] suboptimal
●
Not limited to Vector, other classes also affected
– Spark pain point: Functions/closures
The Problem
●
Vector[T] is unaware of Employee
– Which makes Vector[Employee] suboptimal
●
Not limited to Vector, other classes also affected
– Spark pain point: Functions/closures
– We'd like a "structured" representation throughout
The Problem
●
Vector[T] is unaware of Employee
– Which makes Vector[Employee] suboptimal
●
Not limited to Vector, other classes also affected
– Spark pain point: Functions/closures
– We'd like a "structured" representation throughout
Challenge: No means of
communicating this
to the compiler
Choice: Safe or Fast
Choice: Safe or Fast
This is where my
work comes in...
Data-Centric Metaprogramming
●
compiler plug-in that allows
●
Tuning data representation
●
Website: scala-ildl.org
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Transformation
Definition Application
Transformation
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
●
repetitive and complex
●
affects code
readability
●
is verbose
●
is error-prone
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
●
repetitive and complex
●
affects code
readability
●
is verbose
●
is error-prone
compiler (automated)
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
●
repetitive and complex
●
affects code
readability
●
is verbose
●
is error-prone
compiler (automated)
Data-Centric Metaprogramming
object VectorOfEmployeeOpt extends Transformation {
type Target = Vector[Employee]
type Result = EmployeeVector
def toResult(t: Target): Result = ...
def toTarget(t: Result): Target = ...
def bypass_length: Int = ...
def bypass_apply(i: Int): Employee = ...
def bypass_update(i: Int, v: Employee) = ...
def bypass_toString: String = ...
...
}
Data-Centric Metaprogramming
object VectorOfEmployeeOpt extends Transformation {
type Target = Vector[Employee]
type Result = EmployeeVector
def toResult(t: Target): Result = ...
def toTarget(t: Result): Target = ...
def bypass_length: Int = ...
def bypass_apply(i: Int): Employee = ...
def bypass_update(i: Int, v: Employee) = ...
def bypass_toString: String = ...
...
}
What to transform?
What to transform to?
Data-Centric Metaprogramming
object VectorOfEmployeeOpt extends Transformation {
type Target = Vector[Employee]
type Result = EmployeeVector
def toResult(t: Target): Result = ...
def toTarget(t: Result): Target = ...
def bypass_length: Int = ...
def bypass_apply(i: Int): Employee = ...
def bypass_update(i: Int, v: Employee) = ...
def bypass_toString: String = ...
...
}
How to
transform?
Data-Centric Metaprogramming
object VectorOfEmployeeOpt extends Transformation {
type Target = Vector[Employee]
type Result = EmployeeVector
def toResult(t: Target): Result = ...
def toTarget(t: Result): Target = ...
def bypass_length: Int = ...
def bypass_apply(i: Int): Employee = ...
def bypass_update(i: Int, v: Employee) = ...
def bypass_toString: String = ...
...
} How to run methods on the updated representation?
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
●
repetitive and complex
●
affects code
readability
●
is verbose
●
is error-prone
compiler (automated)
Transformation
programmer
Definition Application
●
can't be automated
●
based on experience
●
based on speculation
●
one-time effort
●
repetitive and complex
●
affects code
readability
●
is verbose
●
is error-prone
compiler (automated)
http://infoscience.epfl.ch/record/207050?ln=en
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Scenario
class Employee(...)
ID NAME SALARY
class Vector[T] { … }
Scenario
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
Scenario
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
Scenario
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
class NewEmployee(...)
extends Employee(...)
ID NAME SALARY DEPT
Scenario
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
class NewEmployee(...)
extends Employee(...)
ID NAME SALARY DEPT
Scenario
class Employee(...)
ID NAME SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
class Vector[T] { … }
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
class NewEmployee(...)
extends Employee(...)
ID NAME SALARY DEPT
Oooops...
Open World Assumption
●
Globally anything can happen
Open World Assumption
●
Globally anything can happen
●
Locally you have full control:
– Make class Employee final or
– Limit the transformation to code that uses Employee
Open World Assumption
●
Globally anything can happen
●
Locally you have full control:
– Make class Employee final or
– Limit the transformation to code that uses Employee
How?
Open World Assumption
●
Globally anything can happen
●
Locally you have full control:
– Make class Employee final or
– Limit the transformation to code that uses Employee
How?
Using
Scopes!
Scopes
transform(VectorOfEmployeeOpt) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Scopes
transform(VectorOfEmployeeOpt) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Scopes
transform(VectorOfEmployeeOpt) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Now the method operates
on the EmployeeVector
representation.
Scopes
●
Can wrap statements, methods, even entire classes
– Inlined immediately after the parser
– Definitions are visible outside the "scope"
Scopes
●
Can wrap statements, methods, even entire classes
– Inlined immediately after the parser
– Definitions are visible outside the "scope"
●
Mark locally closed parts of the code
– Incoming/outgoing values go through conversions
– You can reject unexpected values
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Best Representation?
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Best Representation?
It depends.
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Best ...?
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
It depends.
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Best ...?
Tungsten repr.
<compressed binary blob>
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
It depends.
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Best ...?
EmployeeJSON
{
id: 123,
name: “John Doe”
salary: 100
}
Tungsten repr.
<compressed binary blob>
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
It depends.
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Scopes allow mixing data representations
transform(VectorOfEmployeeOpt) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Scopes
transform(VectorOfEmployeeOpt) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Operating on the
EmployeeVector
representation.
Scopes
transform(VectorOfEmployeeCompact) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Operating on the
compact binary
representation.
Scopes
transform(VectorOfEmployeeJSON) {
def indexSalary(employees: Vector[Employee],
by: Float): Vector[Employee] =
for (employee ← employees)
yield employee.copy(
salary = (1 + by) * employee.salary
)
}
Operating on the
JSON-based
representation.
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Composition
●
Code can be
– Left untransformed (using the original representation)
– Transformed using different representations
Composition
●
Code can be
– Left untransformed (using the original representation)
– Transformed using different representations
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Easy one. Do nothing
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Automatically introduce conversions
between values in the two representations
e.g. EmployeeVector Vector[Employee] or back→
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Hard one. Do not introduce any conversions.
Even across separate compilation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Hard one. Automatically introduce double
conversions (and warn the programmer)
e.g. EmployeeVector Vector[Employee] CompactEmpVector→ →
Composition
calling
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Composition
calling
overriding
●
Original code
●
Transformed code
●
Original code
●
Transformed code
●
Same transformation
●
Different transformation
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
Method print in the class
implements
method print in the trait
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
transform(VectorOfEmployeeOpt) {
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
}
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
transform(VectorOfEmployeeOpt) {
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
} The signature of method
print changes according to
the transformation it no→
longer implements the trait
Scopes
trait Printer[T] {
def print(elements: Vector[T]): Unit
}
transform(VectorOfEmployeeOpt) {
class EmployeePrinter extends Printer[Employee] {
def print(employee: Vector[Employee]) = ...
}
} The signature of method
print changes according to
the transformation it no→
longer implements the trait
Taken care by the
compiler for you!
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Column-oriented Storage
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
Column-oriented Storage
NAME ...NAME
EmployeeVector
ID ID ...
...SALARY SALARY
Vector[Employee]
ID NAME SALARY
ID NAME SALARY
iteration is 5x faster
Retrofitting value class status
(3,5)
3 5Header
reference
Retrofitting value class status
Tuples in Scala are specialized but
are still objects (not value classes)
= not as optimized as they could be
(3,5)
3 5Header
reference
Retrofitting value class status
0l + 3 << 32 + 5
(3,5)
Tuples in Scala are specialized but
are still objects (not value classes)
= not as optimized as they could be
(3,5)
3 5Header
reference
Retrofitting value class status
0l + 3 << 32 + 5
(3,5)
Tuples in Scala are specialized but
are still objects (not value classes)
= not as optimized as they could be
(3,5)
3 5Header
reference
14x faster, lower
heap requirements
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4)
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8)
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
transform(ListDeforestation) {
List(1,2,3).map(_ + 1).map(_ * 2).sum
}
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
transform(ListDeforestation) {
List(1,2,3).map(_ + 1).map(_ * 2).sum
}
accumulate
function
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
transform(ListDeforestation) {
List(1,2,3).map(_ + 1).map(_ * 2).sum
}
accumulate
function
accumulate
function
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
transform(ListDeforestation) {
List(1,2,3).map(_ + 1).map(_ * 2).sum
}
accumulate
function
accumulate
function
compute:
18
Deforestation
List(1,2,3).map(_ + 1).map(_ * 2).sum
List(2,3,4) List(4,6,8) 18
transform(ListDeforestation) {
List(1,2,3).map(_ + 1).map(_ * 2).sum
}
accumulate
function
accumulate
function
compute:
18
6x faster
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Research ahead*
!
* This may not make it into a product.
But you can play with it nevertheless.
Spark
●
Optimizations
– DataFrames do deforestation
– DataFrames do predicate push-down
– DataFrames do code generation
●
Code is specialized for the data representation
●
Functions are specialized for the data representation
Spark
●
Optimizations
– RDDs do deforestation
– RDDs do predicate push-down
– RDDs do code generation
●
Code is specialized for the data representation
●
Functions are specialized for the data representation
Spark
●
Optimizations
– RDDs do deforestation
– RDDs do predicate push-down
– RDDs do code generation
●
Code is specialized for the data representation
●
Functions are specialized for the data representation
This is what
makes them slower
Spark
●
Optimizations
– Datasets do deforestation
– Datasets do predicate push-down
– Datasets do code generation
●
Code is specialized for the data representation
●
Functions are specialized for the data representation
User Functions
X Y
user
function
f
User Functions
serialized
data
encoded
data
X Y
user
function
f
decode
User Functions
serialized
data
encoded
data
X Y
encoded
data
user
function
f
decode encode
User Functions
serialized
data
encoded
data
X Y
encoded
data
user
function
f
decode encode
Allocate object Allocate object
User Functions
serialized
data
encoded
data
X Y
encoded
data
user
function
f
decode encode
Allocate object Allocate object
User Functions
serialized
data
encoded
data
X Y
encoded
data
user
function
f
decode encode
User Functions
serialized
data
encoded
data
X Y
encoded
data
user
function
f
decode encode
Modified user function
(automatically derived
by the compiler)
User Functions
serialized
data
encoded
data
encoded
data
Modified user function
(automatically derived
by the compiler)
User Functions
serialized
data
encoded
data
encoded
data
Modified user function
(automatically derived
by the compiler) Nowhere near as
simple as it looks
Challenge: Transformation not possible
●
Example: Calling outside (untransformed) method
Challenge: Transformation not possible
●
Example: Calling outside (untransformed) method
●
Solution: Issue compiler warnings
Challenge: Transformation not possible
●
Example: Calling outside (untransformed) method
●
Solution: Issue compiler warnings
– Explain why it's not possible: due to the method call
Challenge: Transformation not possible
●
Example: Calling outside (untransformed) method
●
Solution: Issue compiler warnings
– Explain why it's not possible: due to the method call
– Suggest how to fix it: enclose the method in a scope
Challenge: Transformation not possible
●
Example: Calling outside (untransformed) method
●
Solution: Issue compiler warnings
– Explain why it's not possible: due to the method call
– Suggest how to fix it: enclose the method in a scope
●
Reuse the machinery in miniboxing
scala-miniboxing.org
Challenge: Internal API changes
Challenge: Internal API changes
●
Spark internals rely on Iterator[T]
– Requires materializing values
– Needs to be replaced throughout the code base
– By rather complex buffers
Challenge: Internal API changes
●
Spark internals rely on Iterator[T]
– Requires materializing values
– Needs to be replaced throughout the code base
– By rather complex buffers
●
Solution: Extensive refactoring/rewrite
Challenge: Automation
Challenge: Automation
●
Existing code should run out of the box
Challenge: Automation
●
Existing code should run out of the box
●
Solution:
– Adapt data-centric metaprogramming to Spark
– Trade generality for simplicity
– Do the right thing for most of the cases
Challenge: Automation
●
Existing code should run out of the box
●
Solution:
– Adapt data-centric metaprogramming to Spark
– Trade generality for simplicity
– Do the right thing for most of the cases
Where are we now?
Prototype
Prototype Hack
Prototype Hack
●
Modified version of Spark core
– RDD data representation is configurable
Prototype Hack
●
Modified version of Spark core
– RDD data representation is configurable
●
It's very limited:
– Custom data repr. only in map, filter and flatMap
– Otherwise we revert to costly objects
– Large parts of the automation still need to be done
Prototype Hack
sc.parallelize(/* 1 million */ records).
map(x => ...).
filter(x => ...).
collect()
Prototype Hack
sc.parallelize(/* 1 million */ records).
map(x => ...).
filter(x => ...).
collect()
Prototype Hack
sc.parallelize(/* 1 million */ records).
map(x => ...).
filter(x => ...).
collect() Not yet 2x faster,
but 1.45x faster
Motivation
Transformation
Applications
Challenges
Conclusion
Spark
Open World
Best Representation?
Composition
Conclusion
●
Object-oriented composition → inefficient representation
Conclusion
●
Object-oriented composition → inefficient representation
●
Solution: data-centric metaprogramming
Conclusion
●
Object-oriented composition → inefficient representation
●
Solution: data-centric metaprogramming
– Opaque data → Structured data
Conclusion
●
Object-oriented composition → inefficient representation
●
Solution: data-centric metaprogramming
– Opaque data → Structured data
– Is it possible? Yes.
Conclusion
●
Object-oriented composition → inefficient representation
●
Solution: data-centric metaprogramming
– Opaque data → Structured data
– Is it possible? Yes.
– Is it easy? Not really.
Conclusion
●
Object-oriented composition → inefficient representation
●
Solution: data-centric metaprogramming
– Opaque data → Structured data
– Is it possible? Yes.
– Is it easy? Not really.
– Is it worth it? You tell me!
Thank you!
Check out scala-ildl.org.
Deforestation and Language Semantics
●
Notice that we changed language semantics:
– Before: collections were eager
– After: collections are lazy
– This can lead to effects reordering
Deforestation and Language Semantics
●
Such transformations are only acceptable with
programmer consent
– JIT compilers/staged DSLs can't change semantics
– metaprogramming (macros) can, but it should be
documented/opt-in
Code Generation
●
Also known as
– Deep Embedding
– Multi-Stage Programming
●
Awesome speedups, but restricted to small DSLs
●
SparkSQL uses code gen to improve performance
– By 2-4x over Spark
Low-level Optimizers
●
Java JIT Compiler
– Access to the low-level code
– Can assume a (local) closed world
– Can speculate based on profiles
Low-level Optimizers
●
Java JIT Compiler
– Access to the low-level code
– Can assume a (local) closed world
– Can speculate based on profiles
●
Best optimizations break semantics
– You can't do this in the JIT compiler!
– Only the programmer can decide to break semantics
Scala Macros
●
Many optimizations can be done with macros
– :) Lots of power
– :( Lots of responsibility
●
Scala compiler invariants
●
Object-oriented model
●
Modularity
Scala Macros
●
Many optimizations can be done with macros
– :) Lots of power
– :( Lots of responsibility
●
Scala compiler invariants
●
Object-oriented model
●
Modularity
●
Can we restrict macros so they're safer?
– Data-centric metaprogramming

More Related Content

What's hot

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to dfMohit Jaggi
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystTakuya UESHIN
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Anyscale
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsStefan Urbanek
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search rideDuyhai Doan
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Marmagna desai
Marmagna desaiMarmagna desai
Marmagna desaijmsthakur
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.netTarun Jain
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explainedStefan Urbanek
 

What's hot (20)

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & Catalyst
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Marmagna desai
Marmagna desaiMarmagna desai
Marmagna desai
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.net
 
For Beginers - ADO.Net
For Beginers - ADO.NetFor Beginers - ADO.Net
For Beginers - ADO.Net
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 

Viewers also liked

MapUp Resources- Map Maker Intro Presentation-Edit 001
MapUp Resources- Map Maker Intro Presentation-Edit 001MapUp Resources- Map Maker Intro Presentation-Edit 001
MapUp Resources- Map Maker Intro Presentation-Edit 001Asith Wijenayake
 
Bitacoras de-tecnologia-1 (1)
Bitacoras de-tecnologia-1 (1)Bitacoras de-tecnologia-1 (1)
Bitacoras de-tecnologia-1 (1)VALERIA RESTREPO
 
Telehealth-WMC (1)
Telehealth-WMC (1)Telehealth-WMC (1)
Telehealth-WMC (1)ATNRadio24
 
jazmin arllette hernandez santos 1° "R"
jazmin arllette hernandez santos 1° "R"jazmin arllette hernandez santos 1° "R"
jazmin arllette hernandez santos 1° "R"arllette
 
BB 24-2015 Lokaal geld rukt op
BB 24-2015 Lokaal geld rukt opBB 24-2015 Lokaal geld rukt op
BB 24-2015 Lokaal geld rukt opEric Schlangen
 
El valor de l'amistat
El valor de l'amistatEl valor de l'amistat
El valor de l'amistatMiriam Micó
 
Ten Tips for Fixing Your Terrible Website
Ten Tips for Fixing Your Terrible WebsiteTen Tips for Fixing Your Terrible Website
Ten Tips for Fixing Your Terrible WebsiteOn-Site
 
Data Communication and Computer Networking
Data Communication and Computer NetworkingData Communication and Computer Networking
Data Communication and Computer NetworkingSauravadhikari47
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsAll Things Open
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...Spark Summit
 

Viewers also liked (20)

Eipak pallantzas - chalkida - v1
Eipak   pallantzas -  chalkida - v1Eipak   pallantzas -  chalkida - v1
Eipak pallantzas - chalkida - v1
 
6.usuario
6.usuario6.usuario
6.usuario
 
MapUp Resources- Map Maker Intro Presentation-Edit 001
MapUp Resources- Map Maker Intro Presentation-Edit 001MapUp Resources- Map Maker Intro Presentation-Edit 001
MapUp Resources- Map Maker Intro Presentation-Edit 001
 
Ferrography test
Ferrography testFerrography test
Ferrography test
 
Bitacoras de-tecnologia-1 (1)
Bitacoras de-tecnologia-1 (1)Bitacoras de-tecnologia-1 (1)
Bitacoras de-tecnologia-1 (1)
 
Telehealth-WMC (1)
Telehealth-WMC (1)Telehealth-WMC (1)
Telehealth-WMC (1)
 
elshazly cv
elshazly cvelshazly cv
elshazly cv
 
Presentación1
Presentación1Presentación1
Presentación1
 
Aaaa apracticadesoftwareyhardware
Aaaa apracticadesoftwareyhardwareAaaa apracticadesoftwareyhardware
Aaaa apracticadesoftwareyhardware
 
jazmin arllette hernandez santos 1° "R"
jazmin arllette hernandez santos 1° "R"jazmin arllette hernandez santos 1° "R"
jazmin arllette hernandez santos 1° "R"
 
BB 24-2015 Lokaal geld rukt op
BB 24-2015 Lokaal geld rukt opBB 24-2015 Lokaal geld rukt op
BB 24-2015 Lokaal geld rukt op
 
El valor de l'amistat
El valor de l'amistatEl valor de l'amistat
El valor de l'amistat
 
AVOmeter
AVOmeterAVOmeter
AVOmeter
 
Ten Tips for Fixing Your Terrible Website
Ten Tips for Fixing Your Terrible WebsiteTen Tips for Fixing Your Terrible Website
Ten Tips for Fixing Your Terrible Website
 
Ferrography test (new)
Ferrography test (new)Ferrography test (new)
Ferrography test (new)
 
Data Communication and Computer Networking
Data Communication and Computer NetworkingData Communication and Computer Networking
Data Communication and Computer Networking
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
 
Mar na literatura
Mar na literaturaMar na literatura
Mar na literatura
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
 

Similar to Data centric Metaprogramming by Vlad Ulreche

Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringMiro Wengner
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Edureka!
 
ADBMS ASSIGNMENT
ADBMS ASSIGNMENTADBMS ASSIGNMENT
ADBMS ASSIGNMENTLori Moore
 
Who needs MVVM? Architecture components & MVP - Timor Surkis, Colu
Who needs MVVM? Architecture components & MVP - Timor Surkis, ColuWho needs MVVM? Architecture components & MVP - Timor Surkis, Colu
Who needs MVVM? Architecture components & MVP - Timor Surkis, ColuDroidConTLV
 
Resource wrappers in C++
Resource wrappers in C++Resource wrappers in C++
Resource wrappers in C++Ilio Catallo
 
Developer Joy - How great teams get s%*t done
Developer Joy - How great teams get s%*t doneDeveloper Joy - How great teams get s%*t done
Developer Joy - How great teams get s%*t doneSven Peters
 
Stored procedures by thanveer danish melayi
Stored procedures by thanveer danish melayiStored procedures by thanveer danish melayi
Stored procedures by thanveer danish melayiMuhammed Thanveer M
 
Geek Moot '09 -- Smarty 101
Geek Moot '09 -- Smarty 101Geek Moot '09 -- Smarty 101
Geek Moot '09 -- Smarty 101Ted Kulp
 
Tasks In this assignment you are required to design and imp.pdf
Tasks In this assignment you are required to design and imp.pdfTasks In this assignment you are required to design and imp.pdf
Tasks In this assignment you are required to design and imp.pdfacsmadurai
 
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docxPage 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docxalfred4lewis58146
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningCloudxLab
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learningknowbigdata
 

Similar to Data centric Metaprogramming by Vlad Ulreche (20)

Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineering
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
 
ADBMS ASSIGNMENT
ADBMS ASSIGNMENTADBMS ASSIGNMENT
ADBMS ASSIGNMENT
 
Who needs MVVM? Architecture components & MVP - Timor Surkis, Colu
Who needs MVVM? Architecture components & MVP - Timor Surkis, ColuWho needs MVVM? Architecture components & MVP - Timor Surkis, Colu
Who needs MVVM? Architecture components & MVP - Timor Surkis, Colu
 
Resource wrappers in C++
Resource wrappers in C++Resource wrappers in C++
Resource wrappers in C++
 
Developer Joy - How great teams get s%*t done
Developer Joy - How great teams get s%*t doneDeveloper Joy - How great teams get s%*t done
Developer Joy - How great teams get s%*t done
 
Stored procedures by thanveer danish melayi
Stored procedures by thanveer danish melayiStored procedures by thanveer danish melayi
Stored procedures by thanveer danish melayi
 
Geek Moot '09 -- Smarty 101
Geek Moot '09 -- Smarty 101Geek Moot '09 -- Smarty 101
Geek Moot '09 -- Smarty 101
 
My sql udf,views
My sql udf,viewsMy sql udf,views
My sql udf,views
 
Demystifying The Solid Works Api
Demystifying The Solid Works ApiDemystifying The Solid Works Api
Demystifying The Solid Works Api
 
Tasks In this assignment you are required to design and imp.pdf
Tasks In this assignment you are required to design and imp.pdfTasks In this assignment you are required to design and imp.pdf
Tasks In this assignment you are required to design and imp.pdf
 
Feature Engineering in NLP.pdf
Feature Engineering in NLP.pdfFeature Engineering in NLP.pdf
Feature Engineering in NLP.pdf
 
report
reportreport
report
 
C++ references
C++ referencesC++ references
C++ references
 
Linq
LinqLinq
Linq
 
Discovering Django - zekeLabs
Discovering Django - zekeLabsDiscovering Django - zekeLabs
Discovering Django - zekeLabs
 
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docxPage 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
 
Templates
TemplatesTemplates
Templates
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Data centric Metaprogramming by Vlad Ulreche