More expressive types for Spark with
Frameless
Miguel Pérez Pasalodos
@Kamugo
Raise your hand if...
● You use Spark in production
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
● You know what generic programming or Shapeless is
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
● You know what generic programming or Shapeless is
● You’ve used Spark with Frameless before
Spark API evolution
RDDs
trait Person { val name: String }
case class Teacher(id: Int, name: String, salary: Double) extends Person
case class Student(id: Int, name: String) extends Person
RDDs
trait Person { val name: String }
case class Teacher(id: Int, name: String, salary: Double) extends Person
case class Student(id: Int, name: String) extends Person
val people: RDD[Person] = sc.parallelize(List(
Teacher(1, "Emma", 60000),
Student(2, "Steve"),
Student(3, "Arnold")
))
Lambdas are (almost) type-safe
val names = people.map(person => person.name)
val names = people.map {
case Teacher(_, name, _) => s"Teacher $name"
case Student(_, name) => s"Student $name"
}
Lambdas are (almost) type-safe
val names = people.map(person => person.name)
val names = people.map {
case Teacher(_, name, _) => s"Teacher $name"
case Student(_, name) => s"Student $name"
}
Possible MatchError
at runtime
RDDs
● Basically, a lazy distributed immutable collection
● Compile-time type-safe
● Schema-less
● How-to non-optimized transformations
● Limited datasources
Our model from now on
case class Person(id: Int, name: String, age: Short)
DataFrames
val people: DataFrame = List(
Person(1, "Miguel", 26),
Person(2, "Sarah", 28),
Person(2, "John", 32)
).toDF()
Mandatory schema
scala> people.printSchema()
root
|-- id: integer (nullable = false)
|-- name: string (nullable = true)
|-- age: short (nullable = false)
scala> people.filter($"age" !== 26).filter($"age" !== 27).explain(true)
== Parsed Logical Plan ==
'Filter NOT ('age = 27)
+- Filter NOT (cast(age#133 as int) = 26)
+- LocalRelation [id#131, name#132, age#133
== Optimized Logical Plan ==
Filter (NOT (cast(age#133 as int) = 26) && NOT (cast(age#133 as int) = 27))
+- LocalRelation [id#131, name#132, age#133
Query optimization
They’re not type-safe :(
val names: DataFrame = people.select("namee")
They’re not type-safe :(
AnalysisException: cannot resolve '`namee`'
given input columns: [id, name, age]
Runtime
val names: DataFrame = people.select("namee")
DataFrames
● Mandatory schema
● Optimized what-to specification
● Compatible with SQL
● Not type-safe
● Extensible DataSource API
Datasets
val people: Dataset[Person] = List(
Person(1, "Miguel", 26),
Person(2, "Sarah", 28),
Person(2, "John", 32)
).toDS()
Datasets
● Try to get the best of both worlds
● We can use lambdas as in RDDs!
○ What about performance?
● Full DataFrame API as DataFrame = Dataset[Row]
● They seem type-safe
We can use the DataFrame API
val names: DataFrame = people.select("namee")
Still not type-safe :(
AnalysisException: cannot resolve '`namee`'
given input columns: [id, name, age]
Runtime
val names: DataFrame = people.select("namee")
But… we can cast them!
val names: Dataset[Int] = people.select("name").as[Int]
But… we can cast them! ...and fail :(
AnalysisException: Cannot up cast `name` from
string to int as it may truncate
Runtime
val names: Dataset[Int] = people.select("name").as[Int]
Lambdas...
val names: Dataset[String] = people.map(_.namee)
Lambdas… are type-safe!
Error: value namee is not a member of PersonCompile
val names: Dataset[String] = people.map(_.namee)
What about performance?
● 2²⁵ random generated people
● 20 parquet files
● 4 cores
people.filter(_.age == 26).count() VS people.filter($"age" === 26).count()
What about performance?
filter(_.age == 26) filter($"age" === 26)
Encoders?
class Car(name: String)
spark.createDataset(List(
new Car("Tesla Model S")
))
Encoders?
Unable to find encoder for type stored in DatasetCompile
class Car(name: String)
spark.createDataset(List(
new Car("Tesla Model S")
))
Encoders?
case class PersonCar(personId: Int, car: Car)
val cars: Dataset[PersonCar] = spark.createDataset(List(
PersonCar(1, new Car("Tesla Model S"))
))
Encoders?
UnsupportedOperationException: No Encoder found for
Car
- field (class: "Car", name: "car")
- root class: "PersonCar"
case class PersonCar(personId: Int, car: Car)
val cars: Dataset[PersonCar] = spark.createDataset(List(
PersonCar(1, new Car("Tesla Model S"))
))
Runtime
Frameless to the rescue!
Frameless
● Wraps the Spark API
● Type-safe non-lambda methods
● No run-time performance differences
● Provides a way to define custom encoders
● Actions are also lazy
Typed Datasets
val peopleFL: TypedDataset[Person] = people.typed
val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
Typed Datasets
No column Symbol with
shapeless.tag.Tagged[String("namee")] of type A in
Person
Compile
val peopleFL: TypedDataset[Person] = people.typed
val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
Column operations are also supported
scala> val agesDivided = peopleFL.select(peopleFL('age)/2)
agesDivided: TypedDataset[Double]
Column operations are also supported
scala> val agesDivided = peopleFL.select(peopleFL('age)/2)
agesDivided: TypedDataset[Double]
val intToString = (x: Int) => x.toString
val udf = peopleFL.makeUDF(intToString)
scala> val result = peopleFL.select(udf(peopleFL('age)))
result: TypedDataset[String]
Aggregations
case class AvgAge(name: String, age: Double)
val ageByName: TypedDataset[AvgAge] = {
peopleFL.groupBy(peopleFL('name)).agg(avg(peopleFL('age)))
}.as[AvgAge]
Custom type encoders: Injection
sealed trait Gender
case object Female extends Gender
case object Male extends Gender
case object Other extends Gender
case class PersonGender(id: Int, gender: Gender)
TypedDataset.create(peopleGender)
Custom encoders: Injection
sealed trait Gender
case object Female extends Gender
case object Male extends Gender
case object Other extends Gender
case class PersonGender(id: Int, gender: Gender)
TypedDataset.create(peopleGender)
Compile Cannot find implicit value for value encoder
Custom encoders: Injection
implicit val genderToInt: Injection[Gender, Int] = Injection(
{
case Female => 1; case Male => 2; case Other => 3
},{
case 1 => Female; case 2 => Male; case 3 => Other
}
)
scala> TypedDataset.create(peopleGender)
res0: TypedDataset[PersonGender] = [id: int, gender: int]
Lazy actions
val numPeopleJob: Job[Long] = people.count().withDescription("...")
val num: Long = numPeopleJob.run()
Lazy actions
val numPeopleJob: Job[Long] = people.count().withDescription("...")
val num: Long = numPeopleJob.run()
val sampleJob = for {
num <- people.count()
sample <- people.take((num/10).toInt)
} yield sample
How?
Encoders are typeclasses
val peopleList = List(Person(1, "Miguel", 26))
val people = spark.createDataset(peopleList)
def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
Encoders are typeclasses
val peopleList = List(Person(1, "Miguel", 26))
val people = spark.createDataset(peopleList)
def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
// It’s the same as
def createDataset[T](data: Seq[T])(implicit encoder: Encoder[T])
Encoders are typeclasses
● Instances provided by SQLImplicits class
● That’s why we need import spark.implicits._ everywhere!
implicit def newSequenceEncoder[T <: Seq[_] : TypeTag]: Encoder[T] =
ExpressionEncoder() // <- Reflection at runtime!
Reflection is not our friend
class Car(name: String)
val cars = Seq(Car("Tesla"))
val ds: Dataset[Car] = spark.createDataset(cars)
Compile Unable to find encoder for type stored in a Dataset.
Reflection is not our friend
class Car(name: String)
val cars = Seq(Car("Tesla"))
val ds: Dataset[Car] = spark.createDataset(cars)
val ds: Dataset[Seq[Cars]] = spark.createDataset(Seq(cars))
Runtime
Compile
No encoder found for Car
Unable to find encoder for type stored in a Dataset.
How different are the Frameless encoders?
def create[A](data: Seq[A])(
implicit
encoder: TypedEncoder[A],
sqlContext: SQLContext
): TypedDataset[A]
Recursive implicit resolution!
implicit def mapEncoder[A: NotCatalystNullable, B](
implicit
encodeA: TypedEncoder[A],
encodeB: TypedEncoder[B]
): TypedEncoder[Map[A, B]]
How to know if our class has a column?
// We were calling people(‘name)
def TypedDataset[T] {
def apply[A](column: Witness.Lt[Symbol])(
implicit
exists: TypedColumn.Exists[T, column.T, A],
encoder: TypedEncoder[A]
): TypedColumn[T, A]
}
How to know if our class has a column?
object TypedColumn.Exists[T, K, V] {
implicit def deriveRecord[T, H <: HList, K, V](
implicit
lgen: LabelledGeneric.Aux[T, H],
selector: Selector.Aux[H, K, V]
): Exists[T, K, V] = new Exists[T, K, V] {}
}
Concepts we need to understand first
● Generic programming and HList
● Literal types
● Phantom types
● Type tagging
● Dependent types
Generic programming!
HList = HNil | ::[A, H <: HList]
Generic programming!
val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil
scala> :type genericMe
::[Int, ::[String, ::[Short, HNil]]]
HList = HNil | ::[A, H <: HList]
Shapeless Generic typeclass
val genericPerson = Generic[Person]
val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil
scala> val me = genericPerson.from(genericMe)
me: Person = Person(1,Miguel,26)
scala> val genericMeAgain = genericPerson.to(me)
gemericMeAgain: genericPerson.Repr = 1 :: Miguel :: 26 :: HNil
Literal types
● A type for each value!
● Gives the compiler power to know about values
var three = 3.narrow
three: Int(3) = 3
Literal types
scala> three+three
res8: Int = 6
scala> three = 4
<console>:38: error: type mismatch;
found : Int(4)
required: Int(3)
trait Increasable
def inc(x: Int with Increasable) = x+1
inc(3.asInstanceOf[Int with Increasable]): Int = 4
inc(3)
error: type mismatch; found: Int(3); required: Int with Increasable
Phantom types and type tagging
● Phantom type: no runtime behaviour
● Type tagging: assign a phantom type to other types
All combined with Shapeless!
"name" ->> 1
res1: Int with KeyTag[String("name"),Int] = 1
All combined with Shapeless!
"name" ->> 1
res1: Int with KeyTag[String("name"),Int] = 1
val me = ("id" ->> 1) :: ("name" ->> "Miguel") :: ("age" ->> 26) :: HNil
::Int with KeyTag[String("id"),Int],
::String with KeyTag[String("name"),String],
::Short with KeyTag[String("age"),Short],
::HNil
LabelledGeneric
val genericPerson = LabelledGeneric[Person]
::Int with KeyTag[Symbol with Tagged[String("id")],Int],
::String with KeyTag[Symbol with Tagged[String("name")],String],
::Short with KeyTag[Symbol with Tagged[String("age")],Short],
HNil
Dependent types
trait Generic[A] {
type Repr
def to(value: A): Repr
}
def getRepr[A](v: A)(gen: Generic[A]): gen.Repr = gen.to(v)
// Is it not the same as this?
def getRepr[A, R](v: A)(gen: Generic2[A, R]): R = ???
Shapeless Witness
trait Witness {
type T
val value: T
}
def getField[A,K,V](value: A with KeyTag[K,V])
(implicit witness: Witness.Aux[K]) = witness.value
// Aux[K] = Witness { type T = K }
>scala getField("name" ->> 1)
res0: String("name") = name
Shapeless Witness
Witness.Aux[A] = Witness { type T = A }
>scala val witness = Witness(‘name)
witness: Witness.Aux[Symbol with Tagged[String("name")]
Witness.Lt[A] = Witness { type T <: A }
// Tagged Symbol is a subtype of Symbol. So previous line is also...
witness: Witness.Lt[Symbol]
Back to Frameless
// We were calling people(‘name)
def TypedDataset[T] {
def apply[A](column: Witness.Lt[Symbol])(
implicit
exists: TypedColumn.Exists[T, column.T, A],
encoder: TypedEncoder[A]
): TypedColumn[T, A]
}
Back to Frameless
object TypedColumn.Exists[T, K, V] {
implicit def deriveRecord[T, H <: HList, K, V](
implicit
lgen: LabelledGeneric.Aux[T, H],
selector: Selector.Aux[H, K, V]
): Exists[T, K, V] = new Exists[T, K, V] {}
}
To use it or not to use it
Type-safe with the same performance
Injections for custom types
Lazy jobs with descriptions
Slower compilation
Not yet stable. No official Spark backward compatibility
More expressive types for Spark with
Frameless
Miguel Pérez Pasalodos
@Kamugo

More expressive types for spark with frameless

  • 1.
    More expressive typesfor Spark with Frameless Miguel Pérez Pasalodos @Kamugo
  • 2.
    Raise your handif... ● You use Spark in production
  • 3.
    Raise your handif... ● You use Spark in production ● You use Spark with Scala
  • 4.
    Raise your handif... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is
  • 5.
    Raise your handif... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is ● You know what generic programming or Shapeless is
  • 6.
    Raise your handif... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is ● You know what generic programming or Shapeless is ● You’ve used Spark with Frameless before
  • 7.
  • 8.
    RDDs trait Person {val name: String } case class Teacher(id: Int, name: String, salary: Double) extends Person case class Student(id: Int, name: String) extends Person
  • 9.
    RDDs trait Person {val name: String } case class Teacher(id: Int, name: String, salary: Double) extends Person case class Student(id: Int, name: String) extends Person val people: RDD[Person] = sc.parallelize(List( Teacher(1, "Emma", 60000), Student(2, "Steve"), Student(3, "Arnold") ))
  • 10.
    Lambdas are (almost)type-safe val names = people.map(person => person.name) val names = people.map { case Teacher(_, name, _) => s"Teacher $name" case Student(_, name) => s"Student $name" }
  • 11.
    Lambdas are (almost)type-safe val names = people.map(person => person.name) val names = people.map { case Teacher(_, name, _) => s"Teacher $name" case Student(_, name) => s"Student $name" } Possible MatchError at runtime
  • 12.
    RDDs ● Basically, alazy distributed immutable collection ● Compile-time type-safe ● Schema-less ● How-to non-optimized transformations ● Limited datasources
  • 13.
    Our model fromnow on case class Person(id: Int, name: String, age: Short)
  • 14.
    DataFrames val people: DataFrame= List( Person(1, "Miguel", 26), Person(2, "Sarah", 28), Person(2, "John", 32) ).toDF()
  • 15.
    Mandatory schema scala> people.printSchema() root |--id: integer (nullable = false) |-- name: string (nullable = true) |-- age: short (nullable = false)
  • 16.
    scala> people.filter($"age" !==26).filter($"age" !== 27).explain(true) == Parsed Logical Plan == 'Filter NOT ('age = 27) +- Filter NOT (cast(age#133 as int) = 26) +- LocalRelation [id#131, name#132, age#133 == Optimized Logical Plan == Filter (NOT (cast(age#133 as int) = 26) && NOT (cast(age#133 as int) = 27)) +- LocalRelation [id#131, name#132, age#133 Query optimization
  • 17.
    They’re not type-safe:( val names: DataFrame = people.select("namee")
  • 18.
    They’re not type-safe:( AnalysisException: cannot resolve '`namee`' given input columns: [id, name, age] Runtime val names: DataFrame = people.select("namee")
  • 19.
    DataFrames ● Mandatory schema ●Optimized what-to specification ● Compatible with SQL ● Not type-safe ● Extensible DataSource API
  • 20.
    Datasets val people: Dataset[Person]= List( Person(1, "Miguel", 26), Person(2, "Sarah", 28), Person(2, "John", 32) ).toDS()
  • 21.
    Datasets ● Try toget the best of both worlds ● We can use lambdas as in RDDs! ○ What about performance? ● Full DataFrame API as DataFrame = Dataset[Row] ● They seem type-safe
  • 22.
    We can usethe DataFrame API val names: DataFrame = people.select("namee")
  • 23.
    Still not type-safe:( AnalysisException: cannot resolve '`namee`' given input columns: [id, name, age] Runtime val names: DataFrame = people.select("namee")
  • 24.
    But… we cancast them! val names: Dataset[Int] = people.select("name").as[Int]
  • 25.
    But… we cancast them! ...and fail :( AnalysisException: Cannot up cast `name` from string to int as it may truncate Runtime val names: Dataset[Int] = people.select("name").as[Int]
  • 26.
  • 27.
    Lambdas… are type-safe! Error:value namee is not a member of PersonCompile val names: Dataset[String] = people.map(_.namee)
  • 28.
    What about performance? ●2²⁵ random generated people ● 20 parquet files ● 4 cores people.filter(_.age == 26).count() VS people.filter($"age" === 26).count()
  • 29.
    What about performance? filter(_.age== 26) filter($"age" === 26)
  • 30.
  • 31.
    Encoders? Unable to findencoder for type stored in DatasetCompile class Car(name: String) spark.createDataset(List( new Car("Tesla Model S") ))
  • 32.
    Encoders? case class PersonCar(personId:Int, car: Car) val cars: Dataset[PersonCar] = spark.createDataset(List( PersonCar(1, new Car("Tesla Model S")) ))
  • 33.
    Encoders? UnsupportedOperationException: No Encoderfound for Car - field (class: "Car", name: "car") - root class: "PersonCar" case class PersonCar(personId: Int, car: Car) val cars: Dataset[PersonCar] = spark.createDataset(List( PersonCar(1, new Car("Tesla Model S")) )) Runtime
  • 34.
  • 35.
    Frameless ● Wraps theSpark API ● Type-safe non-lambda methods ● No run-time performance differences ● Provides a way to define custom encoders ● Actions are also lazy
  • 36.
    Typed Datasets val peopleFL:TypedDataset[Person] = people.typed val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
  • 37.
    Typed Datasets No columnSymbol with shapeless.tag.Tagged[String("namee")] of type A in Person Compile val peopleFL: TypedDataset[Person] = people.typed val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
  • 38.
    Column operations arealso supported scala> val agesDivided = peopleFL.select(peopleFL('age)/2) agesDivided: TypedDataset[Double]
  • 39.
    Column operations arealso supported scala> val agesDivided = peopleFL.select(peopleFL('age)/2) agesDivided: TypedDataset[Double] val intToString = (x: Int) => x.toString val udf = peopleFL.makeUDF(intToString) scala> val result = peopleFL.select(udf(peopleFL('age))) result: TypedDataset[String]
  • 40.
    Aggregations case class AvgAge(name:String, age: Double) val ageByName: TypedDataset[AvgAge] = { peopleFL.groupBy(peopleFL('name)).agg(avg(peopleFL('age))) }.as[AvgAge]
  • 41.
    Custom type encoders:Injection sealed trait Gender case object Female extends Gender case object Male extends Gender case object Other extends Gender case class PersonGender(id: Int, gender: Gender) TypedDataset.create(peopleGender)
  • 42.
    Custom encoders: Injection sealedtrait Gender case object Female extends Gender case object Male extends Gender case object Other extends Gender case class PersonGender(id: Int, gender: Gender) TypedDataset.create(peopleGender) Compile Cannot find implicit value for value encoder
  • 43.
    Custom encoders: Injection implicitval genderToInt: Injection[Gender, Int] = Injection( { case Female => 1; case Male => 2; case Other => 3 },{ case 1 => Female; case 2 => Male; case 3 => Other } ) scala> TypedDataset.create(peopleGender) res0: TypedDataset[PersonGender] = [id: int, gender: int]
  • 44.
    Lazy actions val numPeopleJob:Job[Long] = people.count().withDescription("...") val num: Long = numPeopleJob.run()
  • 45.
    Lazy actions val numPeopleJob:Job[Long] = people.count().withDescription("...") val num: Long = numPeopleJob.run() val sampleJob = for { num <- people.count() sample <- people.take((num/10).toInt) } yield sample
  • 46.
  • 47.
    Encoders are typeclasses valpeopleList = List(Person(1, "Miguel", 26)) val people = spark.createDataset(peopleList) def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
  • 48.
    Encoders are typeclasses valpeopleList = List(Person(1, "Miguel", 26)) val people = spark.createDataset(peopleList) def createDataset[T : Encoder](data: Seq[T]): Dataset[T] // It’s the same as def createDataset[T](data: Seq[T])(implicit encoder: Encoder[T])
  • 49.
    Encoders are typeclasses ●Instances provided by SQLImplicits class ● That’s why we need import spark.implicits._ everywhere! implicit def newSequenceEncoder[T <: Seq[_] : TypeTag]: Encoder[T] = ExpressionEncoder() // <- Reflection at runtime!
  • 50.
    Reflection is notour friend class Car(name: String) val cars = Seq(Car("Tesla")) val ds: Dataset[Car] = spark.createDataset(cars) Compile Unable to find encoder for type stored in a Dataset.
  • 51.
    Reflection is notour friend class Car(name: String) val cars = Seq(Car("Tesla")) val ds: Dataset[Car] = spark.createDataset(cars) val ds: Dataset[Seq[Cars]] = spark.createDataset(Seq(cars)) Runtime Compile No encoder found for Car Unable to find encoder for type stored in a Dataset.
  • 52.
    How different arethe Frameless encoders? def create[A](data: Seq[A])( implicit encoder: TypedEncoder[A], sqlContext: SQLContext ): TypedDataset[A]
  • 53.
    Recursive implicit resolution! implicitdef mapEncoder[A: NotCatalystNullable, B]( implicit encodeA: TypedEncoder[A], encodeB: TypedEncoder[B] ): TypedEncoder[Map[A, B]]
  • 54.
    How to knowif our class has a column? // We were calling people(‘name) def TypedDataset[T] { def apply[A](column: Witness.Lt[Symbol])( implicit exists: TypedColumn.Exists[T, column.T, A], encoder: TypedEncoder[A] ): TypedColumn[T, A] }
  • 55.
    How to knowif our class has a column? object TypedColumn.Exists[T, K, V] { implicit def deriveRecord[T, H <: HList, K, V]( implicit lgen: LabelledGeneric.Aux[T, H], selector: Selector.Aux[H, K, V] ): Exists[T, K, V] = new Exists[T, K, V] {} }
  • 56.
    Concepts we needto understand first ● Generic programming and HList ● Literal types ● Phantom types ● Type tagging ● Dependent types
  • 57.
    Generic programming! HList =HNil | ::[A, H <: HList]
  • 58.
    Generic programming! val genericMe= 1 :: "Miguel" :: (26: Short) :: HNil scala> :type genericMe ::[Int, ::[String, ::[Short, HNil]]] HList = HNil | ::[A, H <: HList]
  • 59.
    Shapeless Generic typeclass valgenericPerson = Generic[Person] val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil scala> val me = genericPerson.from(genericMe) me: Person = Person(1,Miguel,26) scala> val genericMeAgain = genericPerson.to(me) gemericMeAgain: genericPerson.Repr = 1 :: Miguel :: 26 :: HNil
  • 60.
    Literal types ● Atype for each value! ● Gives the compiler power to know about values var three = 3.narrow three: Int(3) = 3
  • 61.
    Literal types scala> three+three res8:Int = 6 scala> three = 4 <console>:38: error: type mismatch; found : Int(4) required: Int(3)
  • 62.
    trait Increasable def inc(x:Int with Increasable) = x+1 inc(3.asInstanceOf[Int with Increasable]): Int = 4 inc(3) error: type mismatch; found: Int(3); required: Int with Increasable Phantom types and type tagging ● Phantom type: no runtime behaviour ● Type tagging: assign a phantom type to other types
  • 63.
    All combined withShapeless! "name" ->> 1 res1: Int with KeyTag[String("name"),Int] = 1
  • 64.
    All combined withShapeless! "name" ->> 1 res1: Int with KeyTag[String("name"),Int] = 1 val me = ("id" ->> 1) :: ("name" ->> "Miguel") :: ("age" ->> 26) :: HNil ::Int with KeyTag[String("id"),Int], ::String with KeyTag[String("name"),String], ::Short with KeyTag[String("age"),Short], ::HNil
  • 65.
    LabelledGeneric val genericPerson =LabelledGeneric[Person] ::Int with KeyTag[Symbol with Tagged[String("id")],Int], ::String with KeyTag[Symbol with Tagged[String("name")],String], ::Short with KeyTag[Symbol with Tagged[String("age")],Short], HNil
  • 66.
    Dependent types trait Generic[A]{ type Repr def to(value: A): Repr } def getRepr[A](v: A)(gen: Generic[A]): gen.Repr = gen.to(v) // Is it not the same as this? def getRepr[A, R](v: A)(gen: Generic2[A, R]): R = ???
  • 67.
    Shapeless Witness trait Witness{ type T val value: T } def getField[A,K,V](value: A with KeyTag[K,V]) (implicit witness: Witness.Aux[K]) = witness.value // Aux[K] = Witness { type T = K } >scala getField("name" ->> 1) res0: String("name") = name
  • 68.
    Shapeless Witness Witness.Aux[A] =Witness { type T = A } >scala val witness = Witness(‘name) witness: Witness.Aux[Symbol with Tagged[String("name")] Witness.Lt[A] = Witness { type T <: A } // Tagged Symbol is a subtype of Symbol. So previous line is also... witness: Witness.Lt[Symbol]
  • 69.
    Back to Frameless //We were calling people(‘name) def TypedDataset[T] { def apply[A](column: Witness.Lt[Symbol])( implicit exists: TypedColumn.Exists[T, column.T, A], encoder: TypedEncoder[A] ): TypedColumn[T, A] }
  • 70.
    Back to Frameless objectTypedColumn.Exists[T, K, V] { implicit def deriveRecord[T, H <: HList, K, V]( implicit lgen: LabelledGeneric.Aux[T, H], selector: Selector.Aux[H, K, V] ): Exists[T, K, V] = new Exists[T, K, V] {} }
  • 71.
    To use itor not to use it Type-safe with the same performance Injections for custom types Lazy jobs with descriptions Slower compilation Not yet stable. No official Spark backward compatibility
  • 72.
    More expressive typesfor Spark with Frameless Miguel Pérez Pasalodos @Kamugo