Successfully reported this slideshow.
Upcoming SlideShare
×

# Advanced Functional Programming in Scala

15,289 views

Published on

This is an introduction to some of the advanced concept in functional programming using Scala
- Higher kind projections
- Contravariant functors
- Streams
- Views
- Type classes
- Stacked mixins models
- Cake pattern
- Magnet pattern
- View bounds
- F-bound polymorphism
- Dataflow back pressure
- Continuation passing style

• Full Name
Comment goes here.

Are you sure you want to Yes No

Are you sure you want to  Yes  No

Are you sure you want to  Yes  No

Are you sure you want to  Yes  No

Are you sure you want to  Yes  No

Are you sure you want to  Yes  No

### Advanced Functional Programming in Scala

1. 1. Advanced Functional Programming in Scala Patrick Nicolas Oct 2013 Rev. July 2015 patricknicolas.blogspot.com www.slideshare.net/pnicolas
2. 2. This is an overview of some interesting advanced features of Scala. It is not meant to be a tutorial and assume that you are familiar with the key constructs of the language. Some of the examples are extracted from Scala for Machine Learning – Packt Publishing
3. 3. Scala has a lot of features ….. Actors Composed futures F-bound Reactive Advanced functional programming? ... among them
4. 4. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
5. 5. Functors and monads are defined as single type higher kinds: M[_]. The problem is to define monadic composition for objects belongs to categories that have two or more types M[_, _] ( i.e. Function1[U, V] ). Higher kind projection Scala support functorial and monadic operations for multi- type categories using higher kind type projection
6. 6. Higher kind projection Let us consider a covariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓: 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑎 → 𝐹(𝑏) The definition of a functor in Scala relies on a single type higher kind M (*) Functors are important concepts in algebraic topology used in defining algebra for tensors for example.
7. 7. Higher kind projection How can we define a functor for classes that have multiple parameterized type? Let’s consider the definition of a tensor using Scala Function1 The covariant CoVector (resp. contravariant Vector) vectors are created through a projection onto the covariant (resp. contravariant) parameterized type T of Function1.
8. 8. Higher kind projection The implementation of the functor for the Vector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type Vector of Tensor The map applies covariant composition, compose of Function1
9. 9. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
10. 10. Contravariant functors Some categories of objects such as covariant tensors or function parameterized on the input or contravariant type (i.e. T => Function1[T, U] for a given type U), require the order of morphisms be reversed. Morphisms on contravariant argument type are transported through contravariant functors.
11. 11. Contravariant functors Let us consider a contravariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑏 → 𝐹(𝑎) The definition of a contravariant functor in Scala relies on a single type higher kind M
12. 12. Contravariant functors The implementation of the contravariant functor for the CoVector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type CoVector of Tensor The map applies covariant composition, andThen of Function1
13. 13. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
14. 14. It is quite common to compose, iteratively or recursively functions, methods or data transformations. Monadic composition Monads extends the concept of functor to support composition (or chaining) of computation into a chain
15. 15. Monads are abstract structures in algebraic topology related to the category theory. A category C is a structure which has ● object {a, b,c...} ● morphism or maps on objects f: a->b ● composition of morphisms f: a->b, g: b->c => f o g: a->c Monads enable the “monadic” composition or chaining of functions or computation on single type argument. Monadic composition
16. 16. Let’s consider the definition of a kernel function Kf as the composition of 2 functions g o h. 𝒦𝑓 𝐱, 𝐲 = 𝑔( 𝑖 ℎ(𝑥𝑖, 𝑦𝑖)) Monadic composition We create a monad to generate any kind of kernel functions Kf, by composing their component g: g1 o g2 o … o gn o h
17. 17. A monad extends a functor with binding method (flatMap) The monadic definition of the kernel function component h Monadic composition
18. 18. Example of Kernel functions 𝒦 𝐱, 𝐲 = 𝑒 − 1 2 𝐱−𝐲 𝜎 2 h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒 − 1 2𝜎2( 𝑥)2 Polynomial kernel 𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲) 𝑑 h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥) 𝑑 Monadic composition Radius basis function kernel
19. 19. The monadic composition consists of chaining the flatMap invocation on the functor, map, that preserves morphisms on kernel functions. Monadic composition The for comprehension closure is a syntactic sugar on the iterative monadic composition.
20. 20. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
21. 21. Streams Streams reduce memory consumption by allocating and releasing chunk of data (or slice or time series) while allowing reuse of intermediate results. Some problems lend themselves to process very large data sets of unknown size for which the execution may have to be aborted or re-applied
22. 22. The large data set is converted into a stream then broken down into manageable slices. The slices are instantiated, processed (i.e. loss function) and released back to the garbage collector, one at the time X0 X1 ….... Xn ………. Xm Data stream 1 2𝑚 𝑦 𝑛 − 𝑓 𝒘|𝑥 𝑛 2 + 𝜆 𝒘 2 Garbage collector Xi Allocate slice .take Release slice .drop Heap Traversal loss function Streams
23. 23. Slices of NOBS observations are allocated one at the time, (take) processed, then released (drop) at the time. Views and streams
24. 24. The reference streamRef has to be weak, in order to have the slices garbage collected. Otherwise the memory consumption increases with each new batch of data. (*) Alternatives: define strmRef as a def or use StreamIterator Views and streams
25. 25. Comparing list, stream and stream with weak references. Views and streams Operating zone
26. 26. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
27. 27. Views Scientific computations require chaining complex data transformations on large data set. There is not always a need to process all elements of the dataset. Scala allows the creation of a view on collections that are the result of a data transformation. The elements are instantiated only once needed.
28. 28. Views Accessing an element of the list requires allocating the entire list in memory. Accessing an element of the view requires allocating only this element in memory.
29. 29. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
30. 30. Type classes Scala libraries classes cannot always be sub-classed. Wrapping library component in a helper class clutters the design. Type classes extends classes functionality without cluttering name spaces (alternative to type classes) The purpose of reusability goes beyond refactoring code. It includes leveraging existing well understood concepts and semantic.
31. 31. Let’s consider the definition of a tensor as being either a vector or a covector. Type classes Let’s extend the concept of tensor with. A metric is computed as the inner product or composition of a Covector and a vector. The computationis implemented by the method Metric.apply
32. 32. Type classes The inner object Metric define the implicit conversion
33. 33. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
34. 34. Stacked mixins models Scala stacked traits and abstract values preserve the core formalism of mathematical expressions. Traditional programming languages compare unfavorably to scientific related language such as R because their inability to follow a strict mathematical formalism: 1. Variable declaration 2. Model definition 3. Instantiation
35. 35. 𝑓 ∈ ℝ 𝑛 → ℝ 𝑛 𝑓 𝑥 = 𝑒 𝑥 𝑔 ∈ ℝ 𝑛 → ℝ ℎ = 𝑔𝑜𝑓 g 𝒙 = 𝑖 𝑥𝑖 Declaration Model Instantiation Stacked mixins models
36. 36. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
37. 37. Stacked mixins models Building machine learning apps requires configurable, dynamic workflows Leverage mixins, inheritance and abstract values to create models and weave data transformation. Factory design patterns have been used to model dynamic systems (GoF). Dependency injection has gain popularity for creating configurable systems (i.e. Spring framework).
38. 38. Stacked mixins models Multiple models and algorithms are typically evaluated by weaving computation tasks. A learning platform is a framework that • Define computational tasks • Wires the tasks (data flow) • Deploys the tasks (*) Overcome limitation of monadic composition (3 level of dynamic binding…) (*) Actor-based deployment
39. 39. Even the simplest workflow, defined as a pipeline of data transformations requires a flexible design … Stacked mixins models
40. 40. Stacked mixins models Summary of the 3 configurability layers of Cake pattern 1. Given the objective of the computation, select the best sequence of module/tasks (i.e. Modeling: Preprocessing + Training + Validating) 2. Given the profile of data input, select the best data transformation for each module (i.e. Data preprocessing: Kalman, DFT, Moving average….) 3. Given the computing platform, select the best implementation for each data transformation (i.e. Kalman: KalmanOnAkka, Spark…)
41. 41. Implementation of Preprocessing module Stacked mixins models
42. 42. Implementation of Preprocessing module using discrete Fourier … and discrete Kalman filter Stacked mixins models
43. 43. d d Preprocessing Loading Reducing Training Validating Preprocessor DFTFilter Kalman EM PCA SVM MLP Reducer Supervisor Clustering Clustering workflow = preprocessing task -> Reducing task Modeling workflow = preprocessing task -> model training task -> model validation Modeling Stacked mixins models
44. 44. A simple clustering workflow requires a preprocessor & reducer. The computation sequence exec transform a time series of element of type U and return a time series of type W as option Stacked mixins models
45. 45. A model is created by processing the original time series of type TS[T] through a preprocessor, a training supervisor and a validator Stacked mixins models
46. 46. Putting all together for a conditional path execution … Stacked mixins models
47. 47. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
48. 48. Magnet pattern Method overloading in Scala has limitations: • Type erasure in the JVM causes collision of type of arguments in overloaded methods • Overloaded methods cannot be lifted into a function • Code may be unecessary duplicated The magnet pattern overcomes these limitations by encapsulating the return and redefining the overloaded methods as implicit functions.
49. 49. Magnet pattern Let’s consider the following three incarnations of the method test These methods have different return types. The first and last methods conflict because of type erasure on T => List[Double]
50. 50. Magnet pattern Step 1: Define generic return type and constructor Step 2: Implement the test methods as implicits
51. 51. Magnet pattern Step 3: Implement the lifted function test as follows The first call invokes the implicit fromTN and the second triggers the implicit fromT. The return type is inferred from the type of argument
52. 52. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
53. 53. View bound Context bound cannot be used to bind the parameterized type of a generic class to a primitive type. Scala view bounds allows to create developers to create class with parameterized types associated to a Scala or Java primitive type.
54. 54. View bound Let’s consider a class which parameterized type can be manipulate as a Float. Context bound is not permissible Constraining the type with a upper bound Float does not work as Float is a final class.
55. 55. View bound The solution is to bind the class type to a Float using an implicit conversion (or view) The <% directive is the short notation for
56. 56. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
57. 57. F-Bound polymorphism is a parametric type polymorphism that constrains the subtypes to themselves using bounds. It is important to write code that catch error at compile time. How can we enforce type integrity in subclasses? F-Bound polymorphism
58. 58. F-Bound polymorphism Let’s create a trait that define a discriminative learning model with method to manipulate data. The class Svm and Mlp implements the Discriminative trait. The problem is that nothing prevent to create a class Nnet that impersonates an Svm class.
59. 59. F-Bound polymorphism One solution is to restrict (or bound) the type to a Discriminative class It prevents a new class to insert itself into the hierarchy. .. but does not guarantee the type integrity for existing classes
60. 60. F-Bound polymorphism The self reference guarantee the integrity of each existing and new subclass. F-Bound polymorphism is a self- referenced bound polymorphism.
61. 61. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Data flow control Continuation passing style
62. 62. Data flow back pressure A data flow control mechanism handling back pressure on bounded mail boxes of upstream actors. Scala actors provide a reliable way to deploy workflows on a distributed environment. However, some nodes may experience slow processing and create performance bottlenecks.
63. 63. Data flow back pressure Actor-based workflow has to consider - Cascading failures => supervision strategy - Cascading bottleneck => Mailbox back-pressure strategy Workers Router, Dispatcher, …
64. 64. Messages passing scheme to process various data streams with transformations. Dataset Workers Controller Watcher Load-> Compute-> Bounded mailboxes <- GetStatus Status -> Completed-> Data flow back pressure
65. 65. Worker actors processes data chunk msg.xt sent by the controller with the transformation msg.fct Message sent by collector to trigger computation Data flow back pressure
66. 66. Watcher actor monitors messages queues report to collector with Status message. GetStatus message sent by the collector has no payload Data flow back pressure
67. 67. Controller creates the workers, bounded mailbox for each worker actor (msgQueues) and the watcher actor. Data flow back pressure
68. 68. The Controller loads the data sets per chunk upon receiving the message Load from the main program. It processes the results of the computation from the worker (Completed) and throttle the input to workers for each Status message. Data flow back pressure
69. 69. The Load message is implemented as a loop that create data chunk which size is adjusted according to the load computed by the watcher and forwarded to the controller, Status Data flow back pressure
70. 70. Simple throttle increases/decreases size of the batch of observations given the current load and specified watermark. Data flow back pressure Selecting faster/slower and less/more accurate version of algorithm can also be used in the regulation strategy
71. 71. Feedback control loop adjusts the size of the batches given the load in mail boxes and complexity of the computation Data flow back pressure
72. 72. • Feedback control loop should be smoothed (moving average, Kalman…) • A larger variety of data flow control actions such as adding more workers, increasing queue capacity, … • The watch dog should handle dead letters, in case of a failure of the feedback control or the workers. • Reactive streams introduced in Akka 2.2+ has a sophisticated TCP-based propagation and back pressure control flows Notes Data flow back pressure
73. 73. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
74. 74. Delimited continuation Continuation Passing Style (CPS) is a technique that abstracts computation unit as a data structure in order to control the state of a computer program, workflow or sequence of data transformations Continuations are used to ‘jump’ to a method that produces a call to the current method. They can be regarded as ‘functional GOTO’
75. 75. Delimited continuation A data transformation (or computation unit) can be extended (continued) with another transformation known as continuation. The continuation is provided as argument of the orginal transformation. Let’s consider the following workflow The first workflow is not a continuation, the second is
76. 76. Delimited continuation A delimited continuation is a section of the workflow that is reified into a function returning a value. This technique relies on control delimiters (shift/reset) to make the continuation composable and reusable.
77. 77. More Scala nuggets… • Domain specific language • Reactive streams • Back-pressure strategy using connection state Wait a minute, there is more…..