Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

13,300 views

Published on

- Higher kind projections

- Contravariant functors

- Monadic composition

- Streams

- Views

- Type classes

- Stacked mixins models

- Cake pattern

- Magnet pattern

- View bounds

- F-bound polymorphism

- Dataflow back pressure

- Continuation passing style

No Downloads

Total views

13,300

On SlideShare

0

From Embeds

0

Number of Embeds

928

Shares

0

Downloads

510

Comments

1

Likes

60

No embeds

No notes for slide

The transition from Java and Python to Scala is not that easy: It goes beyond selecting Scala for its obvious benefits.- support functional concepts- leverage open source libraries and framework if needed- fast, distributed enough to handle large data setsScala was the most logical choice.

Scientific programming may very well involved different roles in a project:

Mathematicians for formulas

Data scientists for data processing and modeling

Software engineering for implementation

Dev. Ops and performance engineers for deployment in productionIn order to ease the pain, we tend to to learn/adopt Scala incrementally within a development team.. The problem is that you end up with an inconsistent code base with different levels of quality and the team developed a somewhat negative attitude toward the language.The solution is to select a list of problems or roadblocks (in our case Machine learning) and compare the solution in Scala with Java, Python ... (You sell the outcome not the process).PresentationA set of diverse Scala features or constructs the entire team agreed that Scala is a far better solution than Python or Java.

Here is a list of some of the features of Scala that are particularly valuable for writing scientific workflows, machine learning algorithms and complex analytics solutions.

Covariant

Contra variant

A bilinear form such as Tensor product

Inner product

n-differential forms

…

1 One Div Zero Monads are Elephants Part 2 J. Iry Blog 2007 http://james-iry.blogspot.com/2007/10/monads-are-elephants-part-2.html

2. Monad Design for the Web §7 A Review of Collections as Monads L.G. Meredith Artima 2012

Introduction to Machine Learning §Nonparametric Regression: Smoothing Models. E. Alpaydin MIT Press 2007

A Short Introduction to Learning with Kernels B. Scholkopt, Max Planck Institut f ̈ur Biologische Kybernetik A. Smola Australian National University 2005 http://alex.smola.org/papers/2003/SchSmo03c.pdf

The purpose here is to generate and experiment with any kind of explicit kernels by defining and composing two g and h functions

A function h operates on each feature or component of the vector

A function g is the transformation of the dot product of the two vectors. The dot product is computed by applying the function to all the elements and compute the sum.

The “dot” product K is computed by traversing the two observations (vector of features), computing the sum and finally applying the g transform. The variable type is the type of the function g (F1 = Double => Double)

The map and flatMap transformation applies to the g function or transformation on the inner product.

The flatMap method is implemented by creating a new Kernel and applying the transformation to only one of the component of the Kernel function: function h (in red). This “partial” monadic operation is good enough for building Kernel functions on the fly.

Polynomial functions and Radius basis functions are two of the most commonly used kernel functions. Note: The source code is shown here to illustrate the fact that the implementation in any other language would be a lot more messy and won’t be able to fit in any of those slides.

Note that composed kernel function used the h function of the last invocation in the for instruction.

The method does not expose the functor or KF classes that wraps the components of the kernel function.

Streams vs. Iterator: Iterator does not allow to dynamically select the chunk of memory or preserve it if necessary for future computation.

It is not uncommon to have to train a model as labeled data becomes available (online training). In this case, the size of the data set can be very large and unknown. The processing of the data would result in high memory consumption and heavy load on the Garbage collector.

Finally, the traversing the entire data sets (~ allocating large memory chunk) may not even needed as some computation may abort if some condition is met.

Scala’s streams can help!

Allocate slice of the data set (memory chunk) from the heap using take method.

Release the memory chunk back to the Garbage collection through a drop method.

This example is taking from “Scala for Machine Learning” Packt Publishing.

Most of machine learning algorithms consists of minimizing a loss function or maximizing a likelihood. For each iteration (or recursion) the cumulative loss is computed and passed to the optimizer (i.e. Gradient descent or variant) to update the model parameters.

Each slice of n observations is allocated, processed by the loss function then released back to the Garbage collector.

An observation is defined as y = f(x) where x the feature set containing for instance, age, ethnicity of patient, body temperature and y is the label value such as the diagnosed disease.. The tail recursion allocates the next slice of STEP observations through the take method, computes the lost, nextLoss, then drop the slice. The reference is recursively redefined as the reference to the remaining stream.

The problem is that the garbage collector cannot reclaim the memory because the first reference to the stream is created outside the recursion. The solution is to declare the reference to the stream as weak so it chunks of memory associated to the slices/batches of observations already processed can be reclaimed.

In this case, the weak reference has been used to show Java concepts are still relevant.

A list

A stream with standard reference

A stream with weak reference

In the first scenario, and as expected, the memory for the entire data set is allocated before processing. The memory requirement for the stream with a strong reference increases each time a new slice is instantiated, because the memory block is held by the reference to the original stream. Only the stream with a weak reference guarantees that only the memory for a slice of STEP observations is needed through the entire execution.

Design patterns have been introduced by the “Gang of four” in the eponymous “Design Patterns: Elements of Reusable Object-Oriented software” some 20 years ago… The list of factory design patterns includes Builder, Prototype, Factory method, Composite, Bridge and obviously Singleton.

Those patterns are not very convenient for weaving data transformation (these transformation being defined as class or interface). This is where dependency injection popularized by the Spring framework comes into play.

Beyond composition and inheritance, Scala enables us to implement and chain data transformations/reductions by stacking the traits that declare these transformations or reductions.

Here is another example

Declaration variable 𝑥∈ℝ, 𝑦∈ℝ

Declaration of model f(x,y)=𝑥+𝑦

Instantiation of variable 𝑥=5, 𝑦=7; 𝑓 5,7 =12

Design patterns have been introduced by the “Gang of four” in the eponymous “Design Patterns: Elements of Reusable Object-Oriented software” some 20 years ago… The list of factory design patterns includes Builder, Prototype, Factory method, Composite, Bridge and obviously Singleton.

Those patterns are not very convenient for weaving data transformation (these transformation being defined as class or interface). This is where dependency injection popularized by the Spring framework comes into play.

Beyond composition and inheritance, Scala enables us to implement and chain data transformations/reductions by stacking the traits that declare these transformations or reductions.

Note: As far the 3rd point, deployment of tasks, it usually involves a actor-based (non blocking) distributed architecture such as Akka and Spark. We will mention it briefly later in this presentation is introducing mailbox back-pressure mechanism.

Once defined, the modules are to be weaved/chained by making sure that output of a module/tasks matches the input of the subsequent task.

Notes:

The training module can be broken down further into generative and discriminative models.

Real-world applications are significantly more complex and would include REST service, DAO to access relational database, caches….

The terms “module”, “tasks” or “computational tasks” are used interchangeably in this section.

Preprocessing of a data set is performed by a processor of type Preprocessor that is defined at run time. Therefore the preprocessor has to be declared as an abstract value.

The three preprocessors defined in the preprocessing modules are Kalman filter, Moving Average (MovAv) and Discrete Fourier filter (DFTF). Those 2 inner classes act as adapter or stub to the actual implementation of those algorithm.

The2 inner classes, Kalman and DFTF act as adapter or stub to the actual implementation of those algorithm. It allows the implementation may consist of multiple version. For instance filtering.Kalman is a trait with several implementation of the algorithm (single host, distributed, using Spark…)

Such design allows to

Select the type of preprocessing algorithm within the Preprocessing module or namespace

Select the implementation of this particular type of algorithm/preprocessor in the filtering package

Modeling is therefore implemented as a stack of 3 traits, each representing a transformation or reduction on data sets.

In this simple case, a clustering task is triggered if anomalyDetection is needed, training a model is launched otherwise.

These conditional path execution are important for complex analysis or lengthy computation that require unattended execution (i.e. overnight or over the week-end).

Note: The overriding of the abstract value for the Modeling workflow are omitted here for the sake of clarity

Summary: This factory pattern operates on 3 level of componentization: Dynamic selection of

1- Workflow or sequence of tasks according to the objective of the computation (i.e. Clustering => Preprocessing)

2- Task processing algorithm according to the data (i.e. Preprocessing => Kalman filter)

3- Implementation of task processing according to the environment (i.e. Kalman filter => Implementation on Apache Spark)

A strategy to control the flow (or back-pressure flow) is needed to regulate the data flow across all modules.

This example use a back-pressure handling mechanism that consists of monitoring bounded mail boxes. This is a simplistic approach to flow control described for the sake of illustrating the concept. As we will see later, there is a far more effective mechanism to deal with back-pressure.

Scala/Akka actors are resilient because that are defined with hierarchical context for which an actor because a supervisor to other actors. In this slide, a router is a supervising actor to the workers and, depending on the selected strategy, is responsible of restarting a worker in case of failure.

But, what about the case for which the load (number of messages in the

Upon receiving the message ‘Compute’ the workers process data given a transformation function. The workers returns the processed data through a Completed message.

The purpose of the watch dog actor, ‘Watcher’ is to monitor the utilization of mailbox and report it to the Controller

This is a simple feedback control:

1- Watcher monitors the utilization of the mailbox (average length)

2- The controller adjust the size of each batch in the load message handler (throttling)

3- The workers process the next batch

The load on a worker depends on three variables

1- The amount of data to process

2- The complexity of the data transformation

3- The underlying system (cores, memory..)

The controller provides the slice of data to be processed by the workers msg.xt as well as the data transformation msg.fct.

As far as the configuration is concerned, the Controller generates a list of workers, the bounded mailboxes, msgQueues, for the workers and ultimately the watcher actor.

The worker and watcher are created within the Controller context using the Akka actorOf constructor.

loads partition and distribute batch of data points to the worker actors (message: Load)

processes the results of the computation in workers (message: Completed)

- Throttle up or down the flow upon receiving the status on utilization of mail boxes from the watcher (message: Status)

The composition of the messages processed by the controller are self-explanatory. It

Adjusts the size of the next batch if required (throttle method)

Extracts the next batch of data from the input stream

Partition and distribute the batch across the worker actors

Send the partition along with the data transformation to the workers.

The complexity of the data transformation and has an impact on the load on workers. It varies from 0 (simple map operation) to 2 (complex data processing involving recursion or iterations).

The throttle intensity ranges between -6 (rapid decrease of size of batches) and +6 (rapid increase of size of batches of data)

The top graph displays the actual utilization of the mail boxes with capacity of 512 messages as regulated by the feedback control loop (executed by the controller).

The feedback control loop could be smoothed with a moving average technique or Kalman filter to avoid erratic behavior

We would need to provide a larger range of options for control actions beside adjusting the size of data batches: increase of number of workers, mail box capacity, caching strategy, .. A fine grained set of actions reduces also the risk of instable systems.

The watch dog should be able to handle dead letters in case of failure (mailbox overflowing)

Reactive streams control the flow back pressure at the TCP connection level. It is far more accurate and responsive that mailbox utilization.

- 1. Advanced Functional Programming in Scala Patrick Nicolas Oct 2013 Rev. July 2015 patricknicolas.blogspot.com www.slideshare.net/pnicolas
- 2. This is an overview of some interesting advanced features of Scala. It is not meant to be a tutorial and assume that you are familiar with the key constructs of the language. Some of the examples are extracted from Scala for Machine Learning – Packt Publishing
- 3. Scala has a lot of features ….. Actors Composed futures F-bound Reactive Advanced functional programming? ... among them
- 4. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 5. Functors and monads are defined as single type higher kinds: M[_]. The problem is to define monadic composition for objects belongs to categories that have two or more types M[_, _] ( i.e. Function1[U, V] ). Higher kind projection Scala support functorial and monadic operations for multi- type categories using higher kind type projection
- 6. Higher kind projection Let us consider a covariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓: 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑎 → 𝐹(𝑏) The definition of a functor in Scala relies on a single type higher kind M (*) Functors are important concepts in algebraic topology used in defining algebra for tensors for example.
- 7. Higher kind projection How can we define a functor for classes that have multiple parameterized type? Let’s consider the definition of a tensor using Scala Function1 The covariant CoVector (resp. contravariant Vector) vectors are created through a projection onto the covariant (resp. contravariant) parameterized type T of Function1.
- 8. Higher kind projection The implementation of the functor for the Vector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type Vector of Tensor The map applies covariant composition, compose of Function1
- 9. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 10. Contravariant functors Some categories of objects such as covariant tensors or function parameterized on the input or contravariant type (i.e. T => Function1[T, U] for a given type U), require the order of morphisms be reversed. Morphisms on contravariant argument type are transported through contravariant functors.
- 11. Contravariant functors Let us consider a contravariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑏 → 𝐹(𝑎) The definition of a contravariant functor in Scala relies on a single type higher kind M
- 12. Contravariant functors The implementation of the contravariant functor for the CoVector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type CoVector of Tensor The map applies covariant composition, andThen of Function1
- 13. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 14. It is quite common to compose, iteratively or recursively functions, methods or data transformations. Monadic composition Monads extends the concept of functor to support composition (or chaining) of computation into a chain
- 15. Monads are abstract structures in algebraic topology related to the category theory. A category C is a structure which has ● object {a, b,c...} ● morphism or maps on objects f: a->b ● composition of morphisms f: a->b, g: b->c => f o g: a->c Monads enable the “monadic” composition or chaining of functions or computation on single type argument. Monadic composition
- 16. Let’s consider the definition of a kernel function Kf as the composition of 2 functions g o h. 𝒦𝑓 𝐱, 𝐲 = 𝑔( 𝑖 ℎ(𝑥𝑖, 𝑦𝑖)) Monadic composition We create a monad to generate any kind of kernel functions Kf, by composing their component g: g1 o g2 o … o gn o h
- 17. A monad extends a functor with binding method (flatMap) The monadic definition of the kernel function component h Monadic composition
- 18. Example of Kernel functions 𝒦 𝐱, 𝐲 = 𝑒 − 1 2 𝐱−𝐲 𝜎 2 h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒 − 1 2𝜎2( 𝑥)2 Polynomial kernel 𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲) 𝑑 h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥) 𝑑 Monadic composition Radius basis function kernel
- 19. The monadic composition consists of chaining the flatMap invocation on the functor, map, that preserves morphisms on kernel functions. Monadic composition The for comprehension closure is a syntactic sugar on the iterative monadic composition.
- 20. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 21. Streams Streams reduce memory consumption by allocating and releasing chunk of data (or slice or time series) while allowing reuse of intermediate results. Some problems lend themselves to process very large data sets of unknown size for which the execution may have to be aborted or re-applied
- 22. The large data set is converted into a stream then broken down into manageable slices. The slices are instantiated, processed (i.e. loss function) and released back to the garbage collector, one at the time X0 X1 ….... Xn ………. Xm Data stream 1 2𝑚 𝑦 𝑛 − 𝑓 𝒘|𝑥 𝑛 2 + 𝜆 𝒘 2 Garbage collector Xi Allocate slice .take Release slice .drop Heap Traversal loss function Streams
- 23. Slices of NOBS observations are allocated one at the time, (take) processed, then released (drop) at the time. Views and streams
- 24. The reference streamRef has to be weak, in order to have the slices garbage collected. Otherwise the memory consumption increases with each new batch of data. (*) Alternatives: define strmRef as a def or use StreamIterator Views and streams
- 25. Comparing list, stream and stream with weak references. Views and streams Operating zone
- 26. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 27. Views Scientific computations require chaining complex data transformations on large data set. There is not always a need to process all elements of the dataset. Scala allows the creation of a view on collections that are the result of a data transformation. The elements are instantiated only once needed.
- 28. Views Accessing an element of the list requires allocating the entire list in memory. Accessing an element of the view requires allocating only this element in memory.
- 29. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 30. Type classes Scala libraries classes cannot always be sub-classed. Wrapping library component in a helper class clutters the design. Type classes extends classes functionality without cluttering name spaces (alternative to type classes) The purpose of reusability goes beyond refactoring code. It includes leveraging existing well understood concepts and semantic.
- 31. Let’s consider the definition of a tensor as being either a vector or a covector. Type classes Let’s extend the concept of tensor with. A metric is computed as the inner product or composition of a Covector and a vector. The computationis implemented by the method Metric.apply
- 32. Type classes The inner object Metric define the implicit conversion
- 33. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 34. Stacked mixins models Scala stacked traits and abstract values preserve the core formalism of mathematical expressions. Traditional programming languages compare unfavorably to scientific related language such as R because their inability to follow a strict mathematical formalism: 1. Variable declaration 2. Model definition 3. Instantiation
- 35. 𝑓 ∈ ℝ 𝑛 → ℝ 𝑛 𝑓 𝑥 = 𝑒 𝑥 𝑔 ∈ ℝ 𝑛 → ℝ ℎ = 𝑔𝑜𝑓 g 𝒙 = 𝑖 𝑥𝑖 Declaration Model Instantiation Stacked mixins models
- 36. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 37. Stacked mixins models Building machine learning apps requires configurable, dynamic workflows Leverage mixins, inheritance and abstract values to create models and weave data transformation. Factory design patterns have been used to model dynamic systems (GoF). Dependency injection has gain popularity for creating configurable systems (i.e. Spring framework).
- 38. Stacked mixins models Multiple models and algorithms are typically evaluated by weaving computation tasks. A learning platform is a framework that • Define computational tasks • Wires the tasks (data flow) • Deploys the tasks (*) Overcome limitation of monadic composition (3 level of dynamic binding…) (*) Actor-based deployment
- 39. Even the simplest workflow, defined as a pipeline of data transformations requires a flexible design … Stacked mixins models
- 40. Stacked mixins models Summary of the 3 configurability layers of Cake pattern 1. Given the objective of the computation, select the best sequence of module/tasks (i.e. Modeling: Preprocessing + Training + Validating) 2. Given the profile of data input, select the best data transformation for each module (i.e. Data preprocessing: Kalman, DFT, Moving average….) 3. Given the computing platform, select the best implementation for each data transformation (i.e. Kalman: KalmanOnAkka, Spark…)
- 41. Implementation of Preprocessing module Stacked mixins models
- 42. Implementation of Preprocessing module using discrete Fourier … and discrete Kalman filter Stacked mixins models
- 43. d d Preprocessing Loading Reducing Training Validating Preprocessor DFTFilter Kalman EM PCA SVM MLP Reducer Supervisor Clustering Clustering workflow = preprocessing task -> Reducing task Modeling workflow = preprocessing task -> model training task -> model validation Modeling Stacked mixins models
- 44. A simple clustering workflow requires a preprocessor & reducer. The computation sequence exec transform a time series of element of type U and return a time series of type W as option Stacked mixins models
- 45. A model is created by processing the original time series of type TS[T] through a preprocessor, a training supervisor and a validator Stacked mixins models
- 46. Putting all together for a conditional path execution … Stacked mixins models
- 47. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 48. Magnet pattern Method overloading in Scala has limitations: • Type erasure in the JVM causes collision of type of arguments in overloaded methods • Overloaded methods cannot be lifted into a function • Code may be unecessary duplicated The magnet pattern overcomes these limitations by encapsulating the return and redefining the overloaded methods as implicit functions.
- 49. Magnet pattern Let’s consider the following three incarnations of the method test These methods have different return types. The first and last methods conflict because of type erasure on T => List[Double]
- 50. Magnet pattern Step 1: Define generic return type and constructor Step 2: Implement the test methods as implicits
- 51. Magnet pattern Step 3: Implement the lifted function test as follows The first call invokes the implicit fromTN and the second triggers the implicit fromT. The return type is inferred from the type of argument
- 52. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 53. View bound Context bound cannot be used to bind the parameterized type of a generic class to a primitive type. Scala view bounds allows to create developers to create class with parameterized types associated to a Scala or Java primitive type.
- 54. View bound Let’s consider a class which parameterized type can be manipulate as a Float. Context bound is not permissible Constraining the type with a upper bound Float does not work as Float is a final class.
- 55. View bound The solution is to bind the class type to a Float using an implicit conversion (or view) The <% directive is the short notation for
- 56. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 57. F-Bound polymorphism is a parametric type polymorphism that constrains the subtypes to themselves using bounds. It is important to write code that catch error at compile time. How can we enforce type integrity in subclasses? F-Bound polymorphism
- 58. F-Bound polymorphism Let’s create a trait that define a discriminative learning model with method to manipulate data. The class Svm and Mlp implements the Discriminative trait. The problem is that nothing prevent to create a class Nnet that impersonates an Svm class.
- 59. F-Bound polymorphism One solution is to restrict (or bound) the type to a Discriminative class It prevents a new class to insert itself into the hierarchy. .. but does not guarantee the type integrity for existing classes
- 60. F-Bound polymorphism The self reference guarantee the integrity of each existing and new subclass. F-Bound polymorphism is a self- referenced bound polymorphism.
- 61. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Data flow control Continuation passing style
- 62. Data flow back pressure A data flow control mechanism handling back pressure on bounded mail boxes of upstream actors. Scala actors provide a reliable way to deploy workflows on a distributed environment. However, some nodes may experience slow processing and create performance bottlenecks.
- 63. Data flow back pressure Actor-based workflow has to consider - Cascading failures => supervision strategy - Cascading bottleneck => Mailbox back-pressure strategy Workers Router, Dispatcher, …
- 64. Messages passing scheme to process various data streams with transformations. Dataset Workers Controller Watcher Load-> Compute-> Bounded mailboxes <- GetStatus Status -> Completed-> Data flow back pressure
- 65. Worker actors processes data chunk msg.xt sent by the controller with the transformation msg.fct Message sent by collector to trigger computation Data flow back pressure
- 66. Watcher actor monitors messages queues report to collector with Status message. GetStatus message sent by the collector has no payload Data flow back pressure
- 67. Controller creates the workers, bounded mailbox for each worker actor (msgQueues) and the watcher actor. Data flow back pressure
- 68. The Controller loads the data sets per chunk upon receiving the message Load from the main program. It processes the results of the computation from the worker (Completed) and throttle the input to workers for each Status message. Data flow back pressure
- 69. The Load message is implemented as a loop that create data chunk which size is adjusted according to the load computed by the watcher and forwarded to the controller, Status Data flow back pressure
- 70. Simple throttle increases/decreases size of the batch of observations given the current load and specified watermark. Data flow back pressure Selecting faster/slower and less/more accurate version of algorithm can also be used in the regulation strategy
- 71. Feedback control loop adjusts the size of the batches given the load in mail boxes and complexity of the computation Data flow back pressure
- 72. • Feedback control loop should be smoothed (moving average, Kalman…) • A larger variety of data flow control actions such as adding more workers, increasing queue capacity, … • The watch dog should handle dead letters, in case of a failure of the feedback control or the workers. • Reactive streams introduced in Akka 2.2+ has a sophisticated TCP-based propagation and back pressure control flows Notes Data flow back pressure
- 73. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
- 74. Delimited continuation Continuation Passing Style (CPS) is a technique that abstracts computation unit as a data structure in order to control the state of a computer program, workflow or sequence of data transformations Continuations are used to ‘jump’ to a method that produces a call to the current method. They can be regarded as ‘functional GOTO’
- 75. Delimited continuation A data transformation (or computation unit) can be extended (continued) with another transformation known as continuation. The continuation is provided as argument of the orginal transformation. Let’s consider the following workflow The first workflow is not a continuation, the second is
- 76. Delimited continuation A delimited continuation is a section of the workflow that is reified into a function returning a value. This technique relies on control delimiters (shift/reset) to make the continuation composable and reusable.
- 77. More Scala nuggets… • Domain specific language • Reactive streams • Back-pressure strategy using connection state Wait a minute, there is more…..

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

A partial function is a function which is defined for only some of it's possible inputs.

A partially applied function is a function of reduced arity that is the result of applying a subset of the required arguments. (e.g. from currying)