ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Leveraging spire for complex time allocation logic
1. Leveraging Spire for complex
time allocation logic
Vladimir Pavkin
Scalar
April 7th, 2018
2. About the speaker
• Software engineer at Evolution Gaming
• Maintainer:
• Moment.js façade for Scala.js: https://github.com/vpavkin/scala-js-momentjs
• DTC (Datetime Type Classes): https://github.com/vpavkin/dtc
• 4 years of Scala & Scala.js
• Last 2 years: internal scheduling system development
2
24. Interval API
• Point membership
• Subset/superset relation of two intervals
• Intersection
• Complement & subtraction (return list of intervals)
• Interval arithmetic
24
26. • Flat sorted array
• Still very fast
Interval Set
IntervalTrie IntervalSeq
• Tree
• Insanely fast
• Requires fast convertibility
to Long
Fast Immutable Interval Sets
Scala World 2017
26
27. Calendar scheduling
27
Task 1:
Find a period,
when Alice, Bob and Charlie
can have a 1 hour meeting.
It must happen
between 9:00 AM and 7:00 PM
28. 28
Calendar scheduling
Task 1:
Find a period,
when Alice, Bob and Charlie
can have a 1 hour meeting.
It must happen
between 9:00 AM and 7:00 PM
29. type Entry= Interval[ZonedDateTime]
type Entries = IntervalSeq[ZonedDateTime]
def duration(e: Entry):Option[Duration] = e.fold {
case (ValueBound(a), ValueBound(b)) => Some(Duration.between(a, b))
case _ => None
}
29
Calendar scheduling
30. val AliceEntries: List[Entry] =List(/*...*/)
val BobEntries: List[Entry] = List(/*...*/)
val CharlieEntries: List[Entry] = List(/*...*/)
val WorkDay: Entry= Interval.closed(datetime(9, 0), datetime(19, 0))
val OneHour: Duration = Duration.ofHours(1L)
30
Calendar scheduling
typeEntry= Interval[ZonedDateTime]
typeEntries= IntervalSeq[ZonedDateTime]
41. • Work is required for specific periods.
• 5 cashiers needed 24/7
• 10 more cashiers needed every day 12:00 – 20:00
• 1 cleaner needed 24/7
• 2 cleaners needed every day 12:00 – 20:00
• People work for specific periods
• 50 cashiers work 2/2, 8 hour shifts
• 7 cleaners work 3/1, 8 hour shifts
41
Resource allocation
42. case class Coverage(requirement: Entries, work: Entries) {
def covered: Entries = requirement & work
def uncovered: Entries = ~work & requirement
def unused: Entries = ~requirement & work
}
42
Resource allocation
45. Your programs must
be able to only construct values,
that are valid in your domain
45
46. • Full control over what kind of intervals can be created
• Readability and ubiquitous language
• Rich models with helper methods (e.g. duration)
• Optimizations for particular scenarios
• Control over serialization
46
Benefits of separate,
domain compliant data structures.
47. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
47
Domain integration concerns
48. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
48
Domain integration concerns
49. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
49
Domain integration concerns
50. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
50
Domain integration concerns
51. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
51
Domain integration concerns
52. • Types of intervals that make sense, e.g. what is (-∞, …) ?
• Interval bounds constraints (edge cases with adjacent periods)
• One good consistent approach is to always use [x, y)-shaped intervals
• Time values precision constraints (seconds, minutes, etc.)
• Can allow you to use a faster IntervalTrie
• Ordering and equality semantics (strict or instant-based)
• Avoid leaking abstraction (delegate to intervals under the hood)
• Total order constraint in the constructors (PR pending)
52
Domain integration concerns
Hello everyone!Really excited to be here and see you all on my talk. It’s called “Leveraging Spire…”.
So the topic of working with time intervals might not be close to every developers heart (though it has certainly become close to mine). Anyway, for those with practical interest, I’ll try to provide some interesting points. And others may see ideas for their area of interest, or be prepared for facing time allocation problems in the future.
… And this last project is the thing I got a lot of exposure and experience with problems I am talking about today.
This slide is credit to the developers who created Spire.
But before we start looking into code I’m going to fill in some gaps. Probably most of you are familiar with the topic, but for completeness I’ll just quickly glance over Interval notation.
We’ll take real number intervals as the most commonly known occurence.
** BULLETS **
Here you see some examples of the notation with a sometimes very useful graphic representation.
Round bracket represents open (or exclusive) bounds – element at the boundary itself is not included (marked with an empty, white point)
Square bracket represents closed (or inclusive) bounds (marked with full black point)
2 special bounds: negative and positive infinity. When both are present we’re talking about the set of all real numbers
Empty interval is an inverse of that.
And there are also degenerate intervals that include only one element.
Important fundamental property of a real interval is that there’s no way to iterate all elements…
… – there’s an infinite number of them. Unless you fix some precision – and then it’s no longer a real number interval.
This is similar to what we can say about time interval – without specified precision we can’t iterate all moments.
Interval set is no more than a set of non-intersecting intervals, which is again – a set of numbers.
So, by intuition, a single interval is at the same time a singleton interval set.
Here are some examples. As you can see, constituents of the interval set are joined with union operation.
Intervals and Interval sets are regular sets, so they support all known set operations. Intersection and union are of the most interest to us.
Who is familiar with set union and intersection?
Another important operation is the complement (or inverse).
It’s a set that contains all elements, that were not included in the original. Here’s an example of complement of an interval set.
Last operation is to subtract one interval set from another. It can be defined as intersection with a complement. Again, it’s not specific for interval sets, it’s a common set operation.
Now that we have refreshed what we know about intervals, let's see what problems they solve.
Major purpose of intervals and interval sets is modelling uncertainty.
One data structure we know that models uncertainty is List: instead of one value you can get a list of different values.
But lists are always iterable, they model discrete uncertainty. Intervals provide means for uncertainty that is continuous, when output value can by anything between two specified boundaries.
Fuzzy calculations is a related application of intervals.
Intervals are also a very natural instrument for modelling complex constraints.
But, we also can use intervals for .…
Scheduling!
And the reason we can do it is that we can use interval sets for any type totally ordered values …
This is the signature of Interval class from Spire.
We can create lawful well-behaving interval sets as long as we have a total order defined.
As it turns out, a context bound in the constructor has its downsides, but we’ll come back to it later.
So… they want order - let’s bring them order then!
One caveat here, is that for ZonedDateTime we can have at least two different lawful total orders.
The first one requires time-zone equality for values to be equal.
The second one is more relaxed and compares only instants.
Both are fine depending on your task. Just a thing to be aware of
This is the signature of Interval class from Spire.
We can create lawful well-behaving interval sets as long as we have a total order defined.
As it turns out, a context bound in the constructor has its downsides, but we’ll come back to it later.
So… they want order - let’s bring them order then!
Now we’re ready to look closer to intervals in Spire
This ADT closely matches the definition – we have all types of bounds covered here:
open, closed, infinity and a special case for empty interval (both bounds will be empty in this case.
Very naturally it consists of lower and upper bound.
The difference here is that…
It doesn’t provide public ADT tree.
Instead there is bunch of smart constructors. And the only API you can work with is this abstract Interval class.
Which is rich enough to cover most cases and provides a fold over bounds when some custom functionality is needed.
Rudiger contributed this feature and among other things has done a fantastic job on optimizing performance.
!!!!!!!!!!!!!!! SKIP
Interval sets come in two flavours.
IntervalTrie is the fastest. Performance doesn’t come for free – it requires that your values are convertible to Long while preserving order. And the conversion should be relatively fast, not to ruin all the gains of the optimized internals.
It’s still quite a realistic requirement, for example most numeric primitives can be used with IntervalTrie. Instants can be used as well, given the millisecond precision is enough.
If you need nanos, of for whatever other reason can’t fit into 64 bytes, IntervalSeq is your choice.
See talk for implementation details, really amazing engineering.
This example is quite simple, but it’s an everyday thing for many people, that’s why I decided to start with it.
Real world challenges us with a much more diverse and complex problems of the same nature.
So let’s begin solving this task with spire intervals.
Just some preparation here – type aliases for conciseness and readability.
Entry is an interval of datetime values, and Entries is an interval set.To provide correct result we also need a notion of interval duration. For now let’s define it in this simple manner: duration will be defined for all bounded intervals.
Any questions to this slide so far?
Here we just define the data to work with.
In real world entries will come from some storage.
Work day is the interval we work in (datetime is just a helper to concisely create a zoned date time value on our specific date)
OneHour is a cached duration for filtering results.
So all the preparation is done, we can now solve the task. Solution fits on one slide.
First we unify calendar entries of all participants in a single interval set. This OR pipe is union operation.
So this set marks all points in time when at least one person is occupied.
And the complement of that will be the set of moments when all people are free! Now, this complement is unbounded, and here’s where our WorkDay comes in.
By intersecting this FreeTime set with work day interval we get what we want –intervals where all three participants are free.
Last two lines do the filtering, so that result only contains intervals at least 1 hour in duration.
.intervals materializes interval set to a list of intervals, so that we can filter them by duration.
Here’s the result for the data from original calendar, the same that we have come up with manually
Here for each employee we build an interval set of all moments when they are occupied. So we get a list of interval sets.
To solve the task we need to find the common subset for all of them. This can be done by incrementally intersecting all of them together. & is intersection operation.
The initial aggregate to the fold is the the set of all time. It’s similar to how we start with Int.Max when folding a list of numbers to find the minimum.
| and & work polymorphically with IntervalSeqs and Intervals
So if someone noticed a pattern here – well done! Let’s take a closer look at these code samples we seen here
Couple details here.
This is common property of sets in general and not specific to interval sets. It just helps to rediscover these things again in such cases.To be exactly precise, sets form distributive lattice, which in turn provides two semilattices: join & meet.
For union monoid the identity element is empty interval.
For intersection monoid identity element is the set of all values.
Existence of these lawful instances give us a lot of proven logic for free. This is quite nice to have.
So we took a wonderful FP purists break, now back to our problems…
There are lots of problems to solve here: minimize overtime, minimize idle time, compliance with local legislations and so on.
One of two is usually flexible, which allows to pursue the best match possible in current circumstances.
Manageable by hand when up to 100 people. Several thousands require machine help.
Problems and solutions vary significantly from case to case, so it’s hard to provide a simple meaningful example. In general, interval sets prove to be invaluably helpful in these scenarios.
I’ll just show a small building block that can help in such problems.
So this is one instance of work resource allocation. For example this can represent one 24h cashier requirement and people assigned to cover this requirement.
Almost for free we get all the attributes of this particular allocation – covered, uncovered and idle (unused) work time.
There’s another dimension of real-world application of Spire and we should look at this as well. I’m talking about integration of all these goodies with your particular business domain.
I think it’s very good that people are paying much more attention to relation between application code and business domain.
The recent rise in popularity of DDD shows that people see value in ubiquitous language and other domain driven development tools.
I tend to agree that it allows to create simpler and more maintainable and systems.
That’s why I raise this topic here. A huge argument against using intervals directly in your domain code is that …
Intervals and interval sets are too broad.
Most domains don’t need the whole timeline with negative and positive infinity. Actually, many would do pretty well with just bounded value intervals on hand.
When you’re able to instantiate something you’re not going to ever use, you’re basically creating problems for yourself. Even putting naming problems aside for a while, …
This is very important. If you are not a fan of throwing exceptions every now and then (which I hope most of you are not)… then you should seriously consider separate domain compliant data structures.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.
So when you’re integrating spire intervals in your application or library, it helps to think about following concerns.