Category theory, Monads, and Duality in the world of (BIG) Data

10,248 views

Published on

Bart De Smet's ECOOP 2011 Keynote talk.

Published in: Technology
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,248
On SlideShare
0
From Embeds
0
Number of Embeds
3,494
Actions
Shares
0
Downloads
0
Comments
0
Likes
17
Embeds 0
No embeds

No notes for slide
  • Speaker tips:So far, we’ve seen specific operators to create new observable sequencesLike preprogrammed (parameterized) implementations of the IObservable<T> interfaceSometimes we just want to implement the interfaceCan be simplified using Create (general pattern in Rx)Is really the Subscribe method as a lambdaSlide omits what the lambda should return, an Action delegate that’s used to create the IDisposable that’s returned…There’s also CreateWithDisposableWe chose to omit this from the slide (and be imprecise) to focus on the flow of data hereTypically, an observable creates concurrency upon subscription in order to send out the messagesMention this but refer till later, where we mention ISchedulerAlso notice the use of a Subscribe extension methodAgain… this shows how to mimic anonymous interface implementations that C# lacks
  • Speaker tips:Assume we’re in the debuggerSet a breakpoint on the onNext lambda bodyStart the program using F10Now let’s see what Subscribe will do…
  • Speaker tips:Slide is just here for animation of F10Press F10 againSubscribe will new up an IObserver<int> objectThis will get passed to the Create method’s lambda parameter as the “observer” parameter
  • Speaker tips:Emphasize the asynchronous nature of SubscribeMain thread has moved on beyond the asynchronous Subscribe callStill assume the body of create has introduced concurrencyi.e. calls to OnNext and OnCompleted are scheduled to happen in the backgroundWe’ll let the debugger go using F5 to see our breakpoint getting hit
  • Speaker tips:Bang – the breakpoint got hit!Notice where the main thread sits, indicated in grayThough we’re blocking that thread in the sample, it could be doing other useful work……while the observable notifies us about data being availableThis shouldn’t be new to the audienceE.g. when using += (o, e) => {…} to set up an event handlerSyntactical location of a breakpoint can belong to a whole different thread compared to code close-by!
  • Speaker tips:Big message:Primitive constructor operators are great, but we’d like to do something of more interest…Rx doesn’t aim at replacing existing sources of asynchrony in the frameworkInstead we can bridge with those worldsFirst common source of asynchrony are .NET eventsSuffer from some problems:Nobody thinks of a mouse as a database of pointsMouse database is not “preprogrammed” (like: “give me a mouse that can move once across the screen”) but is an infinite sourceHave to dot into the EventArgs object to obtain the data (sometimes it isn’t even there!)Events cannot be grabbedNo objects that can be passed to a method, stored in a field, put in an array, etc.How’d you write a GPS visualizer that expects to get passed an event producing points? Can’t pass a .NET event along!Composition suffersEveryone has to write logic in event handlers, e.g. an if to filterCan’t hire a mathematician to write a generic filter that works with all eventsAlso, we’d like a filtered event still to be an event (stay in the same “world”)  we have to settle for procedural code todayResource maintenance requires stateHave to remember what you gave to += in order to get rid of it using -=  use a field?Same C# code passed to -= won’t work (there is no value equality between delegates based on what code they contain)Notice resource management gets even harder in the face of compositionSay that applying a hypothetical generic filter to an event gives you a new event “object”Now if you unhook a handler from the filtered event, you want it to unhook from the original event Cascading effect with lots of state maintenance!
  • Speaker tips:How does Rx improve on this?FromEvent methods here reflective overload being usedNotice: omits a few things…Generic parameter for EventArgsFact it returns an IObservable<IEvent<…>>  correct this in the demoRationale: focus on the essence hereComparison to before:Look at the type to see the (no longer hidden) data source  source of Point values (thanks to generics)Objects a la IObservable<Point> can be passed, e.g. to our GPS visualizerJust like LINQ to Objects does, we can define operators on objects  compositionality enters the pictureResource maintenance can be done using a “subscription handle”Yes, you still need to store it somewhere, but you don’t need to remember what you gave itThe old world is like subscribing to a magazine but keeping your hands on the check so you can take it back!In the new world you get an unsubscription card (the Dispose method) you can send in to unsubscribe…Notice state maintenance for unsubscription can be encapsulated now tooE.g. Merge operator that merges n number of observable sequences into oneSubscription causes a subscribe on all of the sequencesUnsubscribe should unsubscribe from all of the underlying sequencesNeeds a list of IDisposable objects  we have an algebra over IDisposable in System.DisposablesCan hide those from the outside world!
  • Category theory, Monads, and Duality in the world of (BIG) Data

    1. 1. Category theory, Monads, and Duality in the world of (BIG) Data<br />Bart J.F. De Smet<br />bartde@microsoft.com<br />Cloud Programmability Team<br />
    2. 2. What’s in a name?<br />Cloud Programmability Team<br />Logical role:<br />Research-oriented team<br />Collaboration with MSR<br />Physical placement:<br />Oasis within the product team<br />Close to the SQL Server business<br />(Dual) 80/20 rule of success<br />Portfolio<br />Language Integrated Query (LINQ)<br />XML literals in Visual Basic 9<br />Reactive Extensions (Rx)<br />Various undisclosed projects<br />Democratizing the cloud<br />
    3. 3. Take One<br />Democratizing data access with LINQ<br />
    4. 4. A quick reminder on LINQ<br />Solving the impedance mismatch between objects and data through querying.<br />
    5. 5. Back to the future<br />5+ years ago<br />Censored<br />
    6. 6. Democratizing data access<br />varres = from p inctx.Products<br />wherep.UnitPrice > 100<br />group p byp.Categoryinto g<br />selectnew { Category = g.Key, Avg = g.Average() };<br />“Lost in translation”<br />varres = ctx.Products<br /> .Where(p => p.UnitPrice > 100)<br /> .GroupBy(p => p.Category)<br /> .Select(g => new { Category = g.Key, Avg = g.Average() });<br />(In-memory) iterators<br />Query providers<br />
    7. 7. C# 3.0 compilation to C# 2.0<br />
    8. 8. Language Integrated Monads<br />IEnumerable<T><br />IQueryable<T><br />IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source,Func<T, IEnumerable<R>> selector)<br />SelectMany<br />
    9. 9. Maybe baby!<br />Billion<br />Null-propagating dot<br />string s = name?.ToUpper();<br />Syntactic sugar<br />name.SelectMany( _ => _.ToUpper(),<br /> s => s)<br />from _ in name<br />from s in _.ToUpper()<br />select s<br />Compiler<br />Can useextension method<br />
    10. 10. Closing the loop<br />LINQ to Haskell <br />
    11. 11. Take two<br />Democratizing event processing with Rx<br />
    12. 12. Once upon a time…<br />Tier splitting<br />“LINQ to Events”<br />
    13. 13. Reactive Extensions (Rx)<br />GPS<br />RSS feeds<br />Stock tickers<br />Social<br />media<br />UI events<br />Server management<br />
    14. 14. Pull-based data access<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext();<br /> T Current { get; }<br />void Reset();<br />}<br />You could get stuck<br />
    15. 15. Duality in the world around us(Or… the Dutch are cheap)<br />Electricity: inductor and capacitor<br />Logic: De Morgan’s Law<br />Programming?<br />¬𝐴∨𝐵≡¬𝐴∧¬𝐵<br /> <br />¬𝐴∧𝐵≡¬𝐴∨¬𝐵<br /> <br />
    16. 16. Duality as the secret sauce?Give me a recipe<br />http://en.wikipedia.org/wiki/Dual_(category_theory)<br />Reversing arrows…Input becomes output and vice versa<br />Making a U-turnin synchrony<br />
    17. 17. Distilling the essenceProperties and unchecked exceptions<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext();<br /> T Current { get; }<br />}<br />
    18. 18. Distilling the essenceProperties and unchecked exceptions<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext() throwsException;<br /> T GetCurrent();<br />}<br />
    19. 19. Distilling the essenceEmbracing a (more) functional style<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext() throwsException;<br /> T GetCurrent();<br />}<br />
    20. 20. Distilling the essenceEmbracing a (more) functional style<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br /> (void | T | Exception) MoveNext();<br />}<br />() -> (() -> (void | T | Exception))<br />
    21. 21. Flipping the arrowsPurely mechanical transformation<br />() -> (() -> (void | T | Exception))<br />((void | T | Exception) -> ()) -> ()<br />
    22. 22. Harvesting the resultSo far for abstract nonsense<br />interfaceIBar<out T><br />{<br />voidQux(IFoo<T> foo);}<br />interfaceIFoo<in T><br />{<br />voidWibble();<br /> void Wobble(T value);<br /> voidWubble(Exception error);<br />}<br />
    23. 23. Harvesting the resultThe observer pattern in disguise<br />interfaceIObservable<out T><br />{<br />void Subscribe(IObserver<T> observer);}<br />interfaceIObserver<in T><br />{<br />voidOnCompleted();<br /> voidOnNext(T value);<br /> voidOnError(Exception error);<br />}<br />
    24. 24. The observer pattern revisited<br />Stateful!<br />
    25. 25. Interface hierarchy<br />interfaceIObservable<out T><br />{<br />IDisposableSubscribe(IObserver<T> observer);}<br />
    26. 26. Message grammar<br />OnNext(42)<br />OnNext(43)<br />OnCompleted<br />source1<br />OnNext(“Hello”)<br />OnError(error)<br />source2<br />OnNext* [OnError | OnCompleted]<br />
    27. 27. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />C# doesn’t have anonymous interface implementation, so we provide various extension methods that take lambdas.<br />C# 4.0 named parameter syntax<br />
    28. 28. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br />// Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F10<br />
    29. 29. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br />// Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F10<br />
    30. 30. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F5<br />
    31. 31. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />Breakpoint got hit<br />
    32. 32. Iterators dualized<br />IObservable<int> GetXs() {<br />returnObservable.Create(o =><br />for(int i = 0;<br /> i < 10;<br /> i++)<br />o.OnNext(i * i);<br />o.OnCompleted();<br /> );<br />}<br />GetXs().Subscribe(x => {<br />Console.WriteLine(x);<br />});<br />IEnumerable<int> GetXs() { <br />for (int i = 0;<br /> i < 10;<br /> i++)<br />yieldreturni * i;<br />yield break;<br />}<br />foreach(var x inGetXs()) {<br />Console.WriteLine(x);<br />}<br />Synchronous<br />Asynchronous<br />
    33. 33. Compositionality matters<br />IObservable<T>Merge<T>(thisIObservable<T> left,<br />IObservable<T> right)<br />{<br />return Create<T>(observer => {<br />// Ignoring a few details for OnCompleted<br />var d1 = left.Subscribe(observer);<br />var d2 = right.Subscribe(observer);<br />returnnewCompositeDisposable(d1, d2);<br /> });<br />}<br />Lazy evaluation<br />
    34. 34. Bridging Rx with the WorldWhy .NET events aren’t first-class…<br />Hidden data source<br />How to pass around?<br />form1.MouseMove+= (sender, args) => {<br />if(args.Location.X==args.Location.Y)<br />// I’d like to raise another event<br />};<br />form1.MouseMove -=/* what goes here? */<br />Lack of composition<br />Resource maintenance?<br />
    35. 35.
    36. 36. Bridging Rx with the World…but observable sequences are first-class<br />Source of Point values<br />Objects can be passed<br />IObservable<Point>mouseMoves= Observable.FromEvent(frm, "MouseMove");<br />varfiltered = mouseMoves<br />.Where(pos => pos.X == pos.Y);<br />varsubscription = filtered.Subscribe(…);<br />subscription.Dispose();<br />Can define operators<br />Resource maintenance!<br />
    37. 37. Composition and QueryingIt’s the continuation monad!<br />// IObservable<string> from TextChanged events<br />varchanged = Observable.FromEvent(txt, "TextChanged");<br />var input = (from e in changed<br />let text = ((TextBox)e.Sender).Text<br />wheretext.Length >= 3<br />select text)<br /> .DistinctUntilChanged()<br /> .Throttle(TimeSpan.FromSeconds(1));<br />// Bridge with the dictionary web service<br />var svc = newDictServiceSoapClient();var lookup = Observable.FromAsyncPattern<string, DictionaryWord[]> (svc.BeginLookup, svc.EndLookup);<br />// Compose both sources using SelectMany<br />var res = from term in input<br />from words in lookup(term)<br />select words;<br />input.SelectMany(term => lookup(term))<br />
    38. 38. Introducing schedulers<br />How to be asynchronous?<br />Different ways to Introduce of concurrency<br />Parameterization by schedulers<br />interfaceIScheduler<br />{<br />DateTimeOffset Now { get; }<br />IDisposableSchedule<T>(<br /> T state,<br />Func<IScheduler, T, IDisposable> f);<br />// Overloads for time-based scheduling<br />}<br />
    39. 39. Example: creation operators<br />static classObservable<br />{<br />static IObservable<T> Return<T>(T value,<br />IScheduler scheduler)<br /> {<br />return Create<T>(observer =><br /> {<br />var state = new { value, observer };<br />returnscheduler.Schedule(state, (self, s) =><br /> {<br />s.observer.OnNext(s.value);<br />s.observer.OnCompleted();<br /> });<br /> });<br /> }<br />}<br />Resource<br />mgmt<br />Avoiding closures<br />(serialization)<br />
    40. 40. Operational layering<br />
    41. 41. IQbservable<T><br />LINQ to Twitter<br />How?<br />ToQbservable<br />Translatable<br />(Expression trees)<br />IQueryable<T><br />LINQ to SQL<br />ToQueryable<br />LINQ to *.*<br />AsObservable<br />Homo-iconic<br />AsEnumerable<br />AsQbservable<br />AsQueryable<br />ToObservable<br />IEnumerable<T><br />LINQ to Objects<br />IObservable<T><br />LINQ to Events<br />Fixed<br />(MSIL)<br />ToEnumerable<br />Pull(interactive)<br />Push<br />(reactive)<br />What?<br />Duality<br />Concurrency(IScheduler)<br />Where?<br />Message loops<br />Distributed<br />Worker pools<br />Threads<br />
    42. 42. Take THREE<br />Democratizing cloud data processing with CoSQL<br />
    43. 43. NoSQL is CoSQL!<br />
    44. 44. The NoSQL trend<br />
    45. 45. Object graphs<br />var_1579124585 = newProduct{<br /> Title = “The Right Stuff”, Author = “Tom Wolfe”,<br /> Year = 1979, Pages = 304,<br /> Keywords = new[] { “Book”, “Hardcover”, “American” },<br /> Ratings = new[] { “****”, “4 stars” },<br />};<br />var Products = new[] { _1579124585 };<br />
    46. 46. Queries over object graphs<br />varq = from product in Products<br />whereproduct.Ratings.Any(rating => rating == “****”)<br />selectnew { product.Title, product.Keywords };<br />
    47. 47. The O/R paradox<br />Objects<br />Fully compositional<br /> value ::= scalar<br /> new {…, name = value, … }<br />Tables<br />Non compositional<br /> value ::= new {…, name = scalar, … }<br />
    48. 48. Relational (de)normalization<br />
    49. 49. Queries over tables<br />var q = from product in Products<br />fromrating in Ratings<br />whereproduct.ID == rating.ProductId<br /> && rating == “****”<br />fromkeyword in Keywords<br />whereproduct.ID == keyword.ProductID<br />selectnew { product.Title, keyword.Keyword };<br />varq = from product inProducts<br />joinrating in Ratings<br />onproduct.ID equalsrating.ProductId<br />whererating == “****”<br />selectproduct intoFourStarProducts<br />fromfourstarproductinFourStarProducts<br />joinkeyword in Keywords<br />onproduct.ID equalskeyword.ProductID<br />selectnew { product.Title, keyword.Keyword };<br />
    50. 50. Welcome to O/R voodoo<br />varq = from product in Products<br />whereproduct.Ratings.Any(rating => rating == “****”)<br />selectnew { product.Title, product.Keywords };<br />
    51. 51. What did we gain?<br />Ad-hoc queries?<br />But what about scale…<br />The relational Gods invented indexes<br />Going against the PK-FK flow…<br />from p1 in WWW<br />from p2 in WWW<br />where p2.Contains(p1.URL)<br />selectnew { p1, p2 };<br />
    52. 52. Job security?<br />
    53. 53. Spot the difference<br />
    54. 54. Duality to the rescue again?<br />
    55. 55. Consequences of duality<br />
    56. 56. More work in the area<br />
    57. 57. Thank you!<br />Bart J.F. De Smet<br />bartde@microsoft.com<br />Cloud Programmability Team<br />

    ×