Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Category theory, Monads, and Duality in the world of (BIG) Data

12,502 views

Published on

Bart De Smet's ECOOP 2011 Keynote talk.

Published in: Technology
  • Be the first to comment

Category theory, Monads, and Duality in the world of (BIG) Data

  1. 1. Category theory, Monads, and Duality in the world of (BIG) Data<br />Bart J.F. De Smet<br />bartde@microsoft.com<br />Cloud Programmability Team<br />
  2. 2. What’s in a name?<br />Cloud Programmability Team<br />Logical role:<br />Research-oriented team<br />Collaboration with MSR<br />Physical placement:<br />Oasis within the product team<br />Close to the SQL Server business<br />(Dual) 80/20 rule of success<br />Portfolio<br />Language Integrated Query (LINQ)<br />XML literals in Visual Basic 9<br />Reactive Extensions (Rx)<br />Various undisclosed projects<br />Democratizing the cloud<br />
  3. 3. Take One<br />Democratizing data access with LINQ<br />
  4. 4. A quick reminder on LINQ<br />Solving the impedance mismatch between objects and data through querying.<br />
  5. 5. Back to the future<br />5+ years ago<br />Censored<br />
  6. 6. Democratizing data access<br />varres = from p inctx.Products<br />wherep.UnitPrice > 100<br />group p byp.Categoryinto g<br />selectnew { Category = g.Key, Avg = g.Average() };<br />“Lost in translation”<br />varres = ctx.Products<br /> .Where(p => p.UnitPrice > 100)<br /> .GroupBy(p => p.Category)<br /> .Select(g => new { Category = g.Key, Avg = g.Average() });<br />(In-memory) iterators<br />Query providers<br />
  7. 7. C# 3.0 compilation to C# 2.0<br />
  8. 8. Language Integrated Monads<br />IEnumerable<T><br />IQueryable<T><br />IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source,Func<T, IEnumerable<R>> selector)<br />SelectMany<br />
  9. 9. Maybe baby!<br />Billion<br />Null-propagating dot<br />string s = name?.ToUpper();<br />Syntactic sugar<br />name.SelectMany( _ => _.ToUpper(),<br /> s => s)<br />from _ in name<br />from s in _.ToUpper()<br />select s<br />Compiler<br />Can useextension method<br />
  10. 10. Closing the loop<br />LINQ to Haskell <br />
  11. 11. Take two<br />Democratizing event processing with Rx<br />
  12. 12. Once upon a time…<br />Tier splitting<br />“LINQ to Events”<br />
  13. 13. Reactive Extensions (Rx)<br />GPS<br />RSS feeds<br />Stock tickers<br />Social<br />media<br />UI events<br />Server management<br />
  14. 14. Pull-based data access<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext();<br /> T Current { get; }<br />void Reset();<br />}<br />You could get stuck<br />
  15. 15. Duality in the world around us(Or… the Dutch are cheap)<br />Electricity: inductor and capacitor<br />Logic: De Morgan’s Law<br />Programming?<br />¬𝐴∨𝐵≡¬𝐴∧¬𝐵<br /> <br />¬𝐴∧𝐵≡¬𝐴∨¬𝐵<br /> <br />
  16. 16. Duality as the secret sauce?Give me a recipe<br />http://en.wikipedia.org/wiki/Dual_(category_theory)<br />Reversing arrows…Input becomes output and vice versa<br />Making a U-turnin synchrony<br />
  17. 17. Distilling the essenceProperties and unchecked exceptions<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext();<br /> T Current { get; }<br />}<br />
  18. 18. Distilling the essenceProperties and unchecked exceptions<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext() throwsException;<br /> T GetCurrent();<br />}<br />
  19. 19. Distilling the essenceEmbracing a (more) functional style<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br />boolMoveNext() throwsException;<br /> T GetCurrent();<br />}<br />
  20. 20. Distilling the essenceEmbracing a (more) functional style<br />interfaceIEnumerable<out T><br />{<br />IEnumerator<T> GetEnumerator();}<br />interfaceIEnumerator<out T> : IDisposable<br />{<br /> (void | T | Exception) MoveNext();<br />}<br />() -> (() -> (void | T | Exception))<br />
  21. 21. Flipping the arrowsPurely mechanical transformation<br />() -> (() -> (void | T | Exception))<br />((void | T | Exception) -> ()) -> ()<br />
  22. 22. Harvesting the resultSo far for abstract nonsense<br />interfaceIBar<out T><br />{<br />voidQux(IFoo<T> foo);}<br />interfaceIFoo<in T><br />{<br />voidWibble();<br /> void Wobble(T value);<br /> voidWubble(Exception error);<br />}<br />
  23. 23. Harvesting the resultThe observer pattern in disguise<br />interfaceIObservable<out T><br />{<br />void Subscribe(IObserver<T> observer);}<br />interfaceIObserver<in T><br />{<br />voidOnCompleted();<br /> voidOnNext(T value);<br /> voidOnError(Exception error);<br />}<br />
  24. 24. The observer pattern revisited<br />Stateful!<br />
  25. 25. Interface hierarchy<br />interfaceIObservable<out T><br />{<br />IDisposableSubscribe(IObserver<T> observer);}<br />
  26. 26. Message grammar<br />OnNext(42)<br />OnNext(43)<br />OnCompleted<br />source1<br />OnNext(“Hello”)<br />OnError(error)<br />source2<br />OnNext* [OnError | OnCompleted]<br />
  27. 27. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />C# doesn’t have anonymous interface implementation, so we provide various extension methods that take lambdas.<br />C# 4.0 named parameter syntax<br />
  28. 28. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br />// Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F10<br />
  29. 29. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br />// Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F10<br />
  30. 30. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />F5<br />
  31. 31. Observable.Create<T> operator<br />IObservable<int> o = Observable.Create<int>(observer => {<br /> // Assume we introduce concurrency (see later)…<br />observer.OnNext(42);<br />observer.OnCompleted();<br />return () => { /* unsubscribe action */ };<br />});<br />IDisposable subscription = o.Subscribe(<br />onNext: x => { Console.WriteLine("Next: " + x); },<br />onError: ex => { Console.WriteLine("Oops: " + ex); },<br />onCompleted: () => { Console.WriteLine("Done"); }<br />);<br />Thread.Sleep(30000); // Main thread is blocked…<br />Breakpoint got hit<br />
  32. 32. Iterators dualized<br />IObservable<int> GetXs() {<br />returnObservable.Create(o =><br />for(int i = 0;<br /> i < 10;<br /> i++)<br />o.OnNext(i * i);<br />o.OnCompleted();<br /> );<br />}<br />GetXs().Subscribe(x => {<br />Console.WriteLine(x);<br />});<br />IEnumerable<int> GetXs() { <br />for (int i = 0;<br /> i < 10;<br /> i++)<br />yieldreturni * i;<br />yield break;<br />}<br />foreach(var x inGetXs()) {<br />Console.WriteLine(x);<br />}<br />Synchronous<br />Asynchronous<br />
  33. 33. Compositionality matters<br />IObservable<T>Merge<T>(thisIObservable<T> left,<br />IObservable<T> right)<br />{<br />return Create<T>(observer => {<br />// Ignoring a few details for OnCompleted<br />var d1 = left.Subscribe(observer);<br />var d2 = right.Subscribe(observer);<br />returnnewCompositeDisposable(d1, d2);<br /> });<br />}<br />Lazy evaluation<br />
  34. 34. Bridging Rx with the WorldWhy .NET events aren’t first-class…<br />Hidden data source<br />How to pass around?<br />form1.MouseMove+= (sender, args) => {<br />if(args.Location.X==args.Location.Y)<br />// I’d like to raise another event<br />};<br />form1.MouseMove -=/* what goes here? */<br />Lack of composition<br />Resource maintenance?<br />
  35. 35.
  36. 36. Bridging Rx with the World…but observable sequences are first-class<br />Source of Point values<br />Objects can be passed<br />IObservable<Point>mouseMoves= Observable.FromEvent(frm, "MouseMove");<br />varfiltered = mouseMoves<br />.Where(pos => pos.X == pos.Y);<br />varsubscription = filtered.Subscribe(…);<br />subscription.Dispose();<br />Can define operators<br />Resource maintenance!<br />
  37. 37. Composition and QueryingIt’s the continuation monad!<br />// IObservable<string> from TextChanged events<br />varchanged = Observable.FromEvent(txt, "TextChanged");<br />var input = (from e in changed<br />let text = ((TextBox)e.Sender).Text<br />wheretext.Length >= 3<br />select text)<br /> .DistinctUntilChanged()<br /> .Throttle(TimeSpan.FromSeconds(1));<br />// Bridge with the dictionary web service<br />var svc = newDictServiceSoapClient();var lookup = Observable.FromAsyncPattern<string, DictionaryWord[]> (svc.BeginLookup, svc.EndLookup);<br />// Compose both sources using SelectMany<br />var res = from term in input<br />from words in lookup(term)<br />select words;<br />input.SelectMany(term => lookup(term))<br />
  38. 38. Introducing schedulers<br />How to be asynchronous?<br />Different ways to Introduce of concurrency<br />Parameterization by schedulers<br />interfaceIScheduler<br />{<br />DateTimeOffset Now { get; }<br />IDisposableSchedule<T>(<br /> T state,<br />Func<IScheduler, T, IDisposable> f);<br />// Overloads for time-based scheduling<br />}<br />
  39. 39. Example: creation operators<br />static classObservable<br />{<br />static IObservable<T> Return<T>(T value,<br />IScheduler scheduler)<br /> {<br />return Create<T>(observer =><br /> {<br />var state = new { value, observer };<br />returnscheduler.Schedule(state, (self, s) =><br /> {<br />s.observer.OnNext(s.value);<br />s.observer.OnCompleted();<br /> });<br /> });<br /> }<br />}<br />Resource<br />mgmt<br />Avoiding closures<br />(serialization)<br />
  40. 40. Operational layering<br />
  41. 41. IQbservable<T><br />LINQ to Twitter<br />How?<br />ToQbservable<br />Translatable<br />(Expression trees)<br />IQueryable<T><br />LINQ to SQL<br />ToQueryable<br />LINQ to *.*<br />AsObservable<br />Homo-iconic<br />AsEnumerable<br />AsQbservable<br />AsQueryable<br />ToObservable<br />IEnumerable<T><br />LINQ to Objects<br />IObservable<T><br />LINQ to Events<br />Fixed<br />(MSIL)<br />ToEnumerable<br />Pull(interactive)<br />Push<br />(reactive)<br />What?<br />Duality<br />Concurrency(IScheduler)<br />Where?<br />Message loops<br />Distributed<br />Worker pools<br />Threads<br />
  42. 42. Take THREE<br />Democratizing cloud data processing with CoSQL<br />
  43. 43. NoSQL is CoSQL!<br />
  44. 44. The NoSQL trend<br />
  45. 45. Object graphs<br />var_1579124585 = newProduct{<br /> Title = “The Right Stuff”, Author = “Tom Wolfe”,<br /> Year = 1979, Pages = 304,<br /> Keywords = new[] { “Book”, “Hardcover”, “American” },<br /> Ratings = new[] { “****”, “4 stars” },<br />};<br />var Products = new[] { _1579124585 };<br />
  46. 46. Queries over object graphs<br />varq = from product in Products<br />whereproduct.Ratings.Any(rating => rating == “****”)<br />selectnew { product.Title, product.Keywords };<br />
  47. 47. The O/R paradox<br />Objects<br />Fully compositional<br /> value ::= scalar<br /> new {…, name = value, … }<br />Tables<br />Non compositional<br /> value ::= new {…, name = scalar, … }<br />
  48. 48. Relational (de)normalization<br />
  49. 49. Queries over tables<br />var q = from product in Products<br />fromrating in Ratings<br />whereproduct.ID == rating.ProductId<br /> && rating == “****”<br />fromkeyword in Keywords<br />whereproduct.ID == keyword.ProductID<br />selectnew { product.Title, keyword.Keyword };<br />varq = from product inProducts<br />joinrating in Ratings<br />onproduct.ID equalsrating.ProductId<br />whererating == “****”<br />selectproduct intoFourStarProducts<br />fromfourstarproductinFourStarProducts<br />joinkeyword in Keywords<br />onproduct.ID equalskeyword.ProductID<br />selectnew { product.Title, keyword.Keyword };<br />
  50. 50. Welcome to O/R voodoo<br />varq = from product in Products<br />whereproduct.Ratings.Any(rating => rating == “****”)<br />selectnew { product.Title, product.Keywords };<br />
  51. 51. What did we gain?<br />Ad-hoc queries?<br />But what about scale…<br />The relational Gods invented indexes<br />Going against the PK-FK flow…<br />from p1 in WWW<br />from p2 in WWW<br />where p2.Contains(p1.URL)<br />selectnew { p1, p2 };<br />
  52. 52. Job security?<br />
  53. 53. Spot the difference<br />
  54. 54. Duality to the rescue again?<br />
  55. 55. Consequences of duality<br />
  56. 56. More work in the area<br />
  57. 57. Thank you!<br />Bart J.F. De Smet<br />bartde@microsoft.com<br />Cloud Programmability Team<br />

×