Ziggrid
Processing Data in Near Real-Time Using Couchbase
Christopher Tse (Head of R&D, McGraw-HIll Education)
Gareth Powe...
@ McGraw-Hill Education
+
Research & Development
@ McGraw-Hill Education
+
Research & Development
Leveraging EmberJS, a
JavaScript MVC framework to
rethink the teaching and
learning experiences on the
Web and on mobile d...
Collecting and analyzing
multiple streams of student
engagement, performance,
and demographics for
dashboards.
Data
FACT
D...
Action
Collections
EdSense: Real-time Reactions
Learning Style
Engagement
User Intents
Recommendations
ReactionActivity Lo...
Action
Collections
EdSense: Real-time Reactions
Learning Style
Engagement
User Intents
Recommendations
Reaction
Activity L...
Learning Portal
• Designed and built as a
collaboration between MHE
Labs and Couchbase
• Serves as proof-of-concept
and te...
SQL Hi!
SQL ETL
SQL
Some-sort-of
query language
ETL
To extract, transform
and load in steps
We mean:
So we can: Declaratively express the
...
SQL ETL
SQL ETL
Logic
SQL ETL
Logic Steps
SQL ETL
Logic Steps
Fresh Data
SQL ETL
Logic Steps
Fresh Data Fast Access
SQLETL
Logic Stepsin
Fresh Data Fast Access&
FRP
Logic Stepsin
Fresh Data Fast Access&
Introducing
Functional Reactive Programming
FRP
Functional reactive programming (FRP) is a programming paradigm for reactive
programming using the building blocks of func...
Functional reactive programming (FRP) is a programming paradigm for reactive
programming using the building blocks of func...
Hint
Excel is FRP
Excel is FRP
Functional
Every cell is either is a value or a f(x) that
generates a value
Excel is FRP
Functional
Reactive
Every cell is either is a value or a f(x) that
generates a value
If you change one cell, ...
Excel is FRP
Functional
Reactive
Every cell is either is a value or a f(x) that
generates a value
If you change one cell, ...
Excel is FRP
Functional
Reactive
Programming
Every cell is either is a value or a f(x) that
generates a value
If you chang...
Start with a simple sum()
Adding numbers within one worksheet
Excel is FRP
Start with a simple sum()
Add more tabs
Adding numbers within one worksheet
To reflect higher level aggregates
Excel is FRP
Start with a simple sum()
Add more tabs
Draw fancy graphs
Adding numbers within one worksheet
To reflect higher level aggre...
The world runs on Excel. :)
The world runs on Excel. :
)
What if...
Cells inside Sheets Documents in JSONData Model:
Calculating: When you open the file
Visualization: Supported ch...
What if...
Cells inside Sheets Documents in JSONData Model:
Calculating: When you open the file
Visualization: Supported ch...
Ziggrid is FRP
f(x) f(x)
f(x)
Ziggrid is FRP
Stores values in JSON
Specifies f(x) in JSON
Inside a Couchbase cluster
Also builds a depende...
f(x) f(x)
f(x)
Ziggrid is FRP
Stores values in JSON
Specifies f(x) in JSON
Inside a Couchbase cluster
Also builds a depende...
Ziggrid is FRP
Stores values in JSON
Specifies f(x) in JSON
Push data out via JSON
Inside a Couchbase cluster
Also builds a...
Ziggrid is FRP
Stores values in JSON
Specifies f(x) in JSON
Push data out via JSON
Inside a Couchbase cluster
Also builds a...
Layers of the Ziggurat
Raw Events
Enhanced Events
Summaries
Rankings
Correlations
Snapshots
Composites
Gareth Powell, Ph. D.
Functional Programming Expert
Wrote doctorate thesis on Haskell
Gareth Powell, Ph. D.
Functional Programming Expert
Wrote doctorate thesis on Haskell
Baseball Fanatic
Example: Baseball Data Analysis Model
Raw Events
Enhanced Events
Summaries
Rankings
Correlations
Snapshots
Composites
Plat...
LIVE DEMO
Beane Counter Architecture
HTML5 Data Tables and SVG Visualization
Ember.js + D3.js via WebSockets
MiddlewareFront-end
Mod...
Ziggrid
Models
• Data model described in
JSON structure
{
"name": "plateAppearance",
"fields": [
{
"name": "team", // The ...
{
"enhanced": "situation",
"from": "plateAppearance",
"enhance": {
"player": "player",
"season": "season",
"dayOfYear": "d...
{
"composeInto": "profile",
"from": "correlate_on_situation_groupedBy_player_and_season",
"key": [
"player/",
{ "field": "...
https://github.com/Ziniki-Network/Ziggrid
Ziggrid is 100%
Open Source
Let’s work together!
Future Improvements
Using Couchbase
View Engine to do more
of the processing in the
database via Incremental
MapReduce. Cu...
Hadoop
Hadoop
Big Data
Hadoop
Big DataBut Slow
Zebras
for
Thanks to 2 members of the Ember.js Core Team
Who helped us design and code the sexy Ember + D3.js + WebSockets front-end
...
Questions?
@christse
Follow me on Twitter
McGraw Hill Couchbase SF 2013
McGraw Hill Couchbase SF 2013
McGraw Hill Couchbase SF 2013
Upcoming SlideShare
Loading in...5
×

McGraw Hill Couchbase SF 2013

1,838

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,838
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

McGraw Hill Couchbase SF 2013

  1. 1. Ziggrid Processing Data in Near Real-Time Using Couchbase Christopher Tse (Head of R&D, McGraw-HIll Education) Gareth Powell, Ph. D. (Chief Scientist, Ziniki Network)CouchConf SF 2013 - Sep 13, 2013
  2. 2. @ McGraw-Hill Education + Research & Development
  3. 3. @ McGraw-Hill Education + Research & Development
  4. 4. Leveraging EmberJS, a JavaScript MVC framework to rethink the teaching and learning experiences on the Web and on mobile devices HTML
  5. 5. Collecting and analyzing multiple streams of student engagement, performance, and demographics for dashboards. Data FACT Dimension DimensionDimension Dimension Dimension
  6. 6. Action Collections EdSense: Real-time Reactions Learning Style Engagement User Intents Recommendations ReactionActivity Log Previously Achievements Efficacy
  7. 7. Action Collections EdSense: Real-time Reactions Learning Style Engagement User Intents Recommendations Reaction Activity Log Previously Achievements Efficacy
  8. 8. Learning Portal • Designed and built as a collaboration between MHE Labs and Couchbase • Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration • Available for download and further development as open source code http://github.com/couchbaselabs/learningportal  Unveiled during CouchConf SF 2012
  9. 9. SQL Hi!
  10. 10. SQL ETL
  11. 11. SQL Some-sort-of query language ETL To extract, transform and load in steps We mean: So we can: Declaratively express the logic for the machine to calculate and process But: Processing complex, multi- layered queries upon request can be slow Store the results from the intermediate or final steps of our calculations Stored data gets out-of- sync with reality. And refresh is often expensive When we say:
  12. 12. SQL ETL
  13. 13. SQL ETL Logic
  14. 14. SQL ETL Logic Steps
  15. 15. SQL ETL Logic Steps Fresh Data
  16. 16. SQL ETL Logic Steps Fresh Data Fast Access
  17. 17. SQLETL Logic Stepsin Fresh Data Fast Access&
  18. 18. FRP Logic Stepsin Fresh Data Fast Access&
  19. 19. Introducing Functional Reactive Programming FRP
  20. 20. Functional reactive programming (FRP) is a programming paradigm for reactive programming using the building blocks of functional programming. The key traits of FRP are: • The concept of "behaviors" or "signals" which model values that vary over continuous time. • The concept of "events" which have occurrences at finitely many points in time. • A means to change the FRP system in response to events, generally termed "switching". • The separation of evaluation details such as sampling rate from the reactive model. An additional common but contentious trait is a notion of consistency when ordering events (not just within one stream). Variants include synchrony and glitch freedom. The semantic model of FRP in side-effect free languages is typically in terms of continuous functions, and typically over time. In contrast, integration with a host language that has side- effects is typically given in terms of data flow or dependency graphs by extending the typical operational semantics to manipulate and use them. WTF is FRP?
  21. 21. Functional reactive programming (FRP) is a programming paradigm for reactive programming using the building blocks of functional programming. The key traits of FRP are: • The concept of "behaviors" or "signals" which model values that vary over continuous time. • The concept of "events" which have occurrences at finitely many points in time. • A means to change the FRP system in response to events, generally termed "switching". • The separation of evaluation details such as sampling rate from the reactive model. An additional common but contentious trait is a notion of consistency when ordering events (not just within one stream). Variants include synchrony and glitch freedom. The semantic model of FRP in side-effect free languages is typically in terms of continuous functions, and typically over time. In contrast, integration with a host language that has side- effects is typically given in terms of data flow or dependency graphs by extending the typical operational semantics to manipulate and use them. TL;DR WTF is FRP?
  22. 22. Hint
  23. 23. Excel is FRP
  24. 24. Excel is FRP Functional Every cell is either is a value or a f(x) that generates a value
  25. 25. Excel is FRP Functional Reactive Every cell is either is a value or a f(x) that generates a value If you change one cell, all the other cells that refer to it changes immediately
  26. 26. Excel is FRP Functional Reactive Every cell is either is a value or a f(x) that generates a value If you change one cell, all the other cells that refer to it changes immediately
  27. 27. Excel is FRP Functional Reactive Programming Every cell is either is a value or a f(x) that generates a value If you change one cell, all the other cells that refer to it changes immediately Yes, you are programming when you are create a model in an Excel spreadsheet
  28. 28. Start with a simple sum() Adding numbers within one worksheet Excel is FRP
  29. 29. Start with a simple sum() Add more tabs Adding numbers within one worksheet To reflect higher level aggregates Excel is FRP
  30. 30. Start with a simple sum() Add more tabs Draw fancy graphs Adding numbers within one worksheet To reflect higher level aggregates That visualizes the valuable aggregates Excel is FRP
  31. 31. The world runs on Excel. :)
  32. 32. The world runs on Excel. : )
  33. 33. What if... Cells inside Sheets Documents in JSONData Model: Calculating: When you open the file Visualization: Supported chart types All the time in the cloud Anything drawable in HTML5 Instead of... We have... =SUM(A1:B10) function Sum() { ... }Language:
  34. 34. What if... Cells inside Sheets Documents in JSONData Model: Calculating: When you open the file Visualization: Supported chart types All the time in the cloud Anything drawable in HTML5 Instead of... We have... =SUM(A1:B10) function Sum() { ... }Language:
  35. 35. Ziggrid is FRP
  36. 36. f(x) f(x) f(x) Ziggrid is FRP Stores values in JSON Specifies f(x) in JSON Inside a Couchbase cluster Also builds a dependency graph
  37. 37. f(x) f(x) f(x) Ziggrid is FRP Stores values in JSON Specifies f(x) in JSON Inside a Couchbase cluster Also builds a dependency graph Push data out via JSON So clients can render data in HTML5, etc
  38. 38. Ziggrid is FRP Stores values in JSON Specifies f(x) in JSON Push data out via JSON Inside a Couchbase cluster Also builds a dependency graph So clients can render data in HTML5, etc f(x) f(x) f(x) “The Ziggurat”
  39. 39. Ziggrid is FRP Stores values in JSON Specifies f(x) in JSON Push data out via JSON Inside a Couchbase cluster Also builds a dependency graph So clients can render data in HTML5, etc f(x) f(x) f(x) “The Ziggurat” JS N
  40. 40. Layers of the Ziggurat Raw Events Enhanced Events Summaries Rankings Correlations Snapshots Composites
  41. 41. Gareth Powell, Ph. D. Functional Programming Expert Wrote doctorate thesis on Haskell
  42. 42. Gareth Powell, Ph. D. Functional Programming Expert Wrote doctorate thesis on Haskell Baseball Fanatic
  43. 43. Example: Baseball Data Analysis Model Raw Events Enhanced Events Summaries Rankings Correlations Snapshots Composites Plate AppearancesPlayer SituationOutcome Player Totals Correlate vs Situation Snapshots of Player Totals Player Profile Snapshots of Correlation Game Results Leaderboards (HR, AVG, PROD) Win / Loss Record
  44. 44. LIVE DEMO
  45. 45. Beane Counter Architecture HTML5 Data Tables and SVG Visualization Ember.js + D3.js via WebSockets MiddlewareFront-end Model Description, Calculation, and Event Chaining Java via Memcached Protocol Backend Raw and Aggregated Data Storage and Indexing Couchbase JSON Store + Incremental MapReduce
  46. 46. Ziggrid Models • Data model described in JSON structure { "name": "plateAppearance", "fields": [ { "name": "team", // The team identifier from the Retrosheet Event file "type": "string", "key": true }, { "name": "player", // The player identifier from the Retrosheet Event file "type": "string", "key": true }, { "name": "season", // Year represented as YYYY "type": "string", "key": true }, { "name": "dayOfYear", // 1-365, proxy for which game it was "type": "number", "key": true }, { "name": "inning", // 1-9 for regular innings "type": "number", "key": true }, ... } JS N
  47. 47. { "enhanced": "situation", "from": "plateAppearance", "enhance": { "player": "player", "season": "season", "dayOfYear": "dayOfYear", "atbat": { "op": "+", "args": [{ "op": "*", "args": [ 3, "inning" ] }, "outs", -3 ] }, "bases": "bases", "lead": { "op": "group", "value": { "op": "ifelse", "test": "home", "true": { "op": "-", "lhs": "homeScore", "rhs": "awayScore" }, "false": { "op": "-", "lhs": "awayScore", "rhs": "homeScore" } }, "dividers": [ -3, -1, 0, 2 ], // (-inf, -3], (-3, -1], (-1, 0], (0, 2], "moreThan": 3 // (2,inf) }, Ziggrid Algorithms • Data model described in JSON structure • Define all calculation via communative and associative operators JS N
  48. 48. { "composeInto": "profile", "from": "correlate_on_situation_groupedBy_player_and_season", "key": [ "player/", { "field": "player" } ], "fields": { "clutchness": "correlation" } }, { "leaderboard": "hotness", "from": "snapshot_playerSeasonToDate", "groupby": [ [ "season", "dayOfYear" ] ], "sortby": [ "average" ], "order": "desc", "values": [ "player" ] }, { "composeInto": "profile", "from": "snapshot_playerSeasonToDate", "key": [ "player/", { "field": "player" } ], "fields": { "hotness": "average" } } ... ] Ziggrid Composites • Data model described in JSON structure • Define all calculation via communative and associative operators • Projecting data via composite definition JS N
  49. 49. https://github.com/Ziniki-Network/Ziggrid Ziggrid is 100% Open Source Let’s work together!
  50. 50. Future Improvements Using Couchbase View Engine to do more of the processing in the database via Incremental MapReduce. Currently, only the leaderboards are computed using views. GREATER SCALABILITY Expand the functions support by Ziggrid to perform transformation, statistical calculations typical of Big Data analysis, and even ones for machine learning. Allow in-browser development of new models using a subset of data. We need to finish developing a pure JavaScript-based Ziggrid processing engine. Using UPR protocol to be notified of changes in inside Couchbase to allow more immediate, and thus more real-time propagation of events up the Ziggurat. EASIER MODEL DEVELOPMENT REDUCED LATENCY DEEPER ANALYTICS
  51. 51. Hadoop
  52. 52. Hadoop Big Data
  53. 53. Hadoop Big DataBut Slow
  54. 54. Zebras for
  55. 55. Thanks to 2 members of the Ember.js Core Team Who helped us design and code the sexy Ember + D3.js + WebSockets front-end @machty @stefanpenner
  56. 56. Questions? @christse Follow me on Twitter
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×