Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clojure at BackType

20,774 views

Published on

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

  • Be the first to comment

Clojure at BackType

  1. Clojure at BackTypeHow we learned to stop worrying and love the parentheses Nathan Marz BackType @nathanmarz
  2. BackTypeData Services (APIs)Social Media Analytics Dashboard
  3. APIs• Conversational graph for url• Comment search• #Tweets / URL• Influence scores• Top sites• Trending links stream• etc.
  4. URL Profiles
  5. Site comparisons
  6. Influencer Profiles
  7. Twitter Account Analytics
  8. Topic Analysis
  9. Topic Analysis
  10. BackType’s Challenges
  11. BackType’s Challenges Complex analytics
  12. BackType’s Challenges Complex analyticson lots of data (> 30TB)
  13. BackType’s Challenges Complex analyticson lots of data (> 30TB) in realtime
  14. Clojure at BackType• Cascalog• ElephantDB• Storm
  15. Let’s build an app
  16. Let’s build an app
  17. Cascalog Cascalog Variables and logicAbstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  18. Cascalog basics The “age” dataset
  19. Cascalog basics
  20. Cascalog basicsDefine andexecute a query
  21. Cascalog basics Where to emit resultsDefine andexecute a query
  22. Cascalog basics Where to emit results Output variablesDefine andexecute a query
  23. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variablesDefine andexecute a query
  24. Predicates
  25. PredicatesInput fields
  26. PredicatesInput fields Output fields
  27. PredicatesFields can be constants or variables
  28. PredicatesFields can be constants or variables Variables are prefixed with ? or !
  29. Predicates
  30. Predicates• Functions• Filters• Aggregators• Generators: finite sources of tuples
  31. Example #1 Generator Filter
  32. Example #2Generator Function
  33. Example #3Generator Aggregator Filter
  34. Join example
  35. Join example Triggers a join
  36. Join example
  37. Join exampleJoins are an implementation detail
  38. Cascalog demo!
  39. Composability “Predicate macro”
  40. Composability expands toUsing a predicate macro
  41. Contrast to PigPig’s AVG is 300 lines of code
  42. Let’s build an app
  43. Graph Schema Reshare: trueGender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
  44. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  45. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data
  46. StormStream Processing Distributed RPC
  47. Stream processing• Automatically distributes computation• Horizontally scalable• Fault-tolerant• Guarantees processing of messages
  48. Stream processing DBQueue DB DB Storm cluster
  49. Raw data What is a query? View
  50. Tweets What is a query? # Tweets for a URL
  51. Tweets What is a query? Influence Score for a person
  52. Raw data Computing a query Fully precompute view DB Query
  53. Raw data Computing a query Do a live compute from scratch Query
  54. Computing a query DBRaw data Precompute subviews Compute query from DB Query intermediate dbs DB
  55. Distributed RPCApplication Queue “I want to know X, and return the results to me at Y”
  56. Distributed RPC DBsQueue App queries Storm cluster
  57. (BackType is hiring)
  58. Questions?

×