Clojure at BackType

15,321 views
14,891 views

Published on

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

0 Comments
34 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
15,321
On SlideShare
0
From Embeds
0
Number of Embeds
56
Actions
Shares
0
Downloads
171
Comments
0
Likes
34
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Clojure at BackType

    1. Clojure at BackTypeHow we learned to stop worrying and love the parentheses Nathan Marz BackType @nathanmarz
    2. BackTypeData Services (APIs)Social Media Analytics Dashboard
    3. APIs• Conversational graph for url• Comment search• #Tweets / URL• Influence scores• Top sites• Trending links stream• etc.
    4. URL Profiles
    5. Site comparisons
    6. Influencer Profiles
    7. Twitter Account Analytics
    8. Topic Analysis
    9. Topic Analysis
    10. BackType’s Challenges
    11. BackType’s Challenges Complex analytics
    12. BackType’s Challenges Complex analyticson lots of data (> 30TB)
    13. BackType’s Challenges Complex analyticson lots of data (> 30TB) in realtime
    14. Clojure at BackType• Cascalog• ElephantDB• Storm
    15. Let’s build an app
    16. Let’s build an app
    17. Cascalog Cascalog Variables and logicAbstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
    18. Cascalog basics The “age” dataset
    19. Cascalog basics
    20. Cascalog basicsDefine andexecute a query
    21. Cascalog basics Where to emit resultsDefine andexecute a query
    22. Cascalog basics Where to emit results Output variablesDefine andexecute a query
    23. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variablesDefine andexecute a query
    24. Predicates
    25. PredicatesInput fields
    26. PredicatesInput fields Output fields
    27. PredicatesFields can be constants or variables
    28. PredicatesFields can be constants or variables Variables are prefixed with ? or !
    29. Predicates
    30. Predicates• Functions• Filters• Aggregators• Generators: finite sources of tuples
    31. Example #1 Generator Filter
    32. Example #2Generator Function
    33. Example #3Generator Aggregator Filter
    34. Join example
    35. Join example Triggers a join
    36. Join example
    37. Join exampleJoins are an implementation detail
    38. Cascalog demo!
    39. Composability “Predicate macro”
    40. Composability expands toUsing a predicate macro
    41. Contrast to PigPig’s AVG is 300 lines of code
    42. Let’s build an app
    43. Graph Schema Reshare: trueGender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
    44. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
    45. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data
    46. StormStream Processing Distributed RPC
    47. Stream processing• Automatically distributes computation• Horizontally scalable• Fault-tolerant• Guarantees processing of messages
    48. Stream processing DBQueue DB DB Storm cluster
    49. Raw data What is a query? View
    50. Tweets What is a query? # Tweets for a URL
    51. Tweets What is a query? Influence Score for a person
    52. Raw data Computing a query Fully precompute view DB Query
    53. Raw data Computing a query Do a live compute from scratch Query
    54. Computing a query DBRaw data Precompute subviews Compute query from DB Query intermediate dbs DB
    55. Distributed RPCApplication Queue “I want to know X, and return the results to me at Y”
    56. Distributed RPC DBsQueue App queries Storm cluster
    57. (BackType is hiring)
    58. Questions?

    ×