Clojure at BackTypeHow we learned to stop worrying and love the               parentheses                                 ...
BackTypeData Services (APIs)Social Media Analytics      Dashboard
APIs• Conversational graph for url• Comment search• #Tweets / URL• Influence scores• Top sites• Trending links stream• etc.
URL Profiles
Site comparisons
Influencer Profiles
Twitter Account   Analytics
Topic Analysis
Topic Analysis
BackType’s Challenges
BackType’s Challenges Complex analytics
BackType’s Challenges Complex analyticson lots of data (> 30TB)
BackType’s Challenges Complex analyticson lots of data (> 30TB)      in realtime
Clojure at BackType• Cascalog• ElephantDB• Storm
Let’s build an app
Let’s build an app
Cascalog               Cascalog   Variables and logicAbstraction              Cascading   Tuples, data workflows           ...
Cascalog basics The “age” dataset
Cascalog basics
Cascalog basicsDefine andexecute a query
Cascalog basics        Where to        emit resultsDefine andexecute a query
Cascalog basics        Where to        emit results                   Output variablesDefine andexecute a query
Cascalog basics        Where to                      “Predicates”: constrain        emit results                  the outp...
Predicates
PredicatesInput fields
PredicatesInput fields   Output fields
PredicatesFields can be constants or variables
PredicatesFields can be constants or variables Variables are prefixed with ? or !
Predicates
Predicates• Functions• Filters• Aggregators• Generators: finite sources of tuples
Example #1    Generator   Filter
Example #2Generator        Function
Example #3Generator   Aggregator   Filter
Join example
Join example     Triggers a join
Join example
Join exampleJoins are an implementation detail
Cascalog demo!
Composability “Predicate macro”
Composability       expands toUsing a predicate macro
Contrast to PigPig’s AVG is 300 lines of code
Let’s build an app
Graph Schema                              Reshare: trueGender: female                                      Property       ...
ElephantDB                                   Shard 0                                   Shard 1                            ...
ElephantDBDFS                       ElephantDB                             ServerShard 0Shard 1Shard 2                   E...
StormStream Processing Distributed RPC
Stream processing• Automatically distributes computation• Horizontally scalable• Fault-tolerant• Guarantees processing of ...
Stream processing                         DBQueue                         DB                         DB         Storm clus...
Raw data   What is a query?                          View
Tweets   What is a query?                       # Tweets for                          a URL
Tweets   What is a query?                        Influence                       Score for a                         person
Raw data   Computing a query              Fully precompute view   DB   Query
Raw data   Computing a query              Do a live compute from scratch   Query
Computing a query                                 DBRaw data           Precompute subviews        Compute query from      ...
Distributed RPCApplication                                                           Queue              “I want to know X,...
Distributed RPC              DBsQueue                     App queries          Storm cluster
(BackType is hiring)
Questions?
Clojure at BackType
Upcoming SlideShare
Loading in...5
×

Clojure at BackType

13,086

Published on

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

0 Comments
33 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
13,086
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
162
Comments
0
Likes
33
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Clojure at BackType

    1. 1. Clojure at BackTypeHow we learned to stop worrying and love the parentheses Nathan Marz BackType @nathanmarz
    2. 2. BackTypeData Services (APIs)Social Media Analytics Dashboard
    3. 3. APIs• Conversational graph for url• Comment search• #Tweets / URL• Influence scores• Top sites• Trending links stream• etc.
    4. 4. URL Profiles
    5. 5. Site comparisons
    6. 6. Influencer Profiles
    7. 7. Twitter Account Analytics
    8. 8. Topic Analysis
    9. 9. Topic Analysis
    10. 10. BackType’s Challenges
    11. 11. BackType’s Challenges Complex analytics
    12. 12. BackType’s Challenges Complex analyticson lots of data (> 30TB)
    13. 13. BackType’s Challenges Complex analyticson lots of data (> 30TB) in realtime
    14. 14. Clojure at BackType• Cascalog• ElephantDB• Storm
    15. 15. Let’s build an app
    16. 16. Let’s build an app
    17. 17. Cascalog Cascalog Variables and logicAbstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
    18. 18. Cascalog basics The “age” dataset
    19. 19. Cascalog basics
    20. 20. Cascalog basicsDefine andexecute a query
    21. 21. Cascalog basics Where to emit resultsDefine andexecute a query
    22. 22. Cascalog basics Where to emit results Output variablesDefine andexecute a query
    23. 23. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variablesDefine andexecute a query
    24. 24. Predicates
    25. 25. PredicatesInput fields
    26. 26. PredicatesInput fields Output fields
    27. 27. PredicatesFields can be constants or variables
    28. 28. PredicatesFields can be constants or variables Variables are prefixed with ? or !
    29. 29. Predicates
    30. 30. Predicates• Functions• Filters• Aggregators• Generators: finite sources of tuples
    31. 31. Example #1 Generator Filter
    32. 32. Example #2Generator Function
    33. 33. Example #3Generator Aggregator Filter
    34. 34. Join example
    35. 35. Join example Triggers a join
    36. 36. Join example
    37. 37. Join exampleJoins are an implementation detail
    38. 38. Cascalog demo!
    39. 39. Composability “Predicate macro”
    40. 40. Composability expands toUsing a predicate macro
    41. 41. Contrast to PigPig’s AVG is 300 lines of code
    42. 42. Let’s build an app
    43. 43. Graph Schema Reshare: trueGender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
    44. 44. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
    45. 45. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data
    46. 46. StormStream Processing Distributed RPC
    47. 47. Stream processing• Automatically distributes computation• Horizontally scalable• Fault-tolerant• Guarantees processing of messages
    48. 48. Stream processing DBQueue DB DB Storm cluster
    49. 49. Raw data What is a query? View
    50. 50. Tweets What is a query? # Tweets for a URL
    51. 51. Tweets What is a query? Influence Score for a person
    52. 52. Raw data Computing a query Fully precompute view DB Query
    53. 53. Raw data Computing a query Do a live compute from scratch Query
    54. 54. Computing a query DBRaw data Precompute subviews Compute query from DB Query intermediate dbs DB
    55. 55. Distributed RPCApplication Queue “I want to know X, and return the results to me at Y”
    56. 56. Distributed RPC DBsQueue App queries Storm cluster
    57. 57. (BackType is hiring)
    58. 58. Questions?
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×