Clojure at BackType
Upcoming SlideShare
Loading in...5
×
 

Clojure at BackType

on

  • 13,457 views

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:...

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

Statistics

Views

Total Views
13,457
Views on SlideShare
13,390
Embed Views
67

Actions

Likes
30
Downloads
154
Comments
0

7 Embeds 67

http://paper.li 36
https://twitter.com 20
http://twitter.com 5
http://www.onlydoo.com 2
http://www.mefeedia.com 2
http://www.mongodb.org 1
http://a0.twimg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Clojure at BackType Clojure at BackType Presentation Transcript

  • Clojure at BackTypeHow we learned to stop worrying and love the parentheses Nathan Marz BackType @nathanmarz
  • BackTypeData Services (APIs)Social Media Analytics Dashboard
  • APIs• Conversational graph for url• Comment search• #Tweets / URL• Influence scores• Top sites• Trending links stream• etc.
  • URL Profiles
  • Site comparisons
  • Influencer Profiles
  • Twitter Account Analytics
  • Topic Analysis
  • Topic Analysis
  • BackType’s Challenges
  • BackType’s Challenges Complex analytics
  • BackType’s Challenges Complex analyticson lots of data (> 30TB)
  • BackType’s Challenges Complex analyticson lots of data (> 30TB) in realtime
  • Clojure at BackType• Cascalog• ElephantDB• Storm
  • Let’s build an app
  • Let’s build an app
  • Cascalog Cascalog Variables and logicAbstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  • Cascalog basics The “age” dataset
  • Cascalog basics
  • Cascalog basicsDefine andexecute a query
  • Cascalog basics Where to emit resultsDefine andexecute a query
  • Cascalog basics Where to emit results Output variablesDefine andexecute a query
  • Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variablesDefine andexecute a query
  • Predicates
  • PredicatesInput fields
  • PredicatesInput fields Output fields
  • PredicatesFields can be constants or variables
  • PredicatesFields can be constants or variables Variables are prefixed with ? or !
  • Predicates
  • Predicates• Functions• Filters• Aggregators• Generators: finite sources of tuples
  • Example #1 Generator Filter
  • Example #2Generator Function
  • Example #3Generator Aggregator Filter
  • Join example
  • Join example Triggers a join
  • Join example
  • Join exampleJoins are an implementation detail
  • Cascalog demo!
  • Composability “Predicate macro”
  • Composability expands toUsing a predicate macro
  • Contrast to PigPig’s AVG is 300 lines of code
  • Let’s build an app
  • Graph Schema Reshare: trueGender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
  • ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  • ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data
  • StormStream Processing Distributed RPC
  • Stream processing• Automatically distributes computation• Horizontally scalable• Fault-tolerant• Guarantees processing of messages
  • Stream processing DBQueue DB DB Storm cluster
  • Raw data What is a query? View
  • Tweets What is a query? # Tweets for a URL
  • Tweets What is a query? Influence Score for a person
  • Raw data Computing a query Fully precompute view DB Query
  • Raw data Computing a query Do a live compute from scratch Query
  • Computing a query DBRaw data Precompute subviews Compute query from DB Query intermediate dbs DB
  • Distributed RPCApplication Queue “I want to know X, and return the results to me at Y”
  • Distributed RPC DBsQueue App queries Storm cluster
  • (BackType is hiring)
  • Questions?