Cascalog at Strange Loop
Upcoming SlideShare
Loading in...5
×
 

Cascalog at Strange Loop

on

  • 4,027 views

Presentation of Cascalog at Strange Loop on October 15th, 2010.

Presentation of Cascalog at Strange Loop on October 15th, 2010.

http://github.com/nathanmarz/cascalog

Statistics

Views

Total Views
4,027
Slideshare-icon Views on SlideShare
3,907
Embed Views
120

Actions

Likes
10
Downloads
46
Comments
0

3 Embeds 120

http://lanyrd.com 63
http://nosql.mypopescu.com 53
http://optimus.keyevent.com:8080 4

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Cascalog at Strange Loop Cascalog at Strange Loop Presentation Transcript

  • Cascalog Data processing on Hadoop without the hassle Nathan Marz BackType @nathanmarz
  • What is Cascalog? Cascalog Variables and logic Abstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  • Cascalog’s components Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • Clojure • General purpose programming language • Dialect of Lisp that compiles to Java bytecode
  • Clojure • “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • Cascalog basics The “age” dataset
  • Cascalog basics
  • Cascalog basics Define and execute a query
  • Cascalog basics Where to emit results Define and execute a query
  • Cascalog basics Where to emit results Output variables Define and execute a query
  • Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variables Define and execute a query
  • Predicates
  • Predicates Input fields
  • Predicates Input fields Output fields
  • Predicates Fields can be constants or variables
  • Predicates Fields can be constants or variables Variables are prefixed with ? or !
  • Predicates
  • Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples
  • Example #1 Generator Filter
  • Example #2 Generator Function
  • Example #3 Generator Aggregator Filter
  • Join example
  • Join example Triggers a join
  • Join example
  • Join example Joins are an implementation detail
  • Demo time!
  • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • Accidental complexity • Distinct query languages cause accidental complexity • Example: SQL injection
  • Query language • We want: • Ability to abstract • Ability to compose
  • Abstraction Clojure function that returns a subquery
  • Abstraction Defining and using custom operation
  • Composability Dynamic query with parameterized operation
  • Composability “Predicate macro”
  • Composability expands to Using a predicate macro
  • Contrast to Pig “Average” is 300 lines of code in Pig
  • Optimized aggregators in Cascalog Implementation of count and sum
  • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • Composability Value normalization example #1
  • Composability Value normalization example #2
  • Composability For each id: select value with the biggest timestamp Value normalization algorithm
  • Composability Implementing value normalization
  • Composability Using value normalization
  • Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing-cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • BackType is hiring Think Cascalog’s cool? Come build amazing software at BackType. http://www.backtype.com/jobs
  • Questions? Follow me on Twitter at @nathanmarz nathan.marz@gmail.com