Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Cascalog at Hadoop Day

on

  • 8,847 views

My talk about Cascalog at Hadoop Day in Seattle.

My talk about Cascalog at Hadoop Day in Seattle.

Statistics

Views

Total Views
8,847
Views on SlideShare
3,898
Embed Views
4,949

Actions

Likes
8
Downloads
74
Comments
0

3 Embeds 4,949

http://tech.backtype.com 4944
http://translate.googleusercontent.com 4
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Cascalog at Hadoop Day Cascalog at Hadoop Day Presentation Transcript

  • Cascalog Data processing on Hadoop without the hassle Nathan Marz BackType @nathanmarz
  • What is Cascalog? Cascalog Variables and logic Abstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  • Cascalog’s components Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • Clojure • General purpose programming language • Dialect of Lisp that compiles to Java bytecode
  • Clojure • “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • Cascalog basics The “age” dataset
  • Cascalog basics
  • Cascalog basics Define and execute a query
  • Cascalog basics Where to emit results Define and execute a query
  • Cascalog basics Where to emit results Output variables Define and execute a query
  • Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variables Define and execute a query
  • Predicates
  • Predicates Input fields
  • Predicates Input fields Output fields
  • Predicates Fields can be constants or variables
  • Predicates Fields can be constants or variables Variables are prefixed with ? or !
  • Predicates
  • Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples
  • Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples All variables must be “ground” by a generator
  • Example #1 Generator Filter
  • Example #2 Generator Function
  • Example #3 Generator Aggregator Filter
  • Join example
  • Join example Triggers a join
  • Demo time!
  • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • Accidental complexity • Distinct query languages cause accidental complexity • Example: SQL injection
  • Query language • We want: • Ability to abstract • Ability to compose
  • Abstraction Clojure function that returns a subquery
  • Abstraction Defining and using custom operation
  • Composability Dynamic query with parameterized operation
  • Composability “Predicate macro”
  • Composability expands to Using a predicate macro
  • Sneak peek Optimized join + join pushdown
  • Sneak peek Query Plan • Compute old-people • Push join with old-people into the follows-count subquery (map-only join) • Compute follows-count • Compute final query
  • Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing-cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • BackType is hiring Think Cascalog’s cool? Come build amazing software at BackType. http://www.backtype.com/jobs
  • Questions? Follow me on Twitter at @nathanmarz nathan.marz@gmail.com