• Save
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Upcoming SlideShare
Loading in...5
×
 

Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010

on

  • 3,144 views

Hadoop Summit 2010 - Developers Track

Hadoop Summit 2010 - Developers Track
Cascalog: an Interactive Query Language for Hadoop
Nathan Marz, BackType

Statistics

Views

Total Views
3,144
Views on SlideShare
3,144
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010 Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010 Presentation Transcript

  • Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
  • What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • Accidental complexity in existing tools Pig The query language is different than the programming language Hive
  • When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • Cascalog Domain Specific Language in Clojure for processing data using Hadoop
  • Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
  • Demo time!
  • Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
  • When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
  • Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • Questions? Twitter: @nathanmarz Email: nathan.marz@gmail.com
  • More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries