• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
 

Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010

on

  • 3,087 views

Hadoop Summit 2010 - Developers Track

Hadoop Summit 2010 - Developers Track
Cascalog: an Interactive Query Language for Hadoop
Nathan Marz, BackType

Statistics

Views

Total Views
3,087
Views on SlideShare
3,087
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010 Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010 Presentation Transcript

    • Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
    • What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
    • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
    • Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
    • Accidental complexity in existing tools Pig The query language is different than the programming language Hive
    • When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
    • Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
    • Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
    • Cascalog Domain Specific Language in Clojure for processing data using Hadoop
    • Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
    • Demo time!
    • Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
    • When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
    • Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
    • Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
    • Questions? Twitter: @nathanmarz Email: nathan.marz@gmail.com
    • More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries