Cascalog at Hadoop Summit
Upcoming SlideShare
Loading in...5
×
 

Cascalog at Hadoop Summit

on

  • 3,485 views

My presentation about Cascalog at Hadoop Summit 2010.

My presentation about Cascalog at Hadoop Summit 2010.

Statistics

Views

Total Views
3,485
Views on SlideShare
3,484
Embed Views
1

Actions

Likes
5
Downloads
36
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This is the Title slide. <br /> Please use the name of the presentation that was used in the abstract submission. <br />
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time. <br />
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time. <br />
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time. <br />
  • <br />
  • - UDFs, custom duct tape for registering and finding dependencies, separate files <br /> - separate files, testing?, error handling <br /> - things that you didn&#x2019;t think were possible become idiomatic. compose queries, parameterize, pass queries and operations around <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • This is the final slide; generally for questions at the end of the talk. <br /> Please post your contact information here. <br />
  • <br />

Cascalog at Hadoop Summit Cascalog at Hadoop Summit Presentation Transcript

  • Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
  • What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • Accidental complexity in existing tools Pig The query language is different than the programming language Hive
  • When query tool is separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • Cascalog Domain Specific Language in Clojure for processing data using Hadoop
  • Cascalog Full power of a general purpose programming language available at all times
  • Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
  • Demo time!
  • Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
  • When query tool is separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
  • Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • Questions? Twitter: @nathanmarz Email: nathan.marz@gmail.com
  • More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries