Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010

2,300
-1

Published on

Hadoop Summit 2010 - Developers Track
Cascalog: an Interactive Query Language for Hadoop
Nathan Marz, BackType

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,300
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010

  1. 1. Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
  2. 2. What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  3. 3. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  4. 4. Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  5. 5. Accidental complexity in existing tools Pig The query language is different than the programming language Hive
  6. 6. When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  7. 7. Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  8. 8. Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  9. 9. Cascalog Domain Specific Language in Clojure for processing data using Hadoop
  10. 10. Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
  11. 11. Demo time!
  12. 12. Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
  13. 13. When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  14. 14. Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
  15. 15. Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  16. 16. Questions? Twitter: @nathanmarz Email: nathan.marz@gmail.com
  17. 17. More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries

×