Your SlideShare is downloading. ×
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cascalog: an Interactive Query Language for Hadoop__HadoopSummit2010

2,243

Published on

Hadoop Summit 2010 - Developers Track …

Hadoop Summit 2010 - Developers Track
Cascalog: an Interactive Query Language for Hadoop
Nathan Marz, BackType

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,243
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
  • 2. What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • 3. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • 4. Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • 5. Accidental complexity in existing tools Pig The query language is different than the programming language Hive
  • 6. When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • 7. Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • 8. Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • 9. Cascalog Domain Specific Language in Clojure for processing data using Hadoop
  • 10. Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
  • 11. Demo time!
  • 12. Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
  • 13. When query tool separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • 14. Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
  • 15. Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • 16. Questions? Twitter: @nathanmarz Email: nathan.marz@gmail.com
  • 17. More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries

×