• Like
Cascalog at May Bay Area Hadoop User Group
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Cascalog at May Bay Area Hadoop User Group

  • 2,160 views
Published

Presentation about Cascalog, a Clojure-based query language for Hadoop.

Presentation about Cascalog, a Clojure-based query language for Hadoop.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,160
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
16
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide




























Transcript

  • 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
  • 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
  • 3. Presentation Over view 1) High level introduction to Cascalog 2) Demo 3) Cascalog at BackType
  • 4. What is Cascalog? Query language for Hadoop Queries are written as regular Clojure code Alternative to Pig and Hive
  • 5. What is Clojure? Functional language that compiles to Java bytecode Lisp-based First-class integration with Java
  • 6. Features Inner and outer joins Aggregators Functions Subqueries Sorting Arbitrary inputs and outputs
  • 7. What sets Cascalog apart?
  • 8. What sets Cascalog apart? Fully integrated in a general purpose programming language
  • 9. What sets Cascalog apart? Full power of Clojure available at all times
  • 10. What sets Cascalog apart? Full power of Clojure available at all times
  • 11. What sets Cascalog apart? Custom operations No UDF interface Just Clojure functions
  • 12. What sets Cascalog apart? Dynamic queries Write functions that return queries Manipulate queries as first-class entities in the language
  • 13. What sets Cascalog apart? Use Cascalog side by side with other code Appends and Distributed Copies Consolidation Application logic
  • 14. Easy Experimentation Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • 15. Demo time!
  • 16. Cascalog at BackType BackType collects data about conversations around the web Tweets Blog comments Social news People
  • 17. Cascalog at BackType
  • 18. Cascalog at BackType Cascalog is used to:
  • 19. Cascalog at BackType Cascalog is used to: Identify influencers
  • 20. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter
  • 21. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets”
  • 22. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets” Study social engagement of domains over time
  • 23. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets” Study social engagement of domains over time Etc, etc.
  • 24. Cascalog at BackType Input and output Cascalog reads from MySQL databases and HDFS Cascalog writes to Cassandra and HDFS
  • 25. Cascalog at BackType Rapid development Local playground dataset for development Develop queries in the REPL
  • 26. Cascalog Roadmap Optimized joins: Replicated joins Bloom joins Negations Recursion
  • 27. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz
  • 28. Clojure and Cascalog Provided by Clojure: Module system Dynamic queries Custom operations Interactive REPL
  • 29. Cascading and Cascalog Provided by Cascading: Tuple abstraction and tuple manipulation Workflow to MapReduce translation Read and write from anywhere with Taps