Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cascalog

15,268 views

Published on

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Published in: Technology
  • Be the first to comment

Cascalog

  1. 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
  2. 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
  3. 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
  4. 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
  5. 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
  6. 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  7. 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  8. 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  9. 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
  10. 10. Demo time!
  11. 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
  12. 12. News Feed
  13. 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
  14. 14. News Feed Custom Function to score each item in the feed
  15. 15. News Feed Data sources
  16. 16. News Feed Subquery to compute follower count for each person
  17. 17. News Feed Tie everything together in a single Cascalog query
  18. 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz

×