Cascalog

14,554
-1

Published on

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,554
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
94
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

















  • Cascalog

    1. 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
    2. 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
    3. 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
    4. 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
    5. 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
    6. 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    7. 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    8. 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
    9. 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
    10. 10. Demo time!
    11. 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
    12. 12. News Feed
    13. 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
    14. 14. News Feed Custom Function to score each item in the feed
    15. 15. News Feed Data sources
    16. 16. News Feed Subquery to compute follower count for each person
    17. 17. News Feed Tie everything together in a single Cascalog query
    18. 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz

    ×