Your SlideShare is downloading. ×
Cascalog
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cascalog

14,173
views

Published on

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial: …

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Published in: Technology

0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,173
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
90
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

















  • Transcript

    • 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
    • 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
    • 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
    • 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
    • 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
    • 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    • 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    • 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
    • 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
    • 10. Demo time!
    • 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
    • 12. News Feed
    • 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
    • 14. News Feed Custom Function to score each item in the feed
    • 15. News Feed Data sources
    • 16. News Feed Subquery to compute follower count for each person
    • 17. News Feed Tie everything together in a single Cascalog query
    • 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz