Cascalog
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Cascalog

on

  • 15,662 views

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Statistics

Views

Total Views
15,662
Views on SlideShare
7,332
Embed Views
8,330

Actions

Likes
13
Downloads
88
Comments
0

16 Embeds 8,330

http://tech.backtype.com 6074
http://nathanmarz.com 2170
http://www.slideshare.net 45
http://tedwon.com 13
http://web.archive.org 7
http://nathanmarz.com. 5
http://webcache.googleusercontent.com 3
http://translate.googleusercontent.com 3
http://techspottr.com 2
http://computerhelpkansascity.blogspot.com 2
http://www.twylah.com 1
http://www.mefeedia.com 1
http://posterous.com 1
http://static.slidesharecdn.com 1
http://www.party09.com 1
http://party09.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Cascalog Presentation Transcript

  • 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
  • 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
  • 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
  • 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
  • 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
  • 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
  • 10. Demo time!
  • 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
  • 12. News Feed
  • 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
  • 14. News Feed Custom Function to score each item in the feed
  • 15. News Feed Data sources
  • 16. News Feed Subquery to compute follower count for each person
  • 17. News Feed Tie everything together in a single Cascalog query
  • 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz