• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cascalog
 

Cascalog

on

  • 15,409 views

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Statistics

Views

Total Views
15,409
Views on SlideShare
7,106
Embed Views
8,303

Actions

Likes
13
Downloads
86
Comments
0

16 Embeds 8,303

http://tech.backtype.com 6074
http://nathanmarz.com 2143
http://www.slideshare.net 45
http://tedwon.com 13
http://web.archive.org 7
http://nathanmarz.com. 5
http://webcache.googleusercontent.com 3
http://translate.googleusercontent.com 3
http://techspottr.com 2
http://computerhelpkansascity.blogspot.com 2
http://www.twylah.com 1
http://www.mefeedia.com 1
http://posterous.com 1
http://static.slidesharecdn.com 1
http://www.party09.com 1
http://party09.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Cascalog Cascalog Presentation Transcript

  • Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
  • About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
  • What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
  • What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
  • Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
  • What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
  • Demo time!
  • News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
  • News Feed
  • News Feed Custom Aggregator to produce a news feed in JSON-like form
  • News Feed Custom Function to score each item in the feed
  • News Feed Data sources
  • News Feed Subquery to compute follower count for each person
  • News Feed Tie everything together in a single Cascalog query
  • Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz