Your SlideShare is downloading. ×
0
Cascalog
                      Nathan Marz, BackType



Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H a...
About Me


Tech Lead at BackType

Have been working on many-terabyte scale
systems for two years

 ETL workflows

 Data wa...
What is Hadoop?

Distributed Filesystem

MapReduce Framework



Scales to thousands of machines and petabytes of
data
What is Cascalog?


Clojure-based query language for Hadoop with
Datalog-inspired syntax

Queries compile to one or more M...
Features

Inner and outer joins

Aggregators

Functions

Subqueries

Sorting

High performance
What sets Cascalog apart?

Super simple

Full power of Clojure always available

Easy to extend with custom operations

Dy...
What sets Cascalog apart?

Super simple

Full power of Clojure always available

Easy to extend with custom operations

Dy...
Experiment with Cascalog

Ships with test
dataset that can be
queried locally (the
“playground”)

5 minutes to setup
Hadoo...
News feed generator

Ranks events in
social network
for each person
based on
“importance”
and recency


38 lines of code
Demo time!
News Feed
“Follows” and “Action” data sources

 Text files on HDFS
       Follows               Action
News Feed
News Feed
   Custom Aggregator to produce a
     news feed in JSON-like form
News Feed

             Custom Function
            to score each item
                in the feed
News Feed



            Data sources
News Feed

            Subquery to compute
             follower count for
                 each person
News Feed




  Tie everything
together in a single
  Cascalog query
Questions?


Project page:
http://www.github.com/nathanmarz/cascalog

Tutorial:
http://nathanmarz.com/blog/introducing-cas...
Upcoming SlideShare
Loading in...5
×

Cascalog

14,273

Published on

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

Published in: Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,273
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
94
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

















  • Transcript of "Cascalog"

    1. 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
    2. 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
    3. 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
    4. 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
    5. 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
    6. 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    7. 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
    8. 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
    9. 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
    10. 10. Demo time!
    11. 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
    12. 12. News Feed
    13. 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
    14. 14. News Feed Custom Function to score each item in the feed
    15. 15. News Feed Data sources
    16. 16. News Feed Subquery to compute follower count for each person
    17. 17. News Feed Tie everything together in a single Cascalog query
    18. 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×