• Save
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Upcoming SlideShare
Loading in...5
×
 

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

on

  • 26,175 views

 

Statistics

Views

Total Views
26,175
Views on SlideShare
3,322
Embed Views
22,853

Actions

Likes
5
Downloads
0
Comments
0

18 Embeds 22,853

http://tech.backtype.com 21316
http://nosql.mypopescu.com 895
http://developer.yahoo.com 340
http://developer.yahoo.net 208
https://developer.yahoo.com 31
http://www.slideshare.net 19
http://translate.googleusercontent.com 11
http://web.archive.org 10
http://static.slidesharecdn.com 9
http://webcache.googleusercontent.com 6
http://computerrepairkansascity.typepad.com 1
http://computerhelpkansascity.blogspot.com 1
http://hghltd.yandex.net 1
http://paper.li 1
http://posterous.com 1
https://www.x-ploited.net 1
http://tech.backtype.com.iproxy.saverpigeeks.com 1
http://rdbcci 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType Presentation Transcript

    • Cascalog
      • Nathan Marz, BackType
      Powerful and easy-to-use data analysis tool for Hadoop
    • About Me
      • Tech Lead at BackType
      • Have been working on many-terabyte scale systems for two years
        • ETL workflows
        • Data warehouses
    • Presentation Overview
      • High level introduction to Cascalog
      • Demo
      • Cascalog at BackType
    • What is Cascalog?
      • Query language for Hadoop
      • Queries are written as regular Clojure code
      • Alternative to Pig and Hive
    • What is Clojure?
      • Functional language that compiles to Java bytecode
      • Lisp-based
      • First-class integration with Java
    • Features
      • Inner and outer joins
      • Aggregators
      • Functions
      • Subqueries
      • Sorting
      • Arbitrary inputs and outputs
    • What sets Cascalog apart?
    • What sets Cascalog apart? Fully integrated in a general purpose programming language
    • What sets Cascalog apart? Full power of Clojure available at all times
    • What sets Cascalog apart? Full power of Clojure available at all times
    • What sets Cascalog apart?
      • Custom operations
        • No UDF interface
        • Just Clojure functions
    • What sets Cascalog apart?
      • Dynamic queries
        • Write functions that return queries
        • Manipulate queries as first-class entities in the language
    • What sets Cascalog apart?
      • Use Cascalog side by side with other code
        • Appends and Distributed Copies
        • Consolidation
        • Application logic
    • Easy Experimentation
      • Ships with test dataset that can be queried locally (the “playground”)
      • 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
    • Demo time!
    • Cascalog at BackType
      • BackType collects data about conversations around the web
        • Tweets
        • Blog comments
        • Social news
        • People
    • Cascalog at BackType
      • Cascalog is used to:
        • Identify influencers
        • Determine number of people exposed to URLs on Twitter
        • Identify “interesting tweets”
        • Study social engagement of domains over time
        • Etc, etc.
    • Cascalog at BackType
      • Input and output
        • Cascalog reads from MySQL databases
        • Cascalog writes to Cassandra
    • Cascalog at BackType
      • Rapid development
        • Local playground dataset for development
        • Develop queries in the REPL
    • Cascalog Roadmap
      • Optimized joins:
        • Replicated joins
        • Bloom joins
      • Negations
      • Recursion
    • Questions?
      • Project page: http://www.github.com/nathanmarz/cascalog
      • Tutorial: http://nathanmarz.com/blog/introducing-cascalog
      • Follow me on Twitter: @nathanmarz
    • Clojure and Cascalog
      • Provided by Clojure:
        • Module system
        • Dynamic queries
        • Custom operations
        • Interactive REPL
    • Cascading and Cascalog
      • Provided by Cascading:
        • Tuple abstraction and tuple manipulation
        • Workflow to MapReduce translation
        • Read and write from anywhere with Taps