BigDataCamp 2011     Chris K Wensel
Concurrent, Inc.• Founded in Spring of 2008• Cascading core development• Support, Training, & OEM Licensing
So What is Cascading?
In a Nutshell                Processing API Integration APIScheduler API                       Physical Planner           ...
On Many Platforms                    Processing API Integration API    Scheduler API                           Physical Pl...
But How is Cascading      Used?
RazorFish/BestBuy                               Java               [unit, regression, & integration testing]              ...
FlightCaster                   JVM Language/DSL                  [scripting, ad-hoc queries, etc]                      Log...
Etsy                   JVM Language/DSL                  [scripting, ad-hoc queries, etc]                      Logical Pla...
What• User behavior on site• Data driven site features • Taste Test • Facebook gift recommender • Suggested Shops • Top Qu...
BackType                   JVM Language/DSL                  [scripting, ad-hoc queries, etc]                      Logical...
Ion Flux                         Java         [unit, regression, & integration testing]                 Processing API Int...
Who Else?http://concurrentinc.com/casestudies/
How is Cascading  Different?
Pig/Hive                 Query Syntax    Extension API                        Logical Planner                 Processing A...
Oozie/Azkaban        Scheduler         Syntax                       Processing API Integration API       Scheduler API    ...
But They are     Complementary• No reason Oozie (or Talend) can’t be used  to drive Cascading apps• No reason Cascading ca...
Architecture isn’t               Innovation        collection           cleansing            processing                  d...
Cascading 2.0• Removed dependencies on Hadoop• Improved Processing Planner architecture• Improved integration APIs        ...
To Do• Support more platforms, including in-  memory stream processing• Make Planner more intelligent and leverage  more c...
We are Hiringhttp://www.concurrentinc.com/careers/
Upcoming SlideShare
Loading in...5
×

BigDataCamp 2011

1,205

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,205
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • BigDataCamp 2011

    1. 1. BigDataCamp 2011 Chris K Wensel
    2. 2. Concurrent, Inc.• Founded in Spring of 2008• Cascading core development• Support, Training, & OEM Licensing
    3. 3. So What is Cascading?
    4. 4. In a Nutshell Processing API Integration APIScheduler API Physical Planner Scheduler Alternative Java API to MapReduce with built in Processing Planner and Workload Scheduler
    5. 5. On Many Platforms Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Apache Hadoop • MapR• Amazon Elastic • EMC/GreenPlum MapReduce • and more**
    6. 6. But How is Cascading Used?
    7. 7. RazorFish/BestBuy Java [unit, regression, & integration testing] Processing API Integration API Scheduler API Physical Planner Scheduler Platform• E-Commerce visitor/customer behavior classification• Rule processing against proprietary logs• Backend system integration
    8. 8. FlightCaster JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• They predict flight delays 6 hrs in advance• Created own API/DSL in Clojure• Used to build predictive models
    9. 9. Etsy JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Online retailer• Forked own API/DSL in JRuby • Cascading.JRuby - avail on github
    10. 10. What• User behavior on site• Data driven site features • Taste Test • Facebook gift recommender • Suggested Shops • Top Query List • plus many more on the way
    11. 11. BackType JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Marketing intelligence• Created Cascalog • an API/DSL in Clojure, avail on github
    12. 12. Ion Flux Java [unit, regression, & integration testing] Processing API Integration APIScheduler API Physical Planner Scheduler Platform Gene sequencing
    13. 13. Who Else?http://concurrentinc.com/casestudies/
    14. 14. How is Cascading Different?
    15. 15. Pig/Hive Query Syntax Extension API Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler PlatformGreat for ad-hoc queries, but hard to operationalize
    16. 16. Oozie/Azkaban Scheduler Syntax Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Great for gluing command line apps together• JVM scripting language + Cascading is less brittle and with more degrees of freedom
    17. 17. But They are Complementary• No reason Oozie (or Talend) can’t be used to drive Cascading apps• No reason Cascading can’t drive raw MR/ Pig/Hive processes (see Riffle)
    18. 18. Architecture isn’t Innovation collection cleansing processing deliveryevent data signal info knowledge normalization scoring mining The point of computing systems is to make data more valuable Everything else is an implementation detail Copyright Concurrent, Inc. 2011. All rights reserved.
    19. 19. Cascading 2.0• Removed dependencies on Hadoop• Improved Processing Planner architecture• Improved integration APIs Copyright Concurrent, Inc. 2011. All rights reserved.
    20. 20. To Do• Support more platforms, including in- memory stream processing• Make Planner more intelligent and leverage more complex data flow topologies• Integrate with more systems and applications Copyright Concurrent, Inc. 2011. All rights reserved.
    21. 21. We are Hiringhttp://www.concurrentinc.com/careers/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×