BigDataCamp 2011

1,375 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,375
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • BigDataCamp 2011

    1. 1. BigDataCamp 2011 Chris K Wensel
    2. 2. Concurrent, Inc.• Founded in Spring of 2008• Cascading core development• Support, Training, & OEM Licensing
    3. 3. So What is Cascading?
    4. 4. In a Nutshell Processing API Integration APIScheduler API Physical Planner Scheduler Alternative Java API to MapReduce with built in Processing Planner and Workload Scheduler
    5. 5. On Many Platforms Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Apache Hadoop • MapR• Amazon Elastic • EMC/GreenPlum MapReduce • and more**
    6. 6. But How is Cascading Used?
    7. 7. RazorFish/BestBuy Java [unit, regression, & integration testing] Processing API Integration API Scheduler API Physical Planner Scheduler Platform• E-Commerce visitor/customer behavior classification• Rule processing against proprietary logs• Backend system integration
    8. 8. FlightCaster JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• They predict flight delays 6 hrs in advance• Created own API/DSL in Clojure• Used to build predictive models
    9. 9. Etsy JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Online retailer• Forked own API/DSL in JRuby • Cascading.JRuby - avail on github
    10. 10. What• User behavior on site• Data driven site features • Taste Test • Facebook gift recommender • Suggested Shops • Top Query List • plus many more on the way
    11. 11. BackType JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Marketing intelligence• Created Cascalog • an API/DSL in Clojure, avail on github
    12. 12. Ion Flux Java [unit, regression, & integration testing] Processing API Integration APIScheduler API Physical Planner Scheduler Platform Gene sequencing
    13. 13. Who Else?http://concurrentinc.com/casestudies/
    14. 14. How is Cascading Different?
    15. 15. Pig/Hive Query Syntax Extension API Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler PlatformGreat for ad-hoc queries, but hard to operationalize
    16. 16. Oozie/Azkaban Scheduler Syntax Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Great for gluing command line apps together• JVM scripting language + Cascading is less brittle and with more degrees of freedom
    17. 17. But They are Complementary• No reason Oozie (or Talend) can’t be used to drive Cascading apps• No reason Cascading can’t drive raw MR/ Pig/Hive processes (see Riffle)
    18. 18. Architecture isn’t Innovation collection cleansing processing deliveryevent data signal info knowledge normalization scoring mining The point of computing systems is to make data more valuable Everything else is an implementation detail Copyright Concurrent, Inc. 2011. All rights reserved.
    19. 19. Cascading 2.0• Removed dependencies on Hadoop• Improved Processing Planner architecture• Improved integration APIs Copyright Concurrent, Inc. 2011. All rights reserved.
    20. 20. To Do• Support more platforms, including in- memory stream processing• Make Planner more intelligent and leverage more complex data flow topologies• Integrate with more systems and applications Copyright Concurrent, Inc. 2011. All rights reserved.
    21. 21. We are Hiringhttp://www.concurrentinc.com/careers/

    ×