Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AMP Camp 5 Intro


Published on

"AMP Camp 5 Intro" talk by Mike Franklin of the UC Berkeley AMPLab at AMP Camp 5

  • Be the first to comment

AMP Camp 5 Intro

  1. 1. Welcome and AMPLab Overview Michael Franklin November 20, 2014 UC BERKELEY
  2. 2. 3
  3. 3. AMPLab Overview Project Launched Jan 2011, 6 Yr Planned Duration Personnel: ~65 Students, Postdocs, Faculty and Staff Funding: Government/Industry Partnership NSF Expedition Award , Darpa XData, DoE, 20+ Companies Key Outputs: BDAS Open Source Stack & Apps, (including Apache Spark) Publications: Top Venues in ML, Systems, Databases and Others “… the University of California, Berkeley’s AMPLab has already left an indelible mark on world of information technology, and even the web. But we haven’t yet experienced the full impact of the group, … Not even close.” -- Derrick Harris, GigaOm, August 2014
  4. 4. The AMPLab Faculty UC BERKELEY Michael Franklin (Databases) Michael Jordan (Machine Learning) Ion Stoica (Systems) Dave Patterson (Systems) Scott Shenker (Networks) Alex Bayen (Mobile Sensing) David Culler (Systems/Sensing) Ken Goldberg (Crowdsourcing) Anthony Joseph (Security) Randy Katz (Systems) Michael Mahoney (ML) Ben Recht (Machine Learning) Raluca Popa (Systems/security) joining in Summer 2015
  5. 5. Industrial Engagement • Industrial-Strength Open Source Software • Used by Sponsors, Start-ups and many others • Regular interactions with top industry technologists twice-yearly 3-day offsite retreats; AMPCamp training, some site visits
  6. 6. AMP: Integrating 3 Key Resources Algorithms • Machine Learning, Statistical Methods • Prediction, Business Intelligence Machines • Clusters and Clouds • Warehouse Scale Computing People • Crowdsourcing, Human Computation • Data Scientists, Analysts
  7. 7. Our View of the Big Data Challenge Time Answer Money Quality 8 Step 1: Improve efficiency (e.g. Spark, Tachyon) Massive Diverse Massive Diverse and and Growing Growing Data Data Step 1I: Enable intelligent tradeoffs (e.g., BlinkDB SampleCle an)
  8. 8. The Research Challenge + + Integration + Extreme Elasticity + Tradeoffs + More Sophisticated Analytics = Extreme Complexity
  9. 9. Arc of our Research Program Early work on Foundations (Yrs 1-2): Algorithms – Bag of Little Bootstraps Machines – Mesos and Spark People – CrowdDB Prototype Filling out the Analytics Stack (Yrs 3-4): <you are here> Algorithms – ML Pipelines, Async Algorithms, Concurrency Ctl Machines – Tachyon, SQL, Graphs, Streams, R, Performance People – Hybrid Human/Machine Data Cleaning/Integration Moving Up the Stack/Expanding the Footprint (Yrs 5-6): Algorithms – MLlib build out, Declarative ML (MLBase) Machines – New Storage/Processing Archs, Data/Model
  10. 10. Big Data Ecosystem Evolution MapReduce Pregel Dremel GraphLab Storm Giraph Drill Tez Impala S4 … Specialized systems (iterative, interactive and streaming apps) General batch processing
  11. 11. AMPLab Unification Philosophy Don’t specialize MapReduce – Generalize it! Two additions to Hadoop MR can enable all the models shown earlier! 1. General Task DAGs 2. Data Sharing For Users: Fewer Systems to Use Less Data Movement Spark Streaming GraphX … SparkSQL MLbase
  12. 12. Berkeley Data Analytics Cancer Genomics, Energy Debugging, Smart In House Applications Buildings Sample Clean MLBa se Spark R Access and Interfaces Velox Model Serving Processing Engine Tachyon BlinkDB Spark Streamin g Shark GraphX MLlib Spark Stack (open source software) HDFS, Mesos Resource S3, Virtualization … Yarn In-house Apps Access and Interfaces Processing Engine Storage Resource Virtualization Tachyon Storage
  13. 13. Berkeley Data Analytics Cancer Genomics, Energy Debugging, Smart Buildings Sample Clean MLBa se Spark R Velox Model Serving SparkSQ Tachyon BlinkDB Spark Streamin g GraphX MLlib Spark Stack (open source software) HDFS, In-house Apps Access and Interfaces Processing Engine Storage Mesos S3, … Yarn Resource Virtualization Tachyon Apache Apache Shark L
  14. 14. Some Academic Accolades Ph.D. + Postdoc alumni 2013/14 above have accepted faculty jobs at: Brown, Harvey Mudd, MIT(3), Stanford, UCLA, UT Austin Best Paper Awards: BPOE14,Eurosys13, ICDE 13, NSDI 12, SIGCOMM 12 and Best Demo: SIGMOD 12, VLDB 11 CACM “Research Highlight” Selections 2014 and 2015
  15. 15. About AMPCamp History Today • BDAS and Stack Component Overviews • Hands On Exercises • Use Cases • Reception and Networking Tomorrow • Research and ML Overviews • Advanced Hands On Exercises (including genomics) AMPCamp I @ Berkeley, August 2012 AMPCamp II @ Strata NYC., Feb 2013 AMPCamp III @ Berkeley, August 2013 AMPCamp IV @Strata Santa Clara, Feb 2014 AMPCamp V @Berkeley, Nov 2015 Also “Spark Camp”: AMPCamp Spinoff
  16. 16. AMPCamp Made Possible By Rachit Agarwal Elaine Angelino Peter Bailis Dan Crankshaw Ankur Dave Joseph Gonzalez Daniel Haas Sanjay Krishnan Haoyuan Li Frank Austin Nothaft Xinghao Pan Pedro Rodriguez Ginger Smith Evan Sparks Shivaram Venkataraman Jiannan Wang Zongheng Yang Ameet Talwalkar Jey Kottalam Kattt Atchley Carlyn Chinen Boban Zarkovich Jon Kuroda
  17. 17. To find out more or get involved: UC BERKELEY franklin@berkeley.e du Thanks to NSF CISE Expeditions in Computing, DARPA XData, Founding Sponsors: Amazon Web Services, Google, and SAP, the Thomas and Stacy Siebel Foundation, and all our industrial sponsors and partners.