Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The BDAS Open Source Community

6,576 views

Published on

AMPCamp 5 presentation on "The BDAS Open Source Community" by Prof. Ion Stoica of the UC Berkeley AMPLab and Databricks

Published in: Software

The BDAS Open Source Community

  1. 1. The BDAS Open Source Community UC BERKELEY Ion Stoica UC Berkeley and Databricks
  2. 2. Growing Beyond AMPLab As software matures and becomes successful, more and more contributors outside AMPLab New startups have anchored development » Databricks (Spark Stack) » Mesosphere (Mesos) » … Enables AMPLab to focus more resources on future systems instead of software maintenance
  3. 3. Apache Spark Cancer Genomics, Energy Debugging, Smart Buildings MLBase SparkR Velox Model Serving Sample Clean Spark Streaming SparkSQL Tachyon BlinkDB GraphX MLlib Apache Spark (core) Tachyon HDFS, S3, Apache Meso…s Yarn
  4. 4. Apache Spark Open Source: end of 2010 Apache Project: 2013 Over time has grown to include key libraries » SparkStreaming, SparkSQL, MLlib, GraphX Becoming a platform for Big Data apps
  5. 5. Apache Spark Today MapReduce YARN HDFS Storm Spark 2000 1800 1600 1400 1200 1000 800 600 400 200 0 MapReduce YARN HDFS Storm Spark 350000 300000 250000 200000 150000 100000 50000 0 2-3x more activity than: Hadoop, Storm, Commits Lines of Code Changed MongoDB, NumPy, D3, Julia, … Activity in past 6 months
  6. 6. Meetups Around the World
  7. 7. Monthly Contributors 100 75 50 25 0 Databricks founded 2011 2012 2013 2014 370+ contributors for last 12 months
  8. 8. Spark Stack (2013) Cancer Genomics, Energy Debugging, Smart Buildings Tachyon BlinkDB Spark Streaming MLlib MLBase Sample Clean Shark Apache Spark (core) Tachyon HDFS, S3, Apache Meso…s Yarn
  9. 9. Last Year Developments Tachyon Cancer Genomics, Energy Debugging, Smart Buildings UC BERKELEY BlinkDB MLBase SparkR SpSahrkaSrkQL GraphX MLlib Tachyon Spark Streaming Sample Clean Apache Spark (core) Tachyon HDFS, S3, Tachyon Apache Mesos… Yarn Tachyon UC BERKELEY … UC BERKELEY Velox Model Serving
  10. 10. Wide Adoption All major Hadoop distributions include Spark Beyond Hadoop
  11. 11. Wide Adoption All major Hadoop distributions include Spark Beyond Hadoop partners partners Databricks: spurred Spark’s enterprise growth
  12. 12. Apache Mesos Cancer Genomics, Energy Debugging, Smart Buildings MLBase SparkR Velox Model Serving Sample Clean Spark Streaming SparkSQL Tachyon BlinkDB GraphX MLlib Apache Spark Tachyon HDFS, S3, Apache Meso…s Yarn
  13. 13. Apache Mesos Open Source: 2010 Apache Project: 2012 Used in production at Twitter for past 2.5 years » +10,000 machines » +500 engineers using it Most development moved outside Berkeley starting with 2012
  14. 14. Monthly Contributors Mesosphere founded 65 contributors for last 12 months
  15. 15. BDAS Stack Cancer Genomics, Energy Debugging, Smart Buildings MLBase SparkR Velox Model Serving Sample Clean Spark Streaming SparkSQL Tachyon BlinkDB GraphX MLlib Apache Spark HDFS, S3, Apache Meso…s Yarn
  16. 16. Release Growth Tachyon 0.2: - 3 contributors Apr ‘13Oct‘13 Tachyon 0.5: - 46 contributors Tachyon 0.4: - 30 contributors Feb ‘14 Tachyon 0.3: - 15 contributors 16 July ‘14 Tachyon 0.1: -1 contributor Dec ‘12
  17. 17. Fast Growing Community Berkeley Contributors Non-Berkeley Contributors (20+ companies) ~80% contributors already outside AMPLab
  18. 18. Reaching Tipping Point 18
  19. 19. Research to Real-World Impact MLlib Spark Streaming Spark SQL Apache Spark (core) Apache Mesos GraphX Tachyon Succinct Velox ADAM BlinkDB Research Real-world Impact AMPLab/Berkeley Non-Berkeley committers / commits
  20. 20. Impact on AMPLab Created blue-print & ecosystem for other BDAS components to succeed » MLlib, GraphX, Tachyon, … Enabled AMPLab to increase focus on new research projects » Velox, ADAM, Succinct, …

×