Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark


Published on

CEO of Databricks, Ion Stoica, talks about the impact of Apache Spark in the enterprise.

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

  1. 1. Revolutionizing Big Data in the Enterprise with Spark Ion Stoica October 28,2015
  2. 2. We Have Seen a Lot Worked with 100s companies to run Spark in production over five years Collaboratewith all major Hadoop and Big Data vendors 2
  3. 3. How Does Spark Change Enterprise Big Data? • Unifying data sources • Unifying data processing 3
  4. 4. 4 Unifying Data Sources
  5. 5. Need to process data from • Multiple sources • Different data stores and locations • Different formats Traditional solutions: ETL data into data warehouse, … Traditional Data Warehouses ETL Slow to access and combine data Data Warehouse
  6. 6. 6 Just-In-Time (JIT) Data Warehouse
  7. 7. Process data in place or stream it • No need to wait for data to be ETLed 7 JIT Data Warehouse ETL Data Warehouse
  8. 8. Process data in place or stream it • No need to wait for data to be ETLed Cachedata in memory or SSDs 8 JIT Data Warehouse Low latency and easy to combine data: value!
  9. 9. Analogy 9 Stream/cache & Play Download & Play
  10. 10. Analogy 10 ETL & Query Data Source A ETL Data Warehouse Data Source B Data Source B Data Source A Data Source B Data Source B Stream/Cache + Query
  11. 11. Top-3 Media Company Data sources • Traditional data warehouse:Customer transaction and profile data • S3: Clickstream and historical logs • Elasticsearch: User-submitted reviewsand comments • Kafka: Streaming online eventdata Build Spark-basedJIT Data Warehouseto perform real-time analytics 11
  12. 12. 12 Unifying Data Processing
  13. 13. Unified supportfor • Batch • Streaming • ML/Graphs • … 13 Spark: Unified Engine GraphXMLlib Core Spark Streaming SparkSQL SparkR Easy to manage, learn, and combine functionality
  14. 14. Analogy First cellular phones Unified device (smartphone) Specialized devices Better Games Better GPSBetter Phone
  15. 15. Analogy Batch processing Unified systemSpecialized systems Real-time analytics Instant fraud detection Better Apps
  16. 16. Large On-line Service Company Leverages • Interactive query processing • ML and combines data from S3, Redshift, and HBase to provide • data analyticsfor productmanagementteam • advanced predictive analyticsto delivernew services(e.g., customized inventory displaystailored to each user) 16
  17. 17. 17 Demo
  18. 18. Demo Setting 18 MLlib Core Spark Streaming SparkSQL HDFS RedShift