Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

1,439 views

Published on

Spark Summit EU talk by Ion Stoica

Published in: Data & Analytics
  • Be the first to comment

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

  1. 1. The next AMPLab: Real-time Intelligent Secure Execution Ion Stoica October 26, 2016
  2. 2. Berkeley’s AMPLab 2011 – 2016 • Mission: “Make sense of big data” • 8 faculty, 60+ students Governmental and industrial founding 2 Algorithms Machine s People
  3. 3. AMPLab Goal and Impact 3 Goal: Next generation of open source data analytics stack for industry & academia Berkeley Data Analytics Stack (BDAS) …
  4. 4. What is next?
  5. 5. RISE: Real-time Intelligent Secure Execution
  6. 6. From batch data to advanced analytics AMPLab 6 From live data to real-time decisions RISELab
  7. 7. Why? Data only as valuable as the decisions it enables 7
  8. 8. Why? What does this mean? • Faster decisions better than slower decisions • Decisions on fresh data better than decisions on stale data • Decisions on personalized data better than on generic data 8 Data only as valuable as the decisions it enables
  9. 9. Goal Real-time decisions on live data with strong security 9 decide in ms the current state of the environment privacy, confidentiality, integrity
  10. 10. Typical decision system 10 Decision System DecisionData Preproces s (e.g., train) Intermediate data (e.g., model) Query engine Automatic decision engine update latency decision latency Want low update latency & low decision latency
  11. 11. Why is it hard? Want high quality decisions • Sophisticated, e.g., fraud, forecast, fleet of drones • Accuracy, low false positives and negatives • Robust to noisy and unforseen data Want low latency for both updates and decisions Want strong security: privacy, confidential, integrity 11
  12. 12. Example: Zero-time defense 12 Problem: zero-day attacks can compromise millions of hosts in seconds Solution: analyze network flows to detect attacks and patch hosts/software in real-time • Intermediate data: create attack model • Decision: detect attack, patch Quality sophisticated, accurate, robust Latency update (sec ) / decision (ms) Security privacy (encourage users to share logs), integrity
  13. 13. Application Quality Latency SecurityUpdate Decisio n Zero-time defense sophisticated, accurate, robust sec ms privacy, integrity Parking assistant sophisticated, robust sec sec privacy Disease discovery sophisticated, accurate hours sec/min privacy, integrity IoT (smart buildings) sophisticated, robust min/hour sec privacy, integrity Earthquake warning sophisticated, accurate, robust min ms integrity Chip manufacturing sophisticated, accurate, robust min sec/min confidentiality, integrity Fraud detection sophisticated, accurate min ms privacy, integrity “Fleet” driving sophisticated, accurate, robust sec sec privacy, integrity Virtual companion sophisticated, robust min/hour sec integrity Video QoS at scale sophisticated min ms/sec privacy, integrity
  14. 14. Challenges 14 Automated decisions on live data are hard Poor security: exploits are daily occurrences One-off solutions, expensive, slow to build Real-time, sophisticated decisions that guarantee worst-case behavior on noisy and unforseen live data Ensure privacy and integrity without impacting functionality General platform: Secure Real-time Decision Stack RISE Lab
  15. 15. Research directions Systems: 100x lower latency, 1,000x higher concurrency than today’s Spark Machine learning: Robust, on-line ML algorithms Security: achieve privacy, confidentiality, and integrity without impacting performance or functionality 15
  16. 16. Early work Drizzle Opaque 16
  17. 17. Streaming Micro-batching vs. record-at-a-time Micro-batching (e.g., Spark) inherits batch’s properties • fault-tolerance • straggler mitigation • optimizations • unification with other libraries Record-at-a-time (e.g., Storm, Flink), typically lower latency 17
  18. 18. Yahoo’s streaming benchmark Input: 20M JSON ad-events / second, 100 campaigns Output: ad counts per campaign over a 10sec window Latency: (end of window) – (time last event was processed) SLA: 1sec Findings: Storm, Flink provide indeed lower latency than Spark 18 Streaming systemads ad counts per campaign
  19. 19. Spark Streaming 19 … Workers Master Process batchSchedule tasks task task task
  20. 20. 20 Workers Master … Spark Streaming Cluster status
  21. 21. Spark Streaming 21 … Workers Master Process batchSchedule tasks task task task
  22. 22. 22 Workers Master … Spark Streaming Cluster status
  23. 23. Drizzle Goal: reduce Spark streaming latency by at least 10x Key observation: consecutive iterations use same DAG Solution: push scheduling decisions to workers 23
  24. 24. Group scheduling 24 Workers Master … Spark Streaming Drizzle Workers Master … task s task s task s
  25. 25. 25 Workers Master … Spark Streaming Drizzle Workers Master …… task s task s task s Process batch
  26. 26. 26 Workers Master … Spark Streaming Drizzle Workers Master …… task s task s task s Process batch
  27. 27. 27 Workers Master … Spark Streaming Drizzle Workers Master …… task s task s task s Cluster status
  28. 28. Latency 28 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 Spark Flink Final Event Latency (ms) CDF
  29. 29. Latency 29 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 Spark Drizzle Flink Final Event Latency (ms) CDF Similar latency to Flink
  30. 30. Latency, w/ ReduceBy optimization 30 0 0.2 0.4 0.6 0.8 1 0 200 400 600 800 Spark Drizzl e Flink CDF Final Event Latency (ms) Aggregate counters on map side to reduce shuffle traffic
  31. 31. Latency, w/ ReduceBy optimization 31 0 0.2 0.4 0.6 0.8 1 0 200 400 600 800 Spark Drizzl e Flink CDF Final Event Latency (ms) Aggregate counters on map side to reduce shuffle traffic
  32. 32. Fault tolerance 32 1 10 100 1000 10000 100000 150 200 250 300 350 Latency(ms) Drizzle Spark Flink four nines SLA: 8.6 sec per day exceeding SLA Recovers 5x faster than Flink with 10x lower latency Time (seconds)
  33. 33. Early results Drizzle Opaque 33
  34. 34. State-of-the-art security today Authentication, encryption at-rest and in-motion 34 Spark Core Spark Streaming Spark SQL MLlib GraphX OS (e.g., Linux), Cluster Manager (e.g., Kubernetes), Hypervisor (e.g., Xen) private/public cluster Not enough if OS or hypervisor compromised, and attacker get root access
  35. 35. State-of-the-art security today Authentication, encryption at-rest and in-motion 35 Spark Core Spark Streaming Spark SQL MLlib GraphX OS (e.g., Linux), Cluster Manager (e.g., Kubernetes), Hypervisor (e.g., Xen) private/public cluster Not enough if attacker can observe network and memory access patters
  36. 36. Opaque Leverage Intel’s SGX: hardware enclave Implement secure distributed relational algebra 36 Execution Spark Streaming Spark SQL MLlib GraphX Query Optimizer (Catalyst) enc-filter enc-join enc-agg enc-sort
  37. 37. Opaque: two modes Encryption mode • Protect against compromised software (e.g., OS) • Full data encryption, authentication, and computation verification in hardware enclave Oblivious mode • Additionally, hide data access pattern 37
  38. 38. Opaque: Big Data Benchmark 38 0.01 0.1 1 10 100 Query 1 Query 2 Query 3 Runtime(s) SparkSQL Opaque encryption Opaque oblivious
  39. 39. Opaque: Big Data Benchmark 39 0.01 0.1 1 10 100 Query 1 Query 2 Query 3 Runtime(s) SparkSQL Opaque encryption Opaque oblivious Encrypted operators implemented in C++
  40. 40. Opaque: Big Data Benchmark 40 0.01 0.1 1 10 100 Query 1 Query 2 Query 3 Runtime(s) SparkSQL Opaque encryption Opaque oblivious Up to 100x slower but 1,000x faster than state-of-the- art
  41. 41. Next AMPLab: RISELab Already promising results Expect much more over the next five years! 41 Goal: develop Secure Real-time Decision Stack, an open source platform, tools and algorithms for real-time decisions on live data with strong security
  42. 42. Thank you
  43. 43. AMPLab alumni presenting here 43
  44. 44. Example: “Fleet” driving Problem: suboptimal driving decisions Solution: collect & leverage info from other cars and drivers in real-time • Intermediate data: automatically annotate maps, actions of other drivers • Decision: avoid obstacles, congestions 44 Quality sophisticated, accurate, noise tolerant Performanc e sec (decision) / sec (update) Security privacy, data integrity
  45. 45. Not only hypothetical Attacks getting root access by exploiting OS/DBs vulnerabilities Attacks exploiting access pattern leakages 45
  46. 46. 46 Workers Master … Spark Streaming Drizzle Workers Master …… task s task s task s Process batch

×