Successfully reported this slideshow.

Spark Summit EU talk by Jim Dowling

2

Share

Loading in …3
×
1 of 23
1 of 23

More Related Content

More from Spark Summit

Spark Summit EU talk by Jim Dowling

  1. 1. SPARK SUMMIT EUROPE2016 On Premise Spark-as-a-Service on YARN Jim Dowling Associate Prof @ KTH, Stockholm Senior Researcher, SICS Swedish ICT CEO, Logical Clocks AB Twitter: @jim_dowling
  2. 2. Spark-as-a-Service in Sweden • SICS ICE: datacenter research and test environment • Hopsworks: Spark/Kafka/Flink/Hadoop-as-a-service – Built on Hops Hadoop (www.hops.io) – Over 100 active users – Spark the platform of choice 2
  3. 3. HopsFS Architecture 3 NameNodes NDB Leader HDFS Client DataNodes
  4. 4. Hops-YARN Architecture 4 ResourceMgrs NDB Scheduler YARN Client NodeManagers Resource Trackers Heartbeats (70-95%) AM Reqs (5-30%)
  5. 5. Pluggable DB: Data Abstraction Layer 5 NameNode (Apache v2) DAL API (Apache v2) NDB-DAL-Impl (GPL v2) Other DB (Other License) hops-2.7.3.jar dal-ndb-2.7.3-7.5.4.jar
  6. 6. 6 HopsFS Throughput vs Apache HDFS NDB Setup:Nodes using XeonE5-26202.40GHz Processors and10GbE. NameNodes:XeonE5-2620 2.40GHz Processors machines and 10GbE.
  7. 7. HOPSWORKS 7
  8. 8. Project-Based Multi-Tenancy • A project is a collection of – Users with Roles – HDFS DataSets – Kafka Topics – Notebooks, Jobs • Per-Project quotas – Storage in HDFS – CPU in YARN • Uber-style Pricing • Sharing across Projects – Datasets/Topics 8 project dataset 1 dataset N Topic 1 Topic N Kafka HDFS
  9. 9. 9 Alice@gmail.com NSA__Alice Authenticate Users__Alice HopsFS HopsYARN Projects Secure Impersonation Kafka X.509 Certificates Dynamic Roles for Hadoop/Kafka
  10. 10. Look Ma, No Kerberos! • For each project, a user is issued with a X.509 certificate, containing the project-specific userID. • Inspired by Netflix’ BLESS system. • Services are also issued with X.509 certificates. – Both user and service certs are signed with the same CA. – Services extract the userID from RPCs to identify the caller.
  11. 11. 11 Alice@gmail.com Add/Del Users Distributed Database Insert/Remove CertsProject Mgr Root CA Services Hadoop Spark Kafka etc Cert Signing Requests Project-User Certificates
  12. 12. 12 Alice@gmail.com 1. Launch Spark Job Distributed Database 2. Get certs, service endpoints YARN Private LocalResources Spark Streaming App 4. Materialize certs 3.YARN Job, config 6. Get Schema 7. Consume Produce 5. Read Certs Hopsworks KafkaUtil Spark Streaming on YARN with Hopsworks 8.Authenticate
  13. 13. Spark Stream Producer in Secure Kafka SparkConf sparkConf = … JavaSparkContext jsc = … 1. Discover: Schema Registry and Kafka Broker Endpoints 2. Create: Kafka Properties file with certs and broker details 3. Create: producer using Kafka Properties 4. Download: the Schema for the Topic from the Schema Registry 5. Distribute: X.509 certs to all hosts on the cluster 6. Cleanup securely // write to Kafka 13 Developer Operations
  14. 14. Spark Streaming Producer in Hopsworks List<String> topics = KafkaUtil.getTopics(); … SparkProducer sparkProducer = KafkaUtil.getSparkProducer(topic); … Map<String, String> message = … sparkProducer.produce(message); … sparkProducer.close(); 14https://github.com/hopshadoop/hops-kafka-examples
  15. 15. Spark Streaming Consumer in Hopsworks JavaStreamingContext jssc = … List<String> topics = KafkaUtil.getTopics(); … SparkConsumer consumer = KafkaUtil.getSparkConsumer(jssc, topics); … // Avro schema downloaded by framework here GenericRecord genericRecord = KafaUtil.getRecordInjections() .get(topic); … jssc.start(); jssc.awaitTermination(); 15 https://github.com/hopshadoop/hops-kafka-examples
  16. 16. Zeppelin Support for Spark/Livy 16
  17. 17. Livy to launch Spark 2.0 Jobs [Image from: http://gethue.com]
  18. 18. Debugging Spark with DrElephant • Project-specific view of performance/correctness issues for completed Spark Jobs • Customizable heuristics • Doesn’t show killed jobs
  19. 19. Karamel/Chef for Automated Installation 19 Google Compute Engine BareMetal
  20. 20. 20 Demo
  21. 21. Summary • Hopsworks provides first-class support for Spark-as-a-Service – Streaming or Batch Jobs – Zeppelin Notebooks • Hopworks simplifies writing secure SparkStreaming applications with Kafka 21http://github.com/hopshadoop Hops [Hadoop For Humans]
  22. 22. Hops Team Active: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Konstantin Popov,Antonios Kouzoupis, Ermias Gebremeskel. Alumni: Vasileios Giannokostas, Johan Svedlund Nordström, Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
  23. 23. SPARK SUMMIT EUROPE2016 THANK YOU. www.hops.io

×