VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
1. FEBRUARY 9, 2017, WARSAW
HopsWorks: Secure Streaming-as-a-
Service with Kafka/Spark/Flink
Theofilos Kakantousis
Research Engineer@RISE SICS, Co-founder@Logical Clocks AB
Slides by Jim Dowling, Theofilos Kakantousis
2. FEBRUARY 9, 2017, WARSAW
Streaming-as-a-Service in Sweden
• SICS ICE: datacenter research and test environment
• Hopsworks: Spark/Flink/Kafka/Hadoop-as-a-service
• Built on Hops Hadoop (www.hops.io)
• Over 100 active users
• Spark/Flink/Kafka the platforms of choice
4. FEBRUARY 9, 2017, WARSAW
Where did it go wrong for Hadoop?
• Data Engineers/Scientists
• Where is the User-Friendly tooling and Self-Service?
• How do I install/operate anything other than a sandbox VM?
• Operations Folks
• Security model has become incomprehensible (Sentry/Ranger)
• Major distributions not open enough for patching
• Sensitive data still requires its own cluster
• Why not just use AWS EMR/GCE/Databricks/etc ?!?
5. FEBRUARY 9, 2017, WARSAW
Is this Hadoop?
Mesos KubernetesYARNResource
Manager
Storage HDFS GCSS3 WFS
On-Premise AWS GCEAzurePlatform
Processing
MR
TensorflowSpark Flink
Hive
HBase
Presto Kafka
6. FEBRUARY 9, 2017, WARSAW
How about this?
Mesos KubernetesYARNResource
Manager
Storage HDFS GCSS3 WFS
On-Premise AWS GCEAzurePlatform
Processing
MR
TensorflowSpark Flink
Hive
HBase
Presto Kafka
8. FEBRUARY 9, 2017, WARSAW
Bigger, Faster*
16x
Performance on
Spotify Workload
*Usenix FAST 2017, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
12. FEBRUARY 9, 2017, WARSAW
Project-Based Multi-Tenancy
• Projects feel familiar*
• Users, Data, Programs
• Like GitHub
• Data Sharing feels familiar*
• Dropbox shared folders
Proj-All
Proj-XProj-42
Shared TopicTopic /Projs/My/Data
CompanyDB
*As measured by MRI activity in the perirhinal cortex
https://www.sciencenews.org/blog/scicurious/familiar-feeling-comes-deep-brain
13. FEBRUARY 9, 2017, WARSAW
User Roles
Data Owner
-Import/Export data
-Manage Membership
-Share DataSets, Topics
Data Scientist
-Write and Run code
Self-Service Administration – No Administrator Needed
14. FEBRUARY 9, 2017, WARSAW
Dynamic Roles for Hadoop/Kafka
alice@gmail.com
ProjectA__Alice
Authenticate
ProjectB__Alice
HopsFS
HopsYARN
Projects
Kafka
SSL/TLS
Certificates
15. FEBRUARY 9, 2017, WARSAW
Look Ma, No Kerberos!
• For each project, a user is issued with a SSL/TLS(X.509) certificate for both
authentication and encryption.
• Project based access on Kafka resources.
• Custom Authorizer
• Services are also issued with SSL/TLS certificates.
• Both user and service certs are signed with the same CA.
• Services extract the userID from RPCs to identify the caller.
• HADOOP-13836
• Draws on ideas from Netflix’ BLESS system
17. FEBRUARY 9, 2017, WARSAW
Distributing Certs for Spark/Flink Streaming
alice@gmail.com
1. Launch Spark/Flink Job
Distributed
Database
YARN Private
LocalResources
Spark/Flink Streaming
App
2. Get certs,
service endpoints
3. YARN Job, config
6. Get Schema
7. Consume
Produce
5. Read Certs
Hopsworks
4. Materialize certs
HopsUtil
8. Authenticate
18. FEBRUARY 9, 2017, WARSAW
Simplifying Spark/Flink Streaming Apps
• Spark/Flink Streaming Applications need to know
• Credentials
• Hadoop, Kafka, InfluxDb, Logstash
• Endpoints
• Kafka Broker, Kafka SchemaRegistry, ResourceManager, NameNode, InfluxDB, Logstash
• The HopsUtil API hides this complexity.
• Location/security transparent to applications
19. FEBRUARY 9, 2017, WARSAW
Secure Kafka Application
Developer
1.Discover: Schema Registry and Kafka Broker Endpoints
2.Create: Kafka Properties file with certs and broker details
3.Create: Producer/Consumer using Kafka Properties
4.Download: the Schema for the Topic from the Schema Registry
5.Distribute: X.509 certs to all hosts on the cluster
6.Cleanup securely
All of these steps are now down automatically by Hopsworks’ HopsUtil library
Operations
20. FEBRUARY 9, 2017, WARSAW
Spark Producer in HopsWorks
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
String topic = HopsUtil.getTopic(); //Optional
SparkProducer producer = HopsUtil.getSparkProducer(topic);
Map<String, String> message = …
sparkProducer.produce(message);
https://github.com/hopshadoop/hops-kafka-examples
28. FEBRUARY 9, 2017, WARSAW
Karamel/Chef for Automated Installation
Google Compute Engine BareMetal
29. FEBRUARY 9, 2017, WARSAW
Summary
• Hops is the only European distribution of Hadoop
• More scalable, tinker-friendly, and open-source.
• HopsWorks provides first-class support for Spark/Flink-Kafka-as-a-
Service
• HopsWorks provides best-in-class support for secure streaming
applications with Kafka
• Streaming or Batch Jobs
30. FEBRUARY 9, 2017, WARSAW
Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto
Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid,
Robin Andersso, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Tiago Brito, Filotas
Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis
Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana
Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
Hadoop for humans
31. FEBRUARY 9, 2017, WARSAW
Thank you!
Hopshttp://www.hops.io
http://github.com/hopshadoop
@hopshadoop
www. logicalclocks.com
Hadoop for humans