SlideShare a Scribd company logo
1 of 31
Download to read offline
FEBRUARY 9, 2017, WARSAW
HopsWorks: Secure Streaming-as-a-
Service with Kafka/Spark/Flink
Theofilos Kakantousis
Research Engineer@RISE SICS, Co-founder@Logical Clocks AB
Slides by Jim Dowling, Theofilos Kakantousis
FEBRUARY 9, 2017, WARSAW
Streaming-as-a-Service in Sweden
• SICS ICE: datacenter research and test environment
• Hopsworks: Spark/Flink/Kafka/Hadoop-as-a-service
• Built on Hops Hadoop (www.hops.io)
• Over 100 active users
• Spark/Flink/Kafka the platforms of choice
FEBRUARY 9, 2017, WARSAW
Hadoop is not a cool kid anymore!
FEBRUARY 9, 2017, WARSAW
Where did it go wrong for Hadoop?
• Data Engineers/Scientists
• Where is the User-Friendly tooling and Self-Service?
• How do I install/operate anything other than a sandbox VM?
• Operations Folks
• Security model has become incomprehensible (Sentry/Ranger)
• Major distributions not open enough for patching
• Sensitive data still requires its own cluster
• Why not just use AWS EMR/GCE/Databricks/etc ?!?
FEBRUARY 9, 2017, WARSAW
Is this Hadoop?
Mesos KubernetesYARNResource
Manager
Storage HDFS GCSS3 WFS
On-Premise AWS GCEAzurePlatform
Processing
MR
TensorflowSpark Flink
Hive
HBase
Presto Kafka
FEBRUARY 9, 2017, WARSAW
How about this?
Mesos KubernetesYARNResource
Manager
Storage HDFS GCSS3 WFS
On-Premise AWS GCEAzurePlatform
Processing
MR
TensorflowSpark Flink
Hive
HBase
Presto Kafka
FEBRUARY 9, 2017, WARSAW
Here’s HopsHadoop Distribution
YARNResource
Manager
Storage HDFS
On-Premise AWS GCEPlatform
Processing
Elasticsearch
TensorflowSpark Flink
HopsWorks
Kibana
Logstash Kafka
Grafana
FEBRUARY 9, 2017, WARSAW
Bigger, Faster*
16x
Performance on
Spotify Workload
*Usenix FAST 2017, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
FEBRUARY 9, 2017, WARSAW
HopsFS Architecture
NameNodes
NDB
Leader
HDFS Client
DataNodes
FEBRUARY 9, 2017, WARSAW
Nobody gets a cluster... everybody gets a project!
HopsWorks
FEBRUARY 9, 2017, WARSAW
Simplified Semantics
Hops Hadoop
• Projects
– Datasets/Topics
– Per-Project Users
– Jobs/Notebooks
– SSL/TLS Certificates
Hadoop
• Clusters
• Users
• Jobs/Applications
• Files
• ACLs
• Sys Admins
• Kerberos
FEBRUARY 9, 2017, WARSAW
Project-Based Multi-Tenancy
• Projects feel familiar*
• Users, Data, Programs
• Like GitHub
• Data Sharing feels familiar*
• Dropbox shared folders
Proj-All
Proj-XProj-42
Shared TopicTopic /Projs/My/Data
CompanyDB
*As measured by MRI activity in the perirhinal cortex
https://www.sciencenews.org/blog/scicurious/familiar-feeling-comes-deep-brain
FEBRUARY 9, 2017, WARSAW
User Roles
Data Owner
-Import/Export data
-Manage Membership
-Share DataSets, Topics
Data Scientist
-Write and Run code
Self-Service Administration – No Administrator Needed
FEBRUARY 9, 2017, WARSAW
Dynamic Roles for Hadoop/Kafka
alice@gmail.com
ProjectA__Alice
Authenticate
ProjectB__Alice
HopsFS
HopsYARN
Projects
Kafka
SSL/TLS
Certificates
FEBRUARY 9, 2017, WARSAW
Look Ma, No Kerberos!
• For each project, a user is issued with a SSL/TLS(X.509) certificate for both
authentication and encryption.
• Project based access on Kafka resources.
• Custom Authorizer
• Services are also issued with SSL/TLS certificates.
• Both user and service certs are signed with the same CA.
• Services extract the userID from RPCs to identify the caller.
• HADOOP-13836
• Draws on ideas from Netflix’ BLESS system
FEBRUARY 9, 2017, WARSAW
SSL/TLS Certificate Generation
alice@gmail.com
Users don’t see the certificates.
Users authenticate using:
• LDAP
• password
• 2-Factor Authentication
Add/Del
Users
Distributed
Database
Insert/Remove CertsProject
Mgr
Root
CA
HDFS
Spark
Kafka
YARN
Cert
Signing
RequestsIntermediate
Certificate
Authority
Hopsworks
FEBRUARY 9, 2017, WARSAW
Distributing Certs for Spark/Flink Streaming
alice@gmail.com
1. Launch Spark/Flink Job
Distributed
Database
YARN Private
LocalResources
Spark/Flink Streaming
App
2. Get certs,
service endpoints
3. YARN Job, config
6. Get Schema
7. Consume
Produce
5. Read Certs
Hopsworks
4. Materialize certs
HopsUtil
8. Authenticate
FEBRUARY 9, 2017, WARSAW
Simplifying Spark/Flink Streaming Apps
• Spark/Flink Streaming Applications need to know
• Credentials
• Hadoop, Kafka, InfluxDb, Logstash
• Endpoints
• Kafka Broker, Kafka SchemaRegistry, ResourceManager, NameNode, InfluxDB, Logstash
• The HopsUtil API hides this complexity.
• Location/security transparent to applications
FEBRUARY 9, 2017, WARSAW
Secure Kafka Application
Developer
1.Discover: Schema Registry and Kafka Broker Endpoints
2.Create: Kafka Properties file with certs and broker details
3.Create: Producer/Consumer using Kafka Properties
4.Download: the Schema for the Topic from the Schema Registry
5.Distribute: X.509 certs to all hosts on the cluster
6.Cleanup securely
All of these steps are now down automatically by Hopsworks’ HopsUtil library
Operations
FEBRUARY 9, 2017, WARSAW
Spark Producer in HopsWorks
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
String topic = HopsUtil.getTopic(); //Optional
SparkProducer producer = HopsUtil.getSparkProducer(topic);
Map<String, String> message = …
sparkProducer.produce(message);
https://github.com/hopshadoop/hops-kafka-examples
FEBRUARY 9, 2017, WARSAW
Spark Streaming Consumer in HopsWorks
JavaStreamingContext jssc = new
JavaStreamingContext(sparkConf,Durations.seconds(2));
String topic = HopsUtil.getTopic(); //Optional
String consumerGroup = HopsUtil.getConsumerGroup(); //Optional
SparkConsumer consumer =
HopsUtil.getSparkConsumer(jssc,topic,consumerGroup);
JavaInputDStream<ConsumerRecord<String, byte[]>> messages =
consumer.createDirectStream();
jssc.start();
https://github.com/hopshadoop/hops-kafka-examples
FEBRUARY 9, 2017, WARSAW
Hops simplifies Secure Spark/Flink/Kafka
https://github.com/hopshadoop/hops-kafka-examples
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
props.put(SCHEMA_REGISTRY_URL, restApp.restConnect);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
org.apache.kafka.common.serialization.StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("producer.type", "sync");
props.put("serializer.class","kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
props.put("ssl.keystore.location","/var/ssl/kafka.client.keystore.jks")
props.put("ssl.keystore.password","test1234")
props.put("ssl.key.password","test1234")
ProducerConfig config = new ProducerConfig(props);
String userSchema = "{"namespace": "example.avro", "type": "record", "name":
"User"," +
""fields": [{"name": "name", "type": "string"}]}";
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(userSchema);
GenericRecord avroRecord = new GenericData.Record(schema);
avroRecord.put("name", "testUser");
Producer<String, String> producer = new Producer<String, String>(config);
ProducerRecord<String, Object> message = new ProducerRecord<>(“topicName”, avroRecord );
producer.send(data);
Lots of Hard-Coded Endpoints Here!
SparkProducer producer =
HopsUtil.getSparkProducer();
Map<String, String> message = …
sparkProducer.produce(message);
Massively Simplified
Code for Hops/Flink/Kafka
FEBRUARY 9, 2017, WARSAW
Open Source Support Systems
• Apache Kafka
• Hops Hadoop
• Security, Storage/Compute
• ELK Stack
• Real-time Logs
• Grafana/InfluxDB
• Monitoring
• Apache Zeppelin
• Interactive Analytics
Hopsworks
Self-Service UI
Unified Security Model
FEBRUARY 9, 2017, WARSAW
DEMO
FEBRUARY 9, 2017, WARSAW
Kafka Service UI
Manage & Share
• Topics
• ACLs
• Schemas
FEBRUARY 9, 2017, WARSAW
Streaming Applications – Logging
Logging
analytics -
Kibana
FEBRUARY 9, 2017, WARSAW
Streaming Applications – Metrics
Monitoring
analytics -
Grafana
FEBRUARY 9, 2017, WARSAW
Karamel/Chef for Automated Installation
Google Compute Engine BareMetal
FEBRUARY 9, 2017, WARSAW
Summary
• Hops is the only European distribution of Hadoop
• More scalable, tinker-friendly, and open-source.
• HopsWorks provides first-class support for Spark/Flink-Kafka-as-a-
Service
• HopsWorks provides best-in-class support for secure streaming
applications with Kafka
• Streaming or Batch Jobs
FEBRUARY 9, 2017, WARSAW
Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto
Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid,
Robin Andersso, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Tiago Brito, Filotas
Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis
Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana
Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
Hadoop for humans
FEBRUARY 9, 2017, WARSAW
Thank you!
Hopshttp://www.hops.io
http://github.com/hopshadoop
@hopshadoop
www. logicalclocks.com
Hadoop for humans

More Related Content

What's hot

Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessToni de la Fuente
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageTimothy Spann
 
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)Axel Faust
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullJim Dowling
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logsMathew Beane
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
Hammock, a Good Place to Rest
Hammock, a Good Place to RestHammock, a Good Place to Rest
Hammock, a Good Place to RestStratoscale
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...Yahoo Developer Network
 

What's hot (20)

Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)
Alfresco Devcon 2019 - Lightning Talk - Not-so-smart folders made smart(er)
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
War of the Indices- SQL vs. Oracle
War of the Indices-  SQL vs. OracleWar of the Indices-  SQL vs. Oracle
War of the Indices- SQL vs. Oracle
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
Hammock, a Good Place to Rest
Hammock, a Good Place to RestHammock, a Good Place to Rest
Hammock, a Good Place to Rest
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
 

Similar to Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks

Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN Jim Dowling
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanDatabricks
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksJim Dowling
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingDatabricks
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databasesTimothy Findlay
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchVic Hargrave
 

Similar to Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks (20)

Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on Hopsworks
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
 
Eyeing the Onion
Eyeing the OnionEyeing the Onion
Eyeing the Onion
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databases
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 

Recently uploaded

HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks

  • 1. FEBRUARY 9, 2017, WARSAW HopsWorks: Secure Streaming-as-a- Service with Kafka/Spark/Flink Theofilos Kakantousis Research Engineer@RISE SICS, Co-founder@Logical Clocks AB Slides by Jim Dowling, Theofilos Kakantousis
  • 2. FEBRUARY 9, 2017, WARSAW Streaming-as-a-Service in Sweden • SICS ICE: datacenter research and test environment • Hopsworks: Spark/Flink/Kafka/Hadoop-as-a-service • Built on Hops Hadoop (www.hops.io) • Over 100 active users • Spark/Flink/Kafka the platforms of choice
  • 3. FEBRUARY 9, 2017, WARSAW Hadoop is not a cool kid anymore!
  • 4. FEBRUARY 9, 2017, WARSAW Where did it go wrong for Hadoop? • Data Engineers/Scientists • Where is the User-Friendly tooling and Self-Service? • How do I install/operate anything other than a sandbox VM? • Operations Folks • Security model has become incomprehensible (Sentry/Ranger) • Major distributions not open enough for patching • Sensitive data still requires its own cluster • Why not just use AWS EMR/GCE/Databricks/etc ?!?
  • 5. FEBRUARY 9, 2017, WARSAW Is this Hadoop? Mesos KubernetesYARNResource Manager Storage HDFS GCSS3 WFS On-Premise AWS GCEAzurePlatform Processing MR TensorflowSpark Flink Hive HBase Presto Kafka
  • 6. FEBRUARY 9, 2017, WARSAW How about this? Mesos KubernetesYARNResource Manager Storage HDFS GCSS3 WFS On-Premise AWS GCEAzurePlatform Processing MR TensorflowSpark Flink Hive HBase Presto Kafka
  • 7. FEBRUARY 9, 2017, WARSAW Here’s HopsHadoop Distribution YARNResource Manager Storage HDFS On-Premise AWS GCEPlatform Processing Elasticsearch TensorflowSpark Flink HopsWorks Kibana Logstash Kafka Grafana
  • 8. FEBRUARY 9, 2017, WARSAW Bigger, Faster* 16x Performance on Spotify Workload *Usenix FAST 2017, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 9. FEBRUARY 9, 2017, WARSAW HopsFS Architecture NameNodes NDB Leader HDFS Client DataNodes
  • 10. FEBRUARY 9, 2017, WARSAW Nobody gets a cluster... everybody gets a project! HopsWorks
  • 11. FEBRUARY 9, 2017, WARSAW Simplified Semantics Hops Hadoop • Projects – Datasets/Topics – Per-Project Users – Jobs/Notebooks – SSL/TLS Certificates Hadoop • Clusters • Users • Jobs/Applications • Files • ACLs • Sys Admins • Kerberos
  • 12. FEBRUARY 9, 2017, WARSAW Project-Based Multi-Tenancy • Projects feel familiar* • Users, Data, Programs • Like GitHub • Data Sharing feels familiar* • Dropbox shared folders Proj-All Proj-XProj-42 Shared TopicTopic /Projs/My/Data CompanyDB *As measured by MRI activity in the perirhinal cortex https://www.sciencenews.org/blog/scicurious/familiar-feeling-comes-deep-brain
  • 13. FEBRUARY 9, 2017, WARSAW User Roles Data Owner -Import/Export data -Manage Membership -Share DataSets, Topics Data Scientist -Write and Run code Self-Service Administration – No Administrator Needed
  • 14. FEBRUARY 9, 2017, WARSAW Dynamic Roles for Hadoop/Kafka alice@gmail.com ProjectA__Alice Authenticate ProjectB__Alice HopsFS HopsYARN Projects Kafka SSL/TLS Certificates
  • 15. FEBRUARY 9, 2017, WARSAW Look Ma, No Kerberos! • For each project, a user is issued with a SSL/TLS(X.509) certificate for both authentication and encryption. • Project based access on Kafka resources. • Custom Authorizer • Services are also issued with SSL/TLS certificates. • Both user and service certs are signed with the same CA. • Services extract the userID from RPCs to identify the caller. • HADOOP-13836 • Draws on ideas from Netflix’ BLESS system
  • 16. FEBRUARY 9, 2017, WARSAW SSL/TLS Certificate Generation alice@gmail.com Users don’t see the certificates. Users authenticate using: • LDAP • password • 2-Factor Authentication Add/Del Users Distributed Database Insert/Remove CertsProject Mgr Root CA HDFS Spark Kafka YARN Cert Signing RequestsIntermediate Certificate Authority Hopsworks
  • 17. FEBRUARY 9, 2017, WARSAW Distributing Certs for Spark/Flink Streaming alice@gmail.com 1. Launch Spark/Flink Job Distributed Database YARN Private LocalResources Spark/Flink Streaming App 2. Get certs, service endpoints 3. YARN Job, config 6. Get Schema 7. Consume Produce 5. Read Certs Hopsworks 4. Materialize certs HopsUtil 8. Authenticate
  • 18. FEBRUARY 9, 2017, WARSAW Simplifying Spark/Flink Streaming Apps • Spark/Flink Streaming Applications need to know • Credentials • Hadoop, Kafka, InfluxDb, Logstash • Endpoints • Kafka Broker, Kafka SchemaRegistry, ResourceManager, NameNode, InfluxDB, Logstash • The HopsUtil API hides this complexity. • Location/security transparent to applications
  • 19. FEBRUARY 9, 2017, WARSAW Secure Kafka Application Developer 1.Discover: Schema Registry and Kafka Broker Endpoints 2.Create: Kafka Properties file with certs and broker details 3.Create: Producer/Consumer using Kafka Properties 4.Download: the Schema for the Topic from the Schema Registry 5.Distribute: X.509 certs to all hosts on the cluster 6.Cleanup securely All of these steps are now down automatically by Hopsworks’ HopsUtil library Operations
  • 20. FEBRUARY 9, 2017, WARSAW Spark Producer in HopsWorks JavaSparkContext jsc = new JavaSparkContext(sparkConf); String topic = HopsUtil.getTopic(); //Optional SparkProducer producer = HopsUtil.getSparkProducer(topic); Map<String, String> message = … sparkProducer.produce(message); https://github.com/hopshadoop/hops-kafka-examples
  • 21. FEBRUARY 9, 2017, WARSAW Spark Streaming Consumer in HopsWorks JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,Durations.seconds(2)); String topic = HopsUtil.getTopic(); //Optional String consumerGroup = HopsUtil.getConsumerGroup(); //Optional SparkConsumer consumer = HopsUtil.getSparkConsumer(jssc,topic,consumerGroup); JavaInputDStream<ConsumerRecord<String, byte[]>> messages = consumer.createDirectStream(); jssc.start(); https://github.com/hopshadoop/hops-kafka-examples
  • 22. FEBRUARY 9, 2017, WARSAW Hops simplifies Secure Spark/Flink/Kafka https://github.com/hopshadoop/hops-kafka-examples Properties props = new Properties(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList); props.put(SCHEMA_REGISTRY_URL, restApp.restConnect); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer.class); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class); props.put("producer.type", "sync"); props.put("serializer.class","kafka.serializer.StringEncoder"); props.put("request.required.acks", "1"); props.put("ssl.keystore.location","/var/ssl/kafka.client.keystore.jks") props.put("ssl.keystore.password","test1234") props.put("ssl.key.password","test1234") ProducerConfig config = new ProducerConfig(props); String userSchema = "{"namespace": "example.avro", "type": "record", "name": "User"," + ""fields": [{"name": "name", "type": "string"}]}"; Schema.Parser parser = new Schema.Parser(); Schema schema = parser.parse(userSchema); GenericRecord avroRecord = new GenericData.Record(schema); avroRecord.put("name", "testUser"); Producer<String, String> producer = new Producer<String, String>(config); ProducerRecord<String, Object> message = new ProducerRecord<>(“topicName”, avroRecord ); producer.send(data); Lots of Hard-Coded Endpoints Here! SparkProducer producer = HopsUtil.getSparkProducer(); Map<String, String> message = … sparkProducer.produce(message); Massively Simplified Code for Hops/Flink/Kafka
  • 23. FEBRUARY 9, 2017, WARSAW Open Source Support Systems • Apache Kafka • Hops Hadoop • Security, Storage/Compute • ELK Stack • Real-time Logs • Grafana/InfluxDB • Monitoring • Apache Zeppelin • Interactive Analytics Hopsworks Self-Service UI Unified Security Model
  • 24. FEBRUARY 9, 2017, WARSAW DEMO
  • 25. FEBRUARY 9, 2017, WARSAW Kafka Service UI Manage & Share • Topics • ACLs • Schemas
  • 26. FEBRUARY 9, 2017, WARSAW Streaming Applications – Logging Logging analytics - Kibana
  • 27. FEBRUARY 9, 2017, WARSAW Streaming Applications – Metrics Monitoring analytics - Grafana
  • 28. FEBRUARY 9, 2017, WARSAW Karamel/Chef for Automated Installation Google Compute Engine BareMetal
  • 29. FEBRUARY 9, 2017, WARSAW Summary • Hops is the only European distribution of Hadoop • More scalable, tinker-friendly, and open-source. • HopsWorks provides first-class support for Spark/Flink-Kafka-as-a- Service • HopsWorks provides best-in-class support for secure streaming applications with Kafka • Streaming or Batch Jobs
  • 30. FEBRUARY 9, 2017, WARSAW Hops Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, Robin Andersso, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Tiago Brito, Filotas Siskos. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu. Hops Hadoop for humans
  • 31. FEBRUARY 9, 2017, WARSAW Thank you! Hopshttp://www.hops.io http://github.com/hopshadoop @hopshadoop www. logicalclocks.com Hadoop for humans