SlideShare a Scribd company logo
1 of 38
Download to read offline
Bringing Streaming Data
To The Masses
Lowering The "Cost Of Admission”
For Your Streaming Data Platform
San Francisco, CA
October 17th
, 2018
What is Kafka?
About Me
• Bob Lehmann
• Started life as an Electrical Engineer,
switched to IT 20 years ago
• Have worked with data in many
capacities – sensors and controls,
manufacturing process data, ERP
systems, enterprise data, etc.
• Architect and manage the Enterprise
DataHub at Bayer
• Live in St. Louis, MO
Who are we?
Who are we?
Inbred
(Parent 1)
Inbred
(Parent 2)
Hybrid
The Corn “Galaxy”
The Journey Starts Here
Circa 2014…
• Siloed IT org with different tech stacks
(Bayer IT org > 4000)
• MANY legacy systems and platforms
• Bayer adopted “cloud first” philosophy
• Embraced open source (finally J)
• Cross functional team of architects was established
to define strategies and architectures
DIRECTIVE: Develop a strategy for cloud-based enterprise wide analytics
Houston, We Have A Data Problem
• Data sprawl
• Data inconsistency
• Difficult to find data
• Can’t propagate
changes fast enough
Legacy
• Increased data sprawl
• Can’t forklift
applications to cloud
• Cloud apps need on-
prem data and vice-
versa
Cloud
Volume
Variety
Velocity
Veracity
Let’s Clean Up This Mess!
Relational
Databases
App App App
Cache
Poll For Changes
Caches &
Derived Stores
Relational
Data
Warehouse
ODS
Data Guard
Hadoop
CSV Dump
Transforms
Transforms
Apps and Services
Splunk
ActiveMQ
Apps
ActiveMQ
Apps Apps
Log Aggregation
HTTP
NFS
NFS
rsync
Transform & Load
Load
Monitoring
Apps and ServicesApps and Services
HTTP
Key-value
Store
Apps
OLTP Queries
Kafka
Log
Search
Monitoring
Real-time
Analytics
Social
Graph
Search Newsfeed OLAP
Samza
Apps
Key-Value
Storage
Oracle
Apps AppsAppsApps
Security &
Fraud
Hadoop Teradata
Apps
Courtesy: Jay Kreps
The Enterprise DataHub – Original Concept
- Kafka clusters on prem and in AWS
- Datacenter agnostic
- Establish cross-datacenter connection
- Replicate across datacenters
- Apps only interact with local Kafka cluster
- Use AVRO schemas
VPN
Use Schema
Registry
Mirrormaker?
Maybe GCP in the future?
Enterprise DataHub POC
Circa 2015
EC2 Instance
MirrorMaker
MirrorMaker
MirrorMaker
MirrorMaker
MirrorMaker
MirrorMaker
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
ZookeeperZookeeper
Oracle
SQL
Server
Cloud Foundry
Producer
Producer
Network
Monitor
Application
Ticketing
Application
Other
Monitoring
Apps
Postgres
RDS
Cloud Foundry
Consumer
Consumer
Consumer
Confluent 1.0 / Kafka 0.8
VPN
Tunnel
What a Long, Strange Trip It’s Been!
First Phase
The Launch
Second Phase
Reaching
Orbit
Third Phase
Escaping
Gravity
Current Phase
To Infinity
And Beyond…
Kafka
Kafka
Streams
Portal
Portal
KSQL
Kafka
Connect
Portal
Documentation
Site
First Phase - The Launch
September, 2016
• Confluent 2.0 / Kafka 0.9
• Security via SSL certs – developed
patch to dynamically load broker
certs
• Replicant - Process to replace
Mirrormaker
• Basic platform monitoring
• Most user interaction via command
line tools
EC2
Container
Service
Replicant
Replicant
Replicant
Replicant
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
VPN
Tunnel
Core Platform
Monitoring
Documentation
Portal
Kafka
Manager
Phase 1 Results
First Phase
The Launch
Second Phase
Reaching
Orbit
Third Phase
Escaping
Gravity
Current Phase
To Infinity
And Beyond…
Kafka
Kafka
Streams
Portal
Portal
KSQL
Kafka
Connect
Portal
Documentation
Site
• Java/Scala Developers
• AWS skills
• Linux command line
• Data movement from
on-prem to cloud
• Customer 360 –
bidirectional
movement
Phase 2 - Reaching Orbit
• Self Service User Portal
• Improved replication
process - Replikant
• Security Improvements
• Infrastructure automation
• Monitoring for topics and
consumers
• Slack integration for alerts
• Initial evaluation of Kafka
connect
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
VPN
Tunnel
Core Platform
Monitoring
Documentation
Portal
Kafka
Manager
User
Self-Service
Portal
Consumer and
Replikant
Monitoring
Slack
Integration
DataHub
Portal
Topic and
Consumer
Monitoring
Phase 2 Results
First Phase
The Launch
Second Phase
Reaching
Orbit
Third Phase
Escaping
Gravity
Current Phase
To Infinity
And Beyond…
Kafka
Kafka
Streams
Portal
Portal
KSQL
Kafka
Connect
Portal
Documentation
Site
• Java/Scala Developers
• AWS skills
• Linux command line
• Data movement from
on-prem to cloud
• Customer 360 –
bidirectional
movement
• Python, Node, etc.
• Some Analytics
• Some BI
• Event sourcing
• Company360
• Exadata CDC
• Incremental migration
to the cloud
Stage 3 - Leaving Orbit
• Kubernetes / Kafka
Connect
• Expansion to Google
Cloud
• CDC from SAP using
Informatica Data
Replication
• Integration with Data
Historian
• Detailed training class
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
VPN
Tunnel
Core Platform
Monitoring
Documentation
Portal
Kafka
Manager
User
Self-Service
Portal
Consumer and
Replikant
Monitoring
Slack
Integration
Kubernetes
JDBC
Connector
S3
Connector
JMS
Connector
Elasticsearch
Connector
Kubernetes
JDBC
Connector
JDBC
Connector
JDBC
Connector
JMS
Connector
Data Historian
Ingestion
EMR
Hive SparkPresto
S3/
Parquet
• Code-free, simple
• Connector universe is expanding rapidly
• Secure - SSL connection
• AVRO support
Kafka Connect and
Kubernetes
• JDBC (Oracle, Postges, MySQL, SQL Server,
Teradata, Redshift)
• JMS
• S3
• File
• Elasticsearch
Connecters In Use
• Highly scalable
• Cluster in each environment
• Keeps processing local to the
environment
• Efficient use of resources
• Increased security
KubernetesKafka Connect
Expansion To
Other
Datacenters
North America
Datacenter
Greenhouse
Datacenter
Future
Datacenter
Different
Region
DataHub Tech Stack
Phase 3 Results
First Phase
The Launch
Second Phase
Reaching
Orbit
Third Phase
Escaping
Gravity
Current Phase
To Infinity
And Beyond…
Kafka
Kafka
Streams
Portal
Portal
KSQL
Kafka
Connect
Portal
Documentation
Site
• Java/Scala Developers
• AWS skills
• Linux command line
• Data movement from
on-prem to cloud
• Customer 360 –
bidirectional
movement
• Data movement
between all datacenters
on central platform
• LIMS Integration
• Serverless apps in AWS
• SAP/Oracle CDC
• Product360 in GCP - Go
• BI / Reporting
• Analytics platform
• Python, Node, etc.
• Some Analytics
• Some BI
• Event sourcing
• Company360
• Exadata CDC
• Incremental migration
to the cloud
Current Phase – To Infinity And Beyond
• Bring stream processing to
the masses!
• Data validation across the
pipeline
• SQL interface for Kafka
(using Presto)
• Improve topic
discoverability and reuse
• Expose consumer metrics
to end users
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
VPN
Tunnel
Kubernetes
I/O
File
Connector
JDBC
Connector
JDBC
Connector
JMS
Connector
Stream
Processing
KSQL
Kafka
Streams
Custom
Stream Proc
Kubernetes
I/O
File
Connector
JDBC
Connector
S3
Connector
Elasticsearch
Connector
Stream
Processing
KSQL
Kafka
Streams
Custom
Stream Proc
Data Historian
Ingestion
EMR
Hive SparkPresto
S3/
Parquet
Presto
SQL Engine
Haystack
Metadata
Platform
Many Clients In Many Places
Managed By DataHub Team
GoldenGate
CDC
Oracle
SQL
Server
TeraData
ExaData
Neo4J
Cloud Foundry
ProducerProducerProducers
ConsumerConsumerConsumers
ConsumerConsumerApplications
Legacy
Apps
(WebLogic)
Legacy
Apps
(WebLogic)
Legacy
Apps
EMR
S3/
Parquet
Postgres
Cloud Foundry
ProducerProducerProducers
ConsumerConsumerConsumers
ConsumerConsumerApplications
MySQL
Cassandra
RedShift
Integration
With
salesforce.com
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Replikant
Cluster
Replikant
Replikant
Replikant
Replikant
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Kafka
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
REST
Proxy
Schema
Registry
Schema
Registry
REST
Proxy
VPN
Tunnel
Core Platform
Monitoring
Documentation
Portal
Kafka
Manager
User
Self-Service
Portal
Consumer and
Replikant
Monitoring
Slack
Integration
“First Mile”
Processing
Automatic ingestion
Use Case – CDC From SAP to Data Historian
Schema Splitter converts an input
stream with a “generic” schema
(many different tables flowing
through one topic) to individual table
streams with table specific schemas
Data Historian
Ingestion
EMR
Hive SparkPresto
S3/
Parquet
Kubernetes
Cluster
Schema
Splitter
KSQL
Filter
KSQL
Agg
Kafka
Streams Proc
Replikant
Cluster
Replikant
Kafka
Schema
Registry
Schema
Registry
ZookeeperZookeeper
Oracle
ETL
SAP
Informatica
Data
Replication
Kafka
DB
Schema 2
DB
Schema 4
DB
Schema 1
DB
Schema 3
Teradata
Staging
Teradata
Final
Table 1
Table 3
DB
Schema 1
Table 2
Replikant
Replikant
Replikant
VPN
Tunnel
Topic Reuse
• Not as good as we would like. Why?
• Discoverability
• Developers are not altruistic when creating
schemas
MetaData Platform
• Haystack is our enterprise metadata platform
• Kafka topic metadata is automatically synced to
Haystack
• Haystack links back to the DataHub portal
• Will be able to search for topics in Haystack and
immediately find the topic in the DataHub portal
Apache Presto
• Presto is being implemented as an enterprise data
virtualization solution…not just for the DataHub
• Will also be used to provide data validation across the
pipeline via SQL
Example: Join a topic in Kafka to a table in Postgres to confirm that all
data has transferred correctly
• Will also be used to provide a SQL interface in the
DataHub portal to allow querying for specific messages.
• Developed a patch to the Presto Kafka connector to
connect with SSL and deserialize AVRO
Future Tech Stack
Phase 4 – Current Phase
First Phase
The Launch
Second Phase
Reaching
Orbit
Third Phase
Escaping
Gravity
Current Phase
To Infinity
And Beyond…
Kafka
Kafka
Streams
Portal
Portal
KSQL
Kafka
Connect
Portal
Documentation
Site
• Product360 in GCP - Go
• BI / Reporting
• Analytics platform
• Java/Scala Developers
• AWS skills
• Linux command line
• Data movement from
on-prem to cloud
• Customer 360 –
bidirectional
movement
• Data Stewards
• Everyone else!!
• Global streaming
• IOT data
• SAP Hana
• Data movement
between all datacenters
on central platform
• LIMS Integration
• Serverless apps in AWS
• SAP/Oracle CDC
• Python, Node, etc.
• Some Analytics
• Some BI
• Event sourcing
• Company360
• Exadata CDC
• Incremental migration
to the cloud
Future
• Move as much ETL as possible to the
streaming layer
• Monitoring and auditing of data flow across
the pipeline
• Consumer monitoring and configurable
alerting
• Improved data governance
• Integrate with enterprise security platform
The Enterprise DataHub is a living,
scalable, robust central nervous system
for data that facilitates the seamless
acquisition, transport and processing of
information in real time across multiple
datacenter and cloud environments.
THANK YOU!
Bob Lehmann
robert.lehmann@bayer.com

More Related Content

What's hot

Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHostedbyConfluent
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka confluent
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift confluent
 
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...confluent
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkconfluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka confluent
 
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, Jupiter
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, JupiterStream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, Jupiter
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, JupiterHostedbyConfluent
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...HostedbyConfluent
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQHostedbyConfluent
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent
 
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...HostedbyConfluent
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 

What's hot (20)

Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, Stripe
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
 
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka
 
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, Jupiter
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, JupiterStream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, Jupiter
Stream Processing with Kafka and KSQL in Jupiter | Namit Mahuvakar, Jupiter
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
 
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 

Similar to Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Your Streaming Data Platform

Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25Cask Data
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark Summit
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1lisanl
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...confluent
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleApache Kafka TLV
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 

Similar to Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Your Streaming Data Platform (20)

Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Your Streaming Data Platform

  • 1. Bringing Streaming Data To The Masses Lowering The "Cost Of Admission” For Your Streaming Data Platform San Francisco, CA October 17th , 2018
  • 3. About Me • Bob Lehmann • Started life as an Electrical Engineer, switched to IT 20 years ago • Have worked with data in many capacities – sensors and controls, manufacturing process data, ERP systems, enterprise data, etc. • Architect and manage the Enterprise DataHub at Bayer • Live in St. Louis, MO
  • 8. The Journey Starts Here Circa 2014… • Siloed IT org with different tech stacks (Bayer IT org > 4000) • MANY legacy systems and platforms • Bayer adopted “cloud first” philosophy • Embraced open source (finally J) • Cross functional team of architects was established to define strategies and architectures DIRECTIVE: Develop a strategy for cloud-based enterprise wide analytics
  • 9. Houston, We Have A Data Problem • Data sprawl • Data inconsistency • Difficult to find data • Can’t propagate changes fast enough Legacy • Increased data sprawl • Can’t forklift applications to cloud • Cloud apps need on- prem data and vice- versa Cloud Volume Variety Velocity Veracity
  • 10. Let’s Clean Up This Mess! Relational Databases App App App Cache Poll For Changes Caches & Derived Stores Relational Data Warehouse ODS Data Guard Hadoop CSV Dump Transforms Transforms Apps and Services Splunk ActiveMQ Apps ActiveMQ Apps Apps Log Aggregation HTTP NFS NFS rsync Transform & Load Load Monitoring Apps and ServicesApps and Services HTTP Key-value Store Apps OLTP Queries Kafka Log Search Monitoring Real-time Analytics Social Graph Search Newsfeed OLAP Samza Apps Key-Value Storage Oracle Apps AppsAppsApps Security & Fraud Hadoop Teradata Apps Courtesy: Jay Kreps
  • 11. The Enterprise DataHub – Original Concept - Kafka clusters on prem and in AWS - Datacenter agnostic - Establish cross-datacenter connection - Replicate across datacenters - Apps only interact with local Kafka cluster - Use AVRO schemas VPN Use Schema Registry Mirrormaker? Maybe GCP in the future?
  • 12. Enterprise DataHub POC Circa 2015 EC2 Instance MirrorMaker MirrorMaker MirrorMaker MirrorMaker MirrorMaker MirrorMaker Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy ZookeeperZookeeper Oracle SQL Server Cloud Foundry Producer Producer Network Monitor Application Ticketing Application Other Monitoring Apps Postgres RDS Cloud Foundry Consumer Consumer Consumer Confluent 1.0 / Kafka 0.8 VPN Tunnel
  • 13. What a Long, Strange Trip It’s Been! First Phase The Launch Second Phase Reaching Orbit Third Phase Escaping Gravity Current Phase To Infinity And Beyond… Kafka Kafka Streams Portal Portal KSQL Kafka Connect Portal Documentation Site
  • 14. First Phase - The Launch September, 2016 • Confluent 2.0 / Kafka 0.9 • Security via SSL certs – developed patch to dynamically load broker certs • Replicant - Process to replace Mirrormaker • Basic platform monitoring • Most user interaction via command line tools EC2 Container Service Replicant Replicant Replicant Replicant Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy VPN Tunnel Core Platform Monitoring Documentation Portal Kafka Manager
  • 15. Phase 1 Results First Phase The Launch Second Phase Reaching Orbit Third Phase Escaping Gravity Current Phase To Infinity And Beyond… Kafka Kafka Streams Portal Portal KSQL Kafka Connect Portal Documentation Site • Java/Scala Developers • AWS skills • Linux command line • Data movement from on-prem to cloud • Customer 360 – bidirectional movement
  • 16. Phase 2 - Reaching Orbit • Self Service User Portal • Improved replication process - Replikant • Security Improvements • Infrastructure automation • Monitoring for topics and consumers • Slack integration for alerts • Initial evaluation of Kafka connect Replikant Cluster Replikant Replikant Replikant Replikant Replikant Cluster Replikant Replikant Replikant Replikant Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy VPN Tunnel Core Platform Monitoring Documentation Portal Kafka Manager User Self-Service Portal Consumer and Replikant Monitoring Slack Integration
  • 18.
  • 19.
  • 21. Phase 2 Results First Phase The Launch Second Phase Reaching Orbit Third Phase Escaping Gravity Current Phase To Infinity And Beyond… Kafka Kafka Streams Portal Portal KSQL Kafka Connect Portal Documentation Site • Java/Scala Developers • AWS skills • Linux command line • Data movement from on-prem to cloud • Customer 360 – bidirectional movement • Python, Node, etc. • Some Analytics • Some BI • Event sourcing • Company360 • Exadata CDC • Incremental migration to the cloud
  • 22. Stage 3 - Leaving Orbit • Kubernetes / Kafka Connect • Expansion to Google Cloud • CDC from SAP using Informatica Data Replication • Integration with Data Historian • Detailed training class Replikant Cluster Replikant Replikant Replikant Replikant Replikant Cluster Replikant Replikant Replikant Replikant Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy VPN Tunnel Core Platform Monitoring Documentation Portal Kafka Manager User Self-Service Portal Consumer and Replikant Monitoring Slack Integration Kubernetes JDBC Connector S3 Connector JMS Connector Elasticsearch Connector Kubernetes JDBC Connector JDBC Connector JDBC Connector JMS Connector Data Historian Ingestion EMR Hive SparkPresto S3/ Parquet
  • 23. • Code-free, simple • Connector universe is expanding rapidly • Secure - SSL connection • AVRO support Kafka Connect and Kubernetes • JDBC (Oracle, Postges, MySQL, SQL Server, Teradata, Redshift) • JMS • S3 • File • Elasticsearch Connecters In Use • Highly scalable • Cluster in each environment • Keeps processing local to the environment • Efficient use of resources • Increased security KubernetesKafka Connect
  • 26. Phase 3 Results First Phase The Launch Second Phase Reaching Orbit Third Phase Escaping Gravity Current Phase To Infinity And Beyond… Kafka Kafka Streams Portal Portal KSQL Kafka Connect Portal Documentation Site • Java/Scala Developers • AWS skills • Linux command line • Data movement from on-prem to cloud • Customer 360 – bidirectional movement • Data movement between all datacenters on central platform • LIMS Integration • Serverless apps in AWS • SAP/Oracle CDC • Product360 in GCP - Go • BI / Reporting • Analytics platform • Python, Node, etc. • Some Analytics • Some BI • Event sourcing • Company360 • Exadata CDC • Incremental migration to the cloud
  • 27. Current Phase – To Infinity And Beyond • Bring stream processing to the masses! • Data validation across the pipeline • SQL interface for Kafka (using Presto) • Improve topic discoverability and reuse • Expose consumer metrics to end users Replikant Cluster Replikant Replikant Replikant Replikant Replikant Cluster Replikant Replikant Replikant Replikant Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy VPN Tunnel Kubernetes I/O File Connector JDBC Connector JDBC Connector JMS Connector Stream Processing KSQL Kafka Streams Custom Stream Proc Kubernetes I/O File Connector JDBC Connector S3 Connector Elasticsearch Connector Stream Processing KSQL Kafka Streams Custom Stream Proc Data Historian Ingestion EMR Hive SparkPresto S3/ Parquet Presto SQL Engine Haystack Metadata Platform
  • 28. Many Clients In Many Places Managed By DataHub Team GoldenGate CDC Oracle SQL Server TeraData ExaData Neo4J Cloud Foundry ProducerProducerProducers ConsumerConsumerConsumers ConsumerConsumerApplications Legacy Apps (WebLogic) Legacy Apps (WebLogic) Legacy Apps EMR S3/ Parquet Postgres Cloud Foundry ProducerProducerProducers ConsumerConsumerConsumers ConsumerConsumerApplications MySQL Cassandra RedShift Integration With salesforce.com Replikant Cluster Replikant Replikant Replikant Replikant Replikant Cluster Replikant Replikant Replikant Replikant Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Kafka Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 REST Proxy Schema Registry Schema Registry REST Proxy VPN Tunnel Core Platform Monitoring Documentation Portal Kafka Manager User Self-Service Portal Consumer and Replikant Monitoring Slack Integration
  • 30. Use Case – CDC From SAP to Data Historian Schema Splitter converts an input stream with a “generic” schema (many different tables flowing through one topic) to individual table streams with table specific schemas Data Historian Ingestion EMR Hive SparkPresto S3/ Parquet Kubernetes Cluster Schema Splitter KSQL Filter KSQL Agg Kafka Streams Proc Replikant Cluster Replikant Kafka Schema Registry Schema Registry ZookeeperZookeeper Oracle ETL SAP Informatica Data Replication Kafka DB Schema 2 DB Schema 4 DB Schema 1 DB Schema 3 Teradata Staging Teradata Final Table 1 Table 3 DB Schema 1 Table 2 Replikant Replikant Replikant VPN Tunnel
  • 31. Topic Reuse • Not as good as we would like. Why? • Discoverability • Developers are not altruistic when creating schemas
  • 32. MetaData Platform • Haystack is our enterprise metadata platform • Kafka topic metadata is automatically synced to Haystack • Haystack links back to the DataHub portal • Will be able to search for topics in Haystack and immediately find the topic in the DataHub portal
  • 33. Apache Presto • Presto is being implemented as an enterprise data virtualization solution…not just for the DataHub • Will also be used to provide data validation across the pipeline via SQL Example: Join a topic in Kafka to a table in Postgres to confirm that all data has transferred correctly • Will also be used to provide a SQL interface in the DataHub portal to allow querying for specific messages. • Developed a patch to the Presto Kafka connector to connect with SSL and deserialize AVRO
  • 35. Phase 4 – Current Phase First Phase The Launch Second Phase Reaching Orbit Third Phase Escaping Gravity Current Phase To Infinity And Beyond… Kafka Kafka Streams Portal Portal KSQL Kafka Connect Portal Documentation Site • Product360 in GCP - Go • BI / Reporting • Analytics platform • Java/Scala Developers • AWS skills • Linux command line • Data movement from on-prem to cloud • Customer 360 – bidirectional movement • Data Stewards • Everyone else!! • Global streaming • IOT data • SAP Hana • Data movement between all datacenters on central platform • LIMS Integration • Serverless apps in AWS • SAP/Oracle CDC • Python, Node, etc. • Some Analytics • Some BI • Event sourcing • Company360 • Exadata CDC • Incremental migration to the cloud
  • 36. Future • Move as much ETL as possible to the streaming layer • Monitoring and auditing of data flow across the pipeline • Consumer monitoring and configurable alerting • Improved data governance • Integrate with enterprise security platform
  • 37. The Enterprise DataHub is a living, scalable, robust central nervous system for data that facilitates the seamless acquisition, transport and processing of information in real time across multiple datacenter and cloud environments.