SlideShare a Scribd company logo
1 of 42
Download to read offline
Tame your router data
with Apache Kafka and Apache Druid
Rachel Pedreschi
rachel.pedreschi@imply.io
Eric Graham
eric.graham@imply.io
Tell ‘em what you are gonna tell ‘em
! The Who? Intro to your (slightly) nervous speakers
! The Why? What is the problem?
! The How? Introducing the OSS stack to solve all the world’s ills
! The Demo. So much demo.
2
The Who
3
Eric Graham. 

The Man, The Legend.
The one that wrote the paper
that got us accepted to this conference.
Rachel Pedreschi. 

Mostly Overhead.
The one that wrote the abstract
that got us accepted to this conference.
4
Part of the problem - The Data
5
Streaming Telemetry Flow Syslog Augmentation
A recent advancement to replace
SNMP. Provides streaming interface
vs. older pull model. Gives network
operators much quicker response to
deviations.
detailed network analysis around
TCP/IP flows through routers,
switches and firewalls. Flow data
includes src/dst MAC, src/dst IP,
Protocol, src/dst port, in/out
interface ID, TCP flags, TOS, BGP
information, Bytes/Packets and
more
System logs for routers and
switches
Routing, DNS, usernames make
visibility that much clearer
Telegraf, pipeline, sflowd
Tools - examples: PMACCT, Cento,
NIFI/NFDump
Syslog-ng
ksql, kstream, lookup tables, BGP
routing
Used to collect metrics on interface
stats, cpu, memory, disk space and
more.
Get detailed information on TCP/IP
packets
Textual information on whats going
on
Clearer visibility to make rapid decisions
Let’s make the data part of the solution!
6
OSS to the rescue!
7
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
8Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
The Answer: Apache Kafka and Apache Druid
! Both built for modern data
architectures.
! Both can handle data at scale.
(largest Druid cluster over
2000 servers, 50Pb raw data)
! Full redundancy.
! Druid was developed for real-
time analytics.
! Both work in harmony together
helping get answers fast.
9
!10
What the heck is Apache Druid and Why
Should I Care?
11
!12
!13
!14
!15
The 90s: data warehouses and data marts
Tightly coupled architecture with limited flexibility.
Data
Data
Data
Data Sources
ETL Data
Warehouse
Processing Store and Compute
Analytics
Reporting
Data mining
Querying
Confidential. Do not redistribute. 16
!17
!18
The 2000s - present: data lakes
Separation of storage and compute enables flexibility in tools.
19
Data
Data
Data
Mapreduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search
system
Data
Lake
StorageData Sources
Confidential. Do not redistribute.
!20
The Now: data rivers
Streaming architectures enable faster decision cycles.
21
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
Confidential. Do not redistribute.
The problem
22
The problem
23
Typical Big Data++ Challenges
! Scale: when data is large, we need a lot of servers
! Speed: aiming for sub-second response time
! Complexity: too much fine grain to precompute
! High dimensionality: 10s or 100s of dimensions
! Concurrency: many users and tenants
! Freshness: load from streams
24
What were the options?
25
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
26
! Batch ingestion
! Efficient storage
! Fast analytic queries
Confidential. Do not redistribute.
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
27
These guys have played a Druid…
28
Source: http://druid.io/druid-powered.html and imply.io
+ many more!
Gratuitous Customer Quote
“The performance is great ... some of the tables that we have internally in
Druid have billions and billions of events in them, and we’re scanning
them in under a second.”
29
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo:
Shall we take a look?
30
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
31Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
!32
curl -X POST -H 'Content-Type:
application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/
supervisor
33
Use Cases
34
Use Case: Network troubleshooting
35
Use Case: Network troubleshooting
! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset
visualizations.
! Visualize spikes and dips and easily filter on specific data.
! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you
need them.
! Dashboards to show most interesting, common areas of interest.
! Alerting notifications for threshold breaches or deviation from normal.
! Is it the network or application? Enhanced datasets provide quick answers.
36
Use Case: DDOS and security
! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad
actors)
! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase)
! Hooks to multiple notification channels for always on notifications.
! Webhooks for integration with back office systems.
! Easily drill-down into
37
Use Case: BGP Analytics
! PMACCT can collect and add BGP information by peering with a BGP speaker.
! Use Kafka KSQL or Kstream to augment data with BGP information.
! Visualize the BGP AS_PATH (where you traffic is going across the Internet).
! Who are your top transit or peering partners.
! Top Source and Destination ASNs.
! Top BGP communities.
38
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
39
Contribute
40
https://github.com/apache/druid
Stay in touch
41
@druidio
Join the community!
http://druid.io/community
Come by our booth for a druid t-shirt and to learn more!
Follow the Druid project on Twitter!
Thank you!
!42
Hold for applause…

More Related Content

What's hot

Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with KubernetesSatnam Singh
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...Shih Oon Liong
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Well architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data ScienceWell architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data ScienceLeela Krishna Kandrakota
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding KubernetesTu Pham
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)Amazon Web Services
 

What's hot (20)

ELK Stack
ELK StackELK Stack
ELK Stack
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with Kubernetes
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Elk
Elk Elk
Elk
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Well architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data ScienceWell architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data Science
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
 

Similar to How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Lecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksLecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksAtif Shahzad
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingJaime Martin Losa
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream csching
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)gvillain
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceRick Warren
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networkingMohsen Sarakbi
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurRaffael Marty
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big DataFrank Denis
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 

Similar to How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019 (20)

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
Lecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksLecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networks
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
 
Tech
TechTech
Tech
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networking
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala Lumpur
 
Networking 101 english
Networking 101   englishNetworking 101   english
Networking 101 english
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024TopCSSGallery
 

Recently uploaded (20)

Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019

  • 1. Tame your router data with Apache Kafka and Apache Druid Rachel Pedreschi rachel.pedreschi@imply.io Eric Graham eric.graham@imply.io
  • 2. Tell ‘em what you are gonna tell ‘em ! The Who? Intro to your (slightly) nervous speakers ! The Why? What is the problem? ! The How? Introducing the OSS stack to solve all the world’s ills ! The Demo. So much demo. 2
  • 3. The Who 3 Eric Graham. 
 The Man, The Legend. The one that wrote the paper that got us accepted to this conference. Rachel Pedreschi. 
 Mostly Overhead. The one that wrote the abstract that got us accepted to this conference.
  • 4. 4
  • 5. Part of the problem - The Data 5 Streaming Telemetry Flow Syslog Augmentation A recent advancement to replace SNMP. Provides streaming interface vs. older pull model. Gives network operators much quicker response to deviations. detailed network analysis around TCP/IP flows through routers, switches and firewalls. Flow data includes src/dst MAC, src/dst IP, Protocol, src/dst port, in/out interface ID, TCP flags, TOS, BGP information, Bytes/Packets and more System logs for routers and switches Routing, DNS, usernames make visibility that much clearer Telegraf, pipeline, sflowd Tools - examples: PMACCT, Cento, NIFI/NFDump Syslog-ng ksql, kstream, lookup tables, BGP routing Used to collect metrics on interface stats, cpu, memory, disk space and more. Get detailed information on TCP/IP packets Textual information on whats going on Clearer visibility to make rapid decisions
  • 6. Let’s make the data part of the solution! 6
  • 7. OSS to the rescue! 7
  • 8. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 8Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 9. The Answer: Apache Kafka and Apache Druid ! Both built for modern data architectures. ! Both can handle data at scale. (largest Druid cluster over 2000 servers, 50Pb raw data) ! Full redundancy. ! Druid was developed for real- time analytics. ! Both work in harmony together helping get answers fast. 9
  • 10. !10
  • 11. What the heck is Apache Druid and Why Should I Care? 11
  • 12. !12
  • 13. !13
  • 14. !14
  • 15. !15
  • 16. The 90s: data warehouses and data marts Tightly coupled architecture with limited flexibility. Data Data Data Data Sources ETL Data Warehouse Processing Store and Compute Analytics Reporting Data mining Querying Confidential. Do not redistribute. 16
  • 17. !17
  • 18. !18
  • 19. The 2000s - present: data lakes Separation of storage and compute enables flexibility in tools. 19 Data Data Data Mapreduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake StorageData Sources Confidential. Do not redistribute.
  • 20. !20
  • 21. The Now: data rivers Streaming architectures enable faster decision cycles. 21 Data Data Data Data Sources Message bus Data Lake Streaming OLAP Confidential. Do not redistribute.
  • 24. Typical Big Data++ Challenges ! Scale: when data is large, we need a lot of servers ! Speed: aiming for sub-second response time ! Complexity: too much fine grain to precompute ! High dimensionality: 10s or 100s of dimensions ! Concurrency: many users and tenants ! Freshness: load from streams 24
  • 25. What were the options? 25 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 26. 26 ! Batch ingestion ! Efficient storage ! Fast analytic queries Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 27. 27
  • 28. These guys have played a Druid… 28 Source: http://druid.io/druid-powered.html and imply.io + many more!
  • 29. Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 29 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  • 30. Shall we take a look? 30
  • 31. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 31Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 32. !32 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/ supervisor
  • 33. 33
  • 35. Use Case: Network troubleshooting 35
  • 36. Use Case: Network troubleshooting ! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset visualizations. ! Visualize spikes and dips and easily filter on specific data. ! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you need them. ! Dashboards to show most interesting, common areas of interest. ! Alerting notifications for threshold breaches or deviation from normal. ! Is it the network or application? Enhanced datasets provide quick answers. 36
  • 37. Use Case: DDOS and security ! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad actors) ! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase) ! Hooks to multiple notification channels for always on notifications. ! Webhooks for integration with back office systems. ! Easily drill-down into 37
  • 38. Use Case: BGP Analytics ! PMACCT can collect and add BGP information by peering with a BGP speaker. ! Use Kafka KSQL or Kstream to augment data with BGP information. ! Visualize the BGP AS_PATH (where you traffic is going across the Internet). ! Who are your top transit or peering partners. ! Top Source and Destination ASNs. ! Top BGP communities. 38
  • 39. Download Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 39
  • 41. Stay in touch 41 @druidio Join the community! http://druid.io/community Come by our booth for a druid t-shirt and to learn more! Follow the Druid project on Twitter!
  • 42. Thank you! !42 Hold for applause…