SlideShare a Scribd company logo
1 of 42
Download to read offline
Tame your router data
with Apache Kafka and Apache Druid
Rachel Pedreschi
rachel.pedreschi@imply.io
Eric Graham
eric.graham@imply.io
Tell ‘em what you are gonna tell ‘em
! The Who? Intro to your (slightly) nervous speakers
! The Why? What is the problem?
! The How? Introducing the OSS stack to solve all the world’s ills
! The Demo. So much demo.
2
The Who
3
Eric Graham. 

The Man, The Legend.
The one that wrote the paper
that got us accepted to this conference.
Rachel Pedreschi. 

Mostly Overhead.
The one that wrote the abstract
that got us accepted to this conference.
4
Part of the problem - The Data
5
Streaming Telemetry Flow Syslog Augmentation
A recent advancement to replace
SNMP. Provides streaming interface
vs. older pull model. Gives network
operators much quicker response to
deviations.
detailed network analysis around
TCP/IP flows through routers,
switches and firewalls. Flow data
includes src/dst MAC, src/dst IP,
Protocol, src/dst port, in/out
interface ID, TCP flags, TOS, BGP
information, Bytes/Packets and
more
System logs for routers and
switches
Routing, DNS, usernames make
visibility that much clearer
Telegraf, pipeline, sflowd
Tools - examples: PMACCT, Cento,
NIFI/NFDump
Syslog-ng
ksql, kstream, lookup tables, BGP
routing
Used to collect metrics on interface
stats, cpu, memory, disk space and
more.
Get detailed information on TCP/IP
packets
Textual information on whats going
on
Clearer visibility to make rapid decisions
Let’s make the data part of the solution!
6
OSS to the rescue!
7
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
8Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
The Answer: Apache Kafka and Apache Druid
! Both built for modern data
architectures.
! Both can handle data at scale.
(largest Druid cluster over
2000 servers, 50Pb raw data)
! Full redundancy.
! Druid was developed for real-
time analytics.
! Both work in harmony together
helping get answers fast.
9
!10
What the heck is Apache Druid and Why
Should I Care?
11
!12
!13
!14
!15
The 90s: data warehouses and data marts
Tightly coupled architecture with limited flexibility.
Data
Data
Data
Data Sources
ETL Data
Warehouse
Processing Store and Compute
Analytics
Reporting
Data mining
Querying
Confidential. Do not redistribute. 16
!17
!18
The 2000s - present: data lakes
Separation of storage and compute enables flexibility in tools.
19
Data
Data
Data
Mapreduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search
system
Data
Lake
StorageData Sources
Confidential. Do not redistribute.
!20
The Now: data rivers
Streaming architectures enable faster decision cycles.
21
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
Confidential. Do not redistribute.
The problem
22
The problem
23
Typical Big Data++ Challenges
! Scale: when data is large, we need a lot of servers
! Speed: aiming for sub-second response time
! Complexity: too much fine grain to precompute
! High dimensionality: 10s or 100s of dimensions
! Concurrency: many users and tenants
! Freshness: load from streams
24
What were the options?
25
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
26
! Batch ingestion
! Efficient storage
! Fast analytic queries
Confidential. Do not redistribute.
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
27
These guys have played a Druid…
28
Source: http://druid.io/druid-powered.html and imply.io
+ many more!
Gratuitous Customer Quote
“The performance is great ... some of the tables that we have internally in
Druid have billions and billions of events in them, and we’re scanning
them in under a second.”
29
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo:
Shall we take a look?
30
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
31Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
!32
curl -X POST -H 'Content-Type:
application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/
supervisor
33
Use Cases
34
Use Case: Network troubleshooting
35
Use Case: Network troubleshooting
! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset
visualizations.
! Visualize spikes and dips and easily filter on specific data.
! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you
need them.
! Dashboards to show most interesting, common areas of interest.
! Alerting notifications for threshold breaches or deviation from normal.
! Is it the network or application? Enhanced datasets provide quick answers.
36
Use Case: DDOS and security
! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad
actors)
! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase)
! Hooks to multiple notification channels for always on notifications.
! Webhooks for integration with back office systems.
! Easily drill-down into
37
Use Case: BGP Analytics
! PMACCT can collect and add BGP information by peering with a BGP speaker.
! Use Kafka KSQL or Kstream to augment data with BGP information.
! Visualize the BGP AS_PATH (where you traffic is going across the Internet).
! Who are your top transit or peering partners.
! Top Source and Destination ASNs.
! Top BGP communities.
38
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
39
Contribute
40
https://github.com/apache/druid
Stay in touch
41
@druidio
Join the community!
http://druid.io/community
Come by our booth for a druid t-shirt and to learn more!
Follow the Druid project on Twitter!
Thank you!
!42
Hold for applause…

More Related Content

What's hot

What's hot (20)

Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and Tricks
 
API Platform: A Framework for API-driven Projects
API Platform: A Framework for API-driven ProjectsAPI Platform: A Framework for API-driven Projects
API Platform: A Framework for API-driven Projects
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
Edge architecture ieee international conference on cloud engineering
Edge architecture   ieee international conference on cloud engineeringEdge architecture   ieee international conference on cloud engineering
Edge architecture ieee international conference on cloud engineering
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
 
Crossing the Streams: the New Streaming Foreign-Key Join Feature in Kafka Str...
Crossing the Streams: the New Streaming Foreign-Key Join Feature in Kafka Str...Crossing the Streams: the New Streaming Foreign-Key Join Feature in Kafka Str...
Crossing the Streams: the New Streaming Foreign-Key Join Feature in Kafka Str...
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Hashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs EnterpriseHashicorp Terraform Open Source vs Enterprise
Hashicorp Terraform Open Source vs Enterprise
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 

Similar to How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019

Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
csching
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
Frank Denis
 

Similar to How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019 (20)

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
Lecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksLecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networks
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
 
Tech
TechTech
Tech
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networking
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala Lumpur
 
Networking 101 english
Networking 101   englishNetworking 101   english
Networking 101 english
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 

More from confluent

More from confluent (20)

Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply Data) Kafka Summit NYC 2019

  • 1. Tame your router data with Apache Kafka and Apache Druid Rachel Pedreschi rachel.pedreschi@imply.io Eric Graham eric.graham@imply.io
  • 2. Tell ‘em what you are gonna tell ‘em ! The Who? Intro to your (slightly) nervous speakers ! The Why? What is the problem? ! The How? Introducing the OSS stack to solve all the world’s ills ! The Demo. So much demo. 2
  • 3. The Who 3 Eric Graham. 
 The Man, The Legend. The one that wrote the paper that got us accepted to this conference. Rachel Pedreschi. 
 Mostly Overhead. The one that wrote the abstract that got us accepted to this conference.
  • 4. 4
  • 5. Part of the problem - The Data 5 Streaming Telemetry Flow Syslog Augmentation A recent advancement to replace SNMP. Provides streaming interface vs. older pull model. Gives network operators much quicker response to deviations. detailed network analysis around TCP/IP flows through routers, switches and firewalls. Flow data includes src/dst MAC, src/dst IP, Protocol, src/dst port, in/out interface ID, TCP flags, TOS, BGP information, Bytes/Packets and more System logs for routers and switches Routing, DNS, usernames make visibility that much clearer Telegraf, pipeline, sflowd Tools - examples: PMACCT, Cento, NIFI/NFDump Syslog-ng ksql, kstream, lookup tables, BGP routing Used to collect metrics on interface stats, cpu, memory, disk space and more. Get detailed information on TCP/IP packets Textual information on whats going on Clearer visibility to make rapid decisions
  • 6. Let’s make the data part of the solution! 6
  • 7. OSS to the rescue! 7
  • 8. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 8Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 9. The Answer: Apache Kafka and Apache Druid ! Both built for modern data architectures. ! Both can handle data at scale. (largest Druid cluster over 2000 servers, 50Pb raw data) ! Full redundancy. ! Druid was developed for real- time analytics. ! Both work in harmony together helping get answers fast. 9
  • 10. !10
  • 11. What the heck is Apache Druid and Why Should I Care? 11
  • 12. !12
  • 13. !13
  • 14. !14
  • 15. !15
  • 16. The 90s: data warehouses and data marts Tightly coupled architecture with limited flexibility. Data Data Data Data Sources ETL Data Warehouse Processing Store and Compute Analytics Reporting Data mining Querying Confidential. Do not redistribute. 16
  • 17. !17
  • 18. !18
  • 19. The 2000s - present: data lakes Separation of storage and compute enables flexibility in tools. 19 Data Data Data Mapreduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake StorageData Sources Confidential. Do not redistribute.
  • 20. !20
  • 21. The Now: data rivers Streaming architectures enable faster decision cycles. 21 Data Data Data Data Sources Message bus Data Lake Streaming OLAP Confidential. Do not redistribute.
  • 24. Typical Big Data++ Challenges ! Scale: when data is large, we need a lot of servers ! Speed: aiming for sub-second response time ! Complexity: too much fine grain to precompute ! High dimensionality: 10s or 100s of dimensions ! Concurrency: many users and tenants ! Freshness: load from streams 24
  • 25. What were the options? 25 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 26. 26 ! Batch ingestion ! Efficient storage ! Fast analytic queries Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 27. 27
  • 28. These guys have played a Druid… 28 Source: http://druid.io/druid-powered.html and imply.io + many more!
  • 29. Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 29 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  • 30. Shall we take a look? 30
  • 31. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 31Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 32. !32 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/ supervisor
  • 33. 33
  • 35. Use Case: Network troubleshooting 35
  • 36. Use Case: Network troubleshooting ! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset visualizations. ! Visualize spikes and dips and easily filter on specific data. ! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you need them. ! Dashboards to show most interesting, common areas of interest. ! Alerting notifications for threshold breaches or deviation from normal. ! Is it the network or application? Enhanced datasets provide quick answers. 36
  • 37. Use Case: DDOS and security ! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad actors) ! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase) ! Hooks to multiple notification channels for always on notifications. ! Webhooks for integration with back office systems. ! Easily drill-down into 37
  • 38. Use Case: BGP Analytics ! PMACCT can collect and add BGP information by peering with a BGP speaker. ! Use Kafka KSQL or Kstream to augment data with BGP information. ! Visualize the BGP AS_PATH (where you traffic is going across the Internet). ! Who are your top transit or peering partners. ! Top Source and Destination ASNs. ! Top BGP communities. 38
  • 39. Download Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 39
  • 41. Stay in touch 41 @druidio Join the community! http://druid.io/community Come by our booth for a druid t-shirt and to learn more! Follow the Druid project on Twitter!
  • 42. Thank you! !42 Hold for applause…