SlideShare a Scribd company logo
Washington DC Area Apache 
Spark Interactive 
Spark Bake-off 
Team Name: Silvio Fiorito 
Solution Title: Real-time Packet Analysis using Spark
Spark Bake-off 
Page: 2 
Team Introductions 
 Silvio Fiorito 
– Background in development and app security 
– Started working with Hadoop in 2012 
– Started using Spark at v0.6 in early 2013 
– Built a few prototypes for low-latency query 
services with Spark/Shark and then 
SparkSQL 
– Twitter: @granturing
Spark Bake-off 
Page: 3 
Solution Overview 
 Real-time TCP packet analysis of geographically 
distributed hosts 
– Must support high throughput from many hosts 
– 3 demo VMs ( 2 x Azure & 1 x AWS) 
 Local Flume agent pushes events to Azure Event Hub 
 Events are partitioned and persisted up to 7 days 
 Spark Streaming app ingests streams 
– Reconstruct packets 
– Lookups for geo-ip and port description 
– Clusters using pre-trained k-means model 
– Saves data to Azure Table Storage and publishes on 
Service Bus Topic
Spark Bake-off 
Page: 4 
Solution Overview
Spark Bake-off 
Page: 5 
Sample Dashboard with Power BI
Spark Bake-off 
Page: 6 
Final Comments & Questions 
 With more time 
– Add true anomaly detection with MLLib 
– Test on hosts with real traffic 
– Wire up end-to-end with d3.js viz and 
SparkSQL backend 
– Integrate with existing IDS/IPS rules 
– Bad IPs lookup

More Related Content

What's hot

Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
Uwe Korn
 
Tableapp architecture migration story for GCPUG.TW
Tableapp architecture migration story for GCPUG.TWTableapp architecture migration story for GCPUG.TW
Tableapp architecture migration story for GCPUG.TW
Yen-Wen Chen
 
Getting to Know Airflow
Getting to Know AirflowGetting to Know Airflow
Getting to Know Airflow
Rosanne Hoyem
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
Himani Agrawal
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
Erin Shellman
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Spark Summit
 
How Kubernetes scheduler works
How Kubernetes scheduler worksHow Kubernetes scheduler works
How Kubernetes scheduler works
Himani Agrawal
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
Prometheus kubernetes tech talk
Prometheus kubernetes tech talkPrometheus kubernetes tech talk
Prometheus kubernetes tech talk
Chandresh Pancholi
 
What is Spark
What is SparkWhat is Spark
What is Spark
Bruno Faria
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 
8月27日pmcaff分享
8月27日pmcaff分享8月27日pmcaff分享
8月27日pmcaff分享
Song Hou
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checketts
clintchecketts
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
From AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryFrom AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture Story
Yen-Wen Chen
 
Two Years In Production With Kubernetes - An Experience Report
Two Years In Production With Kubernetes - An Experience ReportTwo Years In Production With Kubernetes - An Experience Report
Two Years In Production With Kubernetes - An Experience Report
Kasper Nissen
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
Kasper Nissen
 

What's hot (20)

Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Tableapp architecture migration story for GCPUG.TW
Tableapp architecture migration story for GCPUG.TWTableapp architecture migration story for GCPUG.TW
Tableapp architecture migration story for GCPUG.TW
 
Getting to Know Airflow
Getting to Know AirflowGetting to Know Airflow
Getting to Know Airflow
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 
How Kubernetes scheduler works
How Kubernetes scheduler worksHow Kubernetes scheduler works
How Kubernetes scheduler works
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Prometheus kubernetes tech talk
Prometheus kubernetes tech talkPrometheus kubernetes tech talk
Prometheus kubernetes tech talk
 
What is Spark
What is SparkWhat is Spark
What is Spark
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
8月27日pmcaff分享
8月27日pmcaff分享8月27日pmcaff分享
8月27日pmcaff分享
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checketts
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
From AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryFrom AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture Story
 
Two Years In Production With Kubernetes - An Experience Report
Two Years In Production With Kubernetes - An Experience ReportTwo Years In Production With Kubernetes - An Experience Report
Two Years In Production With Kubernetes - An Experience Report
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 

Viewers also liked

Databricks-Oreilly- Spark Developer
Databricks-Oreilly- Spark DeveloperDatabricks-Oreilly- Spark Developer
Databricks-Oreilly- Spark Developer
Joby Johny
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
Nicolas Takashi
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
SwitchPitch
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Mike Martin
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
Willem Meints
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
Jim O'Neil
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Paco Nathan
 
Azure IOT
Azure IOTAzure IOT
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
Steve Lee
 
Software scope
Software scopeSoftware scope
Software scope
Shubham Dubey
 
Going serverless
Going serverlessGoing serverless
Going serverless
TechExeter
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Koray Kocabas
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
Azure functions
Azure functionsAzure functions
Azure functions
vivek p s
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
Katie Bradford
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Toradex
 
Going serverless
Going serverlessGoing serverless
Going serverless
Jeremy Green
 

Viewers also liked (20)

Databricks-Oreilly- Spark Developer
Databricks-Oreilly- Spark DeveloperDatabricks-Oreilly- Spark Developer
Databricks-Oreilly- Spark Developer
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Azure IOT
Azure IOTAzure IOT
Azure IOT
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
 
Software scope
Software scopeSoftware scope
Software scope
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
 
Azure functions
Azure functionsAzure functions
Azure functions
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
 
Going serverless
Going serverlessGoing serverless
Going serverless
 

Similar to DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs

Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
Jim Dowling
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
shareddatamsft
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
Nan Zhu
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Ovadiah Myrgorod
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
Sri Ambati
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Landon Robinson
 
Developing apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobiusDeveloping apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobius
shareddatamsft
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 

Similar to DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs (20)

Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Developing apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobiusDeveloping apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobius
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 

Recently uploaded

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 

Recently uploaded (20)

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 

DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs

  • 1. Washington DC Area Apache Spark Interactive Spark Bake-off Team Name: Silvio Fiorito Solution Title: Real-time Packet Analysis using Spark
  • 2. Spark Bake-off Page: 2 Team Introductions  Silvio Fiorito – Background in development and app security – Started working with Hadoop in 2012 – Started using Spark at v0.6 in early 2013 – Built a few prototypes for low-latency query services with Spark/Shark and then SparkSQL – Twitter: @granturing
  • 3. Spark Bake-off Page: 3 Solution Overview  Real-time TCP packet analysis of geographically distributed hosts – Must support high throughput from many hosts – 3 demo VMs ( 2 x Azure & 1 x AWS)  Local Flume agent pushes events to Azure Event Hub  Events are partitioned and persisted up to 7 days  Spark Streaming app ingests streams – Reconstruct packets – Lookups for geo-ip and port description – Clusters using pre-trained k-means model – Saves data to Azure Table Storage and publishes on Service Bus Topic
  • 4. Spark Bake-off Page: 4 Solution Overview
  • 5. Spark Bake-off Page: 5 Sample Dashboard with Power BI
  • 6. Spark Bake-off Page: 6 Final Comments & Questions  With more time – Add true anomaly detection with MLLib – Test on hosts with real traffic – Wire up end-to-end with d3.js viz and SparkSQL backend – Integrate with existing IDS/IPS rules – Bad IPs lookup