SlideShare a Scribd company logo
Anomaly Detection Engine for
Cloud Activities using Flink
Flink Forward Berlin 2018
Yonatan Most & Avihai Berkovitz
Microsoft Cloud App Security
Microsoft Cloud App Security
Discover and
assess risks
Control access
in real time
Detect
threats
Protect your
information
Identify cloud apps on your
network, gain visibility into shadow
IT, and get risk assessments and
ongoing analytics.
Manage and limit cloud app
access based on conditions and
session context, including user
identity, device, and location.
Identify high-risk usage and
detect unusual behavior using
Microsoft threat intelligence
and research.
Get granular control over data
and use built-in or custom
policies for data sharing and
data loss prevention.
Threat detection: Microsoft Intelligent Security Graph, Office ATP
Information Protection: Office 365 & Azure Information Protection
Identity: Azure AD and Conditional Access
To your cloud appsExtend Microsoft security
+ more
Microsoft Cloud App Security
• The primary data type in the product
• 16,000+ supported SaaS applications
• Dozens of activity types (logon, upload, export, create user…)
• Locations, devices, target objects, impersonated users…
• Out of order by up to 24 hours!
Cloud activities
Cloud activities
• Goal: detect anomalous user behavior and alert the admin
• The core flow is:
• Analyze all the activities and extract security-oriented insights
• Maintain a behavioral model for every user, and update it inline
• Detect outliers, suspicious behavior or potentially malicious activity
• Cross-reference with previous alerts and other users, to prevent false positives and “alert fatigue”
• Reach a decision and raise an alert within seconds
Activity-based threat protection
• Scalability
• Fault tolerance & recovery
• Real time processing
• Short time from ingestion to
detection
• Extendibility
Detection engine requirements
• Support for massive state size
• State persistency
• State consistency
• Fast, parallel access and update of
state
Sounds familiar?
• We needed a stateful stream processing framework
• We really didn’t want to build one in-house
• We tested 10 frameworks
• Azure ML, Azure Stream Analytics, Microsoft Orleans, Apache Storm, Apache Samza, Apache Spark streaming, Apache Ignite, Apache Beam…
• We chose Flink!
We embarked on a search for a framework
Anubis
• Our anomaly detection engine
• A single Flink job running the entire flow
• Ingests activities
• Outputs alerts
• 8 clusters across multiple datacenters
• Our largest cluster:
• 40 machines
• 25,000 events per second
• 1.3 TB of state
Anubis
Anubis
User
model
Group
model
Events
Anubis scalability
Scalability in… Requires…
Event rate per cluster Increase cluster as needed (stability cost)
Event rate per user group
Key by user where possible
Special treatment of group-keyed operators
Event rate per user Capping the user event rate
Features models and detections
Key by feature / model / detection +
increase cluster as needed
Number of clusters Easy cluster management
• Multiple clusters in multiple datacenters
• Custom standalone cluster setup
• Automation scripts
• Machine setup
• Flink version upgrade
• Job deployment and upgrade
• Kubernetes-based deployment solution is in progress
Flink @ Microsoft - cluster
• We use Azure EventHubs as sources and sinks
• We use Azure Blob Storage for state backend
Flink @ Microsoft – connectors
• The job and its state should last forever
• We also deploy new versions frequently and update the state objects
• We created a custom versioned framework based on Kryo
Flink @ Microsoft – upgrades
• Custom wrappers for operators, state access, serializers, collectors
• Adding log metadata about current operator, context, timing
• Adding metrics using the built-in framework
• Around 15,000 metrics per process!
• Everything flows to Splunk
• Investigation
• Dashboards
• Alerts
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
• Two other Flink jobs already in production
• Another two in development
• Kubernetes-based deployment solution is in progress
• Helping other groups within Microsoft build Flink jobs
What’s next?
© Copyright Microsoft Corporation. All rights reserved.
Thank you.
Questions?

More Related Content

What's hot

Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic Observability
Elasticsearch
 
Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search
Elasticsearch
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
Elasticsearch
 
Monitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and whyMonitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and why
Karl Ots
 
Automate Your Container Deployments Securely
Automate Your Container Deployments SecurelyAutomate Your Container Deployments Securely
Automate Your Container Deployments Securely
DevOps.com
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Elasticsearch
 
Infrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightInfrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insight
Elasticsearch
 
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Tom Kerkhove
 
Elasticsearch on Azure
Elasticsearch on AzureElasticsearch on Azure
Elasticsearch on Azure
Elasticsearch
 
The full picture of Openstack in real-time
The full picture of Openstack in real-timeThe full picture of Openstack in real-time
The full picture of Openstack in real-time
Dynatrace
 
Monitoring
MonitoringMonitoring
Monitoring
strikr .
 
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elasticsearch
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elasticsearch
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
DevOps.com
 
Your Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery PlatformYour Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery Platform
syed_javed
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
Elasticsearch
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic CloudMigrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
Elasticsearch
 
Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search
Elasticsearch
 
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Tom Kerkhove
 

What's hot (20)

Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic Observability
 
Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
Monitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and whyMonitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and why
 
Automate Your Container Deployments Securely
Automate Your Container Deployments SecurelyAutomate Your Container Deployments Securely
Automate Your Container Deployments Securely
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
Infrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightInfrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insight
 
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
 
Elasticsearch on Azure
Elasticsearch on AzureElasticsearch on Azure
Elasticsearch on Azure
 
The full picture of Openstack in real-time
The full picture of Openstack in real-timeThe full picture of Openstack in real-time
The full picture of Openstack in real-time
 
Monitoring
MonitoringMonitoring
Monitoring
 
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Your Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery PlatformYour Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery Platform
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic CloudMigrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
 
Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search
 
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
 

Similar to Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detection Engine for Cloud Activities using Flink"

ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
Grace Jansen
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself Alert Logic
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Grid Dynamics
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
DevOps Chicago
 
Presentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion seguraPresentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion segura
RogerChaucaZea
 
Hybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminarHybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminar
ManageEngine, Zoho Corporation
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
Derek Chen
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for Dummies
Atif Ghauri
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Toni de la Fuente
 
The Joy of Proactive Security
The Joy of Proactive SecurityThe Joy of Proactive Security
The Joy of Proactive Security
Andy Hoernecke
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!
Xavier Mertens
 
All your logs are belong to you!
All your logs are belong to you!All your logs are belong to you!
All your logs are belong to you!
Security BSides London
 
366864108 azure-security
366864108 azure-security366864108 azure-security
366864108 azure-security
ober64
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Data Con LA
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
Bert Jan Schrijver
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Codemotion
 
Azure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure CloudAzure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure Cloud
Paulo Renato
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Ryan Hodgin
 
Microsoft Azure Security Overview
Microsoft Azure Security OverviewMicrosoft Azure Security Overview
Microsoft Azure Security Overview
Alert Logic
 

Similar to Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detection Engine for Cloud Activities using Flink" (20)

ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Presentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion seguraPresentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion segura
 
Hybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminarHybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminar
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for Dummies
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
 
The Joy of Proactive Security
The Joy of Proactive SecurityThe Joy of Proactive Security
The Joy of Proactive Security
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!
 
All your logs are belong to you!
All your logs are belong to you!All your logs are belong to you!
All your logs are belong to you!
 
366864108 azure-security
366864108 azure-security366864108 azure-security
366864108 azure-security
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
 
Azure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure CloudAzure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure Cloud
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
 
Microsoft Azure Security Overview
Microsoft Azure Security OverviewMicrosoft Azure Security Overview
Microsoft Azure Security Overview
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detection Engine for Cloud Activities using Flink"

  • 1. Anomaly Detection Engine for Cloud Activities using Flink Flink Forward Berlin 2018 Yonatan Most & Avihai Berkovitz Microsoft Cloud App Security
  • 2. Microsoft Cloud App Security Discover and assess risks Control access in real time Detect threats Protect your information Identify cloud apps on your network, gain visibility into shadow IT, and get risk assessments and ongoing analytics. Manage and limit cloud app access based on conditions and session context, including user identity, device, and location. Identify high-risk usage and detect unusual behavior using Microsoft threat intelligence and research. Get granular control over data and use built-in or custom policies for data sharing and data loss prevention. Threat detection: Microsoft Intelligent Security Graph, Office ATP Information Protection: Office 365 & Azure Information Protection Identity: Azure AD and Conditional Access To your cloud appsExtend Microsoft security + more
  • 4. • The primary data type in the product • 16,000+ supported SaaS applications • Dozens of activity types (logon, upload, export, create user…) • Locations, devices, target objects, impersonated users… • Out of order by up to 24 hours! Cloud activities
  • 6. • Goal: detect anomalous user behavior and alert the admin • The core flow is: • Analyze all the activities and extract security-oriented insights • Maintain a behavioral model for every user, and update it inline • Detect outliers, suspicious behavior or potentially malicious activity • Cross-reference with previous alerts and other users, to prevent false positives and “alert fatigue” • Reach a decision and raise an alert within seconds Activity-based threat protection
  • 7. • Scalability • Fault tolerance & recovery • Real time processing • Short time from ingestion to detection • Extendibility Detection engine requirements • Support for massive state size • State persistency • State consistency • Fast, parallel access and update of state
  • 9. • We needed a stateful stream processing framework • We really didn’t want to build one in-house • We tested 10 frameworks • Azure ML, Azure Stream Analytics, Microsoft Orleans, Apache Storm, Apache Samza, Apache Spark streaming, Apache Ignite, Apache Beam… • We chose Flink! We embarked on a search for a framework
  • 10. Anubis • Our anomaly detection engine • A single Flink job running the entire flow • Ingests activities • Outputs alerts
  • 11. • 8 clusters across multiple datacenters • Our largest cluster: • 40 machines • 25,000 events per second • 1.3 TB of state Anubis
  • 13. Anubis scalability Scalability in… Requires… Event rate per cluster Increase cluster as needed (stability cost) Event rate per user group Key by user where possible Special treatment of group-keyed operators Event rate per user Capping the user event rate Features models and detections Key by feature / model / detection + increase cluster as needed Number of clusters Easy cluster management
  • 14. • Multiple clusters in multiple datacenters • Custom standalone cluster setup • Automation scripts • Machine setup • Flink version upgrade • Job deployment and upgrade • Kubernetes-based deployment solution is in progress Flink @ Microsoft - cluster
  • 15. • We use Azure EventHubs as sources and sinks • We use Azure Blob Storage for state backend Flink @ Microsoft – connectors
  • 16. • The job and its state should last forever • We also deploy new versions frequently and update the state objects • We created a custom versioned framework based on Kryo Flink @ Microsoft – upgrades
  • 17. • Custom wrappers for operators, state access, serializers, collectors • Adding log metadata about current operator, context, timing • Adding metrics using the built-in framework • Around 15,000 metrics per process! • Everything flows to Splunk • Investigation • Dashboards • Alerts Flink @ Microsoft – monitoring
  • 18. Flink @ Microsoft – monitoring
  • 19. Flink @ Microsoft – monitoring
  • 20. Flink @ Microsoft – monitoring
  • 21. Flink @ Microsoft – monitoring
  • 22. Flink @ Microsoft – monitoring
  • 23. Flink @ Microsoft – monitoring
  • 24. • Two other Flink jobs already in production • Another two in development • Kubernetes-based deployment solution is in progress • Helping other groups within Microsoft build Flink jobs What’s next?
  • 25. © Copyright Microsoft Corporation. All rights reserved. Thank you. Questions?