SlideShare a Scribd company logo
1 of 29
Download to read offline
Mitigating 1 Million Security Threats
Arun Janarthnam
Architect @ Citrix
Who or what did we protect ?
What kind of threats did we protect against ?
Cloud
Work used to be a place
Any where in the world Anywhere in the internet
On-prem
External Apps
One digital workspace platform to empower secure hybrid work
Citrix DaaS & VDI
Workspace
Secure Private
Access
ShareFile
* Not the full list of Citrix products
98%
98% of Fortune 500
100M
100M users in 100+ countries
400K
Over 400,000 customers
Cloud
External Apps
On-prem
Files
Apps
Citrix Analytics
Citrix DaaS & VDI
Secure Private Access
ShareFile
User
Location
Device
Operations
Credentials + 2FA
Citrix Analytics
Telemetry
URL
Network
Workspace
Credentials + 2FA
Files
Apps
Citrix Security Analytics
User
Location
Device
Operations
Citrix Analytics
URL
DATA EXFILTERATION
Excessive file sharing, downloads, printing,
cut and paste to local computer
INSIDER THREATS
Presence of Malware Files, excessive file
Uploads/Downloads, risky/blocked website access
COMPROMISED USERS
Impossible travel, unusual authentication failures,
suspicious logons, excessive file deletion.
COMPROMISED ENDPOINTS
Devices that are jailbroken, unmanaged or
have blacklisted apps
Risk
Indicators
Risk score
URL
App
Network
Operations
Files
How did we detect and mitigate threats ?
Citrix Analytics Architecture
Data Platform
ML Platform
App Platform
Security Analytics
Performance
Analytics
Networking
Analytics
SPA / ZTNA
Usage Analytics
Citrix Analytics Platform
10+ Citrix Products
sending ~40 different
data streams
Data Ingestion
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
✓ User Correlation
✓ Data Augmentation
✓ Data Transformation
Workspace ShareFile
user1@citrix.com
user1 aaaUser1
username email AAA user name
User Correlation
Module
(Stream)
Active
Directory
User Correlation
distinct
cas_user_name
Batch Job
(every 15 mins)
User Lookup
Table
Data Ingestion
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
✓ User Correlation
✓ Data Augmentation
✓ Data Transformation
Workspace ShareFile
user1@citrix.com
user1 aaaUser1
username email AAA user name
User Correlation
Module
(Stream)
Active
Directory
User Correlation
distinct
cas_user_name
Batch Job
(every 15 mins)
User Lookup
Table
Data Ingestion
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
✓ User Correlation
✓ Data Augmentation
✓ Data Transformation
Workspace ShareFile
user1@citrix.com
user1 aaaUser1
username email AAA user name
User Correlation
Module
(Stream)
Active
Directory
User Correlation
distinct
cas_user_name
update
update
Can’t update billions of
records. A join on
username, solves the
duplicate user issue
Batch Job
(every 15 mins)
User Lookup
Table
Data Ingestion
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
✓ User Correlation
✓ Data Augmentation
✓ Data Transformation
Workspace ShareFile
user1@citrix.com
user1 aaaUser1
username email AAA user name
User Correlation
Module
(Stream)
Active
Directory
User Correlation
distinct
cas_user_name
update
update
lookups
Can’t update billions of
records. A join on
username, solves the
duplicate user issue
Batch Job
(every 15 mins)
User Lookup
Table
Data Ingestion
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
✓ User Correlation
✓ Data Augmentation
✓ Data Transformation
Augmentation & Transformation
• IP to Location Enrichment
• Accuracy & coverage is paramount.
• Handle “locale” city names
• Enrich URL data
• Normalize field names and values
• Data Quality checks
• Add licensing information
Insights
Streaming
Ingestion
Citrix
Products
Prod Data Lake
Batch
Ingestion
Common Data
Processing
Processed
Data
Raw Data
Streaming
Apps
Batch Apps
Custom
Indicators
Data Exploration
Lake
ML
Development
Anonymize
Graph
Loaders
Derived
Data
Detecting possible security threats
Data Exploration Platform
Detect Data,
Concept Drift &
any anomalies
A Platform to generate Insights
Data Collection Data Discovery
Feature
Engineering
Model
Experimentation &
Training
Model Evaluation
Model Deployment &
Serving
Monitor & Improve
Prod Data
Lake
Data
Exploration
Lake
Model Development
(Databricks Notebook & Auto ML)
Fast Data Exploration
(Databricks SQL)
Data Profiling & Quality
(Spark & DB SQL)
Experiment
Tracking
Model
Registry
Testing
Frameworks
CI/CD
pipelines
Metadata
Storage
Feature
Store
Model
Monitoring
Feedback
API
Pipeline Management – Azure Data Factory, Data Bricks Scheduler, DLT
Streaming
ML Jobs
Batch ML
Jobs
Model
Serving on
AKS
Detect anomalies
Reduce False Positives
Data Catalog
Anomaly
Detection
Models
Risk Score
ML Job
Networking
ML Jobs
Frameworks are a dev’s best friend
Prod Data Lake
Processed
Data
Risk Score BOT Detection WAF Violations
Batch App Framework
Potential Data
Exfiltration
Suspicious Logon
Streaming App Framework
First Time Indicators
Custom Indicator Framework
Offline Feature
Store
Online Feature
Store
Excessive operation
Indicators
State Store
• Avoid template code
• Feature Generation
• Feature Store
• State Management
• Custom Listeners
• Offset Monitoring
• Sample & Skeleton Apps
Framework Features
Citrix ADC
Custom Indicators (CI)
Processed
Data
Custom Indicator Framework
Security
Admin
Customers wants to create their own risk indicators
CI Controller
State Store
Meta Store
Create structured streaming
queries
Alert me when a user
located in India launches a
session from a new device
Templates for
easy CI creation
Condition
Trigger
A zero code spark platform
CI Design
CI 1: Geofence crossing
Data Source: Citrix Gateway
Event-Type = “VPN_AI” AND Country != “United States”
User has successful authentication from outside their country of operation
CI 4: Geofence crossing
Data Source: Share File
Is-Employee = “False” AND Operation-Name =
“Login” AND Country != “United States
Login of a non-employee from outside of country of
operation
CI 3: First time access from new IP
Data Source: Citrix Gateway
Event-Type = "Authentication" AND Status-Code =
"Successful login" AND Client-IP-Type != "private"
CI 2: Excessive authentication failure
Data Source: Citrix Gateway
Event-Type = "Authentication" AND Status-Code != "Successful login"
Every time
First time
3 times in
1 min
• Not efficient as the same data is read
again and again.
• A pre spark 2.4 bug (SPARK-19185)
caused issues when multiple Kafka
consumers from the same executor
reads the same topic with different
offsets.
Developed ~3.5 years ago
Spark Structured
Streaming Query for CI 1
Spark Structured
Streaming Query for CI 2
Spark Structured
Streaming Query for CI 3
Spark Structured
Streaming Query for CI 4
ONE Spark Structured
Streaming Query for C1,
C2 & C3
CASE
WHEN ({condition for C1}) and tenantId =‘{tenantId}’
THEN ‘ C1 UUID’
WHEN ({condition for C2}) and tenantId =‘{tenantId}’
THEN ‘ C2 UUID’
WHEN ({condition for C3}) and tenantId =‘{tenantId}’
THEN ‘C3 UUID’
ELSE null
END
• Broadcast trigger conditions for all CIs.
• Store relevant information for a CI + user in state
• Logic to trigger an alert is coded in custom “MapGroupsWithState” function.
First Time triggers
• Load initial state into spark memory to support ”first-time” indicators.
• Shouldn’t raise “first-time” alerts for newly onboarded customers.
• On-demand state load via redis to reduce “state” size.
State Store:
• To support 1 billion states, 100G hdfs space and 240G+local disk space is
required.
• So, extended local state store with RocksDBStateStoreProvider
• Looking forward to Project lightspeed which in addition to other streaming
enhancements, has improved state management.
Filter events by
conditions
Apply trigger conditions via
Spark State operations
RocksDB
State Store
• Read problem is solved by adding all conditions to
a single SQL.
• If a condition is met, the corresponding CI’s UUID
is emitted.
• This df is then later exploded with relevant CI
information.
• Max of 1000 conditions in one query
Gateway
Topic
Sharefile
Topic
Gateway
Topic
Gateway
Topic
CI Hits
Topic
Option 1 Optimizing filter conditions Handling diferent trigger
CI Challenges
Adding new Custom Indicators or modifying existing CI’s
• Both new and update CI operation result in query restart.
• For now, this is not an issue. But, as number of CI’s scale up, restart frequency will increase, which in turn will affect other CI’s
in the group and miss our SLO’s.
• In pipeline: use “Foreachbatch” to get fresh CI details.
• Similarly, new session window capability introduced in Spark 3.2 is promising. Some of the simple window based CI’s can be
moved to session windows.
Stale state
• Since users can define varying time limits for “excessive trigger”, state timeout is set to a bigger number (days).
• States of CI with smaller time limits can’t be timed out in timely manner.
• A state can only be accessed when an event with corresponding key happens (like event_type + user_name).
• So, there is no easy way to access that employee’s state and delete it.
• One option is to send a dummy user event, access the user state, delete it if it’s last ”true” access was x months ago.
• This is not an issue now, but with expected 5-10x growth this will affect our cost.
Things to consider while creating streaming jobs on-demand
• In a multi-tenant system, implement strong multi-tenancy checks. A rogue user can slip in a
bogus tenant condition and read get access to other tenants data.
• Only allow whitelisted field names and validate user values.
• When combining multiple queries into one, even one query with wrong syntax will fail the whole
group.
• Estimate # of hits by running that condition on a database where the data is stored in rest. This
helps to both estimate the memory requirements for a query and inform the user the amount of
alerts he/she will receive.
• While building initial state, load the state on demand. For example, don’t load all user data during
job start. There is a decent chance a good % of users will be dormant.
• Instead, load the state in Redis or similar fast db and lazily load that data.
Can we stop a real life attack ?
Let’s look at a very recent “actual” attack
An Engineer with
malware affected
laptop
Hacker 1
2. steals credentials
3. sells credentials
4. buys credentials
5. Exhaustion attack
6. gets
access
1. logs on
Hacker 2
Detecting security threats
Streaming
Ingestion
Citrix
Products
Processed
Data
Suspicious logon
Unusual authentication
failures
Device with blacklisted
app
EPA scan failure
Unmanaged device
detected
Impossible Travel
Potential Data
Exfiltration
If Hacker 2 was in a faraway location,
impossible travel indicator will be triggered.
It’s highly probable that Hacker 2 was logging
from a new location, device and network
(w.r.t the employee), which will trigger
suspicious logon Risk Indicator.
Security alert raised during “Exhaustion
attack”
These risk indicators have a decent chance
for being triggered.
The hacker might have avoided an EPA scan,
When the employee logged on, blacklisted
app or any malware app might have been
detected.
How Citrix would have detected attacks of this type ?
Configured Risk Indicators
Some of these indicators are
pre-configured for our
customers.
< 1 minute detection time
Spark Streaming &
Custom Indicator
Framework code
Mitigating Threats
Actions:
• Log Off, Lock/disable user
• Start session recording
• Reduce privilege or restrict access.
• Enforce logon again with 2FA.
• Notify via Email
Defining Policies to handle threats
Default Policies
Security
Admin
Mitigating Threats
Processed
Data
Streaming /
Batch Apps
Custom
Indicators
Derived
Data
Alert
Listener
Service
Alert
Processing
Service
Citrix
Products
Webhooks
Policy
Service
Policy UI
Security
Admin
Workspace
Actions:
• Log Off user
• Start session recording
• Reduce privilege or restrict
access.
• Enforce logon again with 2FA.
• Notify via Email
From Risk Indicators to Actions
Confirms action result
Generate security
alerts
Decorates alerts
Takes actions on alerts as
defined by customers
Actions
Service
Bus
Defines Policies which are
actions that needs to be
taken when certain alerts
happens
Some metrics
Ingestion
Citrix
Products
Processed
Data
Streaming /
Batch Apps
Custom
Indicators
Derived
Data
Alert
Listener
Service
Alert
Processing
Service
Citrix
Products
• Latency for streaming path ~1 min
• Latency for batch path ~15 mins
~7 Billion events per day
Peak: 100k events/sec
Expected Growth: 10x in
next 2 years
Data Mesh
Offering Micro
Services
< 1 Minute
Emails & Other
Notification
In last 6 months:
7 Million Risk Indicators
triggered.
In last 6 months:
1 Million mitigative
actions triggered.
Thank You
Mitigating One Million Security Threats With Kafka and Spark With Arun Janarthnam | Current 2022

More Related Content

Similar to Mitigating One Million Security Threats With Kafka and Spark With Arun Janarthnam | Current 2022

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondJeremyOtt5
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB DeploymentMongoDB
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insightsElasticsearch
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowTechWell
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...MongoDB
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0Ewald Roodenrijs
 
Citrix Day 2014: ShareFile Enterprise
Citrix Day 2014: ShareFile EnterpriseCitrix Day 2014: ShareFile Enterprise
Citrix Day 2014: ShareFile EnterpriseDigicomp Academy AG
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 storyAvi Perez
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesElasticsearch
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesElasticsearch
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 

Similar to Mitigating One Million Security Threats With Kafka and Spark With Arun Janarthnam | Current 2022 (20)

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to Know
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0
 
Citrix Day 2014: ShareFile Enterprise
Citrix Day 2014: ShareFile EnterpriseCitrix Day 2014: ShareFile Enterprise
Citrix Day 2014: ShareFile Enterprise
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 story
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Mitigating One Million Security Threats With Kafka and Spark With Arun Janarthnam | Current 2022

  • 1. Mitigating 1 Million Security Threats Arun Janarthnam Architect @ Citrix
  • 2. Who or what did we protect ? What kind of threats did we protect against ?
  • 3.
  • 4. Cloud Work used to be a place Any where in the world Anywhere in the internet On-prem External Apps
  • 5. One digital workspace platform to empower secure hybrid work Citrix DaaS & VDI Workspace Secure Private Access ShareFile * Not the full list of Citrix products 98% 98% of Fortune 500 100M 100M users in 100+ countries 400K Over 400,000 customers Cloud External Apps On-prem
  • 6. Files Apps Citrix Analytics Citrix DaaS & VDI Secure Private Access ShareFile User Location Device Operations Credentials + 2FA Citrix Analytics Telemetry URL Network Workspace Credentials + 2FA
  • 7. Files Apps Citrix Security Analytics User Location Device Operations Citrix Analytics URL DATA EXFILTERATION Excessive file sharing, downloads, printing, cut and paste to local computer INSIDER THREATS Presence of Malware Files, excessive file Uploads/Downloads, risky/blocked website access COMPROMISED USERS Impossible travel, unusual authentication failures, suspicious logons, excessive file deletion. COMPROMISED ENDPOINTS Devices that are jailbroken, unmanaged or have blacklisted apps Risk Indicators Risk score URL App Network Operations Files
  • 8. How did we detect and mitigate threats ?
  • 9. Citrix Analytics Architecture Data Platform ML Platform App Platform Security Analytics Performance Analytics Networking Analytics SPA / ZTNA Usage Analytics Citrix Analytics Platform 10+ Citrix Products sending ~40 different data streams
  • 10. Data Ingestion Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data ✓ User Correlation ✓ Data Augmentation ✓ Data Transformation Workspace ShareFile user1@citrix.com user1 aaaUser1 username email AAA user name User Correlation Module (Stream) Active Directory User Correlation distinct cas_user_name Batch Job (every 15 mins) User Lookup Table
  • 11. Data Ingestion Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data ✓ User Correlation ✓ Data Augmentation ✓ Data Transformation Workspace ShareFile user1@citrix.com user1 aaaUser1 username email AAA user name User Correlation Module (Stream) Active Directory User Correlation distinct cas_user_name Batch Job (every 15 mins) User Lookup Table
  • 12. Data Ingestion Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data ✓ User Correlation ✓ Data Augmentation ✓ Data Transformation Workspace ShareFile user1@citrix.com user1 aaaUser1 username email AAA user name User Correlation Module (Stream) Active Directory User Correlation distinct cas_user_name update update Can’t update billions of records. A join on username, solves the duplicate user issue Batch Job (every 15 mins) User Lookup Table
  • 13. Data Ingestion Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data ✓ User Correlation ✓ Data Augmentation ✓ Data Transformation Workspace ShareFile user1@citrix.com user1 aaaUser1 username email AAA user name User Correlation Module (Stream) Active Directory User Correlation distinct cas_user_name update update lookups Can’t update billions of records. A join on username, solves the duplicate user issue Batch Job (every 15 mins) User Lookup Table
  • 14. Data Ingestion Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data ✓ User Correlation ✓ Data Augmentation ✓ Data Transformation Augmentation & Transformation • IP to Location Enrichment • Accuracy & coverage is paramount. • Handle “locale” city names • Enrich URL data • Normalize field names and values • Data Quality checks • Add licensing information
  • 15. Insights Streaming Ingestion Citrix Products Prod Data Lake Batch Ingestion Common Data Processing Processed Data Raw Data Streaming Apps Batch Apps Custom Indicators Data Exploration Lake ML Development Anonymize Graph Loaders Derived Data Detecting possible security threats Data Exploration Platform
  • 16. Detect Data, Concept Drift & any anomalies A Platform to generate Insights Data Collection Data Discovery Feature Engineering Model Experimentation & Training Model Evaluation Model Deployment & Serving Monitor & Improve Prod Data Lake Data Exploration Lake Model Development (Databricks Notebook & Auto ML) Fast Data Exploration (Databricks SQL) Data Profiling & Quality (Spark & DB SQL) Experiment Tracking Model Registry Testing Frameworks CI/CD pipelines Metadata Storage Feature Store Model Monitoring Feedback API Pipeline Management – Azure Data Factory, Data Bricks Scheduler, DLT Streaming ML Jobs Batch ML Jobs Model Serving on AKS Detect anomalies Reduce False Positives Data Catalog Anomaly Detection Models Risk Score ML Job Networking ML Jobs
  • 17. Frameworks are a dev’s best friend Prod Data Lake Processed Data Risk Score BOT Detection WAF Violations Batch App Framework Potential Data Exfiltration Suspicious Logon Streaming App Framework First Time Indicators Custom Indicator Framework Offline Feature Store Online Feature Store Excessive operation Indicators State Store • Avoid template code • Feature Generation • Feature Store • State Management • Custom Listeners • Offset Monitoring • Sample & Skeleton Apps Framework Features Citrix ADC
  • 18. Custom Indicators (CI) Processed Data Custom Indicator Framework Security Admin Customers wants to create their own risk indicators CI Controller State Store Meta Store Create structured streaming queries Alert me when a user located in India launches a session from a new device Templates for easy CI creation Condition Trigger A zero code spark platform
  • 19. CI Design CI 1: Geofence crossing Data Source: Citrix Gateway Event-Type = “VPN_AI” AND Country != “United States” User has successful authentication from outside their country of operation CI 4: Geofence crossing Data Source: Share File Is-Employee = “False” AND Operation-Name = “Login” AND Country != “United States Login of a non-employee from outside of country of operation CI 3: First time access from new IP Data Source: Citrix Gateway Event-Type = "Authentication" AND Status-Code = "Successful login" AND Client-IP-Type != "private" CI 2: Excessive authentication failure Data Source: Citrix Gateway Event-Type = "Authentication" AND Status-Code != "Successful login" Every time First time 3 times in 1 min • Not efficient as the same data is read again and again. • A pre spark 2.4 bug (SPARK-19185) caused issues when multiple Kafka consumers from the same executor reads the same topic with different offsets. Developed ~3.5 years ago Spark Structured Streaming Query for CI 1 Spark Structured Streaming Query for CI 2 Spark Structured Streaming Query for CI 3 Spark Structured Streaming Query for CI 4 ONE Spark Structured Streaming Query for C1, C2 & C3 CASE WHEN ({condition for C1}) and tenantId =‘{tenantId}’ THEN ‘ C1 UUID’ WHEN ({condition for C2}) and tenantId =‘{tenantId}’ THEN ‘ C2 UUID’ WHEN ({condition for C3}) and tenantId =‘{tenantId}’ THEN ‘C3 UUID’ ELSE null END • Broadcast trigger conditions for all CIs. • Store relevant information for a CI + user in state • Logic to trigger an alert is coded in custom “MapGroupsWithState” function. First Time triggers • Load initial state into spark memory to support ”first-time” indicators. • Shouldn’t raise “first-time” alerts for newly onboarded customers. • On-demand state load via redis to reduce “state” size. State Store: • To support 1 billion states, 100G hdfs space and 240G+local disk space is required. • So, extended local state store with RocksDBStateStoreProvider • Looking forward to Project lightspeed which in addition to other streaming enhancements, has improved state management. Filter events by conditions Apply trigger conditions via Spark State operations RocksDB State Store • Read problem is solved by adding all conditions to a single SQL. • If a condition is met, the corresponding CI’s UUID is emitted. • This df is then later exploded with relevant CI information. • Max of 1000 conditions in one query Gateway Topic Sharefile Topic Gateway Topic Gateway Topic CI Hits Topic Option 1 Optimizing filter conditions Handling diferent trigger
  • 20. CI Challenges Adding new Custom Indicators or modifying existing CI’s • Both new and update CI operation result in query restart. • For now, this is not an issue. But, as number of CI’s scale up, restart frequency will increase, which in turn will affect other CI’s in the group and miss our SLO’s. • In pipeline: use “Foreachbatch” to get fresh CI details. • Similarly, new session window capability introduced in Spark 3.2 is promising. Some of the simple window based CI’s can be moved to session windows. Stale state • Since users can define varying time limits for “excessive trigger”, state timeout is set to a bigger number (days). • States of CI with smaller time limits can’t be timed out in timely manner. • A state can only be accessed when an event with corresponding key happens (like event_type + user_name). • So, there is no easy way to access that employee’s state and delete it. • One option is to send a dummy user event, access the user state, delete it if it’s last ”true” access was x months ago. • This is not an issue now, but with expected 5-10x growth this will affect our cost.
  • 21. Things to consider while creating streaming jobs on-demand • In a multi-tenant system, implement strong multi-tenancy checks. A rogue user can slip in a bogus tenant condition and read get access to other tenants data. • Only allow whitelisted field names and validate user values. • When combining multiple queries into one, even one query with wrong syntax will fail the whole group. • Estimate # of hits by running that condition on a database where the data is stored in rest. This helps to both estimate the memory requirements for a query and inform the user the amount of alerts he/she will receive. • While building initial state, load the state on demand. For example, don’t load all user data during job start. There is a decent chance a good % of users will be dormant. • Instead, load the state in Redis or similar fast db and lazily load that data.
  • 22. Can we stop a real life attack ?
  • 23. Let’s look at a very recent “actual” attack An Engineer with malware affected laptop Hacker 1 2. steals credentials 3. sells credentials 4. buys credentials 5. Exhaustion attack 6. gets access 1. logs on Hacker 2
  • 24. Detecting security threats Streaming Ingestion Citrix Products Processed Data Suspicious logon Unusual authentication failures Device with blacklisted app EPA scan failure Unmanaged device detected Impossible Travel Potential Data Exfiltration If Hacker 2 was in a faraway location, impossible travel indicator will be triggered. It’s highly probable that Hacker 2 was logging from a new location, device and network (w.r.t the employee), which will trigger suspicious logon Risk Indicator. Security alert raised during “Exhaustion attack” These risk indicators have a decent chance for being triggered. The hacker might have avoided an EPA scan, When the employee logged on, blacklisted app or any malware app might have been detected. How Citrix would have detected attacks of this type ? Configured Risk Indicators Some of these indicators are pre-configured for our customers. < 1 minute detection time Spark Streaming & Custom Indicator Framework code
  • 25. Mitigating Threats Actions: • Log Off, Lock/disable user • Start session recording • Reduce privilege or restrict access. • Enforce logon again with 2FA. • Notify via Email Defining Policies to handle threats Default Policies Security Admin
  • 26. Mitigating Threats Processed Data Streaming / Batch Apps Custom Indicators Derived Data Alert Listener Service Alert Processing Service Citrix Products Webhooks Policy Service Policy UI Security Admin Workspace Actions: • Log Off user • Start session recording • Reduce privilege or restrict access. • Enforce logon again with 2FA. • Notify via Email From Risk Indicators to Actions Confirms action result Generate security alerts Decorates alerts Takes actions on alerts as defined by customers Actions Service Bus Defines Policies which are actions that needs to be taken when certain alerts happens
  • 27. Some metrics Ingestion Citrix Products Processed Data Streaming / Batch Apps Custom Indicators Derived Data Alert Listener Service Alert Processing Service Citrix Products • Latency for streaming path ~1 min • Latency for batch path ~15 mins ~7 Billion events per day Peak: 100k events/sec Expected Growth: 10x in next 2 years Data Mesh Offering Micro Services < 1 Minute Emails & Other Notification In last 6 months: 7 Million Risk Indicators triggered. In last 6 months: 1 Million mitigative actions triggered.