Ada Sharoni (Software Engineering Architect) @ Hunters:
Imagine you had to manage thousands of Spark applications that are automatically spinning up on-demand upon every customer interaction.
Our unique constraints in Hunters have led us to adopt an architecture and concepts that we believe many other companies will find useful.
In this lecture we will share our solutions and insights in running many lightweight, cheap Spark applications on Kubernetes, that can easily survive frequent restarts and smartly share resources on Spot EC2 instances.
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Lessons Learnt from Running Thousands of On-demand Spark Applications
1. Lessons Learnt from Running
thousands of On-demand Spark
Applications
Ada Sharoni
Software Engineering Architect
@Hunters
2. Ada Sharoni
Software Engineering Architect @Hunters
• Software Engineering Architect @Hunters ~3 Year
• ML & Big Data
• Fun Fact: I started out as a Hardware Engineer
https://twitter.com/AdaSharoni
https://www.linkedin.com/in/ada-sharoni-47ba26b8/
3. Hunters
Security Operations Platform
• Help security teams understand the full attack story
• Correlate existing telemetry and sources across surfaces
Infection Download Persist
Command &
Control
Lateral
Movement
Data
Exfiltration
• Network • Cloud
• SasS
• Endpoint
• Email
• etc
5. Streaming Security Data in Real-Time
Data Lake
● multiple formats
● multiple sources
● Streaming in real-time
Flexible Ingestion
Data Sources
ingestion
Detection
Layer
Auto
Investigation
Knowledge
Graph
S3
7. ETL Logic as Configuration
Fruit Ninja
Fruit Ninja
Ingestion
Apps
Data Lake
Detection
Layer
Auto
Investigation
Knowledge
Graph
Kafka
S3
Azure Blob …
8. ETL Logic as Configuration
Fruit Ninja
Fruit Ninja
Ingestion
Apps
(per dataflow)
Data Lake
Detection
Layer
Auto
Investigation
Knowledge
Graph
Kafka
S3
Azure Blob
App Config
Decoder
Config
Transformer
Config
…
Schema
9. Why ETL Logic as Configuration ?
1. Security logs come in all shapes and sizes -> takes time to research!
2. There’s a lot of expertise and knowledge domain
3. Engineering teams cannot be a bottleneck
4. Business logic should be easily developed & deployed by the masses
5. SLA to production should be FAST to allow for rapid iteration
17. ETL Logic as Configuration
Fruit Ninja
Fruit Ninja
Ingestion
Apps
(per dataflow)
Data Lake
Detection
Layer
Auto
Investigation
Knowledge
Graph
Kafka
S3
Azure Blob
App Config
Decoder
Config
Transformer
Config
…
Schema
18. Architecture Solution
Controller
Service
Fruit Ninja
Fruit Ninja
Ingestion
Apps
(per dataflow)
Data Lake
new
integration
analysts
Detection
Layer
Auto
Investigation
Knowledge
Graph
Kafka
S3
Azure Blob
Portal
ETL Logic
CICD
Decoder
Config
Transformer
Config
…
1. Track & redeploy upon app
config changes
2. Redeploy upon ETL logic
(decoder /transformer) version
bump
3. App can run on specific branch
4. App can run with specific
Spark config overrides
Schema
customers
Controller Service (or ArgoCD )
App Config
deploy
23. -> This is how it can look on Kubernetes
Master
Node D
Node C
Worker
Node B
Master
Master
Node A
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker Worker
Worker
Worker
24. -> and with auto scaling …
Master
Node D
Node C
Worker
Node B
Master
Master
Node A
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker Worker
Worker
Worker
Node E …
Worker
Worker
25. Why Kubernetes?
1. Cost $$$ (vs. operation)
2. Easy to manage infra (after initial effort)
3. Fast Autoscaling (1 kB/hr → 1 TB/hr in minutes)
4. Easy to achieve Isolation (low overhead per app)
a. app per customer
b. app per data flow …
26. K8S Challenges
1. Learn k8s ecosystem
2. Initial setups
a. Create k8s cluster
b. Create node group
c. Setup auto scaling
d. Setup IAM policy
e. Install Spark Operator
28. API Server
Spark Operator
Master Pod
Worker Pod
Spark Operator
Spark Operator
Spark Application
Worker Pod
Worker Pod
Cluster
Autoscaler
Spark Application
(custom resource)
Scheduler
34. Resilience to sudden shutdowns
1. If you have custom source connectors:
a. Save state in (checkpointed) Metadata Log
b. Graceful shutdown
Spark Custom
Source
Connector
fetch() checkpoint
Metadata Log
Source
cache
shutdown()
43. Summary
1. Isolation - breakdown applications per functionality, per tenant
2. “What you see is what you get” - get everything out to config (inc. logic!)
a. Easier management
b. Easier debug & testing in Production
c. Easier deployment
3. Kubernetes is awesome (and cheap!)