Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
In-Flux Limiting for a Multi-Tenant Logging Service
Ambud Sharma & Suma Cherukuri
Cloud Platform Engineering @ Symantec
In...
Overview
• Who are we?
• Architecture
• Streaming Pipeline
• Influx Issue
• Influx Limiting Design & Solution
• Conclusion...
Who are we?
• Symantec’s internal cloud team
• Host over $1B+ revenue applications
• Team
– Logging as a Service (LaaS) – ...
Our Data
Logs
• Application and system
logs data from VM’s and
Containers
• Used for troubleshooting
Metrics
• Application...
LMM Architecture
Redis
Customer
Agents
Elasticsearch
InfluxDB
Log Topology
Metrics Topology
Kafka
Logstash
Users
Open to
c...
Streaming Pipeline
• Validate events to match schema to optimize indexing
• Authenticate events to route data to the corre...
Influx Issue
• You know your data store performance
limits (find EPS from benchmark/capacity)
• Tenants send a lot of data...
Influx Limiting
• Normalize the EPS curve using buffers
• Like a Hydro Dam, explicitly allocate EPS resource to tenants
Be...
Design - Options
Approach 1 Approach 2
• Route to separate Kafka topic
• No back-pressure in primary queue
• Secondary que...
Customer Requirements
• Customers want threshold quotas defined for them
• Thresholds defined as policies (duration in sec...
Bolt Design
Kafka
1. Track “Event Rate” for each Tenant for the policy window
2. If threshold exceeds then throttle else a...
Scheduled-task design pattern
• Clock is maintained using
Storm Tick Tuple
• Tenant’s counter is
incremented when event is...
Results
13
• Reduced EPS to
Elasticsearch
• We can normalize
flow rate based on
load
In-Flux Limiting for a Multi-Tenant L...
In-Flux Limiting for a Multi-Tenant Logging Service
Conclusion
• Overview of real-time log and metric indexing
• Approache...
Questions?
In-Flux Limiting for a Multi-Tenant Logging Service 15
Upcoming SlideShare
Loading in …5
×

In Flux Limiting for a multi-tenant logging service

864 views

Published on

In Flux Limiting for a multi-tenant logging service

Published in: Technology
  • Be the first to comment

In Flux Limiting for a multi-tenant logging service

  1. 1. In-Flux Limiting for a Multi-Tenant Logging Service Ambud Sharma & Suma Cherukuri Cloud Platform Engineering @ Symantec In-Flux Limiting for a Multi-Tenant Logging Service 1
  2. 2. Overview • Who are we? • Architecture • Streaming Pipeline • Influx Issue • Influx Limiting Design & Solution • Conclusion • Q & A In-Flux Limiting for a Multi-Tenant Logging Service 2
  3. 3. Who are we? • Symantec’s internal cloud team • Host over $1B+ revenue applications • Team – Logging as a Service (LaaS) – Elasticsearch/Kibana – Metering as a Service (MaaS) – InfluxDB/Grafana – Alerting as a Service (AaaS) – Hendrix We are hiring! Also checkout Hendrix: https://github.com/Symantec/hendrix In-Flux Limiting for a Multi-Tenant Logging Service 3
  4. 4. Our Data Logs • Application and system logs data from VM’s and Containers • Used for troubleshooting Metrics • Application and system telemetries • Used for Application Performance Monitoring { “message”: “User logged in from 1.1.1.1”, “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "value", “path”: “/opt/logstash/sample.log", “tenant_id”: "291167ebed3221a006eb", “apikey”: "06be8a-28ef-4568-8cb8-612", “string_boolean”: "true", “host_ip”: "192.168.99.01" } { “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "host1.symantec.com", “tenant_id”: "291167ebed3221a006ebf6", “apikey”: "06be8a-28ef-4568-8cb8-618", “value”: 0.65, “name”: “cpu” } Log Event Metric Event In-Flux Limiting for a Multi-Tenant Logging Service 4
  5. 5. LMM Architecture Redis Customer Agents Elasticsearch InfluxDB Log Topology Metrics Topology Kafka Logstash Users Open to customers In-Flux Limiting for a Multi-Tenant Logging Service 5
  6. 6. Streaming Pipeline • Validate events to match schema to optimize indexing • Authenticate events to route data to the correct index • Have 1 index per day per tenant Kafka Validate Auth Index In-Flux Limiting for a Multi-Tenant Logging Service 6
  7. 7. Influx Issue • You know your data store performance limits (find EPS from benchmark/capacity) • Tenants send a lot of data and ingestion rate is never linear • Ingestion spikes are bound to happen in a real-time streaming application • Wouldn’t it be great if you could normalize these spikes? In-Flux Limiting for a Multi-Tenant Logging Service 7
  8. 8. Influx Limiting • Normalize the EPS curve using buffers • Like a Hydro Dam, explicitly allocate EPS resource to tenants Before After In-Flux Limiting for a Multi-Tenant Logging Service 8
  9. 9. Design - Options Approach 1 Approach 2 • Route to separate Kafka topic • No back-pressure in primary queue • Secondary queue is drained at a slower pace • Events may appear out of order • Controlled back-pressure in the primary queue • Selectively reduce ingestion rate for tenants • Events will always appear in order In-Flux Limiting for a Multi-Tenant Logging Service 9
  10. 10. Customer Requirements • Customers want threshold quotas defined for them • Thresholds defined as policies (duration in seconds) • Policies saved in a data store Tenant A Tenant B Tenant C { “threshold”: 100, “window”: 90 } { “threshold”: 700, “window”: 10 } { “threshold”: 900, “window”: 1 } In-Flux Limiting for a Multi-Tenant Logging Service 10
  11. 11. Bolt Design Kafka 1. Track “Event Rate” for each Tenant for the policy window 2. If threshold exceeds then throttle else allow the events 3. Reset window when the time interval is complete (tumbling window) Validate Auth Throttle Index In-Flux Limiting for a Multi-Tenant Logging Service 11
  12. 12. Scheduled-task design pattern • Clock is maintained using Storm Tick Tuple • Tenant’s counter is incremented when event is received from it • Counters are reset when modulated value matches Is Time % Throttle Duration = 0? = Tenant Throttle Counter Clock time Modulo Reset counters for each tenant in this sliceNothing to Reset = Tenant Throttle Duration (modulated) Reset counters for each tenant in this slice In-Flux Limiting for a Multi-Tenant Logging Service 12
  13. 13. Results 13 • Reduced EPS to Elasticsearch • We can normalize flow rate based on load In-Flux Limiting for a Multi-Tenant Logging Service
  14. 14. In-Flux Limiting for a Multi-Tenant Logging Service Conclusion • Overview of real-time log and metric indexing • Approaches to rate limit in real-time streaming application • Design pattern to efficiently perform counting in Storm 14 That’s all folks!
  15. 15. Questions? In-Flux Limiting for a Multi-Tenant Logging Service 15

×