SlideShare a Scribd company logo
Sumo Logic Confidential
Christian Beedgen, CTO & Co-Founder, @raychaser
March 19th, 2015
Can You Process
10 Trillion Logs Per Day?
Sumo Logic Confidential
Who Is Christian?
• Co-Founder & CTO, Sumo Logic since 2010
• Server guy, Chief Architect at ArcSight, 2001 - 2009
Sumo Logic Confidential
Agenda
• Why We Are What We Are
• How We Set Things Up
• What We Have Learned
Purpose, Practice, Philosophy
Sumo Logic Confidential
Sumo Logic Confidential
Why We Are What We Are
Sumo Logic Confidential
Machine Data Analytics
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Why Is Sumo Logic A Service
Simply the best
way to deliver
Machine Data
Analytics
Full control 
product
development
efficiency
A Conscious, Fundamental Decision
Sumo Logic Confidential
Why Is Sumo Logic A Service
Simply the best
way to deliver
Machine Data
Analytics
Full control 
product
development
efficiency
A Conscious, Fundamental Decision
Sumo Logic Confidential
Why Machine Data Analytics As A Service
• Machine data is actually Big Data
• Big Data enterprise software is expensive and painful
• As a service, we deliver more value at a lower cost
The Big Data Imperative
Sumo Logic Confidential
What Is Machine Data
Actually, It’s Machine Generated Data
Curt Monash:
“Data that was produced
entirely by machines OR data
that is more about observing
humans than recording their
choices.”
Daniel Abadi:
"Machine-generated data is
data that is generated as a
result of a decision of an
independent computational
agent or a measurement of an
event that is not caused by a
human action."
Sumo Logic Confidential
Example
• Timestamp with time zone!
Sumo Logic Confidential
Example
• Timestamp with time zone!
• Log level
Sumo Logic Confidential
Example
• Timestamp with time zone!
• Log level
• Host ID & module name (process/service)
Sumo Logic Confidential
Example
• Timestamp with time zone!
• Log level
• Host ID & module name (process/service)
• Code location or class
Sumo Logic Confidential
Example
• Timestamp with time zone!
• Log level
• Host ID & module name (process/service)
• Code location or class
• Authentication context
Sumo Logic Confidential
Example
• Timestamp with time zone!
• Log level
• Host ID & module name (process/service)
• Code location or class
• Authentication context
• Key-value pairs
Sumo Logic Confidential
Machine Data Is Big Data
• Volume
– Machine Data is voluminous and will continue to grow
– Our own application creates 1TB/logs per day easily
• Velocity
– Machine Data occurs in real-time, and it is time-stamped
– Needs to be processed in real-time as well
• Variety
– Machine Data is unstructured, or poly-structured at best
– Some standard schemas, but sure enough not for the apps you built
V For Big Data
Sumo Logic Confidential
Enterprise Software
Is An Exercise In
Undifferentiated Heavy-Lifting
(For Your Customer)
Sumo Logic Confidential
Big Data Makes Things Worse
• Cluster-able, modern, N+1 scalable Enterprise Software?
– Old architectures from back when the products were conceived
– How many Enterprise Software applications deploy on Hadoop?
• It’s not just the cost of the software license…
– Lead time and cost of servers, storage, networking, plus the cost of HA, DR
– Cost of the people that are maintaining the infrastructure
• Have you ever upgraded Enterprise Software?
– Latest and greatest is always appealing but not always well tested
– Upgrade the Oracle database, migrate the configuration, migrate the data, …
A Eulogy For Enterprise Software
Sumo Logic Confidential
More Value At A Lower Cost
NO HEAVY-LIFTING REQUIRED
• Using vs. running the product
• Easy on, easy off
A Different Business Model
LOWER TCO
• No server, storage, admin cost
• Vendor economies of scale
Sumo Logic Confidential
Who Watches The Watchmen?
• How much monitoring does your monitoring solution require?
• How much does your monitoring solution add to your monitored footprint?
• Do you know what infinite recursion is all about?
This Stuff Actually Matters In Real Life
Sumo Logic Confidential
Why Is Sumo Logic A Service
Simply the best
way to deliver
Machine Data
Analytics
Full control 
product
development
efficiency
A Conscious, Fundamental Decision
Sumo Logic Confidential
Velocity Increases
Visibility Is Perfect
Cost Decreases
Sumo Logic Confidential
Velocity Increases
Visibility Is Perfect
Cost Decreases
Sumo Logic Confidential
Deployment Environment Is Controlled
• No more surprises due to out of control circumstances
– Misread technical documentation, sheer ignorance, lack of time
– Obscure runtime issues creating hard to track down issues
Control Affords Predictability
Sumo Logic Confidential
Sumo Logic Confidential
Deployment Environment Is Controlled
• No more surprises due to out of control circumstances
– Misread technical documentation, sheer ignorance, lack of time
– Obscure runtime issues creating hard to track down issues
• Code and test against the actual runtime environment
– Control over the full stack removes a lot of variables
– Actually testing in production is still hard, but at least there’s progress
Control Affords Predictability
Sumo Logic Confidential
Only One Production Branch
• Less dimensions in the testing matrix
– One and only one product and version to test
– Bugs on arcane platform not even the developers know
• Every release of enterprise software decreases velocity
– Customers try to avoid the upgrade hassle and cost
– Laggard customer fall further behind, forcing support for older and older versions
Fear The Cartesian Product
Sumo Logic Confidential
Velocity Increases
Visibility Is Perfect
Cost Decreases
Sumo Logic Confidential
The Point Here Is That If You Don’t Exploit The Visibility You Now Have, You Really Should Whack Yourself In The Face Daily
Sumo Logic Confidential
Velocity Increases
Visibility Is Perfect
Cost Decreases
Sumo Logic Confidential
Cost Decreases Along Many Dimensions
• Lower support costs
– Much better visibility leads to faster MTTI of customer issues
– Can also reinvest savings into even better support and have happier customers
• What is the true cost of pissed off customers?
– Immediate financial impact due to churn
– Mid-term brand damage because word travels quickly
• Test and maintenance spend reduction
– Test matrix is greatly reduced
– No maintenance developer spend
Obvious & Hidden Dimensions Play Here
Sumo Logic Confidential
Why Is Sumo Logic A Service
Simply the best
way to deliver
Machine Data
Analytics
Full control 
product
development
efficiency
A Conscious, Fundamental Decision
Sumo Logic Confidential
What Decisions Did We Make?
Don’t Forget, A Bunch Of Software Developers Started This Company
We Need To
Focus on what we know and what we have learned
Do everything in code that can be done in code
Evolve from software architects to system architects
Acknowledge our responsibility for the runtime behavior
Sumo Logic Confidential
How We Set Things Up
Sumo Logic Confidential
Multitenancy
Adaptability
Sumo Logic Confidential
Multitenancy
Sumo Logic Confidential
Better Economics
• No fixed, per-customer costs
– No fixed provisioned infrastructure as compared to managed service offerings
– Customers are cattle, not pets – no per-customer administration cost
• Provision for what is actually used
– Customer usage is not uniform but varies by time of day/week/year
– The good old “Sum of peaks vs. peak of sums” argument
Differentiated Pricing
Sumo Logic Confidential
Just one typical Sumo Logic customer - 8x Variance!
Sumo Logic Confidential
Just one typical Sumo Logic customer - 8x Variance!
Sumo Logic Confidential
Here’s another one – spike at 2.5 of steady…
Sumo Logic Confidential
Or… Sweet, incremental, unfettered growth
Sumo Logic Confidential
Even More Product Development Efficiency
• No version drift
– Only one version of the code, only one version of the configuration
– No time wasted debugging custom, one-off stuff
• Much better update cycle times
– No per-customer configuration stands in the way
– No manual steps anywhere
There Simply Is Just One System
Sumo Logic Confidential
Adaptability
Sumo Logic Confidential
Architect
For An
Extreme Rate
Of Change
Sumo Logic Confidential
Change
Change Is Our Success
ScaleSuccess
Sumo Logic Confidential
Success Leads To Scale
• Charging for ingested data is our business model
• We make more money if we sell more daily ingested data
• We need to be able to scale to ever more daily data
Pretty Clever, Huh?
Sumo Logic Confidential
Scaling Implies Change
• Scaling challenges assumptions about system behavior
• To adopt to the new reality, changes are required
• So in order to scale we need to be able to make changes
There’s The Catch!
Sumo Logic Confidential
We Don’t Know Nothing
• Cannot afford a test system the size of Prod
• Anything short of Prod doesn’t accurately reflect reality
• Reality will surprise you and now the unknown is known
…But That We Do Know
Sumo Logic Confidential
Software-Defined
Software
Because We Are Software Developers
Sumo Logic Confidential
Change Needs To Be Fully Automated
• Change needs to be applied at minimum latency
• There is absolutely no room for error
• Not exactly the ideal tasks for humans
And By Automation, We Mean Software
Sumo Logic Confidential
How?
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Stand On The Shoulders Of Giants
• Developers have always known how to do this
– Operating system, programming languages, libraries
– Focus on the value that your code can add
• Today, we also reuse on the level of services
– AWS has turned the datacenter into an API
– I have not seen a datacenter in 8+ years
The View Is So Much Better Up There
Sumo Logic Confidential
Don’t Reinvent The Wheel
Don’t re-write Lucene
Don’t re-write messaging
Invent New Wheels
Do create distributed indexing
Do write your own query engine
Sumo Logic Confidential
Reuse Ruthlessly
Reduce The Area Of Responsibility For Change
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Identify The Main Functions Of The System
Then, Break Those Down Into Major Parts
We need to
Ingest data and index it so it can be queried
Provide ad-hoc query capabilities for analytics
Be able to update certain query results continuously
Also, shared stuff: Configuration, Encryption, API
Sumo Logic Confidential
Ingestion Path
Receiver Bus Index
Raw
CQ
S3
Sumo Logic Confidential
Analytics Path
Query
Service
CQ
S3
Sumo Logic Confidential
Decompose Vertically
Experts Will Emerge To Deal With Change In Any Area
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Internal SOA
• Now that you have things decomposed…
• …they can be decoupled!
• Avro over messaging bus, or RPC
• Documented protocols
• No poking at private parts
We Always Thought About It That Way Intuitively
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Decouple Dramatically
You Will Have To Change Engines Mid-Flight
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Sumo Logic Confidential
No Magic Here
• Services can reuse code
– Communication, configuration and utility libraries
– Global functionality, service discovery, feature flags
• Service-level layering
– Lower level utility services reused by higher level services
– Can also work great if you want to support multiple implementation languages
This Is Just Obvious Best Practice
Sumo Logic Confidential
Layer Horizontally
Always Try To Limit The Size Of Any Change
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Continuous Integration
Sumo Logic Confidential
Continuous Delivery
Sumo Logic Confidential
Continuous Automation
Sumo Logic Confidential
Everything Continuously
Build A Change Delivery Highway That You Can Trust
Sumo Logic Confidential
Reuse Ruthlessly
Decompose Vertically
Decouple Dramatically
Layer Horizontally
Everything Continuously
Sumo Logic Confidential
Architect
For An
Extreme Rate
Of Change
Sumo Logic Confidential
What We Have Learned
Sumo Logic Confidential
Decomposition
Supports
Scaling
Sumo Logic Confidential
ScaleSuccess
Sumo Logic Confidential
In 2010, we knew that
success will look
something like this…
Sumo Logic Confidential
In 2010, we knew that
success will look
something like this…
Sumo Logic Confidential
Our Service Momentum
Massive Data and Usage Growth
Daily Ingest Searches
2012
#ofGB/TBIngestedaDay
#ofSearchedPerDay
2013 2014 2012 2013 2014
Sumo Logic Confidential
Decomposition Supports Scaling
• Full physical separation between services
– Each node runs exactly one service
– Each service is run on a cluster of nodes (N+2)
• Each service can scale independently
– We require wildly different numbers of nodes based on the service
– Some services can run on 3 nodes, some require 1000s
We Have Absolutely Experienced Scale Induced By Success
Sumo Logic Confidential
Software-Defined
Software
Sumo Logic Confidential
Build, Run – It Is All The Same
• You don’t just write code, you run a system
• You write more code to run the system
• We have all become system architects
• Deployment, operations – build by the Chief Architect
• There is more going on here than just config management
No Functionality, No Feature That Doesn’t Need To Be Operated
Sumo Logic Confidential
dsh
• Model-driven, describe desired state, run to make it so
• High performance due to parallelization
• Covers all layers of the stack – AWS, OS, Sumo Logic
• Easy to use and extend, scriptable CLI
• Developer-friendly, Scala-based, high-level APIs
A Custom Command Line Program To Operate Sumo Logic
Sumo Logic Confidential
Sumo Logic Confidential
Deployments Are Model-Driven
• Model contains concepts
– Deployment
– Cluster
– AWS Resources (Amazon S3, Amazon Elastic Load Balancing, Amazon DynamoDB,
Amazon RDS, etc.)
– Software assemblies
– AWS configuration (IAM users, security groups, etc.)
• Human-readable names: prod-index-5
Sie Ist Ein Model & Sie Sieht Gut Aus
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Differential Deployment
• Start by finding existing resources
– Use tagging where it is available
– Name prefixes (“prod_xxx”) where it isn’t (security groups, IAM, …)
• Fix differences to model
– Start “missing” instances
– Change security group rules, missing IAM users
• Proceed with caution
– Never delete anything that holds data
– Amazon EBS, Amazon DynamoDB, Amazon S3, Amazon RDS
As Good As An Episode Of House, M.D.
Sumo Logic Confidential
Programmable Infrastructure Is Real
Embrace, Evolve & Include It In The Architecture
Sumo Logic Confidential
Microservices
Sumo Logic Confidential
Sumo Logic Confidential
How I Actually Visualize Microservices
Sumo Logic Confidential
Factoring Is Still Important
• Highly cohesive loosely coupled is an ideal
• The same ideal OO is striving for
• Here’s a snapshot of the current Sumo factoring
Sumo Logic Confidential
Deployment wide services
Ingest
Search
Internal tools
receiver
hornetq-
forge
forge
cqsplitter
search
cloud
collector
service
api
con-
cierge
stream
katta
glass,
ganglia
bill
mix
meta
config
zoo-
keeper
appvault org
raw
hornetq-
inbound
cocoa
bloom
filter
analyticscsi
cqmerger
rework
view
autoview
depman
hornetq-
internal
hornetq-
metadata
nrt
• 2 to the power of 5
services (“32”), 170+
modules
• Don’t even ask about the
# of dependencies
• At least 3 of each –
everything is a separately
scalable cluster
Sumo Logic Confidential
Refactor-able Infrastructure
• The same old thing all over again
• Now that infrastructure is code, keep refactoring
• Split things that don’t belong, join others
• Service abstractions can help keeping impact low
• Moving around the code, vs. the protocols
Sumo Logic Confidential
Service Groups Scale
• This is really a level of granularity optimization
• One system: too heavy – 32 systems: too fleeting
• Build a service group, deploy against baseline, test
• During deployment, deploy by service group
• Balances crosscutting integration tests with turnaround
Sumo Logic Confidential
What Is Left To Do?
• Still a notion of a common version across all services
– Weekly “major” releases, end of quarter release freezes
– Even releasing one week of changes can perturb the force majorly
• True continuous delivery
– Red/black deployments
– How to do this in a system with a very high write rate?
• Tooling to support partial updates
– Our own system is a great way to monitor and make decisions
– Work in progress…
Sumo Logic Confidential
You Already Know How To Build Systems
Everything Now Has One More Layer Of Abstraction
Sumo Logic Confidential
There Is
No Place
Like
Production
Sumo Logic Confidential
You Cannot Simulate The Big Datas
• There’s simply no substitute for Production
• This doesn’t mean you shouldn’t have nightly, staging, …
• This doesn’t mean you shouldn’t have integration tests
• This doesn’t mean you shouldn’t test manually
• But there’s just a class of issues you will not find
• You can’t move Production data into testing
• You can’t afford a second Production size system
Sumo Logic Confidential
So Now What?
• Instrument, instrument, instrument
• Monitor, monitor, monitor
• Alert on symptoms
• Basically, don’t worry about 100% CPU, etc.
• Alert on customer impact
• Message ingestion delayed, search takes too long, …
https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/mobileba
sic?pli=1&viewopt=127#h.dmn6k1rdb6jf
Sumo Logic Confidential
Don’t Alert If
You Don’t
Have A
Playbook
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Dev Ops
One
Goal
Sumo Logic Confidential
It’s Not Just Another Layer Of Abstraction
The Damn Thing Actually Is Always On
Sumo Logic Confidential
The Perils Of
Horizontal Scaling
Sumo Logic Confidential
bad(N+2)
Sumo Logic Confidential
Horizontal Scaling Gone Bad
• The ideal scenario: work stealing
• One queue of tasks, bunch of workers
• Grab from queue, work work work, happy
Sumo Logic Confidential
Horizontal Scaling Gone Bad
• Scaling out a multi-tenant processing system
• 1000s of customers, 1000s of machines
• Parallelism is good, but locality has to be considered
• 1 customer distributed over 1000 machines is bad
• No single machine getting enough load for that customer
• Batches & shards will become too small
• Metadata and in-memory structures grow too much
Sumo Logic Confidential
Horizontal Scaling Gone Bad
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Sumo Logic Confidential
Horizontal Scaling Gone Bad
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Sumo Logic Confidential
Horizontal Scaling Gone Bad
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 2
Sumo Logic Confidential
Horizontal Scaling Gone Bad
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
Sumo Logic Confidential
Horizontal Scaling With Partitioning
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 1
1 1
Sumo Logic Confidential
Horizontal Scaling With Partitioning
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 1 2 2 2
1 1 2 2 2
Sumo Logic Confidential
Horizontal Scaling With Partitioning
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
Index
1 3 4 1 3 4 2 3 5 2 3 5 2 3 6
7 7 5 8 5 8
1 3 4 1 3 4 2 3 5 2 3 5 2 3 6
7 7 5 8 5 8
7 7 5 8 5 8
5 8
5 8
5 8
6
6
6
Sumo Logic Confidential
Partitioning By Customer
• Each cluster elects a leader node via Zookeeper
• Leader runs the partitioning logic
•
• Partitioning written to Zookeeper
• Example: indexer node knows which customer’s
message blocks to pull from message bus
Set[Customer], Set[Instance]  Map[Instance, Set[Customer]]
Sumo Logic Confidential
When Not To Scale (Without Bounds)
Less Is More, Locality Matters
Sumo Logic Confidential
Copy & Paste Scaling
Sumo Logic Confidential
So You Keep Adding Customers…
• Your current system is getting achy
• Next order of magnitude on the horizon
• You start to understand what you need to rebuild
• Your quarter ends in 21 days…
Sumo Logic Confidential
Sometimes, You Need To Break The Rules
• Copy & Paste ScalingTM
• Copy your deployment descriptor files & metadata
• Point them at a different region and pull the trigger
• Instant 2x scaling!
Sumo Logic Confidential
Chose Your Battles Wisely
Sometimes, Pragmatism Will Win Over Fundamentalism
Sumo Logic Confidential
Sumo Logic Confidential
Fin
@raychaser
Sumo Logic Confidential
Sumo Logic Confidential
Nostalgia For The Future
Sumo Logic Confidential
The Limits Of Physical Separation
• Every service runs on its own set of instances
– We have consciously reinforced service decoupling by full physical separation
– This was very important but we now have the discipline to keep things lose!
• Instances are right-sized for each service
– Is this really the best approach for cost efficiency
– Are we not using CPU on one cluster but heavy I/O and vice-versa on another?
• Denser packing and more dynamic placement
– Just one type of instance plus Mesos, Kubernetes, etc. to schedule the processes?
– Docker makes sense in this context, but we are JVM-based…
Sumo Logic Confidential
Data Is A Movement
• Heavily based around the idea of a physical pipeline
– This causes an enormous amount of data movement
– Should we be moving the data to the computation?
• Data movement logical hard as well in light of partitioning
– Some of our systems attempt to partition based on load
– Others use more static assignment of tenants to nodes in a cluster
• Locality for caches in light of partitioning
– In-memory and ephemeral disk caches bound to instances
– Dynamically adjusting resources much harder in this scenario
Sumo Logic Confidential
State-Based Auctioning
• Work-stealing based on a closed loop system
• Every instance is a data instance for memory and “disk”
• Every piece of data is tagged with tenant, etc.
• Clients don’t address RPCs to instances, just submit request
• Instances compete based on their local knowledge
• Caller will get a promise from the auction winning instance
• Ultimately caller will get the result
• Periodically, state across instances is centralized
• Quotas and limits are computed and distributed to instances
• Prevent over-distribution to maintain locality or cost-envelope
Sumo Logic Confidential
Assembly Details
Sumo Logic Confidential
Receiver
• HTTPS endpoint behind Elastic Load Balancing
• Decompress messages from Collector
• Extract timestamps from messages
• Aggregate messages per-customer into blocks
• Flush blocks to message bus
• Ack to Collector
Sumo Logic Confidential
Raw
• Receive message blocks from message bus
• Encrypt message blocks
• Different key for every day for every customer
• Flush encrypted message blocks to Amazon S3
• Copy blocks as CSV to customer’s Amazon S3 bucket
• Ack to message bus
Sumo Logic Confidential
Index
• Receive message blocks from message bus
• Cache message block on disk and ack to message bus
• Add message blocks to Lucene indexes
• Deal with wildly varying timestamps
• Flush index shards to Amazon S3
• Update meta data database with index shard info
Sumo Logic Confidential
Continuous Query
• Receive message blocks from message bus
• Evaluate each message against all search expressions
• Push matching messages into respective pipelines
• Ack to message bus
• Flush results periodically for pickup by client
• Persist checkpoints periodically to Amazon S3
Sumo Logic Confidential
Query
• Fully distributed streaming query engine
• Materialize messages matching search expression
• Push messages through a pipeline of operators
• First stage – non-aggregation operators
• Second stage – aggregation operators
• Present both raw message results as well as aggregates
• Results update periodically for interactive UI experience
Sumo Logic Confidential
Software-Defined Software
Sumo Logic Confidential
Making It Fast
• Parallelize all the things
– Upload to Amazon S3 while booting instances while creating IAM users while setting
up security groups while…
– Hyper-concurrent rolling restarts
• Fast enough for development
– Write new code or fix a bug, compile locally
– Push code to development deployment and make it live
• Optimize data transfers
– Use Amazon S3 hashes to only transfer new files
– Only upload changed JARs
Sumo Logic Confidential
Making It Reliable
• Check prerequisites before you even try
– Does Prod account have room for this many instances?
– Do I have the required permissions for the AWS APIs?
– Any model discrepancies I can’t automatically resolve? Too many Amazon EBS
volumes?
• Handle common failures automatically
– No m1.large in us-east-1b? Move Amazon EBS volumes to us-west-1c and try there
– Hitting the AWS API rate limit? Throttle and try again
– SSH didn’t come up on the instance? Kill it and launch another
– Eventual consistency in AWS– query until it has the expected state (tags)
Sumo Logic Confidential
Making It Secure
• Different AWS accounts
– Per developer
– Production
• account.xml
– All credentials for one AWS
account (AWS keys, SSH
keys)
– Password-protected
• IAM
– One user per Sumo component
– Minimal IAM policy
– Inject AWS credentials
• Security Groups
– Part of the model
– Minimal privileges
Sumo Logic Confidential
Making It Safe
• Let mistakes happen at most once
• Add safeguards to prevent operator mistakes
• Type in the deployment name before deleting anything
• Disallow risky operations in production (shutdown Prod)
• Don’t allow –SNAPSHOT code to be deployed in production
Sumo Logic Confidential
Making It Easy
• Automate best practices
– Distribute instances over availability zones evenly
– Register instances in Elastic Load Balancing and match AZs to instances
– Tag all resources consistently
• Consistent naming
– Generate SSH with logical names
Sumo Logic Confidential
Making It Affordable
• Developers forget to shut stuff down
– Deployment reaper automatically shuts down deployments
– Daily cost emails
• Per-team budgets
– Manager responsible to
keep within budget
Sumo Logic Confidential
Pitfalls
• Base AMI plus scripted installation prevents auto scaling
• Security group updates cause TCP disconnects
• This is fixed in the VPC stack, however
• Parallelism can cause stampedes (for example, Amazon
DynamoDB)
• Tagging API rate limits are easy to hit

More Related Content

What's hot

What's hot (20)

Cloud Migration for Financial Services - Toronto - October 2016
Cloud Migration for Financial Services - Toronto - October 2016Cloud Migration for Financial Services - Toronto - October 2016
Cloud Migration for Financial Services - Toronto - October 2016
 
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Serverless solutions - AWS Summit SG 2017
Serverless solutions - AWS Summit SG 2017 Serverless solutions - AWS Summit SG 2017
Serverless solutions - AWS Summit SG 2017
 
AWSome Day Brussels - Keynote
AWSome Day Brussels - Keynote AWSome Day Brussels - Keynote
AWSome Day Brussels - Keynote
 
AWS Innovate 2016: Digital Workloads on Amazon Web Services- Santanu Dutt
AWS Innovate 2016: Digital Workloads on Amazon Web Services- Santanu DuttAWS Innovate 2016: Digital Workloads on Amazon Web Services- Santanu Dutt
AWS Innovate 2016: Digital Workloads on Amazon Web Services- Santanu Dutt
 
AWS Cloud Experience CA: Soluciones SAP en AWS: maximice su valor de negocio
AWS Cloud Experience CA: Soluciones SAP en AWS: maximice su valor de negocioAWS Cloud Experience CA: Soluciones SAP en AWS: maximice su valor de negocio
AWS Cloud Experience CA: Soluciones SAP en AWS: maximice su valor de negocio
 
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
 
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
 
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
 
Building and Operating Clouds
Building and Operating CloudsBuilding and Operating Clouds
Building and Operating Clouds
 
Why companies are moving Infor applications to cloud
Why companies are moving Infor applications to cloudWhy companies are moving Infor applications to cloud
Why companies are moving Infor applications to cloud
 
Husqvarna Group
Husqvarna GroupHusqvarna Group
Husqvarna Group
 
Container Soup for Your Soul: The Microservice Edition, Building Deployment ...
 Container Soup for Your Soul: The Microservice Edition, Building Deployment ... Container Soup for Your Soul: The Microservice Edition, Building Deployment ...
Container Soup for Your Soul: The Microservice Edition, Building Deployment ...
 
AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...
AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...
AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...
 
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
 
Digital Workloads on AWS
Digital Workloads on AWSDigital Workloads on AWS
Digital Workloads on AWS
 
DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesDevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
Partner Solutions: Splunk - Cloud Is a Journey. Make Splunk Your Partner
Partner Solutions: Splunk - Cloud Is a Journey. Make Splunk Your PartnerPartner Solutions: Splunk - Cloud Is a Journey. Make Splunk Your Partner
Partner Solutions: Splunk - Cloud Is a Journey. Make Splunk Your Partner
 
Accelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAccelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform Services
 

Viewers also liked

Viewers also liked (8)

Introduction to AllJoyn
Introduction to AllJoynIntroduction to AllJoyn
Introduction to AllJoyn
 
Catching the Internet of Things (IoT) Wave
Catching the Internet of Things (IoT) WaveCatching the Internet of Things (IoT) Wave
Catching the Internet of Things (IoT) Wave
 
Something about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fastSomething about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fast
 
Kubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerKubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning Controller
 
Idea to Production - with Gitlab and Kubernetes
Idea to Production  - with Gitlab and KubernetesIdea to Production  - with Gitlab and Kubernetes
Idea to Production - with Gitlab and Kubernetes
 
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
 
K8S in prod
K8S in prodK8S in prod
K8S in prod
 
Net core, mssql, container und kubernetes
Net core, mssql, container und kubernetesNet core, mssql, container und kubernetes
Net core, mssql, container und kubernetes
 

Similar to Can you process 10 trillion logs per day software architecture conference 2015

Curiosity Software, Infuse and Kumoco present: The Democratisation of Testing
Curiosity Software, Infuse and Kumoco present: The Democratisation of TestingCuriosity Software, Infuse and Kumoco present: The Democratisation of Testing
Curiosity Software, Infuse and Kumoco present: The Democratisation of Testing
Curiosity Software Ireland
 
NNT Business Solutions - NNTServe Overview
NNT Business Solutions - NNTServe OverviewNNT Business Solutions - NNTServe Overview
NNT Business Solutions - NNTServe Overview
NNT Solutions
 

Similar to Can you process 10 trillion logs per day software architecture conference 2015 (20)

Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
 
Business Analytics as a Service
Business Analytics as a ServiceBusiness Analytics as a Service
Business Analytics as a Service
 
What Is Your PLM Challenge - Decrease downtime and minimize production problems
What Is Your PLM Challenge - Decrease downtime and minimize production problemsWhat Is Your PLM Challenge - Decrease downtime and minimize production problems
What Is Your PLM Challenge - Decrease downtime and minimize production problems
 
NYC MeetUp 10.9
NYC MeetUp 10.9NYC MeetUp 10.9
NYC MeetUp 10.9
 
Kanban testing
Kanban testingKanban testing
Kanban testing
 
Curiosity Software, Infuse and Kumoco present: The Democratisation of Testing
Curiosity Software, Infuse and Kumoco present: The Democratisation of TestingCuriosity Software, Infuse and Kumoco present: The Democratisation of Testing
Curiosity Software, Infuse and Kumoco present: The Democratisation of Testing
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
From no services to Microservices
From no services to MicroservicesFrom no services to Microservices
From no services to Microservices
 
Boston MeetUp 10.10
Boston MeetUp 10.10Boston MeetUp 10.10
Boston MeetUp 10.10
 
Automic Support Tips and Tricks
Automic Support Tips and TricksAutomic Support Tips and Tricks
Automic Support Tips and Tricks
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013
 
NNT Business Solutions - NNTServe Overview
NNT Business Solutions - NNTServe OverviewNNT Business Solutions - NNTServe Overview
NNT Business Solutions - NNTServe Overview
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
 
Shorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOpsShorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOps
 
How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...
How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...
How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...
 
Continuous Performance Testing: The New Standard
Continuous Performance Testing: The New StandardContinuous Performance Testing: The New Standard
Continuous Performance Testing: The New Standard
 
Performance Testing: Evolving The Performance
Performance Testing: Evolving The PerformancePerformance Testing: Evolving The Performance
Performance Testing: Evolving The Performance
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
IT due diligence and software quality for fintech startups
IT due diligence and software quality for fintech startupsIT due diligence and software quality for fintech startups
IT due diligence and software quality for fintech startups
 

More from Sumo Logic

More from Sumo Logic (20)

Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
Welcome Webinar PDF
Welcome Webinar PDFWelcome Webinar PDF
Welcome Webinar PDF
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
Sumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & ComplianceSumo Logic Cert Jam - Security & Compliance
Sumo Logic Cert Jam - Security & Compliance
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics Mastery
 
Sumo Logic Cert Jam - Security Analytics
Sumo Logic Cert Jam - Security AnalyticsSumo Logic Cert Jam - Security Analytics
Sumo Logic Cert Jam - Security Analytics
 
Sumo Logic Cert Jam - Administration
Sumo Logic Cert Jam - AdministrationSumo Logic Cert Jam - Administration
Sumo Logic Cert Jam - Administration
 
Sumo Logic Cert Jam - Search Mastery
Sumo Logic Cert Jam - Search MasterySumo Logic Cert Jam - Search Mastery
Sumo Logic Cert Jam - Search Mastery
 
Sumo Logic Cert Jam - Fundamentals
Sumo Logic Cert Jam - FundamentalsSumo Logic Cert Jam - Fundamentals
Sumo Logic Cert Jam - Fundamentals
 
Sumo Logic Cert Jam - Fundamentals (Spanish)
Sumo Logic Cert Jam - Fundamentals (Spanish)Sumo Logic Cert Jam - Fundamentals (Spanish)
Sumo Logic Cert Jam - Fundamentals (Spanish)
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics Mastery
 
Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018Security Certification: Security Analytics using Sumo Logic - Oct 2018
Security Certification: Security Analytics using Sumo Logic - Oct 2018
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018
 
Level 2 Certification: Using Sumo Logic - Oct 2018
Level 2 Certification: Using Sumo Logic - Oct 2018Level 2 Certification: Using Sumo Logic - Oct 2018
Level 2 Certification: Using Sumo Logic - Oct 2018
 
Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
 
Sumo Logic QuickStart Webinar - Get Certified
Sumo Logic QuickStart Webinar - Get CertifiedSumo Logic QuickStart Webinar - Get Certified
Sumo Logic QuickStart Webinar - Get Certified
 
You Build It, You Secure It: Introduction to DevSecOps
You Build It, You Secure It: Introduction to DevSecOpsYou Build It, You Secure It: Introduction to DevSecOps
You Build It, You Secure It: Introduction to DevSecOps
 
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic Webinar
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic WebinarMaking the Shift from DevOps to Practical DevSecOps | Sumo Logic Webinar
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic Webinar
 

Can you process 10 trillion logs per day software architecture conference 2015

  • 1. Sumo Logic Confidential Christian Beedgen, CTO & Co-Founder, @raychaser March 19th, 2015 Can You Process 10 Trillion Logs Per Day?
  • 2. Sumo Logic Confidential Who Is Christian? • Co-Founder & CTO, Sumo Logic since 2010 • Server guy, Chief Architect at ArcSight, 2001 - 2009
  • 3. Sumo Logic Confidential Agenda • Why We Are What We Are • How We Set Things Up • What We Have Learned Purpose, Practice, Philosophy
  • 5. Sumo Logic Confidential Why We Are What We Are
  • 9. Sumo Logic Confidential Why Is Sumo Logic A Service Simply the best way to deliver Machine Data Analytics Full control  product development efficiency A Conscious, Fundamental Decision
  • 10. Sumo Logic Confidential Why Is Sumo Logic A Service Simply the best way to deliver Machine Data Analytics Full control  product development efficiency A Conscious, Fundamental Decision
  • 11. Sumo Logic Confidential Why Machine Data Analytics As A Service • Machine data is actually Big Data • Big Data enterprise software is expensive and painful • As a service, we deliver more value at a lower cost The Big Data Imperative
  • 12. Sumo Logic Confidential What Is Machine Data Actually, It’s Machine Generated Data Curt Monash: “Data that was produced entirely by machines OR data that is more about observing humans than recording their choices.” Daniel Abadi: "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action."
  • 13. Sumo Logic Confidential Example • Timestamp with time zone!
  • 14. Sumo Logic Confidential Example • Timestamp with time zone! • Log level
  • 15. Sumo Logic Confidential Example • Timestamp with time zone! • Log level • Host ID & module name (process/service)
  • 16. Sumo Logic Confidential Example • Timestamp with time zone! • Log level • Host ID & module name (process/service) • Code location or class
  • 17. Sumo Logic Confidential Example • Timestamp with time zone! • Log level • Host ID & module name (process/service) • Code location or class • Authentication context
  • 18. Sumo Logic Confidential Example • Timestamp with time zone! • Log level • Host ID & module name (process/service) • Code location or class • Authentication context • Key-value pairs
  • 19. Sumo Logic Confidential Machine Data Is Big Data • Volume – Machine Data is voluminous and will continue to grow – Our own application creates 1TB/logs per day easily • Velocity – Machine Data occurs in real-time, and it is time-stamped – Needs to be processed in real-time as well • Variety – Machine Data is unstructured, or poly-structured at best – Some standard schemas, but sure enough not for the apps you built V For Big Data
  • 20. Sumo Logic Confidential Enterprise Software Is An Exercise In Undifferentiated Heavy-Lifting (For Your Customer)
  • 21. Sumo Logic Confidential Big Data Makes Things Worse • Cluster-able, modern, N+1 scalable Enterprise Software? – Old architectures from back when the products were conceived – How many Enterprise Software applications deploy on Hadoop? • It’s not just the cost of the software license… – Lead time and cost of servers, storage, networking, plus the cost of HA, DR – Cost of the people that are maintaining the infrastructure • Have you ever upgraded Enterprise Software? – Latest and greatest is always appealing but not always well tested – Upgrade the Oracle database, migrate the configuration, migrate the data, … A Eulogy For Enterprise Software
  • 22. Sumo Logic Confidential More Value At A Lower Cost NO HEAVY-LIFTING REQUIRED • Using vs. running the product • Easy on, easy off A Different Business Model LOWER TCO • No server, storage, admin cost • Vendor economies of scale
  • 23. Sumo Logic Confidential Who Watches The Watchmen? • How much monitoring does your monitoring solution require? • How much does your monitoring solution add to your monitored footprint? • Do you know what infinite recursion is all about? This Stuff Actually Matters In Real Life
  • 24. Sumo Logic Confidential Why Is Sumo Logic A Service Simply the best way to deliver Machine Data Analytics Full control  product development efficiency A Conscious, Fundamental Decision
  • 25. Sumo Logic Confidential Velocity Increases Visibility Is Perfect Cost Decreases
  • 26. Sumo Logic Confidential Velocity Increases Visibility Is Perfect Cost Decreases
  • 27. Sumo Logic Confidential Deployment Environment Is Controlled • No more surprises due to out of control circumstances – Misread technical documentation, sheer ignorance, lack of time – Obscure runtime issues creating hard to track down issues Control Affords Predictability
  • 29. Sumo Logic Confidential Deployment Environment Is Controlled • No more surprises due to out of control circumstances – Misread technical documentation, sheer ignorance, lack of time – Obscure runtime issues creating hard to track down issues • Code and test against the actual runtime environment – Control over the full stack removes a lot of variables – Actually testing in production is still hard, but at least there’s progress Control Affords Predictability
  • 30. Sumo Logic Confidential Only One Production Branch • Less dimensions in the testing matrix – One and only one product and version to test – Bugs on arcane platform not even the developers know • Every release of enterprise software decreases velocity – Customers try to avoid the upgrade hassle and cost – Laggard customer fall further behind, forcing support for older and older versions Fear The Cartesian Product
  • 31. Sumo Logic Confidential Velocity Increases Visibility Is Perfect Cost Decreases
  • 32. Sumo Logic Confidential The Point Here Is That If You Don’t Exploit The Visibility You Now Have, You Really Should Whack Yourself In The Face Daily
  • 33. Sumo Logic Confidential Velocity Increases Visibility Is Perfect Cost Decreases
  • 34. Sumo Logic Confidential Cost Decreases Along Many Dimensions • Lower support costs – Much better visibility leads to faster MTTI of customer issues – Can also reinvest savings into even better support and have happier customers • What is the true cost of pissed off customers? – Immediate financial impact due to churn – Mid-term brand damage because word travels quickly • Test and maintenance spend reduction – Test matrix is greatly reduced – No maintenance developer spend Obvious & Hidden Dimensions Play Here
  • 35. Sumo Logic Confidential Why Is Sumo Logic A Service Simply the best way to deliver Machine Data Analytics Full control  product development efficiency A Conscious, Fundamental Decision
  • 36. Sumo Logic Confidential What Decisions Did We Make? Don’t Forget, A Bunch Of Software Developers Started This Company We Need To Focus on what we know and what we have learned Do everything in code that can be done in code Evolve from software architects to system architects Acknowledge our responsibility for the runtime behavior
  • 37. Sumo Logic Confidential How We Set Things Up
  • 40. Sumo Logic Confidential Better Economics • No fixed, per-customer costs – No fixed provisioned infrastructure as compared to managed service offerings – Customers are cattle, not pets – no per-customer administration cost • Provision for what is actually used – Customer usage is not uniform but varies by time of day/week/year – The good old “Sum of peaks vs. peak of sums” argument Differentiated Pricing
  • 41. Sumo Logic Confidential Just one typical Sumo Logic customer - 8x Variance!
  • 42. Sumo Logic Confidential Just one typical Sumo Logic customer - 8x Variance!
  • 43. Sumo Logic Confidential Here’s another one – spike at 2.5 of steady…
  • 44. Sumo Logic Confidential Or… Sweet, incremental, unfettered growth
  • 45. Sumo Logic Confidential Even More Product Development Efficiency • No version drift – Only one version of the code, only one version of the configuration – No time wasted debugging custom, one-off stuff • Much better update cycle times – No per-customer configuration stands in the way – No manual steps anywhere There Simply Is Just One System
  • 47. Sumo Logic Confidential Architect For An Extreme Rate Of Change
  • 48. Sumo Logic Confidential Change Change Is Our Success ScaleSuccess
  • 49. Sumo Logic Confidential Success Leads To Scale • Charging for ingested data is our business model • We make more money if we sell more daily ingested data • We need to be able to scale to ever more daily data Pretty Clever, Huh?
  • 50. Sumo Logic Confidential Scaling Implies Change • Scaling challenges assumptions about system behavior • To adopt to the new reality, changes are required • So in order to scale we need to be able to make changes There’s The Catch!
  • 51. Sumo Logic Confidential We Don’t Know Nothing • Cannot afford a test system the size of Prod • Anything short of Prod doesn’t accurately reflect reality • Reality will surprise you and now the unknown is known …But That We Do Know
  • 53. Sumo Logic Confidential Change Needs To Be Fully Automated • Change needs to be applied at minimum latency • There is absolutely no room for error • Not exactly the ideal tasks for humans And By Automation, We Mean Software
  • 55. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 56. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 57. Sumo Logic Confidential Stand On The Shoulders Of Giants • Developers have always known how to do this – Operating system, programming languages, libraries – Focus on the value that your code can add • Today, we also reuse on the level of services – AWS has turned the datacenter into an API – I have not seen a datacenter in 8+ years The View Is So Much Better Up There
  • 58. Sumo Logic Confidential Don’t Reinvent The Wheel Don’t re-write Lucene Don’t re-write messaging Invent New Wheels Do create distributed indexing Do write your own query engine
  • 59. Sumo Logic Confidential Reuse Ruthlessly Reduce The Area Of Responsibility For Change
  • 60. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 61. Sumo Logic Confidential Identify The Main Functions Of The System Then, Break Those Down Into Major Parts We need to Ingest data and index it so it can be queried Provide ad-hoc query capabilities for analytics Be able to update certain query results continuously Also, shared stuff: Configuration, Encryption, API
  • 62. Sumo Logic Confidential Ingestion Path Receiver Bus Index Raw CQ S3
  • 63. Sumo Logic Confidential Analytics Path Query Service CQ S3
  • 64. Sumo Logic Confidential Decompose Vertically Experts Will Emerge To Deal With Change In Any Area
  • 65. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 68. Sumo Logic Confidential Internal SOA • Now that you have things decomposed… • …they can be decoupled! • Avro over messaging bus, or RPC • Documented protocols • No poking at private parts We Always Thought About It That Way Intuitively
  • 71. Sumo Logic Confidential Decouple Dramatically You Will Have To Change Engines Mid-Flight
  • 72. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 74. Sumo Logic Confidential No Magic Here • Services can reuse code – Communication, configuration and utility libraries – Global functionality, service discovery, feature flags • Service-level layering – Lower level utility services reused by higher level services – Can also work great if you want to support multiple implementation languages This Is Just Obvious Best Practice
  • 75. Sumo Logic Confidential Layer Horizontally Always Try To Limit The Size Of Any Change
  • 76. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 80. Sumo Logic Confidential Everything Continuously Build A Change Delivery Highway That You Can Trust
  • 81. Sumo Logic Confidential Reuse Ruthlessly Decompose Vertically Decouple Dramatically Layer Horizontally Everything Continuously
  • 82. Sumo Logic Confidential Architect For An Extreme Rate Of Change
  • 83. Sumo Logic Confidential What We Have Learned
  • 86. Sumo Logic Confidential In 2010, we knew that success will look something like this…
  • 87. Sumo Logic Confidential In 2010, we knew that success will look something like this…
  • 88. Sumo Logic Confidential Our Service Momentum Massive Data and Usage Growth Daily Ingest Searches 2012 #ofGB/TBIngestedaDay #ofSearchedPerDay 2013 2014 2012 2013 2014
  • 89. Sumo Logic Confidential Decomposition Supports Scaling • Full physical separation between services – Each node runs exactly one service – Each service is run on a cluster of nodes (N+2) • Each service can scale independently – We require wildly different numbers of nodes based on the service – Some services can run on 3 nodes, some require 1000s We Have Absolutely Experienced Scale Induced By Success
  • 91. Sumo Logic Confidential Build, Run – It Is All The Same • You don’t just write code, you run a system • You write more code to run the system • We have all become system architects • Deployment, operations – build by the Chief Architect • There is more going on here than just config management No Functionality, No Feature That Doesn’t Need To Be Operated
  • 92. Sumo Logic Confidential dsh • Model-driven, describe desired state, run to make it so • High performance due to parallelization • Covers all layers of the stack – AWS, OS, Sumo Logic • Easy to use and extend, scriptable CLI • Developer-friendly, Scala-based, high-level APIs A Custom Command Line Program To Operate Sumo Logic
  • 94. Sumo Logic Confidential Deployments Are Model-Driven • Model contains concepts – Deployment – Cluster – AWS Resources (Amazon S3, Amazon Elastic Load Balancing, Amazon DynamoDB, Amazon RDS, etc.) – Software assemblies – AWS configuration (IAM users, security groups, etc.) • Human-readable names: prod-index-5 Sie Ist Ein Model & Sie Sieht Gut Aus
  • 97. Sumo Logic Confidential Differential Deployment • Start by finding existing resources – Use tagging where it is available – Name prefixes (“prod_xxx”) where it isn’t (security groups, IAM, …) • Fix differences to model – Start “missing” instances – Change security group rules, missing IAM users • Proceed with caution – Never delete anything that holds data – Amazon EBS, Amazon DynamoDB, Amazon S3, Amazon RDS As Good As An Episode Of House, M.D.
  • 98. Sumo Logic Confidential Programmable Infrastructure Is Real Embrace, Evolve & Include It In The Architecture
  • 101. Sumo Logic Confidential How I Actually Visualize Microservices
  • 102. Sumo Logic Confidential Factoring Is Still Important • Highly cohesive loosely coupled is an ideal • The same ideal OO is striving for • Here’s a snapshot of the current Sumo factoring
  • 103. Sumo Logic Confidential Deployment wide services Ingest Search Internal tools receiver hornetq- forge forge cqsplitter search cloud collector service api con- cierge stream katta glass, ganglia bill mix meta config zoo- keeper appvault org raw hornetq- inbound cocoa bloom filter analyticscsi cqmerger rework view autoview depman hornetq- internal hornetq- metadata nrt • 2 to the power of 5 services (“32”), 170+ modules • Don’t even ask about the # of dependencies • At least 3 of each – everything is a separately scalable cluster
  • 104. Sumo Logic Confidential Refactor-able Infrastructure • The same old thing all over again • Now that infrastructure is code, keep refactoring • Split things that don’t belong, join others • Service abstractions can help keeping impact low • Moving around the code, vs. the protocols
  • 105. Sumo Logic Confidential Service Groups Scale • This is really a level of granularity optimization • One system: too heavy – 32 systems: too fleeting • Build a service group, deploy against baseline, test • During deployment, deploy by service group • Balances crosscutting integration tests with turnaround
  • 106. Sumo Logic Confidential What Is Left To Do? • Still a notion of a common version across all services – Weekly “major” releases, end of quarter release freezes – Even releasing one week of changes can perturb the force majorly • True continuous delivery – Red/black deployments – How to do this in a system with a very high write rate? • Tooling to support partial updates – Our own system is a great way to monitor and make decisions – Work in progress…
  • 107. Sumo Logic Confidential You Already Know How To Build Systems Everything Now Has One More Layer Of Abstraction
  • 108. Sumo Logic Confidential There Is No Place Like Production
  • 109. Sumo Logic Confidential You Cannot Simulate The Big Datas • There’s simply no substitute for Production • This doesn’t mean you shouldn’t have nightly, staging, … • This doesn’t mean you shouldn’t have integration tests • This doesn’t mean you shouldn’t test manually • But there’s just a class of issues you will not find • You can’t move Production data into testing • You can’t afford a second Production size system
  • 110. Sumo Logic Confidential So Now What? • Instrument, instrument, instrument • Monitor, monitor, monitor • Alert on symptoms • Basically, don’t worry about 100% CPU, etc. • Alert on customer impact • Message ingestion delayed, search takes too long, … https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/mobileba sic?pli=1&viewopt=127#h.dmn6k1rdb6jf
  • 111. Sumo Logic Confidential Don’t Alert If You Don’t Have A Playbook
  • 117. Sumo Logic Confidential It’s Not Just Another Layer Of Abstraction The Damn Thing Actually Is Always On
  • 118. Sumo Logic Confidential The Perils Of Horizontal Scaling
  • 120. Sumo Logic Confidential Horizontal Scaling Gone Bad • The ideal scenario: work stealing • One queue of tasks, bunch of workers • Grab from queue, work work work, happy
  • 121. Sumo Logic Confidential Horizontal Scaling Gone Bad • Scaling out a multi-tenant processing system • 1000s of customers, 1000s of machines • Parallelism is good, but locality has to be considered • 1 customer distributed over 1000 machines is bad • No single machine getting enough load for that customer • Batches & shards will become too small • Metadata and in-memory structures grow too much
  • 122. Sumo Logic Confidential Horizontal Scaling Gone Bad Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index
  • 123. Sumo Logic Confidential Horizontal Scaling Gone Bad Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 124. Sumo Logic Confidential Horizontal Scaling Gone Bad Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
  • 125. Sumo Logic Confidential Horizontal Scaling Gone Bad Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
  • 126. Sumo Logic Confidential Horizontal Scaling With Partitioning Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 1 1 1
  • 127. Sumo Logic Confidential Horizontal Scaling With Partitioning Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 1 2 2 2 1 1 2 2 2
  • 128. Sumo Logic Confidential Horizontal Scaling With Partitioning Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index 1 3 4 1 3 4 2 3 5 2 3 5 2 3 6 7 7 5 8 5 8 1 3 4 1 3 4 2 3 5 2 3 5 2 3 6 7 7 5 8 5 8 7 7 5 8 5 8 5 8 5 8 5 8 6 6 6
  • 129. Sumo Logic Confidential Partitioning By Customer • Each cluster elects a leader node via Zookeeper • Leader runs the partitioning logic • • Partitioning written to Zookeeper • Example: indexer node knows which customer’s message blocks to pull from message bus Set[Customer], Set[Instance]  Map[Instance, Set[Customer]]
  • 130. Sumo Logic Confidential When Not To Scale (Without Bounds) Less Is More, Locality Matters
  • 131. Sumo Logic Confidential Copy & Paste Scaling
  • 132. Sumo Logic Confidential So You Keep Adding Customers… • Your current system is getting achy • Next order of magnitude on the horizon • You start to understand what you need to rebuild • Your quarter ends in 21 days…
  • 133. Sumo Logic Confidential Sometimes, You Need To Break The Rules • Copy & Paste ScalingTM • Copy your deployment descriptor files & metadata • Point them at a different region and pull the trigger • Instant 2x scaling!
  • 134. Sumo Logic Confidential Chose Your Battles Wisely Sometimes, Pragmatism Will Win Over Fundamentalism
  • 139. Sumo Logic Confidential The Limits Of Physical Separation • Every service runs on its own set of instances – We have consciously reinforced service decoupling by full physical separation – This was very important but we now have the discipline to keep things lose! • Instances are right-sized for each service – Is this really the best approach for cost efficiency – Are we not using CPU on one cluster but heavy I/O and vice-versa on another? • Denser packing and more dynamic placement – Just one type of instance plus Mesos, Kubernetes, etc. to schedule the processes? – Docker makes sense in this context, but we are JVM-based…
  • 140. Sumo Logic Confidential Data Is A Movement • Heavily based around the idea of a physical pipeline – This causes an enormous amount of data movement – Should we be moving the data to the computation? • Data movement logical hard as well in light of partitioning – Some of our systems attempt to partition based on load – Others use more static assignment of tenants to nodes in a cluster • Locality for caches in light of partitioning – In-memory and ephemeral disk caches bound to instances – Dynamically adjusting resources much harder in this scenario
  • 141. Sumo Logic Confidential State-Based Auctioning • Work-stealing based on a closed loop system • Every instance is a data instance for memory and “disk” • Every piece of data is tagged with tenant, etc. • Clients don’t address RPCs to instances, just submit request • Instances compete based on their local knowledge • Caller will get a promise from the auction winning instance • Ultimately caller will get the result • Periodically, state across instances is centralized • Quotas and limits are computed and distributed to instances • Prevent over-distribution to maintain locality or cost-envelope
  • 143. Sumo Logic Confidential Receiver • HTTPS endpoint behind Elastic Load Balancing • Decompress messages from Collector • Extract timestamps from messages • Aggregate messages per-customer into blocks • Flush blocks to message bus • Ack to Collector
  • 144. Sumo Logic Confidential Raw • Receive message blocks from message bus • Encrypt message blocks • Different key for every day for every customer • Flush encrypted message blocks to Amazon S3 • Copy blocks as CSV to customer’s Amazon S3 bucket • Ack to message bus
  • 145. Sumo Logic Confidential Index • Receive message blocks from message bus • Cache message block on disk and ack to message bus • Add message blocks to Lucene indexes • Deal with wildly varying timestamps • Flush index shards to Amazon S3 • Update meta data database with index shard info
  • 146. Sumo Logic Confidential Continuous Query • Receive message blocks from message bus • Evaluate each message against all search expressions • Push matching messages into respective pipelines • Ack to message bus • Flush results periodically for pickup by client • Persist checkpoints periodically to Amazon S3
  • 147. Sumo Logic Confidential Query • Fully distributed streaming query engine • Materialize messages matching search expression • Push messages through a pipeline of operators • First stage – non-aggregation operators • Second stage – aggregation operators • Present both raw message results as well as aggregates • Results update periodically for interactive UI experience
  • 149. Sumo Logic Confidential Making It Fast • Parallelize all the things – Upload to Amazon S3 while booting instances while creating IAM users while setting up security groups while… – Hyper-concurrent rolling restarts • Fast enough for development – Write new code or fix a bug, compile locally – Push code to development deployment and make it live • Optimize data transfers – Use Amazon S3 hashes to only transfer new files – Only upload changed JARs
  • 150. Sumo Logic Confidential Making It Reliable • Check prerequisites before you even try – Does Prod account have room for this many instances? – Do I have the required permissions for the AWS APIs? – Any model discrepancies I can’t automatically resolve? Too many Amazon EBS volumes? • Handle common failures automatically – No m1.large in us-east-1b? Move Amazon EBS volumes to us-west-1c and try there – Hitting the AWS API rate limit? Throttle and try again – SSH didn’t come up on the instance? Kill it and launch another – Eventual consistency in AWS– query until it has the expected state (tags)
  • 151. Sumo Logic Confidential Making It Secure • Different AWS accounts – Per developer – Production • account.xml – All credentials for one AWS account (AWS keys, SSH keys) – Password-protected • IAM – One user per Sumo component – Minimal IAM policy – Inject AWS credentials • Security Groups – Part of the model – Minimal privileges
  • 152. Sumo Logic Confidential Making It Safe • Let mistakes happen at most once • Add safeguards to prevent operator mistakes • Type in the deployment name before deleting anything • Disallow risky operations in production (shutdown Prod) • Don’t allow –SNAPSHOT code to be deployed in production
  • 153. Sumo Logic Confidential Making It Easy • Automate best practices – Distribute instances over availability zones evenly – Register instances in Elastic Load Balancing and match AZs to instances – Tag all resources consistently • Consistent naming – Generate SSH with logical names
  • 154. Sumo Logic Confidential Making It Affordable • Developers forget to shut stuff down – Deployment reaper automatically shuts down deployments – Daily cost emails • Per-team budgets – Manager responsible to keep within budget
  • 155. Sumo Logic Confidential Pitfalls • Base AMI plus scripted installation prevents auto scaling • Security group updates cause TCP disconnects • This is fixed in the VPC stack, however • Parallelism can cause stampedes (for example, Amazon DynamoDB) • Tagging API rate limits are easy to hit

Editor's Notes

  1. I am known to have a lot of cats in my presentations. So i decided that for once there will be no cats in my talk today https://c1.staticflickr.com/1/209/499409303_bfe28ec063.jpg
  2. We need to increase ARR
  3. Computer, network, and other equipment logs Satellite and similar telemetry (espionage or science) Location data, RFID chip readings, GPS system output Temperature and other environmental sensor readings Sensor readings from factories, pipelines, etc. Output from many kinds of medical devices
  4. Once at ArcSight, because of an issue withAIX, IBM literally wheeled in this furniture We had to get an electrician to do some power stuff before we could turn it on Then all it did was flash two numbers on seven segment displays But their field service was insane – guy was there at 5am in the morning the next day But the time we lost on this… http://www.nasi.com/images/IBM_p5-595.jpg
  5. ArcSight example This is a death spiral, and we lived it, It was like chasing our tail
  6. Complete visibility and observability You need to measure to learn It is very hard to measure when you don't have control The only thing to fear is ourselves If our stuff is shit, we have nobody to blame but ourselves http://29.media.tumblr.com/tumblr_l7hgrtcrIV1qb4ej3o1_500.jpg
  7. And this is why Sumo Logic is a service
  8. http://dogcare.dailypuppy.com/DM-Resize/photos.demandstudios.com/getty/article/117/230/78462347.jpg?w=600&h=600&keep_ratio=1&webp=1
  9. Netflix
  10. http://www.lyellg.com/images/dog_jump.jpg
  11. These tradeoffs will be different for everybody And sometimes you get them wrong
  12. I would really like to say that we have discovered some formerly hidden knowledge but it has all been there…
  13. http://imageshack.com/f/155/teeheeheedog.jpg
  14. http://www.maniacworld.com/images/it-sounds-familiar.jpg
  15. Holy recursion, Batman!!
  16. This is actually my dog. His name is Sumo. We practice layering at home, as you can tell.
  17. Oops, what is that cat doing in here??
  18. http://www.slideshare.net/alvarosanchezmariscal/stateless-authentication-for-microservices
  19. http://static3.jadedpixel.com/s/files/1/0010/2052/files/Puppy_dogs.jpg
  20. Also, state…
  21. Also, state…
  22. Also, state…