SlideShare a Scribd company logo
InfluxDB @ Wayfair
Nuance, at Scale
1
Introduction
2
Website / App / Services, e-commerce focused on home goods
“Everyone Should Live in a Home They Love”
We are a tech-focused company
● We innovate for a better customer experience
HQ in Boston, w/ a growing EU presence in Berlin
● More than 2,300 Engineers
and Data Scientists
What is Wayfair?
3
● We’ve had Graphite for years, and it mostly worked
● BUT it became harder and harder to maintain. Chaos Reigned
○ So many developers, so many series creates => many disk full errors
○ Carbon storage units are independent, and graphite-web doesn’t maintain a holistic view
● We ALSO wanted better insights, beyond means and fixed percentiles
● Horizontal expansion with Carbon was tough re: storage model
○ Every series had to be relocated, based on a consistent hash
Leaving Graphite
4
Deciding on InfluxDB
● Resilience & HA: replication + restoring from a backup
○ Internal metrics for capacity planning and tracking overall system health
● Granular retention policies helps with control over sharding logic
○ For scaling, it helped to define per instance
● SQL-like API was great for training new developers
● Ability to capture raw data for tracking rare events
● Active development and official support channels
○ Tooling ecosystem: Telegraf, Kapacitor, Chronograf
○ Cloud friendly
5
Challenge #1: Tackling RUM
6
I manage the Storefront Performance Team at Wayfair
● We want our website and apps to be fast
Our job: Amplify a Performance Culture
● Captains, Consulting, and R&D
● In Storefront and Beyond
Challenge: Scale alongside a growing company
Why I’m Here
7
Extremely noisy re: networks, devices, other processes
Points drastically affected by any customer encountering:
● Being on bad WiFi
● Having too many tabs open
● Entering Battery Saver mode
We retaliate by collecting TONS of data
Data size scales directly with traffic: many requests / second over hundreds of
hosts
RUM: Real User Monitoring
8
With low RUM counts overnight, there’s even more noise
BUT, after looking at the raw data:
● Caches are cold
● Performance is actually worse
We track >> 300 unique pages
● Boss-Level Cardinality Challenge
TODO: sample differently based on volume
What’s That Noise? Check COUNT
9
When to tag HOST
RUM cardinality was destroyed by a HOST tag w/ hundreds of unique values
● When adding a tag: “Is GROUP BY useful here?”
● Thank you, InfluxData: SELECT .. INTO
● Dropping the HOST tag meant much faster SELECTs
Avoid proxy measurements:
● Tag what is actionable, where it matters
● RUM: a bad proxy for CPU / system load monitoring
10
Addendum: Observer Effect
https://en.wikipedia.org/wiki/Observer_effect_(information_technology)
UDP is great, but it’s not a solution to all problems
● We ran a test where we hit our PHP max_children limit on some hosts
● Server needed more time to send all data out
● Failure w/ > 800 Points / request + DNS delay
● register_shutdown_function => reduced visibility
11
Lions and Tigers and Interns
12
Storefront InfluxDB Tiger Team
We have many thousands of PHP files w/ StatsD instrumentation
Shutting off Graphite meant we needed to teach building a schema
● Hundreds of developers over many groups, each with their own style
● “I still don’t understand tags and fields”
● “Can’t we just change the measurement name?”
Tiger Team: Small group of Engineers turn the tide
Dedicated Slack channel + many small projects
13
Undergraduate students from Northeastern joined the Tiger Team Party Bus
● Python script which rewrote parts of our PHP code
● Consulting with other groups
● Driving large swaths of conversion
Advanced DAO Instrumentation
● Different sampling rates for memcache
● Record long-running SQL queries
Don’t underestimate what a few people can do
Interns Let Loose
14
StatsD:
● rum.client_timers.desktop.wayfair_com.index.speed_index.bo1.timer.mean
● rum.client_timers.desktop.wayfair_com.index.page_load.bo1.timer.mean
InfluxDB:
● Measurement: rum
● Tags: platform=desktop; dc=bo1; store=wayfair_com; route=index
● Fields: speed_index=800; total_page_load_time=2300
It’s a feature, not a bug; BUT features require thinking
Tiger Challenge: StatsD vs. Schema
15
Developer Experience
16
Crossing Streams
I’m the primary owner for our PHP InfluxDB Client
I’m the one you called
● Inherited from a developer who moved on to another group
● PHP is our most common language at Wayfair, though there are others
Many problems ensued from mixing between StatsD and InfluxDB paradigms
● Uniqueness for accumulators work, IF you only have strings
● Add a Schema, with Tags and Fields, and There Will Be Bugs
17
Back to the Drawing Board
Rebuilding from scratch is super expensive
● I had to do it anyway. The API was wrong. Expectations always failed
● Key advice: build clients w/ the right mental model
End result: one client that serves all of our PHP codebase
● Works for Storefront and beyond
Standard Software Best Practices:
● Composition over Inheritance. Fluent Interfaces. Separate Responsibilities
18
Accumulators != Points
An Accumulator (Counter / Stopwatch) is distinct from a single Point
● Counter::findOrCreate(array $uniqueness, Influx_Point $initialPoint);
● Goal: one Point with value 2, instead of two Points each w/ value 1
● Importantly, $uniqueness is separated from the value itself
Use case:
● Two versions of a system, one which uses SQLite, another which doesn’t
● SQLite slower per iteration, but has no 100ms startup cost
● Which version is faster overall? Track per entire request
19
Beware the `value` Field Key
20
Field type conflicts: lost data, tons of noise, confusion for developers
Frequent developer retaliation: one measurement per field
● Same cardinality, but much harder to organize
● TooSpecificMeasurementStopwatch: Float
● TooSpecificMeasurementCounter: Integer
● Chronograf gets slow
● Dashboards are hard to create
● InfluxQL limitation re: multi-measurement math fixed in re-combining
Language Limitations
PHP is still one of my favorite languages, but its simplicity can be problematic
● Nothing is shared from one request to the next
● We’ve thought about using SQLite / cron, but it’s complicated
With some C# systems at Wayfair, measurements go to a separate thread
● Easier to aggregate Points across individual requests
● Helps with the Observer Effect
● Complexity also implies a management cost
21
Fighting the Firehose
22
Feature Toggles FTW
At Wayfair, we deploy code many times every day
We built a robust system for toggling off and on branches of our code
● Uses percentages and many other fine-grained filters
● One single adjustment propagates instantly across all systems
● Not tied to any deployment process, so they’re always unblocked
Designed to safely test new functionality in Production
● Works exceedingly well at scaling down measurements
● “Do we have enough data at 7%?” => $influxPoint->setSamplingRate(0.07);
23
Feature Toggle: Example
Helps w/ volume, not cardinality. Percentage => Boolean
24
Dev, meet Ops
We give great opportunities for junior developers
Our Timeseries team has built up resilience
● PHP => Telegraf => Kafka => Telegraf => InfluxDB
● Limits threats to any one piece
● MirrorMaker allows for multi-DC cohesion
Tremor, built by Wayfair, allows us to shape any traffic
● We can blacklist / rate limit a given measurement by inspecting line protocol
25
Infrastructure Updates
26
Yo Dawg
27
We heard you like Influx, so we put Influx in your Influx so you can .. measure how
your clusters are doing when the target system is under attack
2018 Clusters Layout:
● C1: General: for most measurements
● C2: Storefront: specific raw data at high volume
● C3: monitors C1, C2, Kafka, Puppet, Celery, ++
● Data Centers in Boston, Seattle, and Beyond
It makes me want to cry every time I hear this
● My response: “do you know what you’re asking Influx to do?”
Developers try to fetch the 99th percentile on 30+ days of data
● We have a 30 second timeout for Grafana, etc.
● We often hit that limit when processing over 400 million Points
Problem: developers have been used to 10 second aggregations in Graphite
● They only had count, mean, 90th at that window
“Influx is Slow!”
28
Mitigation Strategies
We’ve tried a variety of solutions, w/ InfluxData, to provide that 10 second
windowing:
● Telegraf plugins
● Continuous Queries: CPU load
● Kapacitor: our best path forward yet
Future: processing line protocol further with Tremor
Challenge: speed_index vs. count_speed_index, etc.
● Users want magic: swap out a Retention Policy and see the same data
● Danger: percentile(90th_speed_index, 90) => “what does this mean?”
29
6 Data Centers & Growing
● On-premise and Cloud
● Pictured: our first 3 DCs
2 Telegrafs + Microphone
(Kafka)
Resilient, whole-system view
Scaling Up
30
Speaking to Strengths
31
With downsampled Graphite, moving means / medians were less helpful
● InfluxDB gives us all the functions we could want
● GROUP BY time(:interval:) is super helpful w/ analysis
InfluxDB lets us follow the advice of John Rauser
● Grafana is excellent for our Wayfair Operations Center
● Full analysis of the points themselves, however is a treasure trove
● We can look at our raw data
Looking at Our Data
32
625,441 Points vs. 30,466
33
Keep a window of the most recent 128 Points, unsampled
Calculate the Mean and Standard Deviation of this raw data
For each next Point, check if we are 2 Standard Deviations away
● If we have an outlier, record that Point
● Else move on to the next Point
5% of the data for the same picture, w/ < 50 lines of Python
Means / Medians / etc. are clearly different, but “reality” is sometimes overrated
Graphing Outliers
34
Q&A: 5 Minutes
35

More Related Content

What's hot

Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor Manager
InfluxData
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
InfluxData
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
InfluxData Internals by Ryan Betts
InfluxData Internals by Ryan BettsInfluxData Internals by Ryan Betts
InfluxData Internals by Ryan Betts
InfluxData
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
Rico Chen
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
InfluxData
 
InfluxDB Live Product Training
InfluxDB Live Product TrainingInfluxDB Live Product Training
InfluxDB Live Product Training
InfluxData
 
Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly Detection
InfluxData
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
InfluxData
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
InfluxData
 
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxDataHow to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
InfluxData
 
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
Finding OOMS in Legacy Systems with the Syslog Telegraf PluginFinding OOMS in Legacy Systems with the Syslog Telegraf Plugin
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
InfluxData
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Flink Forward
 
InfluxDB & Kubernetes
InfluxDB & KubernetesInfluxDB & Kubernetes
InfluxDB & Kubernetes
InfluxData
 

What's hot (20)

Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor Manager
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
InfluxData Internals by Ryan Betts
InfluxData Internals by Ryan BettsInfluxData Internals by Ryan Betts
InfluxData Internals by Ryan Betts
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
 
InfluxDB Live Product Training
InfluxDB Live Product TrainingInfluxDB Live Product Training
InfluxDB Live Product Training
 
Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly Detection
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
 
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxDataHow to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
How to Build a Monitoring Application in 20 Minutes | Russ Savage | InfluxData
 
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
Finding OOMS in Legacy Systems with the Syslog Telegraf PluginFinding OOMS in Legacy Systems with the Syslog Telegraf Plugin
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
 
InfluxDB & Kubernetes
InfluxDB & KubernetesInfluxDB & Kubernetes
InfluxDB & Kubernetes
 

Similar to Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard Laskey

Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
LibbySchulze
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
jhugg
 
On component interface
On component interfaceOn component interface
On component interface
Laurence Chen
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
DamienCarpy
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Petr Zapletal
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
Omid Vahdaty
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Subbu Rama
 
Gatling
Gatling Gatling
Gatling
Gaurav Shukla
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
Knoldus Inc.
 
Monitoring with Clickhouse
Monitoring with ClickhouseMonitoring with Clickhouse
Monitoring with Clickhouse
unicast
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 

Similar to Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard Laskey (20)

Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
On component interface
On component interfaceOn component interface
On component interface
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 
Monitoring with Clickhouse
Monitoring with ClickhouseMonitoring with Clickhouse
Monitoring with Clickhouse
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 

More from InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard Laskey

  • 3. Website / App / Services, e-commerce focused on home goods “Everyone Should Live in a Home They Love” We are a tech-focused company ● We innovate for a better customer experience HQ in Boston, w/ a growing EU presence in Berlin ● More than 2,300 Engineers and Data Scientists What is Wayfair? 3
  • 4. ● We’ve had Graphite for years, and it mostly worked ● BUT it became harder and harder to maintain. Chaos Reigned ○ So many developers, so many series creates => many disk full errors ○ Carbon storage units are independent, and graphite-web doesn’t maintain a holistic view ● We ALSO wanted better insights, beyond means and fixed percentiles ● Horizontal expansion with Carbon was tough re: storage model ○ Every series had to be relocated, based on a consistent hash Leaving Graphite 4
  • 5. Deciding on InfluxDB ● Resilience & HA: replication + restoring from a backup ○ Internal metrics for capacity planning and tracking overall system health ● Granular retention policies helps with control over sharding logic ○ For scaling, it helped to define per instance ● SQL-like API was great for training new developers ● Ability to capture raw data for tracking rare events ● Active development and official support channels ○ Tooling ecosystem: Telegraf, Kapacitor, Chronograf ○ Cloud friendly 5
  • 7. I manage the Storefront Performance Team at Wayfair ● We want our website and apps to be fast Our job: Amplify a Performance Culture ● Captains, Consulting, and R&D ● In Storefront and Beyond Challenge: Scale alongside a growing company Why I’m Here 7
  • 8. Extremely noisy re: networks, devices, other processes Points drastically affected by any customer encountering: ● Being on bad WiFi ● Having too many tabs open ● Entering Battery Saver mode We retaliate by collecting TONS of data Data size scales directly with traffic: many requests / second over hundreds of hosts RUM: Real User Monitoring 8
  • 9. With low RUM counts overnight, there’s even more noise BUT, after looking at the raw data: ● Caches are cold ● Performance is actually worse We track >> 300 unique pages ● Boss-Level Cardinality Challenge TODO: sample differently based on volume What’s That Noise? Check COUNT 9
  • 10. When to tag HOST RUM cardinality was destroyed by a HOST tag w/ hundreds of unique values ● When adding a tag: “Is GROUP BY useful here?” ● Thank you, InfluxData: SELECT .. INTO ● Dropping the HOST tag meant much faster SELECTs Avoid proxy measurements: ● Tag what is actionable, where it matters ● RUM: a bad proxy for CPU / system load monitoring 10
  • 11. Addendum: Observer Effect https://en.wikipedia.org/wiki/Observer_effect_(information_technology) UDP is great, but it’s not a solution to all problems ● We ran a test where we hit our PHP max_children limit on some hosts ● Server needed more time to send all data out ● Failure w/ > 800 Points / request + DNS delay ● register_shutdown_function => reduced visibility 11
  • 12. Lions and Tigers and Interns 12
  • 13. Storefront InfluxDB Tiger Team We have many thousands of PHP files w/ StatsD instrumentation Shutting off Graphite meant we needed to teach building a schema ● Hundreds of developers over many groups, each with their own style ● “I still don’t understand tags and fields” ● “Can’t we just change the measurement name?” Tiger Team: Small group of Engineers turn the tide Dedicated Slack channel + many small projects 13
  • 14. Undergraduate students from Northeastern joined the Tiger Team Party Bus ● Python script which rewrote parts of our PHP code ● Consulting with other groups ● Driving large swaths of conversion Advanced DAO Instrumentation ● Different sampling rates for memcache ● Record long-running SQL queries Don’t underestimate what a few people can do Interns Let Loose 14
  • 15. StatsD: ● rum.client_timers.desktop.wayfair_com.index.speed_index.bo1.timer.mean ● rum.client_timers.desktop.wayfair_com.index.page_load.bo1.timer.mean InfluxDB: ● Measurement: rum ● Tags: platform=desktop; dc=bo1; store=wayfair_com; route=index ● Fields: speed_index=800; total_page_load_time=2300 It’s a feature, not a bug; BUT features require thinking Tiger Challenge: StatsD vs. Schema 15
  • 17. Crossing Streams I’m the primary owner for our PHP InfluxDB Client I’m the one you called ● Inherited from a developer who moved on to another group ● PHP is our most common language at Wayfair, though there are others Many problems ensued from mixing between StatsD and InfluxDB paradigms ● Uniqueness for accumulators work, IF you only have strings ● Add a Schema, with Tags and Fields, and There Will Be Bugs 17
  • 18. Back to the Drawing Board Rebuilding from scratch is super expensive ● I had to do it anyway. The API was wrong. Expectations always failed ● Key advice: build clients w/ the right mental model End result: one client that serves all of our PHP codebase ● Works for Storefront and beyond Standard Software Best Practices: ● Composition over Inheritance. Fluent Interfaces. Separate Responsibilities 18
  • 19. Accumulators != Points An Accumulator (Counter / Stopwatch) is distinct from a single Point ● Counter::findOrCreate(array $uniqueness, Influx_Point $initialPoint); ● Goal: one Point with value 2, instead of two Points each w/ value 1 ● Importantly, $uniqueness is separated from the value itself Use case: ● Two versions of a system, one which uses SQLite, another which doesn’t ● SQLite slower per iteration, but has no 100ms startup cost ● Which version is faster overall? Track per entire request 19
  • 20. Beware the `value` Field Key 20 Field type conflicts: lost data, tons of noise, confusion for developers Frequent developer retaliation: one measurement per field ● Same cardinality, but much harder to organize ● TooSpecificMeasurementStopwatch: Float ● TooSpecificMeasurementCounter: Integer ● Chronograf gets slow ● Dashboards are hard to create ● InfluxQL limitation re: multi-measurement math fixed in re-combining
  • 21. Language Limitations PHP is still one of my favorite languages, but its simplicity can be problematic ● Nothing is shared from one request to the next ● We’ve thought about using SQLite / cron, but it’s complicated With some C# systems at Wayfair, measurements go to a separate thread ● Easier to aggregate Points across individual requests ● Helps with the Observer Effect ● Complexity also implies a management cost 21
  • 23. Feature Toggles FTW At Wayfair, we deploy code many times every day We built a robust system for toggling off and on branches of our code ● Uses percentages and many other fine-grained filters ● One single adjustment propagates instantly across all systems ● Not tied to any deployment process, so they’re always unblocked Designed to safely test new functionality in Production ● Works exceedingly well at scaling down measurements ● “Do we have enough data at 7%?” => $influxPoint->setSamplingRate(0.07); 23
  • 24. Feature Toggle: Example Helps w/ volume, not cardinality. Percentage => Boolean 24
  • 25. Dev, meet Ops We give great opportunities for junior developers Our Timeseries team has built up resilience ● PHP => Telegraf => Kafka => Telegraf => InfluxDB ● Limits threats to any one piece ● MirrorMaker allows for multi-DC cohesion Tremor, built by Wayfair, allows us to shape any traffic ● We can blacklist / rate limit a given measurement by inspecting line protocol 25
  • 27. Yo Dawg 27 We heard you like Influx, so we put Influx in your Influx so you can .. measure how your clusters are doing when the target system is under attack 2018 Clusters Layout: ● C1: General: for most measurements ● C2: Storefront: specific raw data at high volume ● C3: monitors C1, C2, Kafka, Puppet, Celery, ++ ● Data Centers in Boston, Seattle, and Beyond
  • 28. It makes me want to cry every time I hear this ● My response: “do you know what you’re asking Influx to do?” Developers try to fetch the 99th percentile on 30+ days of data ● We have a 30 second timeout for Grafana, etc. ● We often hit that limit when processing over 400 million Points Problem: developers have been used to 10 second aggregations in Graphite ● They only had count, mean, 90th at that window “Influx is Slow!” 28
  • 29. Mitigation Strategies We’ve tried a variety of solutions, w/ InfluxData, to provide that 10 second windowing: ● Telegraf plugins ● Continuous Queries: CPU load ● Kapacitor: our best path forward yet Future: processing line protocol further with Tremor Challenge: speed_index vs. count_speed_index, etc. ● Users want magic: swap out a Retention Policy and see the same data ● Danger: percentile(90th_speed_index, 90) => “what does this mean?” 29
  • 30. 6 Data Centers & Growing ● On-premise and Cloud ● Pictured: our first 3 DCs 2 Telegrafs + Microphone (Kafka) Resilient, whole-system view Scaling Up 30
  • 32. With downsampled Graphite, moving means / medians were less helpful ● InfluxDB gives us all the functions we could want ● GROUP BY time(:interval:) is super helpful w/ analysis InfluxDB lets us follow the advice of John Rauser ● Grafana is excellent for our Wayfair Operations Center ● Full analysis of the points themselves, however is a treasure trove ● We can look at our raw data Looking at Our Data 32
  • 33. 625,441 Points vs. 30,466 33
  • 34. Keep a window of the most recent 128 Points, unsampled Calculate the Mean and Standard Deviation of this raw data For each next Point, check if we are 2 Standard Deviations away ● If we have an outlier, record that Point ● Else move on to the next Point 5% of the data for the same picture, w/ < 50 lines of Python Means / Medians / etc. are clearly different, but “reality” is sometimes overrated Graphing Outliers 34