Russell Turner and Seth Porta are site reliability engineers at Domino's Pizza who are responsible for ensuring the best online customer experience. They initially tested Splunk in 2009 and found it provided faster searches, real-time insights, better reporting, and faster alerting compared to their previous log aggregation tool. Domino's now uses Splunk across two data centers to monitor various systems and applications. Splunk has helped reduce troubleshooting time from hours to minutes, identify sales trends to inform marketing decisions, and save $300,000 versus alternate tools. Domino's plans to expand their use of Splunk for more operational analysis and apps.
2. Domino’s Pizza Overview
World leader in pizza delivery
More than 10,000 corporate and
franchised stores in US and
international markets
Online ordering, apps for
Android, Kindle and iPhone
Founded in 1960
2012 sales: $7.4 billion
2
3. Our Background and Role
Russell Turner, Manager of Site Reliability Engineering
Seth Porta, Site Reliability Engineer
Our Team is responsible for ensuring our online customers have the best
experience possible
– Maintain ecommerce uptime
– Middleware, infrastructure, servers, global and local load balancing
– Architecting and deployments of new business initiatives
– Closely tied into development workflow
3
4. How We Started?
POC’ed Splunk for the first time in 2009 (within the Infrastructure team)
Needed a solution to analyze and aggregate logging data from our OS (Linux and Solaris) and
middleware in a timely manner
InfoSec team used HP ArcSight for log aggregation, but Splunk offered the following advantages:
– Faster and easier searches in Splunk
– Real-time insights
– Better reporting with Apache access logs
– Much faster alerting in Splunk
– Cost and scalability
– Ease of deployment
4
5. Splunk Usage at Domino’s
Range of uses from monitoring to business insights including:
– What we are selling, orders per minute, coupon usage, etc.
– Online ordering trends, efficiency of marketing promotions
– Splunk provides us answers 24-48h prior to analysis from our data warehousing
tools
– Significant reduction in troubleshooting time
– Streamlined developer insight into debugging development code
– Overall Order to Cash system health monitoring
5
6. Splunk at Domino’s Today
Splunk deployed across two data Two indexers with Distributed Search
centers (live and failover)
MI Datacenter KY Datacenter
Four different production
environments
Teams using Splunk: Site
Reliability team, InfoSec and web
developers
360+ forwarders
25-40GB data indexed per day
Dozen unique users per month
Splunk Apps: Deployment
Monitor, Google Maps
6
7. Results with Splunk
Proactive
Operational
Reduced MTTR Cost Savings Alerting and
Intelligence
Baselining
Saved $300,000 vs.
Real-time alerting =
alternate APM tools Tracking: business
Issue resolution proactivity
relevant
from 2-3 hours to
Engineering resources information, trends,
less than 5 minutes. Historical baselining
freed up for other promotion
has been huge
needs. success, customer
behavior
7
9. Using Splunk For Data Correlation?
Domino’s Splunk Environment:
Logs from 800
Virtual and Middleware, Apache Web
Proprietary Database Server Logs
Application logs physical System Logs
(over 20 types) servers, Linux/Sola logs
ris
Before Splunk Enter Splunk
Gathering logs manually “Million times easier with Splunk”
Sifting through aggregated Java messages from Proactive alarms alert us to dips in our sales
middleware (Grep)
Baselining and trending
Reactive
9
10. AHA! Moments with Splunk
Russell: Seth:
IT team started taking Splunk “When asked to show response times of
home and working on their own data stores, I was able to provide
time with Splunk. answers within 30s (just pipe one search
“Splunk is much bigger than into another) and got a list of stores
monitoring tool. We are sitting on instantly. In the past we had to work on
a gold mine of data!” that for weeks.”
10
11. Splunk For Operational Analysis of Payment
Processing
Measuring response time for
various order channels
Instant analysis of cash vs.
credit card ordering
performance
Troubleshooting card
processor issues
11
12. Splunk for GEO Sales Tracking
• Splunk RESTful APIs integrate
with Domino’s GEO sales
tracking applications (Java
based)
• Sales monitoring by regions
• We have been able to identify
ISP outages in certain regions
12
13. Splunk for Domino’s Marketing
Before Splunk
• Someone at midnight pulling data and Splunk dashboard to track 50% off
crunching numbers daily online coupon promotion
Results
• Automated information
• Report submitted to our leadership
team, including the CIO and CEO
• Monitoring promotion success in real-time
13
14. Best Practice Recommendations
Build a full blown POC to demonstrate Splunk’s value
Find real use cases to demonstrate Splunk’s effectiveness
Plan your Splunk deployment (distributed environment); understand
where config files live
Splunk documentation is helpful – use it!
Leverage the huge online community
Take scoping seriously
14
15. Splunk at Domino’s: Future
Create real time dashboards for any departments to view OLO
health, not just reports mailed to the LT
Use Splunk for more key performance analyses
Expand Splunk Apps deployment: Linux and Unix monitoring, VMware
App, F5 integration
Optimize middleware application logs for Splunk consumption
Start to leverage Splunk to monitor Corporate applications built on our
stack (Liferay) and Store health
15
16. Summary
Splunk empowers us to better utilize our technology to gain a
competitive edge
Helps to ensure exceptional customer satisfaction
Enables us to be agile and make marketing decisions based on
current promotion success
Splunk helps us not just save cost but boosts morale as well
16