SplunkLive! Atlanta Mar 2013 - The Home Depot


Published on

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SplunkLive! Atlanta Mar 2013 - The Home Depot

  1. 1. Copyright © 2012 Splunk, Inc.The Home Depot#SplunkLive#Splunk
  2. 2. The Home DepotThe Home Depot® is the worlds largest homeimprovement specialty retailer, with more than2,200 retail stores in the United States, Canada,and Mexico. 2
  3. 3. About the Speaker• IT Architect• IT Operations and Enterprise Monitoring – Ensure operational stability of the IT environment – Incident and Change Management – Ensure applications are “Production Ready” 3
  4. 4. How often do you hear…“Hey, what is this Splunk thing?” 4
  5. 5. Splunk lets you collect your data search view 5
  6. 6. Before Splunk… Manual No Enterprise Limited Collection IT Search Visualization No enterprise No data store for Analysis with pipeline for operational command-line machine data. metrics. tools or reports.Difficult to collect Difficult to Difficult to spotdata; required ad- correlate data trends or hoc scripts. sources. anomalies. Limited Operational Visibility 6
  7. 7. Where we are today 120K 500 GB 7
  8. 8. Where we are today 2.1K 120K sitessearches/day 25K 500 forwarders GB 200+index/day sourcetypes 8
  9. 9. How do we use Splunk?Machine data pipelineOperational dashboardsand reportsProactive alerts andnotificationsIT search engine 9
  10. 10. After Splunk… Reduced Improved Application MTTR Reporting Monitoring Much faster to CIO and executive Standard logging find and resolve reports for high- APIs simplify log impacting issues. level status. file collection. Reduced Sev 1 & Operational Collect app stats 2 outages by 43% reports for from shared year over year. support teams. infrastructure.Splunk used by 200+ users in the IT organization 10
  11. 11. Reduced Incident Resolution TimesCollect standardmetrics for 1200+applicationsWar rooms forcritical outagesSplunk dashboards– Quickly identify patterns or outliers– Hours  Minutes 11
  12. 12. Improved Reporting CIO Dashboard IT Ops Dashboards High-level status by organization  Monitoring for 600+ applications Highlights noisiest, most error-  Near real-time status of batch job prone applications success/failure across the chain. Increased visibility reduces errors  Track metrics and identify patterns and outages across all stores (not just a subset). Proactive Reports & Notifications Hourly and daily operational reports to monitor and maintain system health. Integrated with event management and ticketing systems. 12
  13. 13. Application Monitoring Monitor applications through production load balancers Simple logging APIs – Java, Python, Syslog – Timestamp, label, and key-value pairs Transaction visibility – End-to-end application tracing across multiple hops 13
  14. 14. Splunk + Microsoft SharePoint Splunk is integrated with SharePoint for custom lookups and a persistent data store. Allows users to manage their own lookup lists for thresholds and metadata. 14
  15. 15. Splunk + Application Load Balancing Automated site failover for internal app. Scripted input monitors application server health using the load balancer API and triggers a site failover after the number of server failures exceed a threshold. 15
  16. 16. Best Practices & Recommendations Search Heads Indexers Deployment Metrics!  No such thing as  Automate the Provide user too much CPU or deployment to training on writing too many IOPS. forwarders and efficient searches.  Spec lots of RAM core servers. Use multiple pools for high search  Make collection for performance. volumes. simple for users. Simple design  Scalable Splunk  Happy Users 16
  17. 17. Copyright © 2012 Splunk, Inc.Thank You! 17