Splunk Company Overview               Company (NASDAQ: SPLK)                 Founded 2004, first software release in 2006 ...
Copyright © 2012 Splunk, Inc.Target Turns Machine Datainto Application IntelligenceLeena Joshi, SplunkDan Cundiff, Target ...
Agenda•   Splunk Overview    • The machine data opportunity•   Splunk At Target    • Why Target chose Splunk    • Results ...
Turn Machine Data intoApplication Intelligence
SplunkSpelunking: to explore underground cavesSplunking: to explore and visualize large amounts of machine data           ...
MissionMake machine data accessible, usable     and valuable to everyone.                  6
Splunk Collects and Indexes Any Machine DataCustomer                                                                      ...
Splunk Collects and Indexes Any Machine DataCustomer                                                                      ...
Turning Machine Data into Operational Intelligence        Integrated Collection, Storage and Visualization.               ...
Turning Machine Data into Operational IntelligenceMachine Data    Integrated Collection, Storage         Operational Intel...
Enabling Application Intelligence for Dev & Production                                                                    ...
Operational Intelligence Across Use CasesApplication    IT                               Web        Business    Internet o...
Broad Adoption Across 4,400+ Customers                                        Over Half the Fortune 100Financial Services ...
Putting Data to Workby Splunking All theThings at TargetDan Cundiff, Target Corporation
Target Corporation        15
About MeTechnical Architect7+ years development experience working across several groups:security, social media and knowle...
Context: Enterprise Services @ TargetData and transactional APIs for all the domains in our business–   Products (inventor...
Part Problem. Part Opportunity.First API go-live:–   Millions of log events per day (grep/cut/sed/awk not cutting it)–   L...
Solution. Gave Splunk a Try.Installed Splunk on a lab serverHooked up Splunk to the logsQuickly created 15+ searches and r...
Why Splunk?   Find What We                        Proactive             Full Stack Visibility      Community!    Don’t Kno...
Splunk delivers us a new type       of intelligence.              21
Understanding “Normal”Overall volume of requests API response  time SLAs                   Error code by proportion       ...
Better Understand ConsumersWho and how is it being used?What’s their experience?                                23
Better Understand Consumers, Part 2Load testing in production?                              24
Understanding Our InfrastructureExpected design vs actual implementationNot balancing workload as expected                ...
Understanding ProvidersHow are providers responding?Is overhead added to the API response?                                ...
Requirements Feedback LoopRequirement: 200 tpsActual: ~20 tps                              27
Real-time Intelligence from APIsWhere are people searching?Where should we build ournext store(s)?How far are people trave...
Metrics for APIs(source: http://blog.programmableweb.com/2012/08/02/the-api-measurement-secret-know-what-metrics-matter/) ...
In progress and futurestuff.                         30
Splunking all the ThingsConsumer appsProvider systemsOS, firewalls, proxiesExternal API gateway logsAnything in between (m...
DashboardsGlobal dashboard summarizing all APIsBI dashboardsExecutive dashboardsCustom dashboards for different roles brin...
Dashboards, Part 2Environment dashboardsfor each API–   CI–   Test–   Stage–   Prod                            33
Dashboards, Part 3Alert trendingdashboards foreach API                      34
Splunking Continuous IntegrationDrill down into CI results linked straight from Jenkins– Filtered by date OR transaction G...
Splunking Continuous Integration, Part 2We practice code as documentationEvery commit, Jenkins runs, extracts documentatio...
Common Logging ServiceCLS is our strategy for getting logs from all places into SplunkHow– Use UFs on end points everywher...
Best Practice Advice                       38
LessonsRTFM– Keep logs flat– Keep timestamp (ISO8601) at the beginning– k=vIterate quick, push to prod; minimal tweaks to ...
Lessons, Part 2Don’t pre-optimize up front–   Governance–   Standards–   Alerting–   Access controlsOptimize as needed    ...
Lessons, Part 3Create a community                            41
Lessons, Part 4Create best practices, standards, etc in a wiki                                    42
Challenges: Organizational“Stop. We already have tools that do this. Use those.”– tgtMAKE saves the day– tgtMAKE = R&D– R&...
Challenges: Organizational, Part 2The data can’t be trusted?                             44
RecapBe bold. Tooling matters. Sell it.      Splunk all the things! Iterate, adapt, change quickly.                45
We’rehiring                 (cometalk to me)     46
ResourcesSpeaker emails: dan.cundiff AT target.com, ljoshi AT splunk.comSplunk download: www.splunk.com/goto/downloadSplun...
Thank You
Upcoming SlideShare
Loading in …5
×

Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

1,642 views

Published on

A presentation titled "Putting Data to Work by Splunking All the Things at Target" that Dan Cundiff from Target Corporation and Leena Joshi from Splunk gave at Gartner AADI 2012.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,642
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
76
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Founded 2004, first software release in 2006HQ: San Francisco / Region HQ: London, Hong KongOver 600 employees, based in 10 countriesQ2 Revenue: $44.5 million; +71% year-over-yearFree download to massive scaleOn-premise, in the cloud and SaaS4,400+ CustomersCustomers in over 80 countries54 of the Fortune 100
  • So where did we come up with this name? It’s from the term Spelunking – to explore underground caves. Splunking is to explore large amounts of machine data.
  • Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Splunk helps these organizatons solve a very difficult problem, collecting, storing and analyzing this data to provide strategic insights for iT and the business. Our mission is simple, it’s to take machine data and make it accessible, usable and valuable to everyone – and hopefully this will include your organization.
  • Splunk is the leading enterprise solution for managing and analyzing machine data. It provides a unified way to organize and to extract actionable insights from the massive amounts of machine data generated across diverse sources.One person can download and implement Splunk in hours, rather than having a team of people take months or even years to deploy a solution. You can connect to your data in a few clicks and create powerful dashboards with a few more. Key capabilities:Splunk collects machine data securely and reliably from wherever it’s generated. Splunk stores and indexes all of the data in real time in a centralized location and protects it with role-based access controls. Splunk turns your machine data into a NoSQL data fabric that can be searched, browsed, navigated, analyzed and visualized. This enables IT professionals businesses to solve a wide range of mission-critical problems, all without the inherent limitations of traditional approaches.Search and analyze live streaming and terabytes of historically indexed data from one place. Splunk automatically monitors your data for trends and specific patterns of activity or behavior. Then notifies the people that need to know immediately.Powerful search, drilldown and reporting capabilities meet the needs of novice users and expert analysts alike. Easy-to-create dashboards put critical insights from your machine data into the hands of the people who need it.
  • Here’s the context for all the material that follows. “Enterprise Services” program is all about…
  • Here’s the context for all the material that follows. “Enterprise Services” program is all about…
  • Logs scatted everywhere = complex ecosystemLooming horizon = data explosionStory: going live, millions of hits start coming in, try to figure out what is actually happening
  • 4 hours. No joke.We were drawn to innovate; just try something new and see what happens.
  • A list of consumers of the Locations service over a 24 hour period.Story:Identify bad API key before the developer knew what was wrong.
  • We’re taking a look at our infrastructure design because of this.
  • Able to report on non-functional requirements.Going forward we can do a better job of not over-estimating infrastructure needs; thus saving a lot more money, not wasting idle inventory on the shelf, and open the door to putting the right money in the right places then.
  • You saw the original map at the beginning of our presentation; as we expose more APIs, what can we learn from them?
  • How are we adhering to this advice? We have accomplished many of these metrics already. Most of these are achievable with Splunk.
  • The more you have in Splunk, the more complete the monitoring picture can be.
  • Great for perf/load testing; see all the errors in one place.You can even put the Jenkins logs in Splunk and show the results across all APIs being developed.
  • Allow apps to have multiple ways to get logs into SplunkNo UF on consumer devicesBuild transactions across multiple layers of the infraUse UFs on end points everywhere = FASTESTElse, consolidate and mount Splunk = FASTElse, use CLS RESTful API = SLOW
  • Nothing is wrong. Your data is wrong. Getting people to trust what Splunk is telling us.Story about 1 of the nodes being down and initially people didn’t believe it was right.
  • Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

    1. 1. Splunk Company Overview Company (NASDAQ: SPLK) Founded 2004, first software release in 2006 HQ: San Francisco / Region HQ: London, Hong Kong Over 600 employees, based in 10 countries FY 12 Revenue: $121MM; FY 13 Guidance: $183MM – Q2 FY 13 Revenue: $44.5 million Business Model / Products Free download to massive scale Software deployed on-premise and in the cloud; Splunk Storm delivered via a SaaS model 4,400+ Customers Customers in over 80 countries 54 of the Fortune 100 Largest license: 100 Terabytes per day 1
    2. 2. Copyright © 2012 Splunk, Inc.Target Turns Machine Datainto Application IntelligenceLeena Joshi, SplunkDan Cundiff, Target Corporation
    3. 3. Agenda• Splunk Overview • The machine data opportunity• Splunk At Target • Why Target chose Splunk • Results with Splunk • Best Practice Advice 3
    4. 4. Turn Machine Data intoApplication Intelligence
    5. 5. SplunkSpelunking: to explore underground cavesSplunking: to explore and visualize large amounts of machine data 5
    6. 6. MissionMake machine data accessible, usable and valuable to everyone. 6
    7. 7. Splunk Collects and Indexes Any Machine DataCustomer Outside theFacing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data Windows Linux/Unix Virtualization Applications Databases Networking Registry Configuration & Cloud Web logs Configurations Configurations Event logs s Hypervisor Log4J, JMS, JMX Audit/query syslog File system syslog Guest OS, Apps .NET events logs SNMP sysinternals File system Cloud Code and scripts Tables netflow ps, iostat, top Schemas 7
    8. 8. Splunk Collects and Indexes Any Machine DataCustomer Outside theFacing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… •Any amount, any location, any source. Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data No upfront schema Windows Linux/Unix No custom connectors Databases Virtualization Applications Networking Registry Configuration &No RDBMS Web logs Cloud Configurations Configurations Event logs s Hypervisor Log4J, JMS, JMX Audit/query syslog File system syslog sysinternals File system No need to filter/forward logs Guest OS, Apps Cloud .NET events Code and scripts Tables SNMP netflow ps, iostat, top Schemas 8
    9. 9. Turning Machine Data into Operational Intelligence Integrated Collection, Storage and Visualization. Ad hoc search Monitor and alert Real-time Collection and Report and Indexing analyze Custom dashboards Developer Platform 9
    10. 10. Turning Machine Data into Operational IntelligenceMachine Data Integrated Collection, Storage Operational Intelligence and Visualization. Business Insights Gain real-time insight from your machine data to make better-informed business decisions. Operational Visibility Gain operational visibility to make better-informed IT decisions. Proactive Monitoring Monitor infrastructure to identify issues, problems and attacks before they impact your customers and services. Search and Investigation Find and fix problems across the organization using machine data. 10
    11. 11. Enabling Application Intelligence for Dev & Production Talks to every technology in your Databases stackEnd userdevices MessagingEnd user Networking/ Networking/ Networking/ Correlates data across the differentdevices Loadbalancing Loadbalancing Loadbalancing tiers – find causal links Legacy Security SystemsEnd userdevices Web App Services Servers Built for Big Data - Visualize, Virtualization analyze, trend all your data at scale Servers Storage 11
    12. 12. Operational Intelligence Across Use CasesApplication IT Web Business Internet of Security ComplianceManagement Ops Intelligence Analytics Things DEVELOPER FRAMEWORK 12
    13. 13. Broad Adoption Across 4,400+ Customers Over Half the Fortune 100Financial Services & Insurance Retail Technology Cloud and Online Services Cloud and Online Services Cloud and Online Services Cloud and Online Services Government Healthcare Manufacturing Media & Entertainment Cloud and Online Services Cloud and Online Services Cloud and Online Services Cloud and Online Services Energy and Utilities Education Telecommunications Travel and Leisure Cloud and Online Services Cloud and Online Services Cloud and Online Services Cloud and Online Services 13
    14. 14. Putting Data to Workby Splunking All theThings at TargetDan Cundiff, Target Corporation
    15. 15. Target Corporation 15
    16. 16. About MeTechnical Architect7+ years development experience working across several groups:security, social media and knowledge management, and serviceoriented architecturesCurrently focused on API development, creating RESTful APIs that areused in and outside of the enterprise across a wide range of devices,applications, and business partnersEnjoy automating - all the things - exchanging pro tips on continuousintegration and deployment @pmotch 16
    17. 17. Context: Enterprise Services @ TargetData and transactional APIs for all the domains in our business– Products (inventory, price, description, etc)– Locations– Coupons– etcAPIs exposed inside and outsideMostly RESTful APIs, some pub sub/messagingUsed by mobile devices, applications, partners on the outside, etc.Constantly evolving, rapidly improving, all the time 17
    18. 18. Part Problem. Part Opportunity.First API go-live:– Millions of log events per day (grep/cut/sed/awk not cutting it)– Logs scattered everywhere– Limited access to logs– Needed end to end visibility of web services– Needed ability to discover information in logs– Can we be pro-active? Faster reactive?Looming horizon:– BILLIONS of log events coming– Questions changing everyday from business, support, execs, developers 18
    19. 19. Solution. Gave Splunk a Try.Installed Splunk on a lab serverHooked up Splunk to the logsQuickly created 15+ searches and reportsGenerated a dashboard for visibility and trendingTotal time to do all this in Splunk: ~4 hours 19
    20. 20. Why Splunk? Find What We Proactive Full Stack Visibility Community! Don’t Know• Understand • Indicators of • API gateway • Community “Normal” outliers, • Network (load (Splunkbase, • Actionable anomalies, balancers, blogs, etc) events percentage firewalls) • Google-able™ • Identify changes, standard • Web/app • App store! tolerances deviations • OS• Find things we • Quick and flexible didn’t know dashboards existed • Drilldown 20
    21. 21. Splunk delivers us a new type of intelligence. 21
    22. 22. Understanding “Normal”Overall volume of requests API response time SLAs Error code by proportion Error code by volume All the data in one place allows us to track multiple indicators of “Normal” 22
    23. 23. Better Understand ConsumersWho and how is it being used?What’s their experience? 23
    24. 24. Better Understand Consumers, Part 2Load testing in production? 24
    25. 25. Understanding Our InfrastructureExpected design vs actual implementationNot balancing workload as expected 25
    26. 26. Understanding ProvidersHow are providers responding?Is overhead added to the API response? 26
    27. 27. Requirements Feedback LoopRequirement: 200 tpsActual: ~20 tps 27
    28. 28. Real-time Intelligence from APIsWhere are people searching?Where should we build ournext store(s)?How far are people traveling?What time of day?Mobile vs website?iOS vs Android?International? 28
    29. 29. Metrics for APIs(source: http://blog.programmableweb.com/2012/08/02/the-api-measurement-secret-know-what-metrics-matter/) Traffic Metrics Service Metrics Support Metrics – Total calls – Performance – Support tickets – Top methods – Availability – Response time – Call chains – Error rates – Community metrics – Quota faults – Code defects Business Metrics Developer Metrics Marketing Metrics – Direct revenue – Total developer count – Developer registrations – Indirect revenue – # of active developers – Developer portal – Market share – Top developers funnel – Costs – Trending apps – Traffic sources – Retention – Event metrics 29
    30. 30. In progress and futurestuff. 30
    31. 31. Splunking all the ThingsConsumer appsProvider systemsOS, firewalls, proxiesExternal API gateway logsAnything in between (middleware, integrations, etc)Correlate with logs from apps degrees away (e.g. .com web logs)Development (perf test results, git, Jenkins/CI, wiki, etc)
    32. 32. DashboardsGlobal dashboard summarizing all APIsBI dashboardsExecutive dashboardsCustom dashboards for different roles brings right information to appropriate fingertips 32
    33. 33. Dashboards, Part 2Environment dashboardsfor each API– CI– Test– Stage– Prod 33
    34. 34. Dashboards, Part 3Alert trendingdashboards foreach API 34
    35. 35. Splunking Continuous IntegrationDrill down into CI results linked straight from Jenkins– Filtered by date OR transaction GUID 35
    36. 36. Splunking Continuous Integration, Part 2We practice code as documentationEvery commit, Jenkins runs, extracts documentation from code, puts itin the respective wiki pages (pretty cool! – automated / no humans)Splunk monitors wiki changes using the MediaWiki APIMonitor CI + human wiki changeshttps://github.com/pmotch/wikislurp 36
    37. 37. Common Logging ServiceCLS is our strategy for getting logs from all places into SplunkHow– Use UFs on end points everywhere– Else, consolidate and mount Splunk– Else, use CLS RESTful APIEnables end-to-end visibility– Insert GUIDs across all the hops in the transactionUse out of the box log formats (e.g. Log4j) 37
    38. 38. Best Practice Advice 38
    39. 39. LessonsRTFM– Keep logs flat– Keep timestamp (ISO8601) at the beginning– k=vIterate quick, push to prod; minimal tweaks to SplunkFlatten out of box audit events (XML)– Toggle at runtimeDon’t re-invent the wheel, use what your system provides, Splunk canhandle it! 39
    40. 40. Lessons, Part 2Don’t pre-optimize up front– Governance– Standards– Alerting– Access controlsOptimize as needed 40
    41. 41. Lessons, Part 3Create a community 41
    42. 42. Lessons, Part 4Create best practices, standards, etc in a wiki 42
    43. 43. Challenges: Organizational“Stop. We already have tools that do this. Use those.”– tgtMAKE saves the day– tgtMAKE = R&D– R&D = $, servers, flak shelter, people networkMake it real strategy– Demo to as many key players as possible– Drum up interested– Show actual value 43
    44. 44. Challenges: Organizational, Part 2The data can’t be trusted? 44
    45. 45. RecapBe bold. Tooling matters. Sell it. Splunk all the things! Iterate, adapt, change quickly. 45
    46. 46. We’rehiring (cometalk to me) 46
    47. 47. ResourcesSpeaker emails: dan.cundiff AT target.com, ljoshi AT splunk.comSplunk download: www.splunk.com/goto/downloadSplunk Storm SaaS Service: www.splunkstorm.com/ 47
    48. 48. Thank You

    ×