• Save
SplunkLive! Customer Presentation - Cardinal Health

Like this? Share it with your network

Share

SplunkLive! Customer Presentation - Cardinal Health

  • 732 views
Uploaded on

At SplunkLive! Columbus, November 2013

At SplunkLive! Columbus, November 2013

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
732
On Slideshare
732
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • And so, if a developer for example wanted to realize, you know, or wanted to identify the root cause of a problem, they may have significant difficulty locating the log information. And originally, our environment was kind of – it was larger, there were more servers in the mix. And so, it would really be almost, you know, a never ending process to try and find the place where the problem occurred, when we had to physically log in to each box, examine, hover many log files on each box and then move on to the next one and see, you know, where the problem originated.Patrick Farrell: We would definitely increase our, let's say, outage time, right, downtime, if we were having it, because trying to locate a problem was, you know, quite a challenge. And so, by adding Splunk to the mix, especially once we manage to stabilize our environment, you know, we have definitely seen some benefits in – not necessarily in support, but reducing or cleaning up some issues that exist in our application.Patrick Farrell: Correct. Those users that – those customers that are currently on the site, potentially, that are experiencing difficulties. Maybe they're presented with the contact the help desk type message. And so, I'll see that. So, with that information, our goal is that maybe we can make a proactive attempt to contact a customer. We haven't gone this far yet. But for example, automatically pop up a message on the customer's screen saying, "Hey, would you like to speak with customer support about the issue you're experiencing?" type of thing, that kind of push to the customer and say, "Look, you know, we're here for you. You know, come take advantage of the opportunity to speak with us about the issue that you're experiencing."
  • Patrick Farrell: There are 30-some forwarders; I think 33 or so forwarders right now. We collect log data from essentially custom application logs. We collect like HTTP log data. We're bringing in log data from our – let's see, like JVM type logs, robust GC logs. Where else do we pull data from? System out, system error. So, we have a number of source files as well as it's not just Order Express that uses it. We also have our EDI group. They use Splunk as well, the same Splunk installation. They use that for their source types. They have well over 200 individual source files that are managed and indexed by Splunk, probably on the order of about 20 source types just for them. Let's see. By the way, you know, I want to switch gears just for a second. (Cathy), I just pinged (Scott). He said that Splunk was the only – really the only tool being considered. He did say they briefly looked at an IBM tool. But really he said it was far more expensive and less functional than Splunk. Patrick Farrell: Well, right now, we're basically consolidated onto a single virtual machine. And I'll tell you that it's an undersized virtual machine. We handle, I'll say, about – just our production server alone handles about, I'll say, 60 gigabytes a day of log volume coming into it and that's going through a single virtual machine. It's a Linux operating system and it handles – let's see, it's got the deployment server, license server. It's on a license master, and (indexer and search) all in one virtual machine.
  •  Patrick Farrell: And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code. So, like I said, we have a large development team and everybody is off doing their thing. And it's – when you try to bring people together, and you bring this whole thing together, and you put it out there and you look at the finished product and you see, "Wow, maybe there's a million severe error messages an hour in the production logs."Patrick Farrell: And you look at that, you say, "A million severe error messages an hour. Do I really need a million severe error messages an hour? My system is still functioning. I'm not getting alerted. Why is it doing this?" And so, what we're using it for is to go back almost retroactively and find the places in the logs where people were either printing worthless log statements. To give you one example, I found one in there that was printed 1.2 million times an hour in the log and it had nothing in it. Like ...Patrick Farrell: I have – basically – I said before, I was a developer and the team that I was a developer for is called the Inventory Manager. And Inventory Manager, that particularly piece of Order Express or the larger application, is using these logs. As a developer, I was basically the one who wrote the logs, so I knew the most about what was going into those logs. I also had a lot of control of the information and how I was going to write that information to the log. And ultimately, it just – it ended up being very advantageous to me to – you know, to change the way I was writing these logs so that Splunk would – you know, they were naturally useful to Splunk. And so, that information specifically allowed me to build some pretty interesting dashboards, just most specifically from an operational standpoint first and then more from a business perspective, trying to show – from an operational perspective, I show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: Correct. There is information like that that might be in these logs that is completely just dead – it's dead information. There's really no use for it. And so, we're going back and we're taking these statements out because when you add those statements up, the overall size of those statements, we may have gigabytes worth of data per day that is just from one statement in our logs in production. And so, for us, that's not – you know, there's benefit, there's monetary benefit in our case to take those nasty statements out, clean them up and move on. Not to mention, they shouldn't be there in the first place. Our system will run faster if we don't have to write these silly statements to the log. So – I mean it's that kind of retroactive stuff at the moment. We do it in our stage environment. We also use stage environment to look for the most frequently occurring messages; for example, the pump command.
  • And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code.
  •  Patrick Farrell: And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code. So, like I said, we have a large development team and everybody is off doing their thing. And it's – when you try to bring people together, and you bring this whole thing together, and you put it out there and you look at the finished product and you see, "Wow, maybe there's a million severe error messages an hour in the production logs."
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production.  The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
  • I also see which users are having the most difficulty with the system. And so, the idea that the … Correct. Those users that – those customers that are currently on the site, potentially, that are experiencing difficulties. Maybe they're presented with the contact the help desk type message. And so, I'll see that. So, with that information, our goal is that maybe we can make a proactive attempt to contact a customer. We haven't gone this far yet. But for example, automatically pop up a message on the customer's screen saying, "Hey, would you like to speak with customer support about the issue you're experiencing?" type of thing, that kind of push to the customer and say, "Look, you know, we're here for you. You know, come take advantage of the opportunity to speak with us about the issue that you're experiencing.”
  • I would say biggest business impact would be ability to identify issues in a complex environment quickly which will reduce outage time. That's probably our biggest benefit because as a large e-commerce application with, you know, doing as much business as we do. You know, on a daily basis, you don't really want to be down for long. Because every second you're down is orders you're not receiving, and those customers will be happy to take their business somewhere else. So, you really want to get your systems running. You want to get – you want to identify the problems quickly and you want to get them resolved so that you're not alienating your customer base.
  •  

Transcript

  • 1. Copyright © 2013 Splunk Inc. Cardinal Health Patrick Farrell Sr. Software Engineer
  • 2. My Background and Role Patrick Farrell, Sr. Software Engineer – Resident Splunk Administrator and Champion – Started using Splunk two years ago as a developer for our eCommerce platform – Responsible for Splunk administration, maintenance, custom application development, and dashboards – Splunk Community of Practice owner at Cardinal Health
  • 3. Company Overview • • • • • Founded in 1971 Over 30,000 employees Headquarters in Dublin, Ohio Ranked #19 on the Fortune 500 Cardinal Health helps pharmacies, hospitals, ambulatory surgery centers and physician offices focus on patient care while reducing costs, enhancing efficiency and improving quality
  • 4. Before Splunk Manual search on 30+ servers using Unix command-line programs (Awk, Grep, Tail) Operational support and development groups spent hours on root cause analysis and problem resolution No insight into customer usage of our applications No ability to be proactive with customer support
  • 5. Splunk at Cardinal Health Data sources – Application Logs – Access Logs, System Out, System Err, GC, and other custom application logs – 25 individual source types – 250+ individual sources Indexer, Search Head, Deployment Server, and License Master 60 GB Per Day Splunk used in pre-production and production environments More than thirty individuals actively using Splunk on a regular basis Forwarder Forwarder 30+ Forwarders (5 Server Classes)
  • 6. Splunk Use Cases “Splunk is our Swiss Army Knife” Improving Root Cause Analysis Gathering Customer Usage Statistics Increasing Efficiency Proactive Customer Support
  • 7. Return on Investment “One of the most important benefits of using Splunk from an application development standpoint is illustrated by how it has helped us clean up our logging code.”
  • 8. Increased Efficiency 100+ developers on a single application, there can be lines of erroneous code – 1.2 million severe error messages / hour Splunk is used to analyze application logs during performance/endurance testing The punct command is your friend Key benefit: Splunk helps us clean up our code – Capacity savings (storage, license) – Improved efficiency (speed) – Reduced spam
  • 9. Improved Systems Uptime and Performance Writing Splunk friendly code – Inventory Manager Splunk’s search processing language allowed us to easily perform analysis once considered impossible from the Unix prompt. Analytics for: – – – – – Most active accounts Most invoked operations SQL Database contention Longest running operations Exceptions encountered
  • 10. Inventory Manager Operational Dashboard
  • 11. Improving Customer Satisfaction • Splunk alerts us when customers see the contact help desk message on our site – Reach out to customer immediately • Immediate support = happier customers = more revenue • Gathering customer usage data to identify which functionality should be enhanced or retired
  • 12. Reducing Root Cause Analysis Time Searching logs across many application servers can take hours. Remember, time is money! Now an alert or search helps us identify most issues in seconds!
  • 13. Reducing Root Cause Analysis Times Normal Execution Scenario
  • 14. Reducing Root Cause Analysis Time Load Reduction Scenario
  • 15. Reducing Root Cause Analysis Times Abnormal Execution Scenario
  • 16. Results with Splunk Reduced Down time The most important benefit to our large ecommerce application is reduced down time. Every minute of down time results in a significant loss of revenue. Improved Customer Satisfaction Increase Efficiencies We were able to reduce our daily indexing volume by 3 GB by identifying and eliminating defects that produced in excess of 1.2 million severe events per hour. Reduced MTTR Application Enhancements We can determine the focus of future enhancements by monitoring how our customers are using the site. Likewise, we can also identify unused functionality. Thank you, punct! 22 Searching and Reporting Ability to drill down to specific areas and find issues in seconds instead of hours.
  • 17. Best Practice Recommendations Splunk is an amazing platform as long as you are prepared for it! Create a roadmap that outlines how you intend to use Splunk and where you would like to take the product within your organization. Plan your environment and account for future growth (users, searches, license volume, hardware capacity, storage, etc.). 23
  • 18. Best Practice Recommendations Generate a unique identifier for each transaction and write it to the log as part of each event so that you may easily identify all related events. Take advantage of automatic field extraction using key-value pairs or use a logging format such as JSON that can provide automatic field extraction. Capture execution time in log events for an added dimension 24
  • 19. Future Plans Expanding use of Splunk to our Medical eCommerce Platform Creation of additional operational and business dashboards Evaluate the possibility of using Splunk in DEV and QA 25
  • 20. Thank You