Austin Radiological Association Using AccelOps - Demo


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Austin Radiological Association Using AccelOps - Demo

  1. 1. Austin Radiological Association Using AccelOps - Demo Overview and use of AccelOps integrated data center and cloud service monitoring platform from an end user perspective Prepared by Michael Coté ( November, 2010 Overview In this portion of the RedMonkTV video, Austin Radiological Association's Todd Thomas (CIO) and Geoff Christy (Senior Network Engineer) demonstrated how they're using AccelOps to help manage their ARA's infrastructure. Watch the full video at About ARA The Austin Radiological Association recently changed the IT Management platform it uses to monitor and manage it's distributed IT and data center. ARA is one of the largest providers of outpatient imaging services and professional services in central Texas serving the majority of area hospitals and thousands of referring physicians in the community. In addition, ARA operates as an outsourced solution provider and manages a turnkey digital imaging application for 10 regional SaaS clients. The company has invested heavily in IT, ITIL service processes and IT automation to support a variety of health care, imaging and business applications. About AccelOps AccelOps describes itself as: AccelOps integrated data center and cloud monitoring solutions bring unparalleled operational intelligence, service insight, efficiency and security to enterprises and service providers. Delivered as a scalable virtual appliance or SaaS, the AccelOps platform cross-correlates and manages diverse operational data on-premise, off- premise and in cloud environments to provide proactive performance, availability, security, change and business service management. AccelOps enables service delivery with end-to-end visibility, efficient root-cause analysis, reduced MTTR and compliance. Take us for a test drive at: Copyright © RedMonk, LLC 2010
  2. 2. Transcript Michael Coté: So here we are going to see a demo of AccelOps in action here at the Austin Radiological Association. For people who didn't see the interview portion, do you want to quickly introduce yourself? Geoff Christy: Hi! I am Geoff Christy. I am the Senior Network Engineer here at Austin Radiological Association in Austin, Texas. Michael Coté: So you want to introduce yourself? Todd Thomas: My name is Todd Thomas. I am the Chief Information Officer at Austin Radiological Association. Michael Coté: When you are looking at this - or your team is looking at this - what's the thing that you most often jump into? What do you start doing at the start of the day, if you will? Geoff Christy: Actually, I jump to this manually. So this is the screen you go to by default. But one of the nice things about their dashboards is they’ve got this little Home option and you can choose any dashboard you like even one that you create yourself. So there is an entire section of self-created dashboards and any dashboard you like - like in my case what my home is, is my Incident dashboard, that’s where you go to when you log in. So you choose your most important view and that’s where you start off your day. So, each of my engineers like my server team has a server dashboard...that has some critical information for them. They don't care necessarily about all the incidents that are potentially happening across the entire infrastructure; they are just looking at their, "hey, how are my servers doing today?" when they jump right in, and they go straight to this dashboard. They give you an excessive amount of dashboards which is great, because anything you are looking for from your IIS servers are already prepackaged, pre-canned. They’ve got all of these dashboards available for you to use. But at the same time, you have full customization and to say, "hey, yeah, that’s great, but I just want to see what my Internet links are doing." Michael Coté: Right. Geoff Christy: So you can jump over and create a small dashboard that says, "hey, here is my traffic on my Internet links. Do I need to upgrade?" Each of these small windows is a report, exactly like a report that you would do under analytics. If you wanted to run your top CPU report for a certain amount of time, you would say, you would just go in and run a report. All of those reports are available down here in this interface as an option just to add to a dashboard and then it will automatically run that Copyright © RedMonk, LLC 2010
  3. 3. report for the times that you set. Like in here you can customize it and you can change it from seeing as these as line views to, "hey, I just want to see all of the information like a table view, change the refresh intervals." If you don't care about having the most up-to- date information, you go to 15 minutes or go down to 2. You can change your timeframe, and each report sort of has some different options, but anywhere from 30 minutes to a week is your typical option. So at the end, you're able to say, hey, I want to see the top 25 in a table instead of a nice big line graph, and there is a list of all of the top applications by CPU on each server. Michael Coté: Alright. So you have changed the view from line graphs, representing some metric, to basically a more detailed list of what's happening on that same view that you have. So you can kind of -- Geoff Christy: Right, more of the raw data. Michael Coté: You can kind of a pivot between those different views if you will. Geoff Christy: Exactly! Then each option here you can say, "hey, look, here is our application running 99.86% of the CPU" and these little circular buttons give you the ability to drill down on that and it will take you into the reporting tool under analytics and you can start doing some more deep dives, stretch out the period of time to look at the actual data if you are interested. Michael Coté: Right. Geoff Christy: In this area, you are able to expand and add more detail as you need to. Some of the reports can very easily give you too much data. Like we went into a server report and said, "hey, for all of my Windows servers, I want to see the top CPU." Michael Coté: Right. Geoff Christy: And you ran this for a certain amount of time over the past hour and all of this is a web-based interface. So there is no Java, there is no application to run and it's pretty fast for a web-based interface. Michael Coté: Yeah. Geoff Christy: So here is all of these reports with all of the CPU and you are like, "oh great, maybe I just wanted to see a few of these or maybe I wanted to see more details." So you can change the query in this action in a very structured way or you can change what you visualize in this section. Right now, I've got the Average, Max and Min, CPU. You can change each of these values in anyway that you want to, also changing whether it's ascending or descending. When I was testing some of the other ones [IT Management products], they gave you some of this flexibility, but a lot of them made you feel like you had to learn SQL or learn how to do really in-depth queries and was really difficult to figure out what you were looking for. This one: they give you a very detailed list of everything that you can possibly pull into here. So if you said, "hey, yeah that's the host IP that I am looking for but I wanted to add like a host interface name," it comes up pretty quickly. You have many different options like, Copyright © RedMonk, LLC 2010
  4. 4. "hey, l want the host interface name to equal" and then you put in some data, or if you go into something like that they populate, like Event Types, or Event Type Groups equals, that you can also click this button over here and it starts bringing you into the CMDB which we are about to get into. Here you can just pull out what you are looking for and start adding in the different fields you were trying to find. Michael Coté: That's another interesting aspect is like you are saying, there is this vast soup of data and different things that you can search over and it doesn't -- I mean from what you are showing me, it doesn't seem like the tool kind of limits the set of that "soup" you can look over. Geoff Christy: Exactly. In some tools, they said, "hey, yeah, here is your reporting but you are limited in choosing a host name, or a machine type or you want a CPU report." Well, here is your CPU report. Michael Coté: Or it's just for networking or something like that. Geoff Christy: Or it's just for networking, or it's just for servers and there was no merging of it. In this, I can look up the CPU and every device of my entire network if I wanted to review, "hey, what are my top talkers?" and it could stretch between my network devices or server. And you can extend it to network devices, continue your deep-dive search. Some of the nice things about these reports is, "hey, I don't want to see this report; I want to see another one." On the fly, we can change what each of these graphs looks like, you can even change how the graphs are viewed. In some cases, where you have multiple settings and you want to compare the average to the max, they have added in the ability to run the average on the top line and max on the bottom and you can compare and say, hey, how does this look. And even more detail. I mean they are throwing so many different pieces at you, you can change this report to look any way that you want. And then you can export these into PDF files or CSV files. So you can hand them off to whoever needs the information in more detail that doesn’t have access into this tool. They did this on their own. Michael Coté: Well, we are looking at the dashboard, another area that AccelOps does a lot of is sort of security stuff and I think it’s an interesting way to demo the kind of cross- correlation that they do. Todd Thomas: So what we have done with this dashboard is we have taken all of the security events in the environment whether it's coming from the firewall, active directory, or intrusion prevention system, or Nessus scanner and if aggregated all of that information into this particular dashboard. So we are seeing things like top outbound ports from the firewall. We are seeing top block destinations from the firewall. We are seeing across all devices in the network, log on successes or failures which is coming from the domain, we are seeing recons or info leak statistics which is coming from our intrusion prevention system. This is a top security event category by account chart that we can get top security events by severity, top security incidents by severity. We can get top devices or users by failed log-ins. So this is aggregating information again from the domain layer. Top network scanners by event count. So this is also coming from our IPS device. So just what's really interesting about this particular software is being able to take all of these different -- that it can take all of these different events from all these different devices and we just populate it into a single security dashboard. So I don't need yet another tool to do security event information management. Everything that my security manager wants to Copyright © RedMonk, LLC 2010
  5. 5. see has been aggregated in all of these different dashboards. Like everything else within the product, we can start drilling down into some of these event categories then it will pop you over to the analytics tools. We can get some very sophisticated details on each of these events. So if I go into this code deck, if I try and drill down into this code injection. So on that code injection detection, these are some details from the IPS. We can see it classifies in its specific severity level, tells us the source IP, tells us the destination IP. On this particular public IP address, I can get whose information on it as well as drill down into Cisco’s reputation center base or Project Honey Pot. I can get a geo location of that particular IP address. I can get all kinds of information on this particular IP address to help me determine whether or not I want to write an IPS rule to just outright block this in the future. Michael Coté: So, you can really drill down to the core, so to speak. There is no end of the drilling down to correlate between stuff. Todd Thomas: Yeah. Again, all within a single product; I don't need to use my IPS device or IPS interface anymore, because we are putting all that information and even dashboard for the security. Michael Coté: Man! You guys have a lot of stuff. Geoff Christy: We do. Michael Coté: That’s exciting. Geoff Christy: Yup. In the critical and warning state, I have 163 devices under hardware summary. Not always the best thing in the world to notice. Michael Coté: That’s right. Geoff Christy: Like there's - the dashboards are instantly useful. So here is my hardware summary. There is the top device that is critical currently and it's got a bad power supply. Michael Coté: Oh yeah! Geoff Christy: I mean without trying, you have enough information. Michael Coté: It can make it difficult to figure out when you should be going home. Geoff Christy: Well, that’s why I hope my CIO isn’t always looking over my shoulder. Todd Thomas: I was looking for "William Canon" - we have problems. Michael Coté: The joke being that he actually is as we are recording this. Geoff Christy: Yes. Michael Coté: One of the areas that you are touching on a little bit was how the -- obviously, a lot of what you are pulling from, if not all of it, comes from the CMDB that’s built into the product. It's sort of the central database at the core of everything, in addition Copyright © RedMonk, LLC 2010
  6. 6. to traditional mapping and relationships and storing things like that. What was the process like populating this CMDB? Geoff Christy: Actually, relatively easy. The logic was a little different than some of the other tools I had. You started off -- we’re back to the basic admin pages. You started off with a setup wizard, which pretty much leads you through getting yourself set up to use the product. Once you're done with this section, you should have a lot of the things already set up, including scheduled scans of your network to bring in new devices and limiting even some of the polling that this can do. So this is the incidents area. In this location, you can look at all the incidents that are running through your system to say, "hey, here is a CPU that’s critical at 99.12%, and should I be sending alarm to my end-users." Like, here is a service interface critical which I am sending to my server team as a notification. Michael Coté: So you set up a rule or a threshold or whatever you might want to call it that says when this event comes in and has this value, this setting is above or below, and there’re all sorts of ways of specifying that, but someone needs to start worrying about it essentially. Geoff Christy: Well, and that's another thing that brought us to this product. A lot of products out there have incident dashboards, a lot of products out there have rules that alarm when the CPU is too high and you set these thresholds. But what this product gave you were rules, which a lot of them didn't. These are totally customizable alarms that say when something is down. So, in this case, hey, a device is down. In the typical application, you don't have a choice. They give you a rule that fires off and tells you it’s down and you might be able to tweak some minor parts of it, but a ping up, ping down, all of that’s done in the background. In AccelOps, all of this is done under your control. You get to choose how long it takes for it to be done; all of the event logic to create the incident is under your control. Some of the reports get really, really detailed like, "hey, here is all of the applications that potentially go down - if any of them go down, an important application will stop." So it can get really, really detailed or it can be as simple as the device is down, which is, "hey, if the average ping loss equals 100% and there is more than one of these events, tell me the device is down, and then clear when it's less than 100%." Michael Coté: When you're specifying those rules, is it similar to the analytics where you have a GUI that you can use, but you could also type it manually if you wanted to? Geoff Christy: It's exactly like it. I mean there are some minor details that you have to get into, but at the end, it's the exact same logic as writing a report. Michael Coté: Right. Geoff Christy: You say it's the same kind of attributes; host IP and network devices and then an attribute that says ping loss equals 100%. So you have a filter that says what the rule applies to and then aggregate detection that says, how does it fire up? Copyright © RedMonk, LLC 2010
  7. 7. Once you have learned the logic of their events and their attributes, and what it's looking for, you pretty much have the entire system available for you. Michael Coté: But I am curious to see how you use a tool to monitor something that's virtual and how it kind of makes doing that a little easier for you. Geoff Christy: Going back to my server guys, they always have issues in trying to figure out exactly which VM host in a clustering environment and in an environment where VMs can automatically dynamically move between servers. You don't always know where they are, how much processes are they really taking, without going into the console, and looking at it. There are some alarms that they can do inside of that console and they could probably work it out eventually to get it all brought up to the point where you have an alarming tool on VM, but in this, I pull it in, I treat it just like any other device, but they have the intelligence to say like here, that's one of my VM consoles, here are all the VMs running on it including their physical CPU and the like. I can create rules on this just the same as I can in anything else and start creating rules that say, hey, here is where your VM is, here is how much CPU it's taking, whether it's taking 5% or how much it's taking from the individual server. There is some really good information here in the virtual summary which my server guys have been using to track down the servers, figure out where they are in some cases. It’s like "hey, where is that server?" and they have been able to find them here and get more information. Michael Coté: Another aspect that's interesting about your business is you are a service provider for other people. I mean you have a lot of remote locations that you are dealing with and it's not all sort of like centralized IT. I mean can you show how the AccelOps is helping manage that kind of service provider-y topology that you have to deal with? Geoff Christy: So the way we do that is through business services and in services, you can create a like grouping of devices or applications and you can go into de"tail. Like this is one of my sites and the business service will go critical and alarm to me that say, William Cannon has an issue when any device which includes the APC unit", or any of the three network devices there, has any kind of event like today, William Cannon rebooted three hours ago from a power issue. So you can come in here and you can see all the different events that we had interface is down, fan hardware warnings and you can go into detail and find out more information about what happened today. This dashboard can give somebody who says, "hey, I saw William Cannon event," you can go in and get even more detail. There is a nice dashboard at the top that summarizes all those information and you look at current alarms and reports to say, "hey, what services are currently critical and what device underneath them is making that critical?" and then you can even dive into what incident is making that critical. Copyright © RedMonk, LLC 2010
  8. 8. Todd Thomas: I think what's really interesting about this is that when you build out a business service, William Cannon, for example, you can add things in like I want to monitor these key server processes and that can roll up into that. You can monitor at the OS level and no stats can pull up into a server. In your service, you can monitor at the server level, you can monitor at the network level. So whether it's a security, a performance or availability events and any of those items, all that can bubble up into your business service. So you can get very, very granular in terms of all of the components that are defining that business service. What we want to be able to do is if I am a Desktop support analyst, I want to be able to translate into business in terms if I am talking to an end user that, yes, we have a problem with PAX, not that the Oracle.exe process is spinning out of control using up all the CPU and that's why you are not getting these products. Geoff Christy: Like here is my DNS process; included in the DNS process is in reality just my applications. So if any DNS.exe is dying, it will alarm on the DNS business server saying you know DNS has an issue. We are not too sure my CIO would go all for me showing the users information. Under analytics, one of the things that did sell me is there is an identity and location report. Michael Coté: Alright, and we have talked about this as something you guys were kind of looking towards in the future of being able to track people, if not people, coming in and out of the system in addition to the chunks of silicon running around. Geoff Christy: Correct. One of the things we had a problem with, especially our helpdesk, was understanding where our user really was, not where user was saying he was. So this tool -- it wasn’t something we were looking for in the beginning, but the integration into AD and then tying all of this correlated information together to say, "hey, I logged into this workstation today. This is where I have been." So tracking where it say I have been and being able to say, "yes, I have logged into a device" wasn’t something what we were looking for in a product when we first started, but it was one of the reasons that AccelOps was chosen. Some of the other small aspects are the details that can get into like from a network perspective, it's got all my configuration. So I don't need another tool to do configuration management. They already do it, including old backups, being able to do the differentials, all of the things you want in a tool to go with the configuration. Michael Coté: Right. So it's extracting that stuff out for you and saving history. Geoff Christy: It's extracting the stuff out and saving histories and you can go back and look at each timestamp when things change and it keeps track of all of it. For my server guys, some of the nice things that they get into, I will go into their specific folder, is it keeps track of all the software on a system including patches, running applications, what patches have been installed. So that kind of information that you might be looking at from another tool that would be pulling in what are all my applications, what do I have installed on all these servers, and there is a lot of tools out there to do that. Now, it's in my monitoring device. Copyright © RedMonk, LLC 2010
  9. 9. Michael Coté: Then, of course, like all of the pieces of data you can report on that and just go nuts with reporting over all the software. Todd Thomas: Well, it's nice as now, because it's tracking this so the piece of software gets installed outside of our change management processes, it will actually show that this new piece of software has changed in the environment and will want that as an event. Michael Coté: Well, great! I appreciate you spending all that time to dive into this. It's always one thing to see a vendor demo or something, but to actually see a real instance: that is nice. Geoff Christy: No problem. I am glad. I am glad to have been able to show it to you. Copyright © RedMonk, LLC 2010
  10. 10. About RedMonk RedMonk is the first and only "maker" focused industry analyst firm. We believe that developers, operations staff, and those who are on the front lines of implementing and using IT are the most important constituency in technology. We focus on how new and old technologies are being applied by these makers to run businesses and help achieve the goals of their organizations. RedMonk advises both buyers and sellers of technology, providing all of our research for free at in the form of blogs, podcasts, videos, presentations, and other mediums. While it’s impossible given the breadth to simply distill our coverage and views, the core thesis that guides much of our work is that technology adoption is increasingly a bottom up proposition. The supporting evidence abounds; think Linux, Apache, MySQL, PHP, Firefox, Cloud Computing, Eclipse, and the consumerization of IT. All of these are successful because they’ve built from the ground floor, often in grassroots fashion. So the question we pose to you is this: you may have analysts that help you understand top down. Who do you have that does bottom up? Copyright © RedMonk, LLC 2010