The document discusses 3 case studies of companies using SL Corporation's RTView Enterprise Monitor to solve challenges monitoring TIBCO applications across 3 industries. A retail company used it to monitor TIBCO applications across 4,000 stores. A transportation company used it to monitor heterogeneous applications including TIBCO. An insurance company used it to monitor shared TIBCO services. RTView provided centralized monitoring and alerting for TIBCO applications.
Launch of new store operations system at 4,000 stores with limited visibility into critical business applications at each store Significant revenue impact if new system didn’t work Did not plan for troubleshooting at the store level Stores needed to coordinate flawlessly with central business processes and database Key applications include inventory, search, lookup of accounts. These processes enable the stores to check inventory from another store, track inventory, upload or sync data to a data center, transfer customer records from one store to another, etc. Store Operations Team blind to problems with TIBCO applications at an individual store until the store calls TIBCO infrastructure supports key business processes Extremely reactive Lack of visibility into basic health state, for example, no idea whether the store’s EMS servers and/or BW engines are up or down. No learning mechanisms - Patterns? Systematic errors? Does this store always have problems? Unable to see if current store performance is normal because unable to compare to historical trends for specific dates and times. Lack of visibility into store performance has a direct impact on customers
Central, single-pane-of-glass view of the health of TIBCO infrastructure at all 4,000 stores Regional traffic-light map of all states with drill down to TIBCO infrastructure at individual stores EMS and BW monitors at each store send performance data to central EMS and BW monitors for custom views and alerts Central access to historical trends indicates when something is not normal
Store Operations Team has real-time visibility to know if any store has a problem Proactive monitoring instead of reacting to phone call from store Central support now has visibility into the health state of a store and can fix it before the store realizes there’s a problem Improved availability of critical business processes Since deployment, there has been an estimated 70% reduction in degradations of their TIBCO infrastructure impacting store productivity [I’m making this up. Still trying to put my hands on the POC documentation to see if anything was specified there. Otherwise, we’ll need to litmus test this.] Historical context provides a baseline for understanding normal store performance
Too many alerts; didn’t know which are important; don’t know who to call in the middle of the night No visibility into the dependencies among infrastructure and middleware components that support a business service Running EMS and BW along with Oracle databases, WebLogic, and Oracle Coherence Had to really dig to identify the source of a problem Spent a lot of time looking at log files Lack of visibility into history of CIs
Weigh alerts through assigning criticality Something unimportant can be assigned a lower criticality Filter alerts to define those that merit notification Custom correlation of alerts. Able to define exactly when they are notified. Only send me an email if 4 of the 6 engines go down. If only 1 or 2 are down, show as red on my service-level display, but don’t send me an email. Rapidly deployed pre-configured solution packages for all middleware technologies Simply plugged in solution packages for TIBCO EMS and BW, as well as Oracle database, WLS, and Coherence. Solution packages are pre-configured with all of the caches and typical alerts CMDB functionality used to create on-the-fly service definitions Service 1 is dependent on these components (ie, 2 BW engines, 5 WLS instances, etc.) Defined within RTView EM Component- and CI-level information bubbles up to application-level displays If Service 1 is red, able to then drill back down to not only the component, but the exact metric that cause the problem (ie, the message count on EMS Server 7). With history, can determine what went wrong and when for faster resolution.
Shared services among all groups in the organization Central team responsible for maintaining, managing and monitoring these services on behalf of hundreds of critical applications Overwhelmed Anticipate a 30% increase in number of applications in 2013 Increasing number of users Increasing demands for performance of shared services Unable to correlate critical applications with underlying technologies and service users No concept of a CMDB Unable to efficiently maintain health of TIBCO infrastructure Lack of visibility into EMS, BusinessWorks (BW) and BusinessEvents (BE) If there was a problem, they had no idea what caused it and would simply restart the app BE is a black box - had missing or “limbo state” transactions in BE Unable to know resources allocation and usage
Single pane of glass views for TIBCO EMS, BW and BE Displays for each of EMS, BW and BE Mapping critical applications to both infrastructure and users of shared services History for all three TIBCO technologies Trace back to where a component broke Visibility into resource allocation and usage of EMS and BW Could see if a single host has 10 BW engines using 90% of CPU while another box is only using 10% Also got better visibility across multiple data centers - Agents running at each data center report to RTView EM Integration with other existing systems For example, alerts from RTView EM were integrated with their existing SNMP, Tivoli and HP OpenView monitoring systems
Greatly reduced time to troubleshoot problems Able to just look at the display and drill down to failed component What took hours now takes ~10 minutes From reactive to proactive Visibility into what’s happening currently Contextual alerts before something breaks, PLUS Historical tracing allows them to see what had been happening, and what was happening at the exact point of a break or degradation Now know their resource allocation and usage Can see exactly where and how they can accommodate new apps both across TIBCO infrastructure and across data centers Able to leverage alerts in existing SNMP, Tivoli and HP OpenView monitoring systems