SlideShare a Scribd company logo
1 of 47
Produced by Wellesley Information Services, LLC, publisher of SAPinsider. © 2018 Wellesley Information Services. All rights reserved.
How to Stabilise and Improve an SAP
BusinessObjects BI 4.2 Enterprise Shared
Service Environment
Martin Macmaster
BI Brainz
11
• Learn how to investigate your SAP BusinessObjects BI 4.2 environment and diagnose
issues causing outages and stability problems
• Understand the various options available to resolve the issues you find and to stabilise
your SAP BusinessObjects BI 4.2 environment
• Consider factors which could have led to the issues on your landscape, and processes
and safeguards you can put into place to avoid future issues
• Identify areas that can be improved to boost the resilience of your SAP BusinessObjects
BI 4.2 platform
In This Session
22
• Example use case background
• Understanding what happened in the BI environment
• Troubleshooting the problems
• Forming a task force to solve the problems
• Wrap-up
What We’ll Cover
33
„ Example Use Case Background
44
• Based on a series of outages at a customer in 2017
• A large enterprise shared-service SAP BusinessObjects 4.2 landscape
s 70k users in Production
s Four on-premise tiers
s One cloud tier
• All servers are virtual
• Multiple outages per week over four month period
Background
55
• SAP BusinessObjects Platform 4.2 SP02
s Many private fixes/hotfixes
• SAP Analysis, edition for MS office 2.2.3
• SAP Design Studio 1.6 sp01
• SAP Lumira 1.30
• Tomcat 8.0 + non-default SAP JVM
• A10 load balancer
• Microsoft Windows 2012 R2
• SQL Server 2014
Landscape Details
66
• Sandbox tier: Two cloud servers
s Dedicated DB server
• Development tier: Three on-premise servers – one each web, core, proc
• Acceptance tier: Three on-premise servers – one each web, core, proc
s DEV/ACC share database server
• Quality tier: Five on-premise servers
s Dedicated DB server one web, one core, three proc
• Production tier: 10 on-premise servers – two web, two core, six proc
s Dedicated DB server
s Load balancer
Landscape Details (cont.)
77
• Users access BI Launchpad via:
s Active Directory Single Sign-On (AD SSO)
s SAP via NetWeaver® Single Sign-On (SAP SSO)
s Manual enterprise authentication
• Connected to:
s Nine BW systems using STS SSO
s Six SAP HANA systems via SAML
s Several other Oracle, SQL Server and DB2 databases
• Three satellite systems replicate content from Production
• 3000+ scheduled jobs per day
Authentication and Connected Data Sources
88
„ Understanding What Happened
in the BI Environment
99
• After 18 months of stability, users started losing access to Production
• SSO attempts into BI Launchpad hung
• Attempts to manually login (using CMC or /loginNoSso.jsp) would present the login page,
but on entering user details it would also hang
• Schedule jobs started to fail
• Once Production started having issues, other tiers (but not all tiers) started having similar
issues – any attempt to login would hang
• Started happening multiple times per week
• In addition there was a surge in other issues on the landscape
What Happened?
1010
„ Troubleshooting the Problems
1111
• First we needed to isolate the issues
• In our example we have issues on Production (the priority) and the lower tiers
s In Production, we have issues for end-user access and back-end scheduling of jobs
• The quickest route to resolution is to do a full cluster restart
s However, doing this immediately may be overkill and also removes any possibility of
learning more about the issue
s Keep the full restart until you’ve learned more about the problem if you can
„ There’s always a push from the business to have the system back up and running
quickly, but there’s always the chance this will re-occur
„ Better to learn more
Troubleshooting – Where to Start?
1212
• Does this occur for all users (AD, SAP, enterprise)?
s Production: Yes
s Other tiers: Yes
• Can the users access the system via client tools/Central Configuration Manager (CCM)?
s Production: No
s Other tiers: Yes
• Can you use a different environment’s Tomcat to login to the impacted environments?
s Production: No
s Other tiers: Yes
• Do other web apps work (admin tools/dswsbobje)?
s Production: No – although the pages are served
s Other tiers: Yes
Isolate the Issues – A Summary
1313
• Does the system resolve itself without a restart?
s Production: No
s Other tiers: On occasion, yes
• Does restarting Tomcat resolve the issue?
s Production: No
s Other tiers: Yes
• Does a full restart resolve the issue?
s Production: Yes
s Other tiers: Not required
Isolate the Issues – A Summary (cont.)
1414
• Tracing has two benefits
s SAP will always ask for log files for server-side issues
s If you know how to read the trace files, you can try and find the problem yourself
• SAP Client Plug-In is available from the SAP support site
s Allows you to trace a single user session – and only that session – through the system
• To activate the tracing, launch the plug-in and launch IE from within the plug-in
s Depending on your IT policies it may be best to run as administrator
• Once it’s running:
1. Set the trace level to high, and
2. Before trying to reproduce your issue, click “Start Transaction”
Tracing the Issues
1515
• Download the plug-in from https://launchpad.support.sap.com/#/notes/1861180
SAP Client Plug-In
1616
• There appears to be “virus-like” behaviour
• Once Production is in trouble, other tiers have issues
s However, these are not the same issues
• When Production goes, it needs a full restart
• Outage seems to be caused by a breakdown in CORBA communications across system
Initial Findings
1717
• In the other tiers, the application is still accessible via means other than that system’s
own Tomcat
s There is no CORBA breakdown within the other tiers
• It’s not clear what is causing the breakdown in Production
• After several weeks of problems, we decided to form a task force to thoroughly
investigate the issues
Initial Findings (cont.)
1818
„ Forming a Task Force to Solve
the Problems
1919
• Internal support organisation coordinated the task force
• Representatives invited from:
s SAP
s Infrastructure provider
s Network support
s SMEs from an internal Centre of Excellence
• As outages were occurring across multiple tiers at the same time, shared features were
identified and ruled in or out
• Following areas were investigated in parallel, lead by the Centre of Excellence:
s Application
s Infrastructure
s Database
s Web application server
The Task Force
2020
• As with all issues like this, the best place to start is the Client Plug-In from SAP
s With this plug-in you can trace an individual workflow from beginning to end
• During the outages, PRD is inaccessible so it’s not possible to interact with the system
other than to attempt to login
• To properly analyse trace files it’s required to have a good working knowledge of how the
system should behave
Application Investigation
2121
• A good place to start to gain that knowledge, is to compare working and not
working workflows
• A working login to the system will look like the trace excerpt below
Application Investigation (cont.)
2222
• Compare that to a trace file from a login attempt during the outages
• As you can see from the above, after the attempt to login nothing happens
s In the working trace file you can see the workflow communicating with processes
within the system
s So, when the login attempt occurs it cannot communicate from Tomcat with the other
processes in the system
Application Investigation (cont.)
2323
• Early in the investigation there was a theory that the threads were getting stuck on
looking up other servers
s As a result an investigation took place into DNS
• The relevant host IP addresses were already in place in the operating system’s hosts file
• No issue was found at DNS level
Infrastructure Investigation – DNS
2424
• Customer infrastructure runs on VMWare Virtualisation
• Working with the infrastructure provider, one potential area of interest was vMotion
s vMotion is functionality within VMWare that moves a VM from one host to another host
depending on the resources remaining on the host
s Therefore by default, the application is sharing resources with other application
servers
• Previously, a Tableau implementation at the customer had been destabilised whenever a
vMotion event occurred for one of the servers
• From the vMotion logs we could tell that vMotion was occurring on a regular basis for all
of the Production BI4 servers
s The logs did not show these occurring at the time of outages
Infrastructure Investigation – Virtualisation
2525
• However, the logs did show the VMs were not deployed on dedicated hosts
• A request was made to move the VMs for Production to dedicated hosts
• While this was not a part of the issue, it is an improvement that was made to the system
as a result of the outages
Infrastructure Investigation – Virtualisation (cont.)
2626
• At the time of the system build, the decision was made to run a single database server for
both CMS and audit databases
s DB server is 8 CPU, 40GB RAM
s DB runs SQL server 2014
s Based on our sizing calculations this should have been enough to deal with the usage
• From checking the event viewer logs, we noticed errors stating there was not enough free
memory to launch SQL Server Management tools
• Errors on the servers running the CMS processes also showed issues on DB level
s SAP BusinessObjects BI platform CMS: Lost a CMS system database connection to
""PRDCMS"" – 8 CMS system database connections are remaining
„ Reason: [Microsoft][SQL Server Native Client 11.0]TCP Provider: The semaphore
timeout period has expired.
• The default number of connections to the database is 14, which is set at application level
Database Investigation
2727
• Upon closer inspection, the SQLServer.exe process consumes 36GB from the start up of
the server
• The DBAs had configured SQL Server to use 36GB RAM at all times
s This was not giving Windows enough room to operate efficiently
Database Investigation (cont.)
2828
• While using this setting, SQL Server will load all databases into memory
s However, the CMS DB was 5GB in size, the audit DB was 80GB in size
s Therefore, the full extent of the databases could not be loaded into memory
• Would the CMS and audit databases benefit from existing in memory?
s CMS DB would benefit from existing in memory
s Audit DB was being extracted on a regular basis to be stored in HANA, so the copy in
SQL Server was only for the most recent information
s As a result, there is no real benefit from the audit DB being loaded into memory
Database Investigation (cont.)
2929
• When SQL Server loads databases into memory it will still have to flush the information
from memory to the data files on the system
s Theory: When this happened for the audit DB, due to resource sharing, it caused
delayed responses to CMS process requests
„ This had a subsequent impact to the stability of the system, causing the breakdown
in CORBA communication
• Unfortunately, the logging available on SQL Server was not capable of showing us these
flushes and whether or not they corresponded to the outages
• Regardless of whether or not this is the root cause of the issue, the SQL Server database
is evidently struggling for resources
Database Investigation (cont.)
3030
• Three recommendations came out of this part of the investigation
1. Increase the RAM on the SQL Server database without increasing the SQL Server
memory allocation
„ Short-term solution: Can be completed quickly with VMWare and will give the
operating system more resources
2. Create a SQL Server cluster to run the CMS and audit DBs, keeping them in a single
clustered instance
„ Not an option supplied by the infrastructure provider
„ To have this added to the catalogue would have required a long project, at which
point the hardware would still have to be ordered
3. Add a new database server to run the audit database
„ Medium-term solution: Quicker than Option 2, but still longer than Option 1
Database Investigation (cont.)
3131
• Following Recommendation 1 resolved the CORBA communication break down
s Chosen as the fastest route to resolution, while Recommendation 3 would be
implemented further down the line
• RAM was increased by 8GB
• No changes were made to the SQL Server RAM usage
Database Investigation (cont.)
3232
• While the database looks like the culprit, we still have the puzzle of the virus-like
behaviour
• The lower tiers were only impacted at Tomcat level, after Production Tomcats were
already impacted
• In Production there was a load balancer in front of the two Tomcats
s This was the only tier where there was a load balancer
• Our infrastructure provider suggested we test new settings directly in Production
s Led to instability and a huge number of sessions in Tomcat Manager that started
causing issues with the MDAS
s As a result the decision was made to add a new load balancer in QUA to allow testing
in lower tiers
Web Application Server Investigation
3333
• The result of the increase in MDAS Tomcat sessions led to a discussion around the load
balancer model
s Our load balancer was not certified by SAP at this time
s SAP and the load balancer vendor worked together to get the load balancer certified
and provide best practice settings
• Network monitoring was installed on several servers in all tiers to try to determine a link
between the environments
• Separate BI4 environments should not talk to each other unless directed to by an end
user (e.g., Promotion Management).
s We found that the Tomcat servers in affected lower tiers were trying to communicate at
a CORBA level with Production during a login workflow, which should not happen
„ Log files revealed that Tomcat was talking to several different environments
Web Application Server Investigation (cont.)
3434
• Checking the log files in the lower tiers, we found the systems that suffered outages
always seemed to have a reference to Production
• Log files showed Tomcat reads the other clusters from the system
Web Application Server Investigation (cont.)
3535
• A network monitoring tool was installed on several servers
• Output of this tool shows lower tiers are communicating with the higher tiers
Web Application Server Investigation (cont.)
3636
• Only place in the Tomcat setup that would identify other BI4 systems was the
clusterinfo.1400.properties file
s This is a file store in Tomcatlogs.businessobjects used to resolve a cluster name
(@cluustername) when logging into a BI4 system
s Typically contains info such as:
„ @bi-
prd=192.168.1.101:6400;server1.bibrainz.com:6400;192.168.1.102:6400;server2.bibr
ainz.com:6400;36d51257319b49c13d1bb05f48608164
Web Application Server Investigation (cont.)
3737
• When the BOE web application then tried to run a ManagedService() call against PRD, the
threads got stuck
s There’s no way to restrict access through the application to this file
s Any changes to the file would be performed by the Service Account, so the
permissions couldn’t be locked down at OS level
s Permissions would be updated any time a user tries to resolve a different system name
in the application (e.g., Promotion Management, again)
• Solution
s Make the file read-only
s After making this change the “virus-like” behaviour ceased
„ If Production went down, only Production went down
Web Application Server Investigation (cont.)
3838
• Monitoring is a big topic and one that comes up regularly at customers
s SAP encouraged the implementation of SAP Solution Manager
„ This was already in progress, however an error in the NCS instrumentation
functionality within BI4 had other effects
§ Scheduled jobs would begin to fail regularly when the NCS instrumentation was set
§ The only way to resolve this was a restart after setting the instrumentation to 0
• At our customer there are various monitoring tools in place
s Some from the infrastructure provider, some from support organisation, but none were
particularly useful
Other Areas Investigated
3939
„ Wrap-up
4040
• Two distinct issues
s Production outage
s Outages on the lower tiers (“virus-like” behaviour)
• Production outage was caused by instabilities in the CMS
s There were resource issues on the CMS and audit DB server and once this destabilised
the CMS, the CORBA communication across the application failed
„ Increasing the resources available on the database server resolved the instabilities
s Outages in the lower tiers were caused by the Web Application server attempting to
communicate with the Production system – got this information from the
clusterinfo.1400.properties file
„ Clearing the file out and only leaving the information that was needed in the file was
the first step
„ The file can then be set to read-only
What Was the Cause of Instability?
4141
• Understanding the behaviour of the virtualisation software led to requesting dedicated
hosts for the Production environment
• While the vMotion functionality was not found to be a cause, it’s an on-going concern for
the support organisation
s Will do performance testing to investigate pros and cons of keeping this functionality
active for the BI4 application
• One recommendation from the database investigation was to add a separate audit
DB server
s While this was a medium-term solution, it would help separate some of the issues
with the audit DB from the CMS
• Investigation into the load balancer led to:
s Realisation that the provided load balancer was not certified by SAP
s SAP and the load balancer vendor working together to provide best practice
configuration settings, which have improved the performance of the application
Other Improvements
4242
• https://launchpad.support.sap.com/#/notes/1861180
s SAP Client Plug-In
„ Requires login credentials to the SAP ONE Support Launchpad
• https://sapinsider.wispubs.com/Assets/Q-and-As/2014/November/QA-with-Johann-Kottas-on-BI-Reporting-
Performance
s “What’s Slowing Down Your BI Reports? Q&A on End-to-End Analysis of SAP BusinessObjects BI
Performance with Johann Kottas” (SAPinsider, November 2014).
• https://blogs.sap.com/2016/07/12/solutions-to-optimise-the-performance-in-sap-businessobjects-reports-
and-dashboard/
s “Solutions to Optimise the Performance in SAP BusinessObjects Reports and Dashboard” (SAP
Community Blogs, July 2016).
• https://wiki.scn.sap.com/wiki/display/BOBJ/Sizing+and+Deploying+SAP+BusinessObjects+BI+4.x+Platform
+and+Add-Ons
s “Sizing and Deploying SAP BusinessObjects BI 4.x Platform and Add-Ons” (SAP Community WIKI, April
2018).
Where to Find More Information
4343
Key Points to Take Home
„ Isolate the issues to determine where your problems are
„ Understand your full environment: Application, infrastructure, web application,
and database
„ Every part matters – from the resources available to your database server to
the way your application is used
„ Understanding trace files can help pinpoint causes
„ When investigating stability issues, you may stumble across improvements
that can be applied to your environment
4444
Please remember to complete
your session evaluation
Thank You
Any Questions?
*
t
Martin Macmaster
martin.macmaster@bibrainz.com
@MartBIBrainz
Your Turn!
45
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP SE.
45
Disclaimer
Wellesley	Information	Services,	20	Carematrix	Drive,	Dedham,	MA	02026
Copyright	©	2018	Wellesley	Information	Services.	All	rights	reserved.

More Related Content

What's hot

Monitor every app, in every stage, with free and open Elastic APM
Monitor every app, in every stage, with free and open Elastic APMMonitor every app, in every stage, with free and open Elastic APM
Monitor every app, in every stage, with free and open Elastic APMElasticsearch
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platformJenkins NS
 
[Webinar] WSO2 Enterprise Integrator 7.1.0 Release
[Webinar] WSO2 Enterprise Integrator 7.1.0 Release[Webinar] WSO2 Enterprise Integrator 7.1.0 Release
[Webinar] WSO2 Enterprise Integrator 7.1.0 ReleaseWSO2
 
SAP Cloud Platform - The Business Platform for the Intelligent Enterprise
SAP Cloud Platform - The Business Platform for the Intelligent EnterpriseSAP Cloud Platform - The Business Platform for the Intelligent Enterprise
SAP Cloud Platform - The Business Platform for the Intelligent EnterpriseSAP Cloud Platform
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console Splunk
 
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...Microsoft Private Cloud
 
SAP Cloud Platform Product Overview
SAP Cloud Platform Product OverviewSAP Cloud Platform Product Overview
SAP Cloud Platform Product OverviewSAP Cloud Platform
 
Infrastructure as Code for Beginners
Infrastructure as Code for BeginnersInfrastructure as Code for Beginners
Infrastructure as Code for BeginnersDavid Völkel
 
Office 365 and using SharePoint Online
Office 365 and using SharePoint OnlineOffice 365 and using SharePoint Online
Office 365 and using SharePoint OnlineCliff Ashcroft
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy IntroductionDoug Gregory
 
SCCM_Overview_Updated.pptx
SCCM_Overview_Updated.pptxSCCM_Overview_Updated.pptx
SCCM_Overview_Updated.pptxVasanVasanth2
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesAmazon Web Services
 
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...OpenText
 
Oracle SOA Suite Overview - Integration in a Service-Oriented World
Oracle SOA Suite Overview - Integration in a Service-Oriented WorldOracle SOA Suite Overview - Integration in a Service-Oriented World
Oracle SOA Suite Overview - Integration in a Service-Oriented WorldOracleContractors
 

What's hot (20)

Introduction to SAP BTP
Introduction to SAP BTPIntroduction to SAP BTP
Introduction to SAP BTP
 
Monitor every app, in every stage, with free and open Elastic APM
Monitor every app, in every stage, with free and open Elastic APMMonitor every app, in every stage, with free and open Elastic APM
Monitor every app, in every stage, with free and open Elastic APM
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platform
 
[Webinar] WSO2 Enterprise Integrator 7.1.0 Release
[Webinar] WSO2 Enterprise Integrator 7.1.0 Release[Webinar] WSO2 Enterprise Integrator 7.1.0 Release
[Webinar] WSO2 Enterprise Integrator 7.1.0 Release
 
SAP Cloud Platform - The Business Platform for the Intelligent Enterprise
SAP Cloud Platform - The Business Platform for the Intelligent EnterpriseSAP Cloud Platform - The Business Platform for the Intelligent Enterprise
SAP Cloud Platform - The Business Platform for the Intelligent Enterprise
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
 
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...
Introduction to Microsoft SharePoint Online Capabilities, Security, Deploymen...
 
Ui path| RPA
Ui path| RPAUi path| RPA
Ui path| RPA
 
SAP Cloud Platform Product Overview
SAP Cloud Platform Product OverviewSAP Cloud Platform Product Overview
SAP Cloud Platform Product Overview
 
Infrastructure as Code for Beginners
Infrastructure as Code for BeginnersInfrastructure as Code for Beginners
Infrastructure as Code for Beginners
 
SAP on Azure - Deck
SAP on Azure - DeckSAP on Azure - Deck
SAP on Azure - Deck
 
Office 365 and using SharePoint Online
Office 365 and using SharePoint OnlineOffice 365 and using SharePoint Online
Office 365 and using SharePoint Online
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy Introduction
 
SCCM_Overview_Updated.pptx
SCCM_Overview_Updated.pptxSCCM_Overview_Updated.pptx
SCCM_Overview_Updated.pptx
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best Practices
 
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...
OpenText Content Suite Platform and OpenText Extended ECM: What’s New in Rele...
 
Oracle SOA Suite Overview - Integration in a Service-Oriented World
Oracle SOA Suite Overview - Integration in a Service-Oriented WorldOracle SOA Suite Overview - Integration in a Service-Oriented World
Oracle SOA Suite Overview - Integration in a Service-Oriented World
 
SAP Business One
SAP Business OneSAP Business One
SAP Business One
 
Overview of SharePoint Server 2019 Public Preview
Overview of SharePoint Server 2019 Public PreviewOverview of SharePoint Server 2019 Public Preview
Overview of SharePoint Server 2019 Public Preview
 

Similar to How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared Service Environment

IBM Connections – Managing Growth and Expansion
IBM Connections – Managing Growth and ExpansionIBM Connections – Managing Growth and Expansion
IBM Connections – Managing Growth and ExpansionLetsConnect
 
Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...Blackboard APAC
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtimeDBmaestro - Database DevOps
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...BI Brainz
 
Instalacion de windows server 2012
Instalacion de windows server 2012Instalacion de windows server 2012
Instalacion de windows server 2012Salazar Jorge
 
Citrix and Desktop Migration Success
Citrix and Desktop Migration SuccessCitrix and Desktop Migration Success
Citrix and Desktop Migration SuccesseG Innovations
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
SharePoint 2016 Platform Adoption Lessons Learned and Advanced Troubleshooting
SharePoint 2016 Platform Adoption   Lessons Learned and Advanced TroubleshootingSharePoint 2016 Platform Adoption   Lessons Learned and Advanced Troubleshooting
SharePoint 2016 Platform Adoption Lessons Learned and Advanced TroubleshootingJohn Calvert
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
SPSNYC SharePoint Worst Practices
SPSNYC SharePoint Worst PracticesSPSNYC SharePoint Worst Practices
SPSNYC SharePoint Worst PracticesScott Hoag
 
Road to agile: federal government case study
Road to agile: federal government case studyRoad to agile: federal government case study
Road to agile: federal government case studyDavid Marsh
 
Windows 2012 R2 Multi Server Management
Windows 2012 R2 Multi Server ManagementWindows 2012 R2 Multi Server Management
Windows 2012 R2 Multi Server ManagementSharkrit JOBBO
 
Performance Testing
Performance TestingPerformance Testing
Performance TestingAnu Shaji
 
Performance Tuning in the Trenches
Performance Tuning in the TrenchesPerformance Tuning in the Trenches
Performance Tuning in the TrenchesDonald Belcham
 
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesOpen source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesRogue Wave Software
 
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good Server
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good ServerIBM Connect 2017: Back from the Dead: When Bad Code Kills a Good Server
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good ServerSerdar Basegmez
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Why advanced monitoring is key for healthy
Why advanced monitoring is key for healthyWhy advanced monitoring is key for healthy
Why advanced monitoring is key for healthyDenodo
 

Similar to How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared Service Environment (20)

IBM Connections – Managing Growth and Expansion
IBM Connections – Managing Growth and ExpansionIBM Connections – Managing Growth and Expansion
IBM Connections – Managing Growth and Expansion
 
Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
Instalacion de windows server 2012
Instalacion de windows server 2012Instalacion de windows server 2012
Instalacion de windows server 2012
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Citrix and Desktop Migration Success
Citrix and Desktop Migration SuccessCitrix and Desktop Migration Success
Citrix and Desktop Migration Success
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
SharePoint 2016 Platform Adoption Lessons Learned and Advanced Troubleshooting
SharePoint 2016 Platform Adoption   Lessons Learned and Advanced TroubleshootingSharePoint 2016 Platform Adoption   Lessons Learned and Advanced Troubleshooting
SharePoint 2016 Platform Adoption Lessons Learned and Advanced Troubleshooting
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
SPSNYC SharePoint Worst Practices
SPSNYC SharePoint Worst PracticesSPSNYC SharePoint Worst Practices
SPSNYC SharePoint Worst Practices
 
Road to agile: federal government case study
Road to agile: federal government case studyRoad to agile: federal government case study
Road to agile: federal government case study
 
Windows 2012 R2 Multi Server Management
Windows 2012 R2 Multi Server ManagementWindows 2012 R2 Multi Server Management
Windows 2012 R2 Multi Server Management
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
 
Performance Tuning in the Trenches
Performance Tuning in the TrenchesPerformance Tuning in the Trenches
Performance Tuning in the Trenches
 
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesOpen source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packages
 
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good Server
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good ServerIBM Connect 2017: Back from the Dead: When Bad Code Kills a Good Server
IBM Connect 2017: Back from the Dead: When Bad Code Kills a Good Server
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Extreme Makeover OnBase Edition
Extreme Makeover OnBase EditionExtreme Makeover OnBase Edition
Extreme Makeover OnBase Edition
 
Why advanced monitoring is key for healthy
Why advanced monitoring is key for healthyWhy advanced monitoring is key for healthy
Why advanced monitoring is key for healthy
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared Service Environment

  • 1. Produced by Wellesley Information Services, LLC, publisher of SAPinsider. © 2018 Wellesley Information Services. All rights reserved. How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared Service Environment Martin Macmaster BI Brainz
  • 2. 11 • Learn how to investigate your SAP BusinessObjects BI 4.2 environment and diagnose issues causing outages and stability problems • Understand the various options available to resolve the issues you find and to stabilise your SAP BusinessObjects BI 4.2 environment • Consider factors which could have led to the issues on your landscape, and processes and safeguards you can put into place to avoid future issues • Identify areas that can be improved to boost the resilience of your SAP BusinessObjects BI 4.2 platform In This Session
  • 3. 22 • Example use case background • Understanding what happened in the BI environment • Troubleshooting the problems • Forming a task force to solve the problems • Wrap-up What We’ll Cover
  • 4. 33 „ Example Use Case Background
  • 5. 44 • Based on a series of outages at a customer in 2017 • A large enterprise shared-service SAP BusinessObjects 4.2 landscape s 70k users in Production s Four on-premise tiers s One cloud tier • All servers are virtual • Multiple outages per week over four month period Background
  • 6. 55 • SAP BusinessObjects Platform 4.2 SP02 s Many private fixes/hotfixes • SAP Analysis, edition for MS office 2.2.3 • SAP Design Studio 1.6 sp01 • SAP Lumira 1.30 • Tomcat 8.0 + non-default SAP JVM • A10 load balancer • Microsoft Windows 2012 R2 • SQL Server 2014 Landscape Details
  • 7. 66 • Sandbox tier: Two cloud servers s Dedicated DB server • Development tier: Three on-premise servers – one each web, core, proc • Acceptance tier: Three on-premise servers – one each web, core, proc s DEV/ACC share database server • Quality tier: Five on-premise servers s Dedicated DB server one web, one core, three proc • Production tier: 10 on-premise servers – two web, two core, six proc s Dedicated DB server s Load balancer Landscape Details (cont.)
  • 8. 77 • Users access BI Launchpad via: s Active Directory Single Sign-On (AD SSO) s SAP via NetWeaver® Single Sign-On (SAP SSO) s Manual enterprise authentication • Connected to: s Nine BW systems using STS SSO s Six SAP HANA systems via SAML s Several other Oracle, SQL Server and DB2 databases • Three satellite systems replicate content from Production • 3000+ scheduled jobs per day Authentication and Connected Data Sources
  • 9. 88 „ Understanding What Happened in the BI Environment
  • 10. 99 • After 18 months of stability, users started losing access to Production • SSO attempts into BI Launchpad hung • Attempts to manually login (using CMC or /loginNoSso.jsp) would present the login page, but on entering user details it would also hang • Schedule jobs started to fail • Once Production started having issues, other tiers (but not all tiers) started having similar issues – any attempt to login would hang • Started happening multiple times per week • In addition there was a surge in other issues on the landscape What Happened?
  • 12. 1111 • First we needed to isolate the issues • In our example we have issues on Production (the priority) and the lower tiers s In Production, we have issues for end-user access and back-end scheduling of jobs • The quickest route to resolution is to do a full cluster restart s However, doing this immediately may be overkill and also removes any possibility of learning more about the issue s Keep the full restart until you’ve learned more about the problem if you can „ There’s always a push from the business to have the system back up and running quickly, but there’s always the chance this will re-occur „ Better to learn more Troubleshooting – Where to Start?
  • 13. 1212 • Does this occur for all users (AD, SAP, enterprise)? s Production: Yes s Other tiers: Yes • Can the users access the system via client tools/Central Configuration Manager (CCM)? s Production: No s Other tiers: Yes • Can you use a different environment’s Tomcat to login to the impacted environments? s Production: No s Other tiers: Yes • Do other web apps work (admin tools/dswsbobje)? s Production: No – although the pages are served s Other tiers: Yes Isolate the Issues – A Summary
  • 14. 1313 • Does the system resolve itself without a restart? s Production: No s Other tiers: On occasion, yes • Does restarting Tomcat resolve the issue? s Production: No s Other tiers: Yes • Does a full restart resolve the issue? s Production: Yes s Other tiers: Not required Isolate the Issues – A Summary (cont.)
  • 15. 1414 • Tracing has two benefits s SAP will always ask for log files for server-side issues s If you know how to read the trace files, you can try and find the problem yourself • SAP Client Plug-In is available from the SAP support site s Allows you to trace a single user session – and only that session – through the system • To activate the tracing, launch the plug-in and launch IE from within the plug-in s Depending on your IT policies it may be best to run as administrator • Once it’s running: 1. Set the trace level to high, and 2. Before trying to reproduce your issue, click “Start Transaction” Tracing the Issues
  • 16. 1515 • Download the plug-in from https://launchpad.support.sap.com/#/notes/1861180 SAP Client Plug-In
  • 17. 1616 • There appears to be “virus-like” behaviour • Once Production is in trouble, other tiers have issues s However, these are not the same issues • When Production goes, it needs a full restart • Outage seems to be caused by a breakdown in CORBA communications across system Initial Findings
  • 18. 1717 • In the other tiers, the application is still accessible via means other than that system’s own Tomcat s There is no CORBA breakdown within the other tiers • It’s not clear what is causing the breakdown in Production • After several weeks of problems, we decided to form a task force to thoroughly investigate the issues Initial Findings (cont.)
  • 19. 1818 „ Forming a Task Force to Solve the Problems
  • 20. 1919 • Internal support organisation coordinated the task force • Representatives invited from: s SAP s Infrastructure provider s Network support s SMEs from an internal Centre of Excellence • As outages were occurring across multiple tiers at the same time, shared features were identified and ruled in or out • Following areas were investigated in parallel, lead by the Centre of Excellence: s Application s Infrastructure s Database s Web application server The Task Force
  • 21. 2020 • As with all issues like this, the best place to start is the Client Plug-In from SAP s With this plug-in you can trace an individual workflow from beginning to end • During the outages, PRD is inaccessible so it’s not possible to interact with the system other than to attempt to login • To properly analyse trace files it’s required to have a good working knowledge of how the system should behave Application Investigation
  • 22. 2121 • A good place to start to gain that knowledge, is to compare working and not working workflows • A working login to the system will look like the trace excerpt below Application Investigation (cont.)
  • 23. 2222 • Compare that to a trace file from a login attempt during the outages • As you can see from the above, after the attempt to login nothing happens s In the working trace file you can see the workflow communicating with processes within the system s So, when the login attempt occurs it cannot communicate from Tomcat with the other processes in the system Application Investigation (cont.)
  • 24. 2323 • Early in the investigation there was a theory that the threads were getting stuck on looking up other servers s As a result an investigation took place into DNS • The relevant host IP addresses were already in place in the operating system’s hosts file • No issue was found at DNS level Infrastructure Investigation – DNS
  • 25. 2424 • Customer infrastructure runs on VMWare Virtualisation • Working with the infrastructure provider, one potential area of interest was vMotion s vMotion is functionality within VMWare that moves a VM from one host to another host depending on the resources remaining on the host s Therefore by default, the application is sharing resources with other application servers • Previously, a Tableau implementation at the customer had been destabilised whenever a vMotion event occurred for one of the servers • From the vMotion logs we could tell that vMotion was occurring on a regular basis for all of the Production BI4 servers s The logs did not show these occurring at the time of outages Infrastructure Investigation – Virtualisation
  • 26. 2525 • However, the logs did show the VMs were not deployed on dedicated hosts • A request was made to move the VMs for Production to dedicated hosts • While this was not a part of the issue, it is an improvement that was made to the system as a result of the outages Infrastructure Investigation – Virtualisation (cont.)
  • 27. 2626 • At the time of the system build, the decision was made to run a single database server for both CMS and audit databases s DB server is 8 CPU, 40GB RAM s DB runs SQL server 2014 s Based on our sizing calculations this should have been enough to deal with the usage • From checking the event viewer logs, we noticed errors stating there was not enough free memory to launch SQL Server Management tools • Errors on the servers running the CMS processes also showed issues on DB level s SAP BusinessObjects BI platform CMS: Lost a CMS system database connection to ""PRDCMS"" – 8 CMS system database connections are remaining „ Reason: [Microsoft][SQL Server Native Client 11.0]TCP Provider: The semaphore timeout period has expired. • The default number of connections to the database is 14, which is set at application level Database Investigation
  • 28. 2727 • Upon closer inspection, the SQLServer.exe process consumes 36GB from the start up of the server • The DBAs had configured SQL Server to use 36GB RAM at all times s This was not giving Windows enough room to operate efficiently Database Investigation (cont.)
  • 29. 2828 • While using this setting, SQL Server will load all databases into memory s However, the CMS DB was 5GB in size, the audit DB was 80GB in size s Therefore, the full extent of the databases could not be loaded into memory • Would the CMS and audit databases benefit from existing in memory? s CMS DB would benefit from existing in memory s Audit DB was being extracted on a regular basis to be stored in HANA, so the copy in SQL Server was only for the most recent information s As a result, there is no real benefit from the audit DB being loaded into memory Database Investigation (cont.)
  • 30. 2929 • When SQL Server loads databases into memory it will still have to flush the information from memory to the data files on the system s Theory: When this happened for the audit DB, due to resource sharing, it caused delayed responses to CMS process requests „ This had a subsequent impact to the stability of the system, causing the breakdown in CORBA communication • Unfortunately, the logging available on SQL Server was not capable of showing us these flushes and whether or not they corresponded to the outages • Regardless of whether or not this is the root cause of the issue, the SQL Server database is evidently struggling for resources Database Investigation (cont.)
  • 31. 3030 • Three recommendations came out of this part of the investigation 1. Increase the RAM on the SQL Server database without increasing the SQL Server memory allocation „ Short-term solution: Can be completed quickly with VMWare and will give the operating system more resources 2. Create a SQL Server cluster to run the CMS and audit DBs, keeping them in a single clustered instance „ Not an option supplied by the infrastructure provider „ To have this added to the catalogue would have required a long project, at which point the hardware would still have to be ordered 3. Add a new database server to run the audit database „ Medium-term solution: Quicker than Option 2, but still longer than Option 1 Database Investigation (cont.)
  • 32. 3131 • Following Recommendation 1 resolved the CORBA communication break down s Chosen as the fastest route to resolution, while Recommendation 3 would be implemented further down the line • RAM was increased by 8GB • No changes were made to the SQL Server RAM usage Database Investigation (cont.)
  • 33. 3232 • While the database looks like the culprit, we still have the puzzle of the virus-like behaviour • The lower tiers were only impacted at Tomcat level, after Production Tomcats were already impacted • In Production there was a load balancer in front of the two Tomcats s This was the only tier where there was a load balancer • Our infrastructure provider suggested we test new settings directly in Production s Led to instability and a huge number of sessions in Tomcat Manager that started causing issues with the MDAS s As a result the decision was made to add a new load balancer in QUA to allow testing in lower tiers Web Application Server Investigation
  • 34. 3333 • The result of the increase in MDAS Tomcat sessions led to a discussion around the load balancer model s Our load balancer was not certified by SAP at this time s SAP and the load balancer vendor worked together to get the load balancer certified and provide best practice settings • Network monitoring was installed on several servers in all tiers to try to determine a link between the environments • Separate BI4 environments should not talk to each other unless directed to by an end user (e.g., Promotion Management). s We found that the Tomcat servers in affected lower tiers were trying to communicate at a CORBA level with Production during a login workflow, which should not happen „ Log files revealed that Tomcat was talking to several different environments Web Application Server Investigation (cont.)
  • 35. 3434 • Checking the log files in the lower tiers, we found the systems that suffered outages always seemed to have a reference to Production • Log files showed Tomcat reads the other clusters from the system Web Application Server Investigation (cont.)
  • 36. 3535 • A network monitoring tool was installed on several servers • Output of this tool shows lower tiers are communicating with the higher tiers Web Application Server Investigation (cont.)
  • 37. 3636 • Only place in the Tomcat setup that would identify other BI4 systems was the clusterinfo.1400.properties file s This is a file store in Tomcatlogs.businessobjects used to resolve a cluster name (@cluustername) when logging into a BI4 system s Typically contains info such as: „ @bi- prd=192.168.1.101:6400;server1.bibrainz.com:6400;192.168.1.102:6400;server2.bibr ainz.com:6400;36d51257319b49c13d1bb05f48608164 Web Application Server Investigation (cont.)
  • 38. 3737 • When the BOE web application then tried to run a ManagedService() call against PRD, the threads got stuck s There’s no way to restrict access through the application to this file s Any changes to the file would be performed by the Service Account, so the permissions couldn’t be locked down at OS level s Permissions would be updated any time a user tries to resolve a different system name in the application (e.g., Promotion Management, again) • Solution s Make the file read-only s After making this change the “virus-like” behaviour ceased „ If Production went down, only Production went down Web Application Server Investigation (cont.)
  • 39. 3838 • Monitoring is a big topic and one that comes up regularly at customers s SAP encouraged the implementation of SAP Solution Manager „ This was already in progress, however an error in the NCS instrumentation functionality within BI4 had other effects § Scheduled jobs would begin to fail regularly when the NCS instrumentation was set § The only way to resolve this was a restart after setting the instrumentation to 0 • At our customer there are various monitoring tools in place s Some from the infrastructure provider, some from support organisation, but none were particularly useful Other Areas Investigated
  • 41. 4040 • Two distinct issues s Production outage s Outages on the lower tiers (“virus-like” behaviour) • Production outage was caused by instabilities in the CMS s There were resource issues on the CMS and audit DB server and once this destabilised the CMS, the CORBA communication across the application failed „ Increasing the resources available on the database server resolved the instabilities s Outages in the lower tiers were caused by the Web Application server attempting to communicate with the Production system – got this information from the clusterinfo.1400.properties file „ Clearing the file out and only leaving the information that was needed in the file was the first step „ The file can then be set to read-only What Was the Cause of Instability?
  • 42. 4141 • Understanding the behaviour of the virtualisation software led to requesting dedicated hosts for the Production environment • While the vMotion functionality was not found to be a cause, it’s an on-going concern for the support organisation s Will do performance testing to investigate pros and cons of keeping this functionality active for the BI4 application • One recommendation from the database investigation was to add a separate audit DB server s While this was a medium-term solution, it would help separate some of the issues with the audit DB from the CMS • Investigation into the load balancer led to: s Realisation that the provided load balancer was not certified by SAP s SAP and the load balancer vendor working together to provide best practice configuration settings, which have improved the performance of the application Other Improvements
  • 43. 4242 • https://launchpad.support.sap.com/#/notes/1861180 s SAP Client Plug-In „ Requires login credentials to the SAP ONE Support Launchpad • https://sapinsider.wispubs.com/Assets/Q-and-As/2014/November/QA-with-Johann-Kottas-on-BI-Reporting- Performance s “What’s Slowing Down Your BI Reports? Q&A on End-to-End Analysis of SAP BusinessObjects BI Performance with Johann Kottas” (SAPinsider, November 2014). • https://blogs.sap.com/2016/07/12/solutions-to-optimise-the-performance-in-sap-businessobjects-reports- and-dashboard/ s “Solutions to Optimise the Performance in SAP BusinessObjects Reports and Dashboard” (SAP Community Blogs, July 2016). • https://wiki.scn.sap.com/wiki/display/BOBJ/Sizing+and+Deploying+SAP+BusinessObjects+BI+4.x+Platform +and+Add-Ons s “Sizing and Deploying SAP BusinessObjects BI 4.x Platform and Add-Ons” (SAP Community WIKI, April 2018). Where to Find More Information
  • 44. 4343 Key Points to Take Home „ Isolate the issues to determine where your problems are „ Understand your full environment: Application, infrastructure, web application, and database „ Every part matters – from the resources available to your database server to the way your application is used „ Understanding trace files can help pinpoint causes „ When investigating stability issues, you may stumble across improvements that can be applied to your environment
  • 45. 4444 Please remember to complete your session evaluation Thank You Any Questions? * t Martin Macmaster martin.macmaster@bibrainz.com @MartBIBrainz Your Turn!
  • 46. 45 SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP SE. 45 Disclaimer