Pascal benois performance_troubleshooting-spsbe18

  • 354 views
Uploaded on

Where to start? - the first 2 hours of performance troubleshooting …

Where to start? - the first 2 hours of performance troubleshooting
• The performance cheat sheet: cover all the basics before you start
• Data collections and mining the logs
• Common techniques to improve performance

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
354
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • When troubleshooting performance in a SharePoint 2010 environment, the following questions should be asked before you attempt to perform any in-depth analysis with the troubleshooting tools.Where is the bottleneck?You should identify where the bottleneck is occurring, is a single page, site or an entire Web application affected? Or is the issue sporadic, indicating a server issue or disk Sub -System issue? Once you have identified the scope of the bottleneck, you can start looking for patterns.Any strange patterns?Does the issue occur every day at a fixed time? Or is the issue completely intermittent? Does the issue only affect a subset of users? Once you know both the scope of the issue and any patterns in its occurrence, you can start looking for the potential cause.Any errors or unexpected status codes in any if the logs?This seems quite obvious, however performance problems, especially in SharePoint will often be masked by error messages that do not clearly explain why an entire site collection is slow, or why a single page is slow.Are there any customizations in place?As already covered, customizations are one of the key causes of performance issues in SharePoint. If customizations are in place, are other sites/pages with these customizations experiencing the same issues? What happens if you temporarily disable the customizations?Have any software boundaries and limits been breached?Large lists, content databases, or generally any breached software boundary should be immediate cause for investigation. Large lists and content databases in particular are known to cause performance problems for SharePoint. What happens if you edit list views or split content databases?What does analysis of common performance counters show?Are there any indicators of issues caused by CPU, memory, disk or network bottlenecks? If so, what appears to be causing these? What about SQL Server specific performance counters?
  • What: - We will first try to investigate what is the type of memory leak, is it a managed memory leak or an unmanaged memory leak.How: - What is really causing the memory leak. Is it the connection object, some kind of file who handle is not closed etc?Where: - Which function / routine or logic is causing the memory leak.
  • So the first thing we need to ensure what is the type of memory leak is it managed leak or unmanaged leak. In order to detect if it’s a managed leak or unmanaged leak we need to measure two performance counters. The first one is the private bytes counter for the application which we have already seen in the previous session. The second counter which we need to add is ‘Bytes in all heaps’. So select ‘.NET CLR memory’ in the performance object, from the counter list select ‘Bytes in all heaps’ and the select the application which has the memory leak.
  • So the first thing we need to ensure what is the type of memory leak is it managed leak or unmanaged leak. In order to detect if it’s a managed leak or unmanaged leak we need to measure two performance counters. The first one is the private bytes counter for the application which we have already seen in the previous session. The second counter which we need to add is ‘Bytes in all heaps’. So select ‘.NET CLR memory’ in the performance object, from the counter list select ‘Bytes in all heaps’ and the select the application which has the memory leak.
  • On a heavily accessed site, caching frequently accessed pages, objects in a page, and binary large objects for even a short amount of time can result in substantial throughput gains. For example, while a page is cached by the output cache, subsequent requests for that page are served from the output page without executing the code that created it, for the specified duration of the cache. Or in the case of binary large objects, when a request for a file that is not cached is handled by a front-end Web server, the disk-based cache gets the file from SQL Server, saves it to disk, and serves the file to the client that requested it. Future requests for the same file that are handled by that front-end Web server are then served from the file that is stored on the disk, instead of being served from SQL Server.A well-planned caching strategy increases performance and available capacity on given sites. However, careful planning and monitoring is required in order to tweak cache settings correctly.For example, you can use the Publishing Cache Hit Ratio performance counter to monitor the cache hit ratio. You should aim for 90% or above and raise the memory allocated to the object cache if it is not meeting the target. However, a site with a lot of read/write activity should expect to have a lower cache hit ratio.

Transcript

  • 1. #SPSBEPerformance Troubleshooting and Optimization #SPSBE18 Pascal Benois
  • 2. About me• Microsoft Premier Field Engineer• Into SharePoint for ages• Psychobilly enthousiast• Eddy Merckx fanatic
  • 3. A big thanks to our sponsorsPlatinum SponsorsGold Premium Sponsors Venue SponsorGold Sponsors
  • 4. Agenda• The first minutes• Common scenarios• Shoud I virtualize ?• HW considerations• SQL server considerations• Memory leaks for the admin (couldn’t prevent myself)• Caching
  • 5. Agenda• So…we are not gonna make it !
  • 6. QUESTIONS TO ASK• Where is a bottleneck? • Are all pages/sites/Web applications/servers affected?• Any strange patterns? • Is the issue intermittent? • Does the issue occur for a subset of users?• Any errors or unexpected status codes in any if the logs?• Are there any customizations in place?• Have any software boundaries and limits been breached?• What does analysis of common performance counters show?
  • 7. COMMON SCENARIOS – SLOW PAGE LOAD• Issue: A single page is always slow to load, no other pages in the site are slow• Likely causes: • Poor custom code/customizations • Page payload is large or has multiple round-trips • A custom Web part is performing badly • Operations involving large lists (most likely throttled) • Caching is not working correctly for content served on the page
  • 8. COMMON SCENARIOS – SLOW PAGE LOAD• Recommended tools: • Fiddler • IIS Logs and Log Parser • Usage Database • Developer Dashboard • SPDisposeCheck
  • 9. COMMON SCENARIOS – SLOW PAGE LOAD• Issue: Multiple pages are slow to load but the issue is intermittent• Likely causes: • Poor custom code/customizations • Page payload is large or has multiple round-trips • A custom Web part is performing badly • Operations involving large lists (most likely throttled) • Caching is not working correctly • Load balancer device incorrectly configured or a WFE is experiencing problems • Load on WFEs is too high (could be NIC, CPU, memory etc.)
  • 10. COMMON SCENARIOS – SLOW PAGE LOAD• Recommended tools: • Fiddler • IIS Logs and Log Parser • Usage Database • Developer Dashboard • SPDisposeCheck • Performance Monitor • PAL
  • 11. COMMON SCENARIOS – SLOW SITE• Issue: A single site is consistently slow• Likely causes: • Poor custom code/customizations • Page payload is large or has multiple round-trips • A custom Web part is performing badly • Caching is not working correctly
  • 12. COMMON SCENARIOS – MULTIPLE SLOW SITES• Issue: Multiple sites are consistently slow• Likely causes: • Poor customized codes • Web Application/Farm scoped customizations • Caching is not working correctly • SQL Server blocking due to large lists/databases • Load balancer device incorrectly configured or a WFE is experiencing problems • Load on WFEs is too high (could be NIC, CPU, memory etc.)
  • 13. WHICH SHAREPOINT ROLE SHOULD IVIRTUALIZE?
  • 14. WEB ROLE• Responsible for rendering of content• Low amount of disk activity• Multiple web role servers are common for redundancy and scalability• Best Practices • Be sure to keep all components, applications, and patch levels the same • Network Load Balancing (NLB) • Hardware -> Offload NLB to dedicated resources • Software -> CPU and Network usage on WFE • For minimum availability split your load balanced virtual web servers over two physical hosts
  • 15. QUERY ROLE• Process search queries• Requires propagated copy of the index • 10%- 30% of total size of documents indexed• Best Practice • Large Indexes – Prefer dedicated physical LUN on SAN over dynamic expanding virtual hard disk • Don’t put your query and index servers on the same underlying physical disk• Combine or split Web/Query role? • It depends on your environment. • Web and Query performance requirements
  • 16. INDEX ROLE• Memory, CPU, Disk I/O and network intensive• Best Practices • Give most amount of RAM out of front ends • Potentially keep as physical machine in larger environments • Use Index server to be dedicated crawl server. Avoids hop. • Use fixed-size VHDs or physical LUN on iSCSI SAN for best performance
  • 17. OTHER ROLES• Excel Services, PerformancePoint Services, Access Services, Visio Services, etc. are good candidates for virtualization• Additional servers can simply be added into the farm• No additional hardware investment required
  • 18. DATABASE ROLE• SQL Server 2005/ 2008 virtualization fully supported• Memory, CPU, Disk I/O and network intensive• Assess first using Microsoft Assessment and Planning Toolkit (www.microsoft.com/map).• SQL Alias flexibility• Argument for Physical: • SQL Server is already a consolidation layer • Disk I/O activity • Performance, performance, performance! • Longer response times impacts ALL downstream roles in a SharePoint farm
  • 19. DATABASE ROLE• If you decide to virtualize database layer: • Assign as much RAM and CPU as possible • Offload the Disk I/O from the virtual machines • Use fixed-size VHDs or physical LUN on an iSCSI SAN • SQL Clustering: When virtualizing, consider making use of Guest Clustering in Hyper-V • SQL Database Mirroring: Fully supported in SharePoint 2010 in physical or virtual database role environments
  • 20. CPU BEST PRACTICESPHYSICAL• Performance is governed by processor efficiency, power draw and heat output• Faster versus efficient processor – hidden power consumption cost• Beware of built in processor software such as performance throttle for thermal thresholds• Prefer higher number of processors and multi core• Prefer PCI Express to limit bus contention & CPU utilization
  • 21. CPU BEST PRACTICESVIRTUAL• Configure a 1-to-1 mapping of virtual CPU to physical CPU for best performance• Be aware of the virtual processor limit for different guest operating systems and plan accordingly• Beware of “CPU bound” issues, the ability of processors to process information for virtual devices will determine the maximum throughput of such a virtual device. Example: Virtual NICS
  • 22. DISK BEST PRACTICESPHYSICAL• Ensure you are using the fastest SAN infrastructure: Attempt to provide each virtual machine with its own IO channel to shared storage using dual or quad ported HBAs and Gigabit Ethernet adapters.• Use iSCSI SANs for if considering guest clustering• Ensure your disk infrastructure is as fast as it can be. (RAID 10; 15000 RPM) – Slow disk causes CPU contention as Disk I/O takes longer to return data.• Put virtual hard disks on different physical disks than the hard disk that the host operating system uses
  • 23. DISK BEST PRACTICESVIRTUAL• Prefer SCSI controller to IDE controller.• Prefer fixed size to dynamically expanding• Prefer direct iSCSI SAN access for disk-bound roles• Beware of underlying disk read write contention between different virtual machines to their virtual hard disks• Ensure SAN is configured and optimized for virtual disk storage. Understand that a number of LUNs can be provisioned on the same underlying physical disks
  • 24. NETWORK BEST PRACTICESPHYSICAL• Use Gigabit Ethernet adaptors and Gigabit switches• Increasing network capacity – Add a number of NICs to host.
  • 25. NETWORK BEST PRACTICESVIRTUAL• Ensure that integration components (“enlightenments”) are installed on the virtual machine• Use the Network Adapter instead of the Legacy Network Adapter when configuring networking for a virtual machine• Prefer synthetic to emulated drivers as they are more efficient, use a dedicated VMBus to communicate to the Virtual NIC and result in lower CPU and network latency.• Use virtual switches and VLAN tagging for security and performance improvement and create and internal network between virtual machines in your SharePoint farm. Associate SharePoint VMs to the same virtual switch.
  • 26. IMPORTANT• Understand the impact of your virtualization vendor feature set!• Don’t let governance slip in your virtualized SharePoint environment• Snapshots are not supported• Beware of over subscribing host servers• Do not exceed physical server RAM by more than 15% if using Hyper-V’s dynamic memory• Host is a single point of failure
  • 27. SQL SERVER CONFIGURATION• Little or no configuration of SQL Server is a common problem that causes performance issues• Optimize performance by: • Pre-growing data files • Setting growth factor to a fixed value not a percentage • Optimizing storage configuration and RAID levels for databases • Including the number of data files to allocate for tempdb and content databases • Providing a dedicated VLAN for SharePoint to SQL Server communications • Setting max degree of parallelism (MAXDOP) to 1 • Providing additional SQL Server instances or servers
  • 28. SQL SERVER MAINTENANCE• SharePoint databases require constant maintenance otherwise performance will degrade• Performance issues frequently arise due to: • Out-of-date statistics • Fragmented indices• There are Health Analyzer rules that are responsible for updating statistics and reorganizing or rebuilding indices • Ensure these are running frequently and set to repair automatically
  • 29. MAXDOP• SQL Server can utilize the amount of processors that are available to execute the queries in parallel• PG has tested a lot with variable settings and came to the conclusion to NOT to use MAXDOP is the most stable and performing way• To suppress parallel plan generation, set max degree of parallelism to 1
  • 30. MAXDOP
  • 31. AUTO_UPDATE_STATISTICS & AUTO_CREATE_STATISTICS• we recommended disabling AUTO_UPDATE_STATISTICS• In SharePoint 2010, both should be set to be disabled. For SharePoint 2007, it is recommended to have them both enabled.• Product team introduced a new timerjob called “Database statistics” which itself takes care in updating the statistics for the databases.
  • 32. MEASURING PERFORMANCE• What is deemed acceptable? • Are there any agreed upon metrics?• What are you trying to measure? • Common examples: • Requests per second (RPS) • Page load time – Time-to-Last-Byte (TTLB) • Measuring specific operations • Indexing performance• What are you hoping to prove?• Are there any agreed upon tools for measuring performance?
  • 33. KNOW THIS ONE ? •
  • 34. THE DETECTION • Avoid Task Manager • Track the private bytes • A steady increase in private bytes value that means a memory leak issue
  • 35. WHERE IS THE $*%µ& MEMORY LEAK? • • • Type Description Warning GdiPlus.dll is responsible for 399.54 KBytes worth of outstanding allocations. The following are the top 2 memory consuming functions: GdiPlus!GpMalloc+16: 399.54 KBytes worth of outstanding allocations.
  • 36. WHERE IS THE $*%µ& MEMORY LEAK? • Monitor memory (#bytes in all heaps !) • DisposeChecker again ? • Tweak ULS logs • DebugDiag reports • WinDbg, ADPlus and the SOS.DLL
  • 37. CACHING• A poor or no caching strategy may impact performance as usage increases• Caching will alleviate round-trips to SQL Server, increasing performance by allowing content to be rendered quickly• Three types of caches: • BLOB cache • Output cache • Object cache• Simply enabling caching is not enough, settings will need tweaking based on planning and monitoring
  • 38. BLOB CACHE• Tools • Fiddler/httpWatch • Procmon • Perfmon • DecodeBlob (2007)• Avoid flushing the cache at all costs • It causes performance issues due to write lock held during index writes• Limit the blob cache using more restrictive RegExp like: ((?<!_gif).gif|(?<!_jpg).jpg|(?<!_png).png|.css|.js)$ Which excludes specific image pattern *_gif.gif *_jpg.jpg *_png.png• Or regexp like [/]shared documents[/].+.(gif|jpg|png|css|js)$ to limit to a certain library or subweb or site collection
  • 39. BLOB CACHE• ULS logs • Enable Publishing Cache to Verbose • 2010 has improved logging• IIS logs with time taken and client side trace fiddler/http watch • Cache-Control: public, max-age=86400 • 304 responses with if-none-match and Etag headers • for streaming • Accept-Range: bytes in response • Content-Range: bytes headers in request
  • 40. We need your feedback! Scan this QR code or visit http://svy.mk/sps2012be Our sponsors: