Follow the evidence: Troubleshooting Performance Issues

3,384 views

Published on

Are you hitting your Governor Limits? Is your system performance not up to expectations? Are you worried about your capacity to grow or merge multiple orgs? Then this session is for you. Join us as we line up the suspects, find out who's guilty, and how you can avoid being a victim in the closest thing to a murder-mystery at this year's DreamForce. We'll walk you through real situations, and most importantly, how we solved them.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,384
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
63
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Follow the evidence: Troubleshooting Performance Issues

  1. 1. Follow the evidence: Troubleshooting performance issues T.K. Horeis, salesforce.com, Cloud and Industry Architect @TKHoreis
  2. 2. Safe harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  3. 3. T.K. Horeis Cloud and Industry Architect @TKHoreis
  4. 4. Our customer, let’s call them Brand-X Tip: Don’t Rush To Judgement % 9 > >1 0%
  5. 5. They Were Simply Using Too Much Capacity Tip: Don’t Rush To Judgement
  6. 6. How Did We Know? Tip: it a Subm To Case er i rem t P ppor Su • Count • Long-running operations • Total Runtime • CPU time • Db CPU time • Buffer Gets
  7. 7. So We Recommended That They Needed To… 40% – 50% ▪ Reduce combined buffer gets and DB CPU by 20% - 25% ▪ Reduce combined runTime by 20%-25% ▪ Reduce combined App cpuTime by
  8. 8. Here’s What We Knew At The Time • Capacity usage was proportionally LARGE • The EU org will be merged into NA • EU users would use NA business processes only • Some EMEA users (Ireland) have already been moved over.
  9. 9. And this is where our investigation begins…
  10. 10. Lots Of Evidence At The Scene Of The Crime
  11. 11. Lots Of Evidence At The Scene Of The Crime
  12. 12. Lots Of Evidence At The Scene Of The Crime RE ST S AP O
  13. 13. Visual Force Page Loads
  14. 14. So Just What Are They Loading So Many Times? • • • • • • • • • • /apex/qbdialer__sid 1814897 /apex/questionpage 6314 /apex/oppinformationrequest 5871 /apex/merchantaddys 5636 /apex/dnralert 5611 /apex/redemptionaddy 5503 /apex/account_geocoder 5080 /apex/pastdeals 4589 /apex/inlineoptions 3986 /apex/accountgoogleanalytics 3444
  15. 15. And When We Dig Further… Where the median # of calls per day is ~1500. Were those top 4 users REALLY making hundreds of thousands of calls per day?
  16. 16. Not Exactly… An old version of Power Dialer From Inside Sales was the source of this problem. •Users that liberally use tabs can magnify browser issues • Polling sidebar component created an issue •Controller ran unnecessary queries • Vendor produce a patch •Browser issues can linger if users don’t restart Bottom Line: A 64% Reduction in VF loads
  17. 17. Lessons Learned ▪ It’s easier to find the culprit when you isolate the scenario. ▪ Don’t make assumptions. Let the data guide you. • Just because it’s an AppExchange app, doesn’t mean it’s correct. ▪ Trust, but verify
  18. 18. Reporting
  19. 19. Reporting Usage Is Off The Charts! •~50% of all db gets come from reports •Daily Report Stats: •~4000 reports being run daily •45K-60K report executions daily •10B-15B gets from reports •6 users were using 9-13% of their report gets daily. So Where Would You Start?
  20. 20. Why 4000 Unique Reports? Lack of Report Governance Anyone can create any report they want Run it as often as they want Many reports were are the same or almost the same Lather, Rinse, Repeat Problem with bad reports ➢ Restrict who can create reports ➢ Restrict reporting access to some objects. ➢ Establish process to collect reporting requests ➢ Create in demand reports, so they can be shared. ➢ Use scheduled reports and dynamic dashboards ➢ Training, Training, Training Bottom Line: A work in progress…
  21. 21. Why 45k-60k Report Executions Per Day? Six 6 users routinely executed a set of similar reports thousands of times / day. It turns out they were using a browser script to continually refresh. What Would You Do?
  22. 22. Possible Solutions Restrict Who Can Run Reports Customer indicated that they couldn’t do that Tell users to stop running the script This was done, but wasn’t immediately effective. Restrict Login Hours Customer wasn’t willing to do that just yet. Improve the report Done (see improvement in db gets) Implement workflows Tip: Planned What you see as the problem, could simply be another symptom. Bottom Line: A 34% Reduction in the Execs
  23. 23. Why 10-15B Db Gets Are Coming From Reports? Problems Solutions ❑ Hundreds of unselective reports that have > 1M gets/exec ➢Restrict Who Can Create Reports ❑ Many unselective reports that have 15M35M gets/exec ➢Your job doesn’t stop with implementing, you need to train them (cheatsheets, index lists, sample reports). ❑ Two key report profiles contribute ~30% of overall get count ➢Revisit the project to audit, profile, and optimize.
  24. 24. Document available indexes Primary keys • Id List of Indexed Fields • Name • OwnerId Foreign keys • • • • Lookup Master-detail CreatedById LastModifiedById Audit dates • • • • CreatedDate LastActivityDate LastModifiedDate SystemModstamp Custom fields • Unique • External ID
  25. 25. Create your own indexes
  26. 26. Need Additional Indexes? ▪ Salesforce support can create single and two-column custom indexes on most fields (with the exception of multi-picklists, formula fields that reference other objects, and Long Text Area/Rich Text Area) ▪ Open a case ➢ ➢ Bind Variables ➢ Org ID and user ID who would run the query ➢ ▪ Sample Query Suggested index – OPTIONAL If they create an index, then SAVE and DOCUMENT where the index was created
  27. 27. Understand SOQL query optimization Cheat Sheet: Selectivity Rules Standard indexes • Simple predicate targets < 20% of total records or 666K • AND predicate targets < 20% of total records • OR predicate targets < 10% or 333K of total records Custom indexes • Simple predicate targets < 10% of total records or 333K • AND predicate targets < 666K of total records • OR predicate targets < 10% or 333K of total records LIKE • Tests first 100K rows for selectivity Predicates that can lead to full table scans • NULL • Not equal • Contains • Not In • Does not contain • Leading wildcards • Formula fields
  28. 28. Potential Improvements Identified ❑Key Opportunity Report • 9-13% of total report gets/day ❑Custom Object Report • 0-16% of report buffer gets /day ❑Various Account Reports ❑Various Lead Reports ❑Various Opportunity Reports Can ’t In dex WHERE Opportunity.Writeup_Status_del__c = 'Needs Details‘ WHERE Opportunity.Writeup_Status_del__c = 'Needs DQ' • This filter field is a complex formula field. ✓ Create sister field using trigger. ✓ Index sister field. ✓ Modify dependent reports.
  29. 29. Potential Improvements Identified ❑ Key Opportunity Report • 9-13% of total report gets/day ❑ Custom Object Report • 0-16% of report buffer gets /day ❑ Various Account Reports ❑ Various Lead Reports ❑ Various Opportunity Reports WHERE Call_List_Priority__c IS [NOT] NULL OR Call_List_Priority__c < '. 00000000000000000201‘ • ‘<‘ and ‘>’ operators can’t optimize text fields ✓ Investigate process that populates this field ✓ Retype field as NUMBER ✓ Modify reports so < operator can be optimized ✓ Index field
  30. 30. Potential Improvements Identified ❑Key Opportunity Report • 9-13% of total report gets/day ❑Custom Object Report • 0-16% of report buffer gets /day ❑Various Account Reports ❑Various Lead Reports ❑Various Opportunity Reports • • • • • Using non-selective queries Customer has wide objects (i.e. Oppty) Determine appropriate selective field Index field Create Skinny Tables Bottom Line: A 98% Reduction in Db Gets
  31. 31. Lessons Learned ▪ Report on reports to isolate the problem • Cost of individual reports • Number of report runs per day • Non-selective queries • Problematic queries ▪ Report Governance • Who is the author of problematic reports? • Custom Report Types models that the masses can customize ▪ Is it really a reporting problem? • Data Model • Workflow, Business process, etc.
  32. 32. Migration
  33. 33. What’s Wrong With Our Migration? Customer expressed concerns because using bulk API uploaded insert of 17k accounts and it took 27 minutes with workflows, validations, triggers. What’s wrong here? They also requested that we turn code coverage to zero. Why?
  34. 34. What We Found Tip: Follow Best Practices. They’ ve been created for a reason. We suggested adjusting batch sizes on bulk loads from the default to a lower number. A value of 60 was found to be optimal. DupeBlocker was in use and the vendor, advises turning off triggers during migrations. They update a custom object record during account updates / inserts / deletes, which can cause significant contention problems that slow down bulk DML operations. The request to turn code coverage tests to 0% was related to poorly constructed tests that required triggers to be active. This is clearly at odds with Salesforce Best Practice. The below was provided as a workaround. if(custom_setting__c.getInstance().disable_all_triggers__c == true) return; Bottom Line: 66% Load Time Improvement
  35. 35. Disable actions that fire on insert Triggers Validation Rules Workflow Rules
  36. 36. Defer sharing calculations OR … … load with Public default sharing Just make sure you turn it back on again!!!
  37. 37. Prep your data to avoid overhead
  38. 38. What We Found Tip: Upon further investigation, we found: What’s the concern here? Trust, but verify.
  39. 39. Lessons Learned ▪ Trust, but verify • Initial reports of causality may be misleading • Follow the data to the cause • Non-selective queries • Problematic queries ▪ Follow Best Practices • Prep your data in advance for the best results • Understand how to structure your bulk operations • You may need to turn off sharing and other automatic functionality to improve performance.
  40. 40. Resources
  41. 41. Architect Core Resource page • Featured content for architects • Articles, papers, blog posts, events • Follow us on Twitter Updated weekly! http://developer.force.com/architect
  42. 42. Resources Apex Governor Limits http://www.salesforce.com/us/developer/docs/apexcode/Content/apex_gov_limits.htm Best Practices for Deployments with Large Data Volumes http://www.salesforce.com/us/developer/docs/ldv/salesforce_large_data_volumes_bp.pdf Loading Large Data Sets with the Force.com Bulk API http://wiki.developerforce.com/page/Loading_Large_Data_Sets_with_the_Force.com_Bulk_API Report Performance https://na1.salesforce.com/help/doc/en/salesforce_reportperformance_cheatsheet.pdf
  43. 43. Additional Resources Bulk API Developers Guide http://www.salesforce.com/us/developer/docs/api_asynch/api_bulk.pdf Bulk API Errors http://www.salesforce. com/us/developer/docs/api_asynch/Content/asynch_api_reference_errors.htm Batch Apex http://www.salesforce.com/us/developer/docs/apexcode/index_Left. htm#StartTopic=Content/apex_batch.htm Failing Safe with Apex Data Loader http://tedhusted.blogspot.com/2012/04/failing-safe-with-apex-data-loader-for.html Tools – Data Loader - http://wiki.developerforce.com/page/Data_Loader – Dell Boomi - http://www.boomi.com/ – IBM CastIron - http://ibm.co/PO0Qv8 – Informatica - http://bit.ly/OeRcCi
  44. 44. Additional Info
  45. 45. Record Lock Lifecycle ▪ Record Locks = Data Integrity ▪ Salesforce locks a record before executing a DML operation. - This is done before the starting the save process. ▪ - The save process is documented on this page: Triggers and Order of Execution. Records will remain locked until commit. ▪ Salesforce will wait up to 10 seconds to lock a record before throwing an UNABLE_TO_LOCK_ROW error. - Even if no errors occur, waiting for locks can significantly slow DML operations.
  46. 46. Parent-Child Relationships ▪ Insert of Contact requires locking the parent Account. ▪ Insert or Update of Event requires locking both the parent Contact and the parent Account. ▪ Insert or Update of Task requires locking both the parent Contact and parent Account, if the Task is marked as complete. ▪ Insert of Case requires locking the parent Contact and parent Account. ▪ In objects that are part of a master/detail relationship, updating a detail record requires locking the parent if roll-up summary fields exist.
  47. 47. Multi-threaded Operations ▪ Locking should be taken into consideration for API integrations, data loads, apex future methods, etc. where requests will be run in parallel. ▪ Ideally requests that run concurrently should not require the same locks. - Different requests can’t update same records - Different requests can’t update multiple children of the same parent
  48. 48. Prioritize the Data into Tiers Tier 1 Objects SFDC API Tier 2 Objects Tier 3: On Premise SOA *
  49. 49. Tier 1 : Normal Data ▪ Normal SFDC data ▪ List views, standard reporting, and search ▪ Data set should be ~< 10 Million records ▪ Can include snapshot summary of key data to facilitate query optimization
  50. 50. Tier 2 : Storage Objects ▪ Custom Objects in a read-only approximation ▪ No standard reporting and search ▪ Use Visualforce pages to make filtered search queries and limit view to users ▪ Data set should be ~< 50 Million records
  51. 51. Tier 3 : On - Premise Objects ▪ Stored in an on-premise/datawarehouse database ▪ Viewable only through mashups ▪ Integration processes move objects from SFDC to Tier 3, and from Tier 3 to other Tiers ▪ Part of a larger SOA framework ▪ Data set can be > 50 Million records
  52. 52. Denormalizing Data Increases Performance ▪Select name from contact where Account. SLA_Serial_Number__c IN :ListOfSerialNums ▪Solution ▪Copy SLA_Serial_Number to a field in Contact (don’t use a formula) and make the field an External Id ▪OR query for the account IDs with the SLA Serial number first
  53. 53. Batch Apex ▪ Running concurrent Batch Apex jobs on the same record set can lead to contention. - Concurrent jobs will likely perform DML that requires locks on the same records. ▪ One workaround is to parallelize jobs by using an Autonumber field and a Formula field using the MOD function on the Autonumber field. - This ensures that each job will operate on an entirely different record set. - Be careful with records that require locking their parent records. ▪ For example the child in a master-detail relationship.
  54. 54. Batch Apex – Avoid Errors in Bulk DML global void execute(Database.BatchableContext bc, List<Account> scope) { for(Account acc : scope) { /* do something */ } Database.SaveResult[] results = Database.update(scope, false); /* look at results */ } ▪ If an error occurs in this save operation, Salesforce will try to save it one record at a time. ▪ This performs significantly slower than a bulk save without errors.
  55. 55. The Key To Good Discovery Questions ▪ Don’t lead the witness. ▪ Confirm answers from multiple sources. ▪ Ask about median, peak, exceptional event volumes. ▪ When you think you’ve got it all, is there anything we’re forgetting? ▪ Don’t just think about the requirements for today. Ask about growth projections and make some yourself. ▪ Know your governor limits and bulk limits, etc.
  56. 56. Questions To Ask ▪ Stakeholder Expectations ▪ Data Access / Sharing – What are the performance criteria? – What is the likely growth in data volumes? – How long will migration/synchronization take? – Will the current plan support that growth? – Are there business-mandated update windows? – How can data updates be segmented? ▪ Reporting Requirements – How many rows are expected in results? – Do they need to view all results or is paging a possibility? – Understand Performance expectations. – Are Reports Segmented or Filtered? – Report Samples may drive additional questions. – If not, start planning NOW. – Dig deep on role hierarchy. How many levels? – Is there a large number of OR complex sharing rules? – How many records are owned by particular users? (max. vs. median) – Will they be using Territory Management? If so, how many levels?
  57. 57. Questions To Ask ▪ Rollback ▪ Data Integration – What happens if an integration load or synch fails? – What data will be native to SFDC vs. updated from an external master? – How long will rollback take? – Will changes be driven from SFDC or the external system? – What services are affected during a rollback? – What is the mitigation plan? – How frequently will it change? Update windows or immediately? – Do you need workflows or triggers? ▪ Purging / Archiving Data – Is some data transient? ▪ Replication – What data can be archived? – Do I need backups for compliance? – When can it be archived? – What is my backup strategy? – What are the restore/recover requirements? – When and how can I archive the different types of data in Salesforce? – What are the Compliance / legal requirements on access / availability?
  58. 58. Analyzing Performance – Developer Console ▪ Use the Developer Console to analyze server-side performance. ▪ Apex debug logs will be generated for every server request you perform while the window is open. ▪ Several tools are available in the Developer Console to find performance hotspots.
  59. 59. Leveraging Bulk API Best Practices ▪ Use Parallel Mode When possible. – See FAQ for scenarios when you would serial processing ▪ Organize Batches to Minimize Lock Contention ▪ Be Aware of Operations that Increase Lock Contention – – – – – Creating new Users Updating ownership of records with private sharing Updating user roles Updating territory hierarchies. SOLUTION: Create separate jobs to process data in serial mode. ▪ Minimize # of Fields – Foreign keys, lookup relationships, and summary fields are frequent culprits. ▪ Minimize or better yet Eliminate Workflow Actions. ▪ Minimize or Eliminate Triggers ▪ Optimize Batch Size – – – ▪ ▪ ▪ ▪ any batch that takes more than 10 minutes is suspended and returned to the queue for later processing. The best course of action is to submit batches that process in less than 10 minutes. Techniques Start with 5000 records and adjust the batch size based on processing time. If it takes more than five minutes to process a batch, it may be beneficial to reduce the batch size. If it takes a few seconds, the batch size should be increased. If you get a timeout error when processing a batch, split your batch into smaller batches, and try again. Defer complex sharing rules Speed of operation; Insert then Update then Upsert (involves implicit query) Group and sequence data to avoid parent record locking Remember database statistics calculate overnight, wait to do performance testing
  60. 60. Leveraging Batch Apex Best Practices ▪ Consider setting it to the maximum that the execute can support without running into governor limits. Default is 200. ▪ If you are operating on large volume of data, limit the Query Locator size and consider running concurrent Batch Apex jobs. ▪ Chain Batch Apex Jobs: Using Apex Scheduler In the finish method of the Batch Apex job, create an Apex Scheduler instance to run just once and schedule the next Batch Apex job. • Using Email Services Create an Apex class that implements Message.InboundEmailHandler interface, Configure Inbound Email Service Handler, In the finish method of the Batch Apex job, send an email to the Inbound Email Handler, In the Inbound Email Handler class, submit the next Batch Apex job.

×