Building Reports That Fly


Published on

In Salesforce, the foundation of reporting is the retrieval of an organization's data. If your data is not retrieved efficiently, your reports can be incredibly slow and might time out, which will frustrate your users. Join us to learn the techniques every expert architect should know in order to build reports that fly--and to ensure that your data is at your users fingertips. We'll spend the vast majority of our time covering specific performance-related techniques, such as using reporting governance, as well as what you need to know about the query optimizer to write reports that are both fast and selective. Note that this session assumes you already understand the basic principles behind building queries and reports.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • SPEAKER: So what are we going to cover in today’s session? Well, let’s say your company uses Salesforce apps or perhaps a custom app.
  • SPEAKER: You are a developer, perhaps an architect, and you are in charge of maintaining the org.
  • SPEAKER: The org has a bunch of reports that query data from objects with 100s of millions of records. And there are a bunch of reports created by users that are slow. So slow that users are loosing productivity. The users also make mistakes because the data returned by the reports is not always correct.
  • SPEAKER: So your job is to make these reports faster and ensure their results can be trusted.
  • SPEAKER: And that’s what we are going to cover today. We are going to take a look at the most common problems that cause slow and inaccurate reports, and then we’ll show you how to find and fix these problems in your org. Using the skills you learn today, you’ll be able to go back to work and make old, and new, reports fly with data your users can trust.
  • SPEAKER: My name is Sean Regan, and I’m part of the Customer Centric Engineering group here at I think of myself as an “architect evangelist”, someone that can help you learn more about how Salesforce works under the hood so that you can implement better solutions on our platform.
  • SPEAKER: With me today is another member of our team, John Tan, and together, we’d like to discuss what everyone should know about building reports that fly.
  • SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs.
  • SPEAKER: Lets talk about governing your configuration access.
  • SPEAKER: Just as pilots need governance to ensure they can successfully fly their airplanes, Salesforce users also require governance to successfully create reports that fly. If there were only 100 passengers in the world that flew, flights would be successful without air traffic control but as the number of passengers and flights increase, so to does the need for air traffic control.
    Out of the box, Salesforce allows all users to create and modify reports and list views to access data, and users belonging to smaller organizations can usually find what they need quickly and easily, even if they have only a basic understanding of the Salesforce platform. However, if users both belong to an organization with large data volume and a complex sharing architecture, and those users aren’t familiar with their organizations Salesforce data model or its data distribution, they might create inefficient queries that don’t return an accurate, appropriately focused, or quickly delivered data set.
    In other words, they might need help from an architect or developer like you.
  • SPEAKER: We provide several way to govern users access to data on the platform. Large customers that have 10s or 100s of millions of records should govern their end users ability to create reports and list view. This ensures that they:
    can be architected and tuned to return results fast.
    return the correct data users are looking for.
    You can govern your users’ ability to create and manage reports and list views at the profile level by un-checking the Create and Customize Reports checkbox.
  • SPEAKER: Governing end users ability to create reports and list views can have a very significant positive impact on your organization or if you are a partner, on your customers.
    By governing end users ability to create reports and list views, you can:
    Increase the user adoption rate by getting the right data to users fast.
    Increase employee productivity since users wont be waiting for slow reports to return data.
    Decrease employee mistakes since the data they are querying can be trusted to be accurate and complete.
    All of these benefits ultimate can save your organization money or if you are a partner they can save your customers money!
  • SPEAKER: Now lets talk about governing your sharing access.
  • SPEAKER: In large organizations that have a highly restrictive sharing model, implementing sharing can cause reports to use indexed sharing filters to return results extremely fast. If a user only has access to a small number of an object’s records and you configure that object’s organization-wide default as Private, we will use the sharing index filter to drive any queries on that object made by that user.
    Although it is important to understand that all sharing architectures that produce the same end result are not created equal from a performance perspective. If you are interested in seeing an example of this, be sure to attend our workshop.
  • SPEAKER: What about a global company that has business units in each country with a relatively even data distribution across each of those business unit. In this case, this company’s data access requirements are not restrictive. Should they implement a sharing model to segregate data by country to improve report performance or simply filter the data on the reports themselves?
    Lets get a show of hands: How many of you think implementing the sharing model will improve performance of the application? How many think filtering on each of the reports will yield the best performance?
    In this case, sharing should not be implemented since the sharing filter would not be selective. Using an open sharing model eliminates the query optimizers need to check the sharing filter and also reduces the administrative burden with managing the sharing records.
  • SPEAKER: Some architects think that creating a sharing model is a good way to filter records and this can certainly have a positive impact on performance when users query data however in many cases, the same selectivity could be achieved through indexed report filters. In these cases, sharing will add overhead to the queries and increase administrative burden. In most cases the overhead related to sharing will outweigh any performance gain. As a result, we generally recommend implementing your sharing model based on your organization’s requirements.
  • SPEAKER: Now lets talk about governing your data architecture.
  • SPEAKER: There are a lot of ways to do things. Some are certainly more efficient than others.
    Just as there are many routes that an airplane can take to get to an airport, there are many ways a report can return the exact same dataset and some are much faster than others.
  • SPEAKER: Enterprise customers can sometimes have rather complex schemas. This is another great reason, end users’ ability to create reports and list views should be governed in these types of environments. However, it is critical for an administrator or architect to be intimately familiar with the schema to ensure reports are being created in an optimal manner. For example, reports should not perform aggregation on a field when a rollup summary field already exists that performs the same calculation.
  • SPEAKER: I know most of you have heard it before but we encounter it time after time. If data is not required for transactional business, it should be archived or purged. There are many different strategies to keep this data out of your transactional dataset but still allow the data to be accessible. John will talk about aggregation later in the presentation which is one of these approaches. If you feel like you need assistance in this area, we have many great partners who can help architect a data management strategy to meet your specific requirements.
  • SPEAKER: When architecting your object model for large data volume objects, it is important to consider both how its data will be used and its query performance. When dealing with 100s of millions or billions of records in an object, if query filters cross multiple objects, the joins can be very costly. It is extremely important to consider the performance impact of flattening your data architecture to ensure peak performance for data access.
  • Now we will switch gears a bit and discuss frequency which is the part of your architecture that determines how users go about getting the data they require to perform their job.
  • First, lets discuss an organization’s frequency of dashboard and report refreshes.
  • SPEAKER: Reports, Dashboards and even Visualforce pages that are constantly refreshed, needlessly use resources that can impact your report’s performance as well as the performance of other operations in your org.
  • SPEAKER: In Customer Centric Engineering, we see many of our customers who are pulling data in a very inefficient manner by driving workflow off of reports. The platform offers quite a few tools to implement a push based solution which would provide users with the same data in a much more efficient manner. For example, you can use workflow rules, scheduled reports and chatter feeds right out of the box. For more complex requirements, a Visualforce page can be built using our streaming API that will automatically refresh when the underlying data architecture changes. In many cases, these solutions can be much more efficient and return the exact same data to users.
  • Sean covered the who, what and when but let’s talk about how. Specifically about ensuring your reports are as efficient as possible.
  • To ensure that your reports are efficient, we’ll be covering:
    The role of the Query Optimizer and indexes for optimal report performance
    What it means to include at least 1 selective filter in your report
    How skinny tables may help after you’ve exhausted indexes
    Making your data lightweight
  • Because of our multitenant data architecture, we developed our own query optimizer. It is the engine that sits between your reports/SOQL/listviews and our database.
    The query optimizer looks at the list of filters in the WHERE clause and runs inexpensive queries based on those filters to determine best filter (smallest data-set).
    It then chooses the option with the lowest estimated cost
    Based on this, it’ll then determine the best leading table/index to drive the query.
  • If the query optimizer tries to drive a query with the most selective filter, your reports should include at least 1 selective filter.
    Let’s talk about how to do it.
  • In order to have at least one selective filter we’ll need to do the following:
    Add filters to your reports to reduce scope
    Ensure the operators in these filters are efficient
    Ensure that the number of records returned by the filter meets a selectivity threshold
    If the filter meets the threshold and if it’s not indexed, create one
    Let’s look into each one of these areas in more detail
  • Like overpacking for a trip, you’ll want to limit the amount of data retrieved.
  • A good candidate for filters besides fields that are already indexed would be date fields.
    Date fields are examples of fields that are wide and have an even distribution. If the data values in your field is skewed to a very small number of records, it will be hard to meet the selectivity thresholds which we will discuss shortly.
  • A common problem we see with bad filters are non-deterministic formula fields. Essentially, the data can change based on when you access it. A formula field that uses the TODAY() function is non-deterministic.
    The most common example is cross object references. The easiest way to spot these are to look for “__r”.
    For more info on formula fields, take a look at this blog on developerforce.
  • So we’ve just talked about adding a filter to your report. But what about the operators for those filters?
    Well, there are 2 of them that keep your filter from being selective.
  • For NOT or != conditions, due to the underlying database implementation, the query optimizer can’t use the index to drive the query.
    Includes the ‘excludes’ operator
  • All database query optimizers have the same issue with leading % wildcard searches.
    If you need to do leading % wildcard searches, SOSL may be a better alternative. However, if you need real-time results, an alternative it to create a custom search page which restricts leading % wildcard searches and adds governance on the search string(s).
  • The query optimizer has set thresholds for using an index to drive the query. If your filter returns more records than the threshold, the optimizer knows that it would not be efficient to use that index.
  • For a custom index, the threshold is essentially 1/3 of a standard index.
  • After you’ve determined that your filter would meet the selectivity threshold, create an index for that field if one doesn’t already exist.
  • Although a custom index is created when you use unique or external id, we recommend working with support to create custom indexes.
  • We’ve spoken about how the query optimizer tries to drive a query with a selective index. However, there may be times where indexes aren’t enough.
    After tuning with indexes is exhausted, skinny tables may help.
  • Now that you’ve learned how to build efficient reports. You’ll want to maintain that efficiency. One of the best ways to do this is to keep your data lean.
  • Speaker notes:
    When the query optimizer judges returned records against its threhsholds, all of the records that appear in the Recycle Bin or are marked for physical delete do still count against your total number of records. If you’re deleting a large number of records, work with customer support to physically delete the records.
  • Allergan is a multi-specialty health care company focused on discovering, developing and commercializing innovative pharmaceuticals, biologics, medical devices and over-the-counter consumer products that enable people to live life to its greatest potential — to see more clearly, move more freely, express themselves more fully.
  • Sales Rep perspective: Inline report for individual Physician is taking a long time to load
  • Sales Rep perspective: Viewing Sales Data is not meaningful: too many rows, little relevance. Can’t find the information needed.
  • Manager Perspective: Summary Dashboards keep spinning.
    And they wait, and they wait, and they wait.
    While they wait, let’s look at specifics of the dashboards, and let’s look at the storage.
  • “I am afraid that I rather give myself away when I explain,” said he. “Results without causes are much more impressive.”
    The Stock-Broker’s Clerk, Sir Arthur Conan Doyle
  • Let’s take a look at the timing out dashboards.
  • Troubleshooting tool #1: Storage Usage
    Overall Data Storage is getting very high.
    There are only 6,000 accounts – how can 6,000 accounts eat up that much storage?
    Perhaps Daily Sales Data is too granular?
  • Most impactful: data volumes. Only load absolutely necessary data; treat as operational data store.
  • Ways to reduce:
    Reduce sampling frequency
    Summarize (in ETL layer, batch jobs, off-platform jobs, etc)
  • Common problem when moving summary calculations into the Integration Layer, is the lack of primary keys. Common solution is to create Compound Keys (logical Keys) that identify a piece of information uniquely, and are composed of multiple components.
  • Troubleshooting tool #2: Workbench
    Use to profile data: distribution, anomalies, information sparsity/density
  • Use to analyze total data set, including deleted records. Note that the number of records returned is much larger when including the deleted records for our use case.
  • Existing data load scripts were deleting/reloading full sets of data nightly. We helped the customer retire the full reloads, and just do updates/inserts. Note: upserts are easier to implement, but are a bit slower.
  • Challenge: large numbers of errors in data loads; long-running operations; timeouts.
  • Error logs showed concurrency locking
  • Caused by roll-up summary fields. Fixed by moving the calculations into the Integration Layer, and removing the roll-ups.
  • After, the major re-work of the data structure, and integration, we turned to fine-tuning.
  • People
    Defined who can create reports and views
    Ensured there is a “platform owner”
    Ensured performance testing of reports and views in Full sandboxes before promotion to Prod 
    Implemented complex sharing model, limiting visibility to just records required
  • Demo
  • And they fly!
  • SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs.
  • Building Reports That Fly

    1. 1. Building Reports That Fly Sean Regan,, @SFDCSRegan John Tan,, @johntansfdc Irena Miziolek, NTT centerstance Jeannette Liu-Deza, NTT centerstance
    2. 2. Safe harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available., inc. assumes no obligation and does not intend to update these forward-looking statements.
    3. 3. Your company uses Salesforce/ …
    4. 4. You are a developer or architect maintaining your org …
    5. 5. Slow reports return unreliable data that hurts user productivity and leads to mistakes …
    6. 6. You need to make these reports faster and trusted …
    7. 7. Agenda - Building reports that fly! 1. Governance • Configuration Access • Sharing Access • Data Architecture 2. Frequency • Dashboard & Report Refreshes 3. Efficient Reports • Tuning Reports • Data Archiving/Aggregation 4. Customer Case Study
    8. 8. Sean Regan Architect Evangelist @SFDCSregan
    9. 9. John C. Tan Architect Evangelist @johntansfdc
    10. 10. View our content on developerforce
    11. 11. Irena Miziolek & Jeannette Liu-Deza Technical & Solution Architects NTT Centerstance
    12. 12. How many of you have …? Run a Salesforce report … and waited forever?
    13. 13. 1. Governance
    14. 14. Configuration access Governance
    15. 15. Even pilots need help
    16. 16. Restrict report and list view creation
    17. 17. Report & list view governance helps you … • Increase user adoption • Increase employee productivity • Decrease employee mistakes • Save money
    18. 18. Sharing access Governance
    19. 19. Architect sharing based on business requirements
    20. 20. Global organization sharing
    21. 21. What’s the impact of sharing access?
    22. 22. Data architecture Governance
    23. 23. Is the correct end result really enough?
    24. 24. Understand your schema
    25. 25. When it isn’t necessary, get rid of it.
    26. 26. Architect your data model for performance
    27. 27. 2. Frequency
    28. 28. Dashboard/report refreshes Frequency
    29. 29. More is not always better
    30. 30. Push data to users: Streaming API Scheduled Reports Visualforce Pages Workflow Email Alerts Chatter Feeds
    31. 31. Key takeaways Governance ensures efficiency & accuracy Know your object model Push data to users to drive workflow Architect your sharing model based on business requirements
    32. 32. 3. Efficient reports
    33. 33. Efficient reports • Tuning reports • Data archiving/aggregation
    34. 34. Tuning reports Efficient reports
    35. 35. What is the query optimizer? Generates the most efficient query based on: Statistics Indexes / Skinny Tables Sharing
    36. 36. Best practice: One+ selective filter per report
    37. 37. Selective filters: Four components 1. Filters - add filters to reduce data 2. Operators - avoid 2 inefficient filter operators 3. Thresholds – ensure filter meets selectivity threshold 4. Index Creation - index the filter field
    38. 38. Filters reduce the scope of reports
    39. 39. Fields that often make good filters  Date fields  Picklists  Fields with wide and even distribution of values
    40. 40. Non-deterministic formula fields aren’t good filters Can’t index For example:  References related object  CASE(MyUser__r.UserType__c,1,”Gold”,”Silver”)  Create separate field and use trigger to populate SOQL Best Practices: Nulls and Formula Fields
    41. 41. Use selective operators in filters
    42. 42. Avoid negative operators Query for reciprocal values instead ✖ NOT EQUALS (“closed”) ✔ EQUALS (“open”, “in progress”)
    43. 43. Avoid filters with a leading % wildcard CONTAINS ‘%searchstring%’ ✖ CONTAINS (“district”) – equivalent to LIKE ‘%district%’ ✔ STARTS WITH (“district”) – equivalent to LIKE ‘district%’
    44. 44. Ensure filters meet selectivity thresholds
    45. 45. Standard index selectivity threshold A standard index is selective when it returns: < 30% of the first 1 million records < 15% of returned records after the first 1 million records No more than 1 million total records
    46. 46. Standard index selectivity threshold Selectivity threshold for Created Date index
    47. 47. Standard index selectivity threshold For 750,000 Account records < 30% of the first 1 million records 750,000 x .30 = 225,000
    48. 48. Standard index selectivity threshold For 3,500,000 Account records < 30% of the first 1 million records < 15% of returned records after the first 1 million records  (1,000,000 x .30) + (2,500,000 x .15) = 675,000
    49. 49. Standard index selectivity threshold Over 5,600,000 Account records No more than 1 million records 1,000,000
    50. 50. Custom index selectivity threshold A custom index is selective when it returns: < 10% of the first million records < 5% of returned records after the first million records No more than 333,333 records
    51. 51. Create indexes selective filter fields Trusted traveler program
    52. 52. Standard fields with indexes
    53. 53. Custom indexes • Discover common filter conditions • Determine selective fields in those conditions • Request custom indexes
    54. 54. A filter condition is selective when … … it uses an optimizable operator … it meets the selectivity threshold … selective fields have indexes
    55. 55. Skinny tables: Last resort for non-optimizable reports
    56. 56. Skinny table  Single Object. No cross-object joins  Maximum of 100 fields  Not aggregate/summary data. 1:1 recount between source and skinny  Skinny updated automatically  Minimal joins  will analyze and create
    57. 57. Tell me more about skinny tables … Webinar: Inside the Query Optimizer
    58. 58. Data archiving/aggregation Efficient reports
    59. 59. Data archiving Reduce the # of records the query optimizer needs to consider  Determine strategy during design phase  Move older records into a different object  Soft deletes still count towards record count
    60. 60. Data aggregation  Report on historical data  Save tabular or summary report to a custom object  Use Analytic Snapshots or Batch Apex  Gists: • Batch Apex Class - • Trigger -
    61. 61. Key takeaways Reports should contain at least one selective filter A filter is selective if… the field is indexed the filter does not use an inefficient operator the filter meets the selectivity theshold Implement data archiving/aggregation strategies
    62. 62. Introducing customer case
    63. 63. Irena Miziolek & Jeannette Liu-Deza Technical & Solution Architects NTT Centerstance
    64. 64. Agenda What’s the problem How to troubleshoot What’s the solution
    65. 65. Client
    66. 66. Challenge Reports & dashboards timing out Errors in Data Loads Information Overload
    67. 67. In the beginning…. Note: no real data is used in this presentation
    68. 68. Approach to troubleshooting
    69. 69. Reports and dashboards are slow or timing out
    70. 70. Storage usage
    71. 71. Reduce data volume
    72. 72. Reduce data volume Batch jobs Data Sampling Frequency
    73. 73. Compound keys
    74. 74. Workbench
    75. 75. Workbench, Deleted records
    76. 76. Update instead of delete/reload Full Nightly Delete & Reload Update/Insert Upsert
    77. 77. Data loads failing
    78. 78. Error logs
    79. 79. Rollup summary fields
    80. 80. A little housekeeping
    81. 81. Report tuning
    82. 82. Governance
    83. 83. In the end….
    84. 84. Session summary
    85. 85. Building reports that fly • Govern users access to reports and data • Manage reporting frequency • Add at least one selective filter per report
    86. 86. Related DevZone hands-on, mini-workshop Wednesday 1:00-1:45 PM Thursday 1:00-1:45 PM