• Save
Building Reports That Fly
Upcoming SlideShare
Loading in...5
×
 

Building Reports That Fly

on

  • 824 views

In Salesforce, the foundation of reporting is the retrieval of an organization's data. If your data is not retrieved efficiently, your reports can be incredibly slow and might time out, which will ...

In Salesforce, the foundation of reporting is the retrieval of an organization's data. If your data is not retrieved efficiently, your reports can be incredibly slow and might time out, which will frustrate your users. Join us to learn the techniques every expert architect should know in order to build reports that fly--and to ensure that your data is at your users fingertips. We'll spend the vast majority of our time covering specific performance-related techniques, such as using reporting governance, as well as what you need to know about the Force.com query optimizer to write reports that are both fast and selective. Note that this session assumes you already understand the basic principles behind building queries and reports.

Statistics

Views

Total Views
824
Views on SlideShare
821
Embed Views
3

Actions

Likes
5
Downloads
0
Comments
0

3 Embeds 3

https://twitter.com 1
http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • SPEAKER: So what are we going to cover in today’s session? Well, let’s say your company uses Salesforce apps or perhaps a custom Force.com app. <br />
  • SPEAKER: You are a developer, perhaps an architect, and you are in charge of maintaining the org. <br />
  • SPEAKER: The org has a bunch of reports that query data from objects with 100s of millions of records. And there are a bunch of reports created by users that are slow. So slow that users are loosing productivity. The users also make mistakes because the data returned by the reports is not always correct. <br />
  • SPEAKER: So your job is to make these reports faster and ensure their results can be trusted. <br />
  • SPEAKER: And that’s what we are going to cover today. We are going to take a look at the most common problems that cause slow and inaccurate reports, and then we’ll show you how to find and fix these problems in your org. Using the skills you learn today, you’ll be able to go back to work and make old, and new, reports fly with data your users can trust. <br />
  • SPEAKER: My name is Sean Regan, and I’m part of the Customer Centric Engineering group here at salesforce.com. I think of myself as an “architect evangelist”, someone that can help you learn more about how Salesforce works under the hood so that you can implement better solutions on our platform. <br />
  • SPEAKER: With me today is another member of our team, John Tan, and together, we’d like to discuss what everyone should know about building reports that fly. <br />
  • SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs. <br />
  • SPEAKER: Lets talk about governing your configuration access. <br />
  • SPEAKER: Just as pilots need governance to ensure they can successfully fly their airplanes, Salesforce users also require governance to successfully create reports that fly. If there were only 100 passengers in the world that flew, flights would be successful without air traffic control but as the number of passengers and flights increase, so to does the need for air traffic control. <br /> Out of the box, Salesforce allows all users to create and modify reports and list views to access data, and users belonging to smaller organizations can usually find what they need quickly and easily, even if they have only a basic understanding of the Salesforce platform. However, if users both belong to an organization with large data volume and a complex sharing architecture, and those users aren’t familiar with their organizations Salesforce data model or its data distribution, they might create inefficient queries that don’t return an accurate, appropriately focused, or quickly delivered data set. <br /> In other words, they might need help from an architect or developer like you. <br />
  • SPEAKER: We provide several way to govern users access to data on the platform. Large customers that have 10s or 100s of millions of records should govern their end users ability to create reports and list view. This ensures that they: <br /> can be architected and tuned to return results fast. <br /> return the correct data users are looking for. <br /> You can govern your users’ ability to create and manage reports and list views at the profile level by un-checking the Create and Customize Reports checkbox. <br />
  • SPEAKER: Governing end users ability to create reports and list views can have a very significant positive impact on your organization or if you are a partner, on your customers. <br /> By governing end users ability to create reports and list views, you can: <br /> Increase the user adoption rate by getting the right data to users fast. <br /> Increase employee productivity since users wont be waiting for slow reports to return data. <br /> Decrease employee mistakes since the data they are querying can be trusted to be accurate and complete. <br /> All of these benefits ultimate can save your organization money or if you are a partner they can save your customers money! <br />
  • SPEAKER: Now lets talk about governing your sharing access. <br />
  • SPEAKER: In large organizations that have a highly restrictive sharing model, implementing sharing can cause reports to use indexed sharing filters to return results extremely fast. If a user only has access to a small number of an object’s records and you configure that object’s organization-wide default as Private, we will use the sharing index filter to drive any queries on that object made by that user. <br /> Although it is important to understand that all sharing architectures that produce the same end result are not created equal from a performance perspective. If you are interested in seeing an example of this, be sure to attend our workshop. <br />
  • SPEAKER: What about a global company that has business units in each country with a relatively even data distribution across each of those business unit. In this case, this company’s data access requirements are not restrictive. Should they implement a sharing model to segregate data by country to improve report performance or simply filter the data on the reports themselves? <br /> Lets get a show of hands: How many of you think implementing the sharing model will improve performance of the application? How many think filtering on each of the reports will yield the best performance? <br /> In this case, sharing should not be implemented since the sharing filter would not be selective. Using an open sharing model eliminates the query optimizers need to check the sharing filter and also reduces the administrative burden with managing the sharing records. <br />
  • SPEAKER: Some architects think that creating a sharing model is a good way to filter records and this can certainly have a positive impact on performance when users query data however in many cases, the same selectivity could be achieved through indexed report filters. In these cases, sharing will add overhead to the queries and increase administrative burden. In most cases the overhead related to sharing will outweigh any performance gain. As a result, we generally recommend implementing your sharing model based on your organization’s requirements. <br />
  • SPEAKER: Now lets talk about governing your data architecture. <br />
  • SPEAKER: There are a lot of ways to do things. Some are certainly more efficient than others. <br /> Just as there are many routes that an airplane can take to get to an airport, there are many ways a report can return the exact same dataset and some are much faster than others. <br />
  • SPEAKER: Enterprise customers can sometimes have rather complex schemas. This is another great reason, end users’ ability to create reports and list views should be governed in these types of environments. However, it is critical for an administrator or architect to be intimately familiar with the schema to ensure reports are being created in an optimal manner. For example, reports should not perform aggregation on a field when a rollup summary field already exists that performs the same calculation. <br />
  • SPEAKER: I know most of you have heard it before but we encounter it time after time. If data is not required for transactional business, it should be archived or purged. There are many different strategies to keep this data out of your transactional dataset but still allow the data to be accessible. John will talk about aggregation later in the presentation which is one of these approaches. If you feel like you need assistance in this area, we have many great partners who can help architect a data management strategy to meet your specific requirements. <br />
  • SPEAKER: When architecting your object model for large data volume objects, it is important to consider both how its data will be used and its query performance. When dealing with 100s of millions or billions of records in an object, if query filters cross multiple objects, the joins can be very costly. It is extremely important to consider the performance impact of flattening your data architecture to ensure peak performance for data access. <br />
  • Now we will switch gears a bit and discuss frequency which is the part of your architecture that determines how users go about getting the data they require to perform their job. <br />
  • First, lets discuss an organization’s frequency of dashboard and report refreshes. <br />
  • SPEAKER: Reports, Dashboards and even Visualforce pages that are constantly refreshed, needlessly use resources that can impact your report’s performance as well as the performance of other operations in your org. <br />
  • SPEAKER: In Customer Centric Engineering, we see many of our customers who are pulling data in a very inefficient manner by driving workflow off of reports. The platform offers quite a few tools to implement a push based solution which would provide users with the same data in a much more efficient manner. For example, you can use workflow rules, scheduled reports and chatter feeds right out of the box. For more complex requirements, a Visualforce page can be built using our streaming API that will automatically refresh when the underlying data architecture changes. In many cases, these solutions can be much more efficient and return the exact same data to users. <br />
  • Sean covered the who, what and when but let’s talk about how. Specifically about ensuring your reports are as efficient as possible. <br />
  • To ensure that your reports are efficient, we’ll be covering: <br /> The role of the Force.com Query Optimizer and indexes for optimal report performance <br /> What it means to include at least 1 selective filter in your report <br /> How skinny tables may help after you’ve exhausted indexes <br /> Making your data lightweight <br />
  • Because of our multitenant data architecture, we developed our own query optimizer. It is the engine that sits between your reports/SOQL/listviews and our database. <br /> The query optimizer looks at the list of filters in the WHERE clause and runs inexpensive queries based on those filters to determine best filter (smallest data-set). <br /> It then chooses the option with the lowest estimated cost <br /> Based on this, it’ll then determine the best leading table/index to drive the query. <br />
  • If the query optimizer tries to drive a query with the most selective filter, your reports should include at least 1 selective filter. <br /> Let’s talk about how to do it. <br />
  • In order to have at least one selective filter we’ll need to do the following: <br /> Add filters to your reports to reduce scope <br /> Ensure the operators in these filters are efficient <br /> Ensure that the number of records returned by the filter meets a selectivity threshold <br /> If the filter meets the threshold and if it’s not indexed, create one <br /> Let’s look into each one of these areas in more detail <br />
  • Like overpacking for a trip, you’ll want to limit the amount of data retrieved. <br />
  • A good candidate for filters besides fields that are already indexed would be date fields. <br /> Date fields are examples of fields that are wide and have an even distribution. If the data values in your field is skewed to a very small number of records, it will be hard to meet the selectivity thresholds which we will discuss shortly. <br />
  • A common problem we see with bad filters are non-deterministic formula fields. Essentially, the data can change based on when you access it. A formula field that uses the TODAY() function is non-deterministic. <br /> The most common example is cross object references. The easiest way to spot these are to look for “__r”. <br /> For more info on formula fields, take a look at this blog on developerforce. <br />
  • So we’ve just talked about adding a filter to your report. But what about the operators for those filters? <br /> Well, there are 2 of them that keep your filter from being selective. <br />
  • For NOT or != conditions, due to the underlying database implementation, the query optimizer can’t use the index to drive the query. <br /> Includes the ‘excludes’ operator <br />
  • All database query optimizers have the same issue with leading % wildcard searches. <br /> If you need to do leading % wildcard searches, SOSL may be a better alternative. However, if you need real-time results, an alternative it to create a custom search page which restricts leading % wildcard searches and adds governance on the search string(s). <br />
  • The query optimizer has set thresholds for using an index to drive the query. If your filter returns more records than the threshold, the optimizer knows that it would not be efficient to use that index. <br />
  • For a custom index, the threshold is essentially 1/3 of a standard index. <br />
  • After you’ve determined that your filter would meet the selectivity threshold, create an index for that field if one doesn’t already exist. <br />
  • Although a custom index is created when you use unique or external id, we recommend working with support to create custom indexes. <br />
  • We’ve spoken about how the query optimizer tries to drive a query with a selective index. However, there may be times where indexes aren’t enough. <br /> After tuning with indexes is exhausted, skinny tables may help. <br />
  • Now that you’ve learned how to build efficient reports. You’ll want to maintain that efficiency. One of the best ways to do this is to keep your data lean. <br />
  • Speaker notes: <br /> When the Force.com query optimizer judges returned records against its threhsholds, all of the records that appear in the Recycle Bin or are marked for physical delete do still count against your total number of records. If you’re deleting a large number of records, work with customer support to physically delete the records. <br />
  • Allergan is a multi-specialty health care company focused on discovering, developing and commercializing innovative pharmaceuticals, biologics, medical devices and over-the-counter consumer products that enable people to live life to its greatest potential — to see more clearly, move more freely, express themselves more fully. <br />
  • Sales Rep perspective: Inline report for individual Physician is taking a long time to load <br />
  • Sales Rep perspective: Viewing Sales Data is not meaningful: too many rows, little relevance. Can’t find the information needed. <br />
  • Manager Perspective: Summary Dashboards keep spinning. <br /> And they wait, and they wait, and they wait. <br /> While they wait, let’s look at specifics of the dashboards, and let’s look at the storage. <br />
  • “I am afraid that I rather give myself away when I explain,” said he. “Results without causes are much more impressive.” <br /> The Stock-Broker’s Clerk, Sir Arthur Conan Doyle <br />
  • Let’s take a look at the timing out dashboards. <br />
  • Troubleshooting tool #1: Storage Usage <br /> Overall Data Storage is getting very high. <br /> There are only 6,000 accounts – how can 6,000 accounts eat up that much storage? <br /> Perhaps Daily Sales Data is too granular? <br />
  • Most impactful: data volumes. Only load absolutely necessary data; treat as operational data store. <br />
  • Ways to reduce: <br /> Reduce sampling frequency <br /> Summarize (in ETL layer, batch jobs, off-platform jobs, etc) <br /> Archive <br />
  • Common problem when moving summary calculations into the Integration Layer, is the lack of primary keys. Common solution is to create Compound Keys (logical Keys) that identify a piece of information uniquely, and are composed of multiple components. <br />
  • Troubleshooting tool #2: Workbench <br /> Use to profile data: distribution, anomalies, information sparsity/density <br />
  • Use to analyze total data set, including deleted records. Note that the number of records returned is much larger when including the deleted records for our use case. <br />
  • Existing data load scripts were deleting/reloading full sets of data nightly. We helped the customer retire the full reloads, and just do updates/inserts. Note: upserts are easier to implement, but are a bit slower. <br />
  • Challenge: large numbers of errors in data loads; long-running operations; timeouts. <br />
  • Error logs showed concurrency locking <br />
  • Caused by roll-up summary fields. Fixed by moving the calculations into the Integration Layer, and removing the roll-ups. <br />
  • After, the major re-work of the data structure, and integration, we turned to fine-tuning. <br />
  • People <br /> Defined who can create reports and views <br /> Ensured there is a “platform owner” <br />   <br /> Process <br /> Ensured performance testing of reports and views in Full sandboxes before promotion to Prod  <br /> Implemented complex sharing model, limiting visibility to just records required <br />
  • Demo <br />
  • And they fly! <br />
  • SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs. <br />

Building Reports That Fly Building Reports That Fly Presentation Transcript

  • Building Reports That Fly Sean Regan, salesforce.com, @SFDCSRegan John Tan, salesforce.com, @johntansfdc Irena Miziolek, NTT centerstance Jeannette Liu-Deza, NTT centerstance
  • Safe harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  • Your company uses Salesforce/Force.com … View slide
  • You are a developer or architect maintaining your org … View slide
  • Slow reports return unreliable data that hurts user productivity and leads to mistakes …
  • You need to make these reports faster and trusted …
  • Agenda - Building reports that fly! 1. Governance • Configuration Access • Sharing Access • Data Architecture 2. Frequency • Dashboard & Report Refreshes 3. Efficient Reports • Tuning Reports • Data Archiving/Aggregation 4. Customer Case Study
  • Sean Regan Architect Evangelist @SFDCSregan
  • John C. Tan Architect Evangelist @johntansfdc
  • View our content on developerforce http://developer.force.com/architect
  • Irena Miziolek & Jeannette Liu-Deza Technical & Solution Architects NTT Centerstance
  • How many of you have …? Run a Salesforce report … and waited forever?
  • 1. Governance
  • Configuration access Governance
  • Even pilots need help
  • Restrict report and list view creation
  • Report & list view governance helps you … • Increase user adoption • Increase employee productivity • Decrease employee mistakes • Save money
  • Sharing access Governance
  • Architect sharing based on business requirements
  • Global organization sharing
  • What’s the impact of sharing access?
  • Data architecture Governance
  • Is the correct end result really enough?
  • Understand your schema
  • When it isn’t necessary, get rid of it.
  • Architect your data model for performance
  • 2. Frequency
  • Dashboard/report refreshes Frequency
  • More is not always better
  • Push data to users: Force.com Streaming API Scheduled Reports Visualforce Pages Workflow Email Alerts Chatter Feeds
  • Key takeaways Governance ensures efficiency & accuracy Know your object model Push data to users to drive workflow Architect your sharing model based on business requirements
  • 3. Efficient reports
  • Efficient reports • Tuning reports • Data archiving/aggregation
  • Tuning reports Efficient reports
  • What is the Force.com query optimizer? Generates the most efficient query based on: Statistics Indexes / Skinny Tables Sharing
  • Best practice: One+ selective filter per report
  • Selective filters: Four components 1. Filters - add filters to reduce data 2. Operators - avoid 2 inefficient filter operators 3. Thresholds – ensure filter meets selectivity threshold 4. Index Creation - index the filter field
  • Filters reduce the scope of reports
  • Fields that often make good filters  Date fields  Picklists  Fields with wide and even distribution of values
  • Non-deterministic formula fields aren’t good filters Can’t index For example:  References related object  CASE(MyUser__r.UserType__c,1,”Gold”,”Silver”)  Create separate field and use trigger to populate Force.com SOQL Best Practices: Nulls and Formula Fields
  • Use selective operators in filters
  • Avoid negative operators Query for reciprocal values instead ✖ NOT EQUALS (“closed”) ✔ EQUALS (“open”, “in progress”)
  • Avoid filters with a leading % wildcard CONTAINS ‘%searchstring%’ ✖ CONTAINS (“district”) – equivalent to LIKE ‘%district%’ ✔ STARTS WITH (“district”) – equivalent to LIKE ‘district%’
  • Ensure filters meet selectivity thresholds
  • Standard index selectivity threshold A standard index is selective when it returns: < 30% of the first 1 million records < 15% of returned records after the first 1 million records No more than 1 million total records
  • Standard index selectivity threshold Selectivity threshold for Created Date index
  • Standard index selectivity threshold For 750,000 Account records < 30% of the first 1 million records 750,000 x .30 = 225,000
  • Standard index selectivity threshold For 3,500,000 Account records < 30% of the first 1 million records < 15% of returned records after the first 1 million records  (1,000,000 x .30) + (2,500,000 x .15) = 675,000
  • Standard index selectivity threshold Over 5,600,000 Account records No more than 1 million records 1,000,000
  • Custom index selectivity threshold A custom index is selective when it returns: < 10% of the first million records < 5% of returned records after the first million records No more than 333,333 records
  • Create indexes selective filter fields Trusted traveler program
  • Standard fields with indexes
  • Custom indexes • Discover common filter conditions • Determine selective fields in those conditions • Request custom indexes
  • A filter condition is selective when … … it uses an optimizable operator … it meets the selectivity threshold … selective fields have indexes
  • Skinny tables: Last resort for non-optimizable reports
  • Skinny table  Single Object. No cross-object joins  Maximum of 100 fields  Not aggregate/summary data. 1:1 recount between source and skinny  Skinny updated automatically  Minimal joins  salesforce.com will analyze and create
  • Tell me more about skinny tables … Webinar: Inside the Force.com Query Optimizer
  • Data archiving/aggregation Efficient reports
  • Data archiving Reduce the # of records the query optimizer needs to consider  Determine strategy during design phase  Move older records into a different object  Soft deletes still count towards record count
  • Data aggregation  Report on historical data  Save tabular or summary report to a custom object  Use Analytic Snapshots or Batch Apex  Gists: • Batch Apex Class - https://gist.github.com/johntansfdc/7044473 • Trigger - https://gist.github.com/johntansfdc/7044570
  • Key takeaways Reports should contain at least one selective filter A filter is selective if… the field is indexed the filter does not use an inefficient operator the filter meets the selectivity theshold Implement data archiving/aggregation strategies
  • Introducing customer case
  • Irena Miziolek & Jeannette Liu-Deza Technical & Solution Architects NTT Centerstance
  • Agenda What’s the problem How to troubleshoot What’s the solution
  • Client
  • Challenge Reports & dashboards timing out Errors in Data Loads Information Overload
  • In the beginning…. Note: no real data is used in this presentation
  • Approach to troubleshooting
  • Reports and dashboards are slow or timing out
  • Storage usage
  • Reduce data volume
  • Reduce data volume Batch jobs Data Sampling Frequency
  • Compound keys
  • Workbench
  • Workbench, Deleted records
  • Update instead of delete/reload Full Nightly Delete & Reload Update/Insert Upsert
  • Data loads failing
  • Error logs
  • Rollup summary fields
  • A little housekeeping
  • Report tuning
  • Governance
  • In the end….
  • Session summary
  • Building reports that fly • Govern users access to reports and data • Manage reporting frequency • Add at least one selective filter per report
  • Related DevZone hands-on, mini-workshop Wednesday 1:00-1:45 PM Thursday 1:00-1:45 PM