In Salesforce, the foundation of reporting is the retrieval of an organization's data. If your data is not retrieved efficiently, your reports can be incredibly slow and might time out, which will frustrate your users. Join us to learn the techniques every expert architect should know in order to build reports that fly--and to ensure that your data is at your users fingertips. We'll spend the vast majority of our time covering specific performance-related techniques, such as using reporting governance, as well as what you need to know about the Force.com query optimizer to write reports that are both fast and selective. Note that this session assumes you already understand the basic principles behind building queries and reports.
08448380779 Call Girls In Friends Colony Women Seeking Men
Building Reports That Fly
1. Building Reports That Fly
Sean Regan, salesforce.com, @SFDCSRegan
John Tan, salesforce.com, @johntansfdc
Irena Miziolek, NTT centerstance
Jeannette Liu-Deza, NTT centerstance
2. Safe harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of
our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions
based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these
forward-looking statements.
30. Push data to users: Force.com Streaming API
Scheduled Reports
Visualforce Pages
Workflow Email Alerts
Chatter Feeds
31. Key takeaways
Governance ensures efficiency & accuracy
Know your object model
Push data to users to drive workflow
Architect your sharing model based on business requirements
39. Fields that often make good filters
Date fields
Picklists
Fields with wide and even distribution of values
40. Non-deterministic formula fields aren’t good filters
Can’t index
For example:
References related object
CASE(MyUser__r.UserType__c,1,”Gold”,”Silver”)
Create separate field and use trigger to populate
Force.com SOQL Best Practices: Nulls and Formula Fields
43. Avoid filters with a leading % wildcard
CONTAINS ‘%searchstring%’
✖ CONTAINS (“district”) – equivalent to LIKE ‘%district%’
✔ STARTS WITH (“district”) – equivalent to LIKE ‘district%’
45. Standard index selectivity threshold
A standard index is selective when it returns:
< 30% of the first 1 million records
< 15% of returned records after the first 1 million records
No more than 1 million total records
47. Standard index selectivity threshold
For 750,000 Account records
< 30% of the first 1 million records
750,000 x .30 = 225,000
48. Standard index selectivity threshold
For 3,500,000 Account records
< 30% of the first 1 million records
< 15% of returned records after the first 1 million records
(1,000,000 x .30) + (2,500,000 x .15) = 675,000
49. Standard index selectivity threshold
Over 5,600,000 Account records
No more than 1 million records
1,000,000
50. Custom index selectivity threshold
A custom index is selective when it returns:
< 10% of the first million records
< 5% of returned records after the first million records
No more than 333,333 records
56. Skinny table
Single Object. No cross-object joins
Maximum of 100 fields
Not aggregate/summary data. 1:1 recount between source
and skinny
Skinny updated automatically
Minimal joins
salesforce.com will analyze and create
57. Tell me more about skinny tables …
Webinar: Inside the Force.com Query Optimizer
59. Data archiving
Reduce the # of records the query optimizer needs to consider
Determine strategy during design phase
Move older records into a different object
Soft deletes still count towards record count
60. Data aggregation
Report on historical data
Save tabular or summary report to a custom object
Use Analytic Snapshots or Batch Apex
Gists:
• Batch Apex Class - https://gist.github.com/johntansfdc/7044473
• Trigger - https://gist.github.com/johntansfdc/7044570
61. Key takeaways
Reports should contain at least one selective filter
A filter is selective if…
the field is indexed
the filter does not use an inefficient operator
the filter meets the selectivity theshold
Implement data archiving/aggregation strategies
SPEAKER: So what are we going to cover in today’s session? Well, let’s say your company uses Salesforce apps or perhaps a custom Force.com app.
SPEAKER: You are a developer, perhaps an architect, and you are in charge of maintaining the org.
SPEAKER: The org has a bunch of reports that query data from objects with 100s of millions of records. And there are a bunch of reports created by users that are slow. So slow that users are loosing productivity. The users also make mistakes because the data returned by the reports is not always correct.
SPEAKER: So your job is to make these reports faster and ensure their results can be trusted.
SPEAKER: And that’s what we are going to cover today. We are going to take a look at the most common problems that cause slow and inaccurate reports, and then we’ll show you how to find and fix these problems in your org. Using the skills you learn today, you’ll be able to go back to work and make old, and new, reports fly with data your users can trust.
SPEAKER: My name is Sean Regan, and I’m part of the Customer Centric Engineering group here at salesforce.com. I think of myself as an “architect evangelist”, someone that can help you learn more about how Salesforce works under the hood so that you can implement better solutions on our platform.
SPEAKER: With me today is another member of our team, John Tan, and together, we’d like to discuss what everyone should know about building reports that fly.
SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs.
SPEAKER: Lets talk about governing your configuration access.
SPEAKER: Just as pilots need governance to ensure they can successfully fly their airplanes, Salesforce users also require governance to successfully create reports that fly. If there were only 100 passengers in the world that flew, flights would be successful without air traffic control but as the number of passengers and flights increase, so to does the need for air traffic control.
Out of the box, Salesforce allows all users to create and modify reports and list views to access data, and users belonging to smaller organizations can usually find what they need quickly and easily, even if they have only a basic understanding of the Salesforce platform. However, if users both belong to an organization with large data volume and a complex sharing architecture, and those users aren’t familiar with their organizations Salesforce data model or its data distribution, they might create inefficient queries that don’t return an accurate, appropriately focused, or quickly delivered data set.
In other words, they might need help from an architect or developer like you.
SPEAKER: We provide several way to govern users access to data on the platform. Large customers that have 10s or 100s of millions of records should govern their end users ability to create reports and list view. This ensures that they:
can be architected and tuned to return results fast.
return the correct data users are looking for.
You can govern your users’ ability to create and manage reports and list views at the profile level by un-checking the Create and Customize Reports checkbox.
SPEAKER: Governing end users ability to create reports and list views can have a very significant positive impact on your organization or if you are a partner, on your customers.
By governing end users ability to create reports and list views, you can:
Increase the user adoption rate by getting the right data to users fast.
Increase employee productivity since users wont be waiting for slow reports to return data.
Decrease employee mistakes since the data they are querying can be trusted to be accurate and complete.
All of these benefits ultimate can save your organization money or if you are a partner they can save your customers money!
SPEAKER: Now lets talk about governing your sharing access.
SPEAKER: In large organizations that have a highly restrictive sharing model, implementing sharing can cause reports to use indexed sharing filters to return results extremely fast. If a user only has access to a small number of an object’s records and you configure that object’s organization-wide default as Private, we will use the sharing index filter to drive any queries on that object made by that user.
Although it is important to understand that all sharing architectures that produce the same end result are not created equal from a performance perspective. If you are interested in seeing an example of this, be sure to attend our workshop.
SPEAKER: What about a global company that has business units in each country with a relatively even data distribution across each of those business unit. In this case, this company’s data access requirements are not restrictive. Should they implement a sharing model to segregate data by country to improve report performance or simply filter the data on the reports themselves?
Lets get a show of hands: How many of you think implementing the sharing model will improve performance of the application? How many think filtering on each of the reports will yield the best performance?
In this case, sharing should not be implemented since the sharing filter would not be selective. Using an open sharing model eliminates the query optimizers need to check the sharing filter and also reduces the administrative burden with managing the sharing records.
SPEAKER: Some architects think that creating a sharing model is a good way to filter records and this can certainly have a positive impact on performance when users query data however in many cases, the same selectivity could be achieved through indexed report filters. In these cases, sharing will add overhead to the queries and increase administrative burden. In most cases the overhead related to sharing will outweigh any performance gain. As a result, we generally recommend implementing your sharing model based on your organization’s requirements.
SPEAKER: Now lets talk about governing your data architecture.
SPEAKER: There are a lot of ways to do things. Some are certainly more efficient than others.
Just as there are many routes that an airplane can take to get to an airport, there are many ways a report can return the exact same dataset and some are much faster than others.
SPEAKER: Enterprise customers can sometimes have rather complex schemas. This is another great reason, end users’ ability to create reports and list views should be governed in these types of environments. However, it is critical for an administrator or architect to be intimately familiar with the schema to ensure reports are being created in an optimal manner. For example, reports should not perform aggregation on a field when a rollup summary field already exists that performs the same calculation.
SPEAKER: I know most of you have heard it before but we encounter it time after time. If data is not required for transactional business, it should be archived or purged. There are many different strategies to keep this data out of your transactional dataset but still allow the data to be accessible. John will talk about aggregation later in the presentation which is one of these approaches. If you feel like you need assistance in this area, we have many great partners who can help architect a data management strategy to meet your specific requirements.
SPEAKER: When architecting your object model for large data volume objects, it is important to consider both how its data will be used and its query performance. When dealing with 100s of millions or billions of records in an object, if query filters cross multiple objects, the joins can be very costly. It is extremely important to consider the performance impact of flattening your data architecture to ensure peak performance for data access.
Now we will switch gears a bit and discuss frequency which is the part of your architecture that determines how users go about getting the data they require to perform their job.
First, lets discuss an organization’s frequency of dashboard and report refreshes.
SPEAKER: Reports, Dashboards and even Visualforce pages that are constantly refreshed, needlessly use resources that can impact your report’s performance as well as the performance of other operations in your org.
SPEAKER: In Customer Centric Engineering, we see many of our customers who are pulling data in a very inefficient manner by driving workflow off of reports. The platform offers quite a few tools to implement a push based solution which would provide users with the same data in a much more efficient manner. For example, you can use workflow rules, scheduled reports and chatter feeds right out of the box. For more complex requirements, a Visualforce page can be built using our streaming API that will automatically refresh when the underlying data architecture changes. In many cases, these solutions can be much more efficient and return the exact same data to users.
Sean covered the who, what and when but let’s talk about how. Specifically about ensuring your reports are as efficient as possible.
To ensure that your reports are efficient, we’ll be covering:
The role of the Force.com Query Optimizer and indexes for optimal report performance
What it means to include at least 1 selective filter in your report
How skinny tables may help after you’ve exhausted indexes
Making your data lightweight
Because of our multitenant data architecture, we developed our own query optimizer. It is the engine that sits between your reports/SOQL/listviews and our database.
The query optimizer looks at the list of filters in the WHERE clause and runs inexpensive queries based on those filters to determine best filter (smallest data-set).
It then chooses the option with the lowest estimated cost
Based on this, it’ll then determine the best leading table/index to drive the query.
If the query optimizer tries to drive a query with the most selective filter, your reports should include at least 1 selective filter.
Let’s talk about how to do it.
In order to have at least one selective filter we’ll need to do the following:
Add filters to your reports to reduce scope
Ensure the operators in these filters are efficient
Ensure that the number of records returned by the filter meets a selectivity threshold
If the filter meets the threshold and if it’s not indexed, create one
Let’s look into each one of these areas in more detail
Like overpacking for a trip, you’ll want to limit the amount of data retrieved.
A good candidate for filters besides fields that are already indexed would be date fields.
Date fields are examples of fields that are wide and have an even distribution. If the data values in your field is skewed to a very small number of records, it will be hard to meet the selectivity thresholds which we will discuss shortly.
A common problem we see with bad filters are non-deterministic formula fields. Essentially, the data can change based on when you access it. A formula field that uses the TODAY() function is non-deterministic.
The most common example is cross object references. The easiest way to spot these are to look for “__r”.
For more info on formula fields, take a look at this blog on developerforce.
So we’ve just talked about adding a filter to your report. But what about the operators for those filters?
Well, there are 2 of them that keep your filter from being selective.
For NOT or != conditions, due to the underlying database implementation, the query optimizer can’t use the index to drive the query.
Includes the ‘excludes’ operator
All database query optimizers have the same issue with leading % wildcard searches.
If you need to do leading % wildcard searches, SOSL may be a better alternative. However, if you need real-time results, an alternative it to create a custom search page which restricts leading % wildcard searches and adds governance on the search string(s).
The query optimizer has set thresholds for using an index to drive the query. If your filter returns more records than the threshold, the optimizer knows that it would not be efficient to use that index.
For a custom index, the threshold is essentially 1/3 of a standard index.
After you’ve determined that your filter would meet the selectivity threshold, create an index for that field if one doesn’t already exist.
Although a custom index is created when you use unique or external id, we recommend working with support to create custom indexes.
We’ve spoken about how the query optimizer tries to drive a query with a selective index. However, there may be times where indexes aren’t enough.
After tuning with indexes is exhausted, skinny tables may help.
Now that you’ve learned how to build efficient reports. You’ll want to maintain that efficiency. One of the best ways to do this is to keep your data lean.
Speaker notes:
When the Force.com query optimizer judges returned records against its threhsholds, all of the records that appear in the Recycle Bin or are marked for physical delete do still count against your total number of records. If you’re deleting a large number of records, work with customer support to physically delete the records.
Allergan is a multi-specialty health care company focused on discovering, developing and commercializing innovative pharmaceuticals, biologics, medical devices and over-the-counter consumer products that enable people to live life to its greatest potential — to see more clearly, move more freely, express themselves more fully.
Sales Rep perspective: Inline report for individual Physician is taking a long time to load
Sales Rep perspective: Viewing Sales Data is not meaningful: too many rows, little relevance. Can’t find the information needed.
Manager Perspective: Summary Dashboards keep spinning.
And they wait, and they wait, and they wait.
While they wait, let’s look at specifics of the dashboards, and let’s look at the storage.
“I am afraid that I rather give myself away when I explain,” said he. “Results without causes are much more impressive.”
The Stock-Broker’s Clerk, Sir Arthur Conan Doyle
Let’s take a look at the timing out dashboards.
Troubleshooting tool #1: Storage Usage
Overall Data Storage is getting very high.
There are only 6,000 accounts – how can 6,000 accounts eat up that much storage?
Perhaps Daily Sales Data is too granular?
Most impactful: data volumes. Only load absolutely necessary data; treat as operational data store.
Ways to reduce:
Reduce sampling frequency
Summarize (in ETL layer, batch jobs, off-platform jobs, etc)
Archive
Common problem when moving summary calculations into the Integration Layer, is the lack of primary keys. Common solution is to create Compound Keys (logical Keys) that identify a piece of information uniquely, and are composed of multiple components.
Troubleshooting tool #2: Workbench
Use to profile data: distribution, anomalies, information sparsity/density
Use to analyze total data set, including deleted records. Note that the number of records returned is much larger when including the deleted records for our use case.
Existing data load scripts were deleting/reloading full sets of data nightly. We helped the customer retire the full reloads, and just do updates/inserts. Note: upserts are easier to implement, but are a bit slower.
Challenge: large numbers of errors in data loads; long-running operations; timeouts.
Error logs showed concurrency locking
Caused by roll-up summary fields. Fixed by moving the calculations into the Integration Layer, and removing the roll-ups.
After, the major re-work of the data structure, and integration, we turned to fine-tuning.
People
Defined who can create reports and views
Ensured there is a “platform owner”
Process
Ensured performance testing of reports and views in Full sandboxes before promotion to Prod
Implemented complex sharing model, limiting visibility to just records required
Demo
And they fly!
SPEAKER: Governance is the concept of ensuring users within our organization have only as much access as required to do their jobs.