Following best practices can help ensure your success. This is especially true for Force.com applications or large Salesforce orgs that have the potential to push platform limits.
Salesforce allows you to easily scale up from small to large amounts of data. Mostly this is seamless, but as data sets get larger, the time required for certain operations may grow too. Join us to learn different ways of designing and configuring data structures and planning a deployment process to significantly reduce deployment times and achieve operational efficiency.
Watch this webinar to:
Explore best practices for the design, implementation, and maintenance phases of your app's lifecycle.
Learn how seemingly unrelated components can affect one another and determine the ultimate scalability of your app.
See live demos that illustrate innovative solutions to tough challenges, including the integration of an external data warehouse using Force.com Canvas.
Walk away with practical tips for putting best practices into action.
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
1. == BUD ==
Hello everybody, and thanks for attending our webinar on Extreme
Salesforce Data Volumes. We’re delighted to see so much interest in
the architecture and data management practices that make customers
successful on the Force.com platform.
1
2. == BUD ==
This is our Safe Harbor statement. It’s hear to remind you that you
should not purchase or recommend our products and services based on
any forward looking statements. All purchasing decisions should be
based solely on the capabilities of the currently released version of the
service.
2
3. == BUD ==
For introductions, my name is Bud Vieira. I joined Salesforce as Product
Manager for Sharing four years ago, and recently moved into Technical
Enablement. I’ll be joined here today by Steve Bobrowski – Steve is a
major force in spreading knowledge of the Force.com platform to
development audiences. If you’ve ever attended a killer dev
presentation at Dreamforce, chances are good that he had a hand in it.
3
4. == BUD ==
I also want to remind you that there is a lively community of Force.com
developers out there, and you can join the conversation and find more
answers and strategies on these major networks.
4
5. == BUD ==
Before we get started, I’d like to tell you a little bit about the
team that Steve and I work with. Technical Enablement is a
group within salesforce.com’s Customer Centric Engineering
(CCE ) team. We are a bunch of software engineers that think
of ourselves as “architect evangelists,” dedicated to helping
you better understand how things work inside our cloud
platform.
Our team is committed to providing our user community with
great content that will help you architect and maintain
awesome Salesforce cloud-based solutions. Whether you are
a CIO, CTO, architect, developer, or administrator, the content
we manage on the Architect Core Resource page is a valuable
resource that we want you to know about. Make sure to check
it out on Developer Force. It’s a dynamic page, updated at
least once a week, that’s your index to featured content for
architecture, including Articles, Papers, Blog posts and events.
You can also meet our team of Force.com Architect
evangelists by following us on Twitter.
5
6. == BUD ==
And speaking of help, feel free to ask us questions during today. Use the
GoToWebinar Questions Pane to enter a question. Behind the scenes,
we have an expert team that reviews each question in the order that
they are received and posts answers to everyone on the webinar.
Please ask your question once and give our team a few minutes to
respond. Later on, we’ll highlight some of these questions.
6
7. == BUD ==
We’re going to cover a lot of ground in today’s
presentation. You’ll learn about various areas of the
platform that might seem unrelated, but can in fact have
large affects on each other.
I recommend not trying to absorb every little detail during
the presentation. Instead, sit back and just try to become
aware of things you might not have known before, and
make notes on what to research afterward.
So what’s today’s presentation all about?
7
8. == BUD ==
Just to level set here, what are we calling a lot of data? We expect that many
of you have a preconceived notion of how much data a typical Sales Cloud or
Force.com app can handle. The truth is that as salesforce.com popularity has
skyrocketed, so too has the size of databases underlying custom and standard
app implementations on our cloud platforms. It might surprise you to learn that
our team works regularly with customers that have:
• large objects upwards of 10s or even 100s of Ms of data records
• Ms of users
• Ks of roles and groups in their hierarchies
8
9. == BUD ==
What makes it possible for us to handle those loads for our largest
customers? The answer is: careful design and following best practices
throughout the lifecycle of a Force.com implementation.
So our task here today is to help you anticipate and plan for the key
architecture decisions you will be making over the life cycle of a large
enterprise Force.com implementation. To help organize the discussion
and make sure we don’t leave anything out, we’ll take each phase of
the implementation one at a time.
And we’ll cap it all off with an awesome demo showing you how to
combine data from an external system so your users have a seamless
experience no matter how big the data is or where it is stored.
Steve, can you start us off with the Design phase?
10. == STEVE ==
Thanks Bud.
I can’t stress enough how important it is to keep scalability front and
center from the very beginning of the project. Studies show – and I’m
sure you have all learned from painful experience – that key issues left
out of the design phase of a project are MUCH harder and more
expensive to correct in later phases.
11. == STEVE ==
Here’s a few key areas to consider during this phase … scoping
requirements is clearly crucial, but we also need to anticipate how data
volumes will grow over time, and the effects that will have on critical
operations like searching and query performance.
13. == STEVE ==
Understanding requirements and then implementing solutions that meet
those requirements should be a design phase activity, no matter what
app you are building. In the context of large Salesforce implementations,
it’s very important to ask yourself a few key questions related to
requirements analysis.
With small implementations, we often implement list views and reports
and don’t stop to consider how many database records are scanned
and returned on their behalf. You won’t be able to get away with this
casual approach when working with large data sets because the sheer
volume of data has the potential to seriously detract from your
application’s usability. And it could very well hurts scalability in certain
cases, placing limits on the performance of all system components.
So it’s important to step back and consider how much operational data
is necessary for your app to be functional. Will you need to keep all of
14. == STEVE ==
To illustrate the point, let’s take a look at an example default ALL
records list view. Notice how there’s no filter criteria here to minimize
the number of records that the view retrieves from the database.
For smaller orgs, the default ALL records view might be just fine. But
when you are working with a large object that has millions of records,
you should consider some redesigning such default list views. You
probably want to consider more filtered and targeted views that more
usable … AND scalable.
15. == STEVE ==
Now let’s talk a little bit more about your database size, but from a
slightly different angle.
16. == STEVE ==
When you implement your app in production, on the “go live” date, the
database will have a certain volume. From that day on, there will be a
growth rate for data volume. Before then, you need to ask yourself, do
you clearly understand what that growth rate will be?
There are factors that add and subtract from your app’s data volume
growth rate, and you should consider them all so that you are prepared
for the future. Things that can increase volume are transactions and
data loads. These can be offset by archiving data that’s no longer
needed in the production app.
A couple of more subtle, but important things to consider. First, always
ask yourself if everything you are recording in Salesforce meets or
exceeds your requirements. If your Salesforce data set exceeds
requirements, then trim off the fat to keep your database as lean as
possible.
17. == STEVE ==
Calculating your database’s growth rate might seem straightforward.
Focusing on the largest objects in your system, you might calculate for
each month how many records you plan to add by transaction
processing and data loads, and then subtract the records you plan to
archive. It seems so simple that you might just do a cursory calculation
and move on. But unless you think things through, you can get yourself
into hot water, especially with extreme implementations. Here’s an
example of what I mean …
Take a close look at this graph above. Notice that the growth rate isn’t a
straight line. Why not? Well, the growth rate might not be constant
throughout a given year do to cycles. At the end of every quarter or
every year, the growth rate might be larger than what happens during a
normal month. If you are in retail, your growth rate might go off the
charts in November and December due to holiday sales.
18. == STEVE ==
So what do you do with all this information about data volumes and
growth rate? If your database is going to be extraordinarily large, then
your key goal is to figure out ways to minimize data storage. Revisit your
requirements and make sure that you are not exceeding the amount of
data necessary to meet them.
19. == STEVE ==
Now let’s turn our attention to another key area in the design phase,
query building.
19
20. == STEVE ==
The exercises that you want to perform in this context include some
research, and perhaps some brushing up on what makes for efficient
query execution.
21. == STEVE ==
To begin, it’s a good idea to document your schema’s available
indexes. Remember, Force.com automatically indexes various fields in
every object.
22. == STEVE ==
To help you out, we’ve built a cheat sheet that reminds you what fields have
database indexes. You can get this unofficial Force.com cheat sheet from the
Architect Core Resources page.
Once you understand what indexes are available, make sure you know how to
design queries that use them and avoid performance sapping full scans. The
cheat sheet also points out some basic selectivity rules to keep in mind when
designing queries for list views, reports, and queries within API calls and Apex
code.
22
23. == STEVE ==
To drive that point home, let’s profile a SOQL query using the Developer
Console and see how a custom index can affect the performance of that
query.
-- DEMO--
1. Start on Opportunity object edit form.
2. Edit Order Number custom field, point out that it already has a
custom index because of “external id”.
3. Open Developer Console | Execute.
4. Review SOQL query that requests specific Opportunity by Order
Number.
5. Execute 1, review Perf *WITH* index in place. Should be 8-30ms.
23
24. == STEVE ==
Next up, let’s discuss design phase considerations with respect to the
platform’s full-text search engine.
One of the great things about Force.com is that it includes an integrated
full-text search engine for your apps. It automatically indexes most text
fields so that your users can build cross-object searches and quickly find
records that contain strings of interest. But when data volumes get
extreme, there are some design best practices to be aware of.
24
25. == STEVE ==
Similar to query design, you’ll want to get to know what fields are
indexed, learn how to take advantage of those indexes, and then design
search functionality so that it is as efficient as possible. Let’s take a look
at each item here.
26. == STEVE ==
Force.com automatically indexes “text” oriented fields in most objects.
Again, it’s a good idea to document exactly what field have search
indexes.
27. == STEVE ==
To help you out, that same cheat sheet covers search
indexes as well. Although it varies from object to object,
you can bet that your search indexes will include Name,
phone, Text, and picklist fields. Make sure to read the
documentation for a complete list of fields in standard
objects that search indexes.
SOSL queries are quite different than SOQL queries. So
make sure to brush up on ways that you can educate
users for efficient searching.
In the best case scenario, users will understand how to
and then search efficiently. However, you should have a
plan ready for implementing limitations on search, to
27
28. == STEVE ==
It’s also useful to understand some subtle points about the multitenant
characteristics of Force.com’s search engine. As this illustration shows,
notice that a query scans an entire index for search string matches and
reduces the return set afterward by considering record sharing rules.
Why does this matter? Well, the more complex that your record sharing
model is, the longer it will take to perform this reduction step. So here
we have a case of two seemingly independent features of Salesforce
that affect one another. The key takeaway: keep your sharing model as
simple as possible to facilitate faster searching.
Another thing to understand is that Search indexing can lag behind
heavy transaction processing loads and bulk data loads. Considering
this, it’s important to complete such loads at off-peak times so that
users do not unexpectedly get Search results that don’t include the
latest data.
29. == STEVE ==
So that was a lot of information for the design phase. Let’s do a quick
wrap up on the key takeaways here.
…
Bud, why don’t you take us through the data loading phase.
29
30. == BUD ==
Thanks Steve …
Now that you have carefully determined your data requirements and
growth, and understand what indexes are available to help speed up
queries and searches, it’s time for the next big challenge for your
implementation – loading 10s of millions of records.
31. == BUD ==
Loading large volumes of data into an enterprise scale application can
present unique challenges, and requires careful preparation.
In this section we’re going to discuss a few important things you can do
to prepare a smoother load, and to increase your throughput getting
records into the platform.
32. == BUD ==
In a small org, you can often get adequate performance by simply
inserting records without much pre-processing, and letting validation
rules and triggers take care of the object relationships and occasional
errors. But this strategy simply won’t scale when you are loading
millions of records on a tight timeframe. The overhead required by using
these features to clean up data issues in “real time” is just too large.
So one of the best ways to speed up a large data load is to take extra
care that the data is clean to begin with, including relationships between
parent and child objects.
33. == BUD ==
When you know your data is clean, you can safely disable these
processes that you would normally have in place to protect against data
entry errors in batch loads, or made by users during daily operations. All
of these operations can add substantial time to inserts – complex
triggers in particular. They’re one of the first things we investigate when
we are asked to diagnose a slow load or integration.
34. == BUD ==
When you load data, the Force.com Bulk API is the way to go, especially
when you need to load lots of data quickly into Force.com. You can also
use it to remove large sets of data from your org, very quickly and
efficiently.
To learn all about the Bulk API, head on over to Developer Force in the
Integration section. Here you’ll find all the information you need to get
up to speed on the Bulk API with quick start tutorials, articles, and
helpful links.
35. == BUD ==
When you’re brushing up on the Bulk API, make sure you follow the link
to the Documentation page that explains the Force.com Bulk API limits.
With extreme Salesforce implementations, bulk loads can often push
these limits, so it’s very important to plan ahead and figure out how you
are going to configure and time your data loads so that you are
successful.
We STRONGLY recommend that you test large bulk loads in Sandbox
before you load in production. Not only will it help you shake out any
problems in your data, loading plan and process, but it will also give you
at least a ballpark estimate of how long your load will take. And that
estimate is crucial for establishing a maintenance window with your
business customers.
36. == BUD ==
To make your data loading experience as simple as possible, don’t
“reinvent the wheel.” Lots of others have crossed the same bridge that
you are about to. So do yourself a favor: Visit the AppExchange and
search for “data load” to get a list of utilities that can greatly improve
and ease your ETL processes. Many of these utilities are free and
leverage the Bulk API. In fact, Steve’s going to show us an example of
one such utility later on, right Steve?
== STEVE ==
Right Bud, we’re going to look at a utility called Jitterbit Data Loader
that can handle the kind of requirements you just mentioned.
37. == BUD ==
Another thing you need to anticipate when loading data is the time that
will be spent giving users access to the new data.
When you insert lots of data into an org that’s already configured, the
sharing calculations can take a considerable amount of time. In many
cases, you can reduce load times by deferring these sharing
calculations until after the load is complete. If your org has this feature
enabled, this is easy to do.
It’s important to note that after you have deferred sharing and loaded
your data, you need to resume normal sharing calculations and
recalculate sharing rules. Again, it’s a good idea to test the process in
Sandbox, to estimate the kind of maintenance window you are going to
need in production.
38. == BUD ==
Here’s a quick recap on Data Loading best practices:
- Make sure the data is clean
- Turn off operations that slow down inserts
- Find a loading utility that fits your needs
- Learn and use the Bulk API
- Defer Sharing calculations if your org is already configured
38
39. == BUD ==
The sharing configuration that you design for your Force.com application
doesn’t just affect data loading – it has a significant impact on
performance throughout the lifecycle of your implementation. In this
section, we will call out some best practices for keeping your data
secure AND achieving good performance with very large volumes of
data.
40. == BUD ==
So how do you design an efficient sharing model? Once you go to a
private record sharing model as an org-wide default for an object, the
internal tables in Salesforce can really grow in record count. But by
sticking to a few key rules, you can achieve success in extreme
Salesforce implementations.
41. == BUD ==
The first and most effective thing you can do to streamline your sharing
configuration is consider very carefully during the design phase which
objects truly need to be protected. For every object that has a Public
Read Only or Private sharing model, Force.com maintains a table that
records which groups and users have access to that object’s records.
These tables can become very large, sometimes even larger than the
object tables themselves. And since the platform performs access
checks for just about anything a user does, sharing can be a big
component of search, reporting, list views, and other common
operations.
42. == BUD ==
Streamlining your group nesting and your role hierarchy can also be key
to maintaining. When a user requests access to an Account record, for
example, Force.com needs to do more than check the sharing table for
that object to see if the user has been granted access directly. It also
needs to check whether the user has indirect access through
membership in a group. This can be a challenge when the user could
belong to any one of thousands of roles, or a group 8 levels deep in a
stack of public groups. So the leaner you can make these hierarchies,
the faster those access decisions can be made.
43. == BUD ==
Another key thing to plan for is skewed data distributions that can affect
record sharing recalculations. When a single user owns more than 10K
rows, or a single parent record has more than 10K child rows, sharing
recalculations can take a long time to complete. So it’s important to
have a plan ready to implement that distributes the ownership and
parenting of records in the schema as your data volumes grow.
44. == BUD ==
In addition to increasing the time required to calculate sharing changes,
very large data volumes can also increase the risk of encountering a
lock when adjusting group hierarchies and membership, or when
updating parent records.
These locks are very brief, but as data volumes and processing times
grow, the chance of encountering a lock also grows. If you want to read
more about this, there are several papers available on our Core
Resources page.
45. == BUD ==
By understanding the consequences of decisions you make in designing
your sharing configuration, you can protect your data and provide good
performance for your users. Key considerations include:
• avoiding over-protection of your data
• keeping the role hierarchy lean
• preventing data skews on ownership and parent / child relationships
For more detail, check out the paper Designing Record Access for
Enterprise Scale, linked from the Architect Core Resources page.
With that, Steve, can you take us through the next and final phase of
best practices?
45
46. == STEVE ==
Thanks Bud.
So now you’ve implemented your system and you have to maintain it
going forward. You may have done everything you can to build an
efficient system, but you need some more options to make things
perform and scale even more. In this section of the presentation, we’ll
learn about some additional tools you have at your disposal.
47. == STEVE ==
In this part of the presentation, we are going to focus on a few important
options to consider: custom indexing, skinny tables, data partitioning,
and application partitioning.
48. == STEVE ==
Back in the design phase, you identified all the fields that automatically
have indexes, and you built queries, views, and reports that leveraged
those indexes. After your data set grows, you might determine that it
makes sense to index other fields in your database to further improve
performance and scale. Let’s take a look at what options you have to
add additional indexes to your schema.
49. == STEVE ==
Just a quick reminder here that you can create your own indexes by
deeming custom fields in your schema as “Unique” or a “External IDs”.
We saw an example of this earlier when I showed you the custom index
on the Order Number field of the Opportunity object.
50. == STEVE ==
Now if you want to create other indexes that can improve the
performance of queries and reports, but can’t do it using the Setup
configuration interface, create a request with Salesforce Support. We
support the creation of custom one and two field indexes, once you
justify how it can help improve the performance of your app and reduce
the load on our platform.
51. == STEVE ==
Now let’s take a look at a unique feature of Salesforce that we use to
solve certain long-running query issues, a feature called skinny tables.
52. == STEVE ==
Let’s say you’ve exhausted all of the other tools at your disposal to
tune the performance of a long-running report. You’ve got al the right
indexes in place. You’ve kept your database lean. And so forth …
In such extreme cases, you can work with our Support team to
determine if you can improve the performance long-running report
queries using what’s known of as a “skinny table”. As the name
indicates, a skinny table contains a subset of the columns in a large
object, thus reducing the amount of data that a report query needs to
scan at runtime. This can translate to substantial gains in performance.
Skinny tables are automatically kept in synch by Force.com once they
are in place, and do not contribute records to the recycle bin. Even
better, Force.com’s optimizer automatically takes advantage of skinny
tables when it determines doing so would help improve the performance
of a query. That means that you don’t have to modify any queries or
53. == STEVE ==
Data partitioning is a proven technique that database systems provide to
physically divide large logical data structures into smaller, more
manageable pieces. Partitioning can also help to improve the
performance and scalability of a large database schema. For example, a
query that targets data in a specific table partition does not need to
consider other partitions of the table or related indexes. This common
optimization is sometimes referred to as “partition pruning.”
54. == STEVE ==
If you have a large database supporting your app and you ascertain that
by partitioning your data, you can reduce query response times, open a
request with Salesforce support to enable “divisions” for your org. Then,
use the Setup configuration UI to implement divisions in your org.
Before implementing divisions in production, it’s important to thoroughly
test your plan in Sandbox. Using divisions to partition data might
improve the performance of some operations, but might detract from the
performance of others. Effective usage of divisions require that you
understand usage patterns, criteria used to create the division, optimal
number of divisions to create, data distribution between the divisions
and data growth pattern.
With these considerations in mind, before implementing divisions in
production, it’s important to thoroughly test your plan in Sandbox. Using
divisions to partition data might improve the performance of some
55. == STEVE ==
We just spoke about data partitioning. Now let’s discuss application
partitioning. What do I mean, exactly, by application partitioning?
Every individual application platform has practical limits, and that’s
when distributed processing make sense. By partitioning your app
across different systems and then integrating the underlying systems
together, you can scale your app to new heights.
Just as with other platforms, individual components within the Salesforce
Platform each have practical limits, which shouldn’t surprise you. But
the beauty of the platform is that it’s very easy to integrate different
components, share state, authentication, and access controls, and thus
support tremendous amounts of data for even the most demanding big
data app.
56. == STEVE ==
For example, consider this scenario. You’ve determined that you can
archive historical data from your operational Salesforce org to keep it
lean, but you want to preserve this data for compliance and analytics by
loading it into a data warehouse. You also want to make your users
happy by not requiring them to leave your Force.com app to due their
reporting. Fortunately, this is relatively painless because we have a cool
feature called Force.com Canvas. Force.com Canvas lets you embed
the UI of another app right within the Salesforce UI.
57. == STEVE ==
The scenario that we just covered is a great way to illustrate several of
the best practices that we’ve covered today by way of a live demo.
58. == STEVE ==
Before I jump into the live demo, let’s review a list of various best
practices that the demo illustrates so that you know what to look for.
See list.
Demo.
1. Finished product.
2. Heroku app, alone.
3. Canvas Preview and Quick Start.
4. Visualforce page.
5. Jitterbit.
Review best practices.
58
59. == STEVE ==
With that, I’d like to wrap up today’s session by reminding you to visit
the Architect Core Resources page on Developer Force for more best
practices.
59
61. == STEVE ==
And before we get to Q&A, I’d like to invite you to provide us with
feedback on this session. The feedback we get from you is very
important and helps us shape future webinars.
Look in your GoToWebinar chat window for the hyperlink to the survey.
Click on it, and fill it out. It only takes a few seconds and we’d really
appreciate your input.
61
62. == STEVE ==
Alright, let’s cover some of the great questions that we received today
during the webinar.
Bud, why don’t you start us off.
62