Extreme Salesforce Data Volumes Webinar (with Speaker Notes)


Published on

Following best practices can help ensure your success. This is especially true for Force.com applications or large Salesforce orgs that have the potential to push platform limits.

Salesforce allows you to easily scale up from small to large amounts of data. Mostly this is seamless, but as data sets get larger, the time required for certain operations may grow too. Join us to learn different ways of designing and configuring data structures and planning a deployment process to significantly reduce deployment times and achieve operational efficiency.

Watch this webinar to:

Explore best practices for the design, implementation, and maintenance phases of your app's lifecycle.
Learn how seemingly unrelated components can affect one another and determine the ultimate scalability of your app.
See live demos that illustrate innovative solutions to tough challenges, including the integration of an external data warehouse using Force.com Canvas.
Walk away with practical tips for putting best practices into action.

Published in: Technology
  • Be the first to comment

Extreme Salesforce Data Volumes Webinar (with Speaker Notes)

  1. 1. == BUD ==Hello everybody, and thanks for attending our webinar on ExtremeSalesforce Data Volumes. We’re delighted to see so much interest inthe architecture and data management practices that make customerssuccessful on the Force.com platform. 1
  2. 2. == BUD ==This is our Safe Harbor statement. It’s hear to remind you that youshould not purchase or recommend our products and services based onany forward looking statements. All purchasing decisions should bebased solely on the capabilities of the currently released version of theservice. 2
  3. 3. == BUD ==For introductions, my name is Bud Vieira. I joined Salesforce as ProductManager for Sharing four years ago, and recently moved into TechnicalEnablement. I’ll be joined here today by Steve Bobrowski – Steve is amajor force in spreading knowledge of the Force.com platform todevelopment audiences. If you’ve ever attended a killer devpresentation at Dreamforce, chances are good that he had a hand in it. 3
  4. 4. == BUD ==I also want to remind you that there is a lively community of Force.comdevelopers out there, and you can join the conversation and find moreanswers and strategies on these major networks. 4
  5. 5. == BUD ==Before we get started, I’d like to tell you a little bit about theteam that Steve and I work with. Technical Enablement is agroup within salesforce.com’s Customer Centric Engineering(CCE ) team. We are a bunch of software engineers that thinkof ourselves as “architect evangelists,” dedicated to helpingyou better understand how things work inside our cloudplatform.Our team is committed to providing our user community withgreat content that will help you architect and maintainawesome Salesforce cloud-based solutions. Whether you area CIO, CTO, architect, developer, or administrator, the contentwe manage on the Architect Core Resource page is a valuableresource that we want you to know about. Make sure to checkit out on Developer Force. It’s a dynamic page, updated atleast once a week, that’s your index to featured content forarchitecture, including Articles, Papers, Blog posts and events.You can also meet our team of Force.com Architectevangelists by following us on Twitter. 5
  6. 6. == BUD ==And speaking of help, feel free to ask us questions during today. Use theGoToWebinar Questions Pane to enter a question. Behind the scenes,we have an expert team that reviews each question in the order thatthey are received and posts answers to everyone on the webinar.Please ask your question once and give our team a few minutes torespond. Later on, we’ll highlight some of these questions. 6
  7. 7. == BUD ==We’re going to cover a lot of ground in today’spresentation. You’ll learn about various areas of theplatform that might seem unrelated, but can in fact havelarge affects on each other.I recommend not trying to absorb every little detail duringthe presentation. Instead, sit back and just try to becomeaware of things you might not have known before, andmake notes on what to research afterward.So what’s today’s presentation all about? 7
  8. 8. == BUD ==Just to level set here, what are we calling a lot of data? We expect that manyof you have a preconceived notion of how much data a typical Sales Cloud orForce.com app can handle. The truth is that as salesforce.com popularity hasskyrocketed, so too has the size of databases underlying custom and standardapp implementations on our cloud platforms. It might surprise you to learn thatour team works regularly with customers that have:•  large objects upwards of 10s or even 100s of Ms of data records•  Ms of users•  Ks of roles and groups in their hierarchies 8
  9. 9. == BUD ==What makes it possible for us to handle those loads for our largestcustomers? The answer is: careful design and following best practicesthroughout the lifecycle of a Force.com implementation.So our task here today is to help you anticipate and plan for the keyarchitecture decisions you will be making over the life cycle of a largeenterprise Force.com implementation. To help organize the discussionand make sure we don’t leave anything out, we’ll take each phase ofthe implementation one at a time.And we’ll cap it all off with an awesome demo showing you how tocombine data from an external system so your users have a seamlessexperience no matter how big the data is or where it is stored.Steve, can you start us off with the Design phase?
  10. 10. == STEVE ==Thanks Bud.I can’t stress enough how important it is to keep scalability front andcenter from the very beginning of the project. Studies show – and I’msure you have all learned from painful experience – that key issues leftout of the design phase of a project are MUCH harder and moreexpensive to correct in later phases.
  11. 11. == STEVE ==Here’s a few key areas to consider during this phase … scopingrequirements is clearly crucial, but we also need to anticipate how datavolumes will grow over time, and the effects that will have on criticaloperations like searching and query performance.
  12. 12. == STEVE ==So let’s take them one at a time … 12
  13. 13. == STEVE ==Understanding requirements and then implementing solutions that meetthose requirements should be a design phase activity, no matter whatapp you are building. In the context of large Salesforce implementations,it’s very important to ask yourself a few key questions related torequirements analysis.With small implementations, we often implement list views and reportsand don’t stop to consider how many database records are scannedand returned on their behalf. You won’t be able to get away with thiscasual approach when working with large data sets because the sheervolume of data has the potential to seriously detract from yourapplication’s usability. And it could very well hurts scalability in certaincases, placing limits on the performance of all system components.So it’s important to step back and consider how much operational datais necessary for your app to be functional. Will you need to keep all of
  14. 14. == STEVE ==To illustrate the point, let’s take a look at an example default ALLrecords list view. Notice how there’s no filter criteria here to minimizethe number of records that the view retrieves from the database.For smaller orgs, the default ALL records view might be just fine. Butwhen you are working with a large object that has millions of records,you should consider some redesigning such default list views. Youprobably want to consider more filtered and targeted views that moreusable … AND scalable.
  15. 15. == STEVE ==Now let’s talk a little bit more about your database size, but from aslightly different angle.
  16. 16. == STEVE ==When you implement your app in production, on the “go live” date, thedatabase will have a certain volume. From that day on, there will be agrowth rate for data volume. Before then, you need to ask yourself, doyou clearly understand what that growth rate will be?There are factors that add and subtract from your app’s data volumegrowth rate, and you should consider them all so that you are preparedfor the future. Things that can increase volume are transactions anddata loads. These can be offset by archiving data that’s no longerneeded in the production app.A couple of more subtle, but important things to consider. First, alwaysask yourself if everything you are recording in Salesforce meets orexceeds your requirements. If your Salesforce data set exceedsrequirements, then trim off the fat to keep your database as lean aspossible.
  17. 17. == STEVE ==Calculating your database’s growth rate might seem straightforward.Focusing on the largest objects in your system, you might calculate foreach month how many records you plan to add by transactionprocessing and data loads, and then subtract the records you plan toarchive. It seems so simple that you might just do a cursory calculationand move on. But unless you think things through, you can get yourselfinto hot water, especially with extreme implementations. Here’s anexample of what I mean …Take a close look at this graph above. Notice that the growth rate isn’t astraight line. Why not? Well, the growth rate might not be constantthroughout a given year do to cycles. At the end of every quarter orevery year, the growth rate might be larger than what happens during anormal month. If you are in retail, your growth rate might go off thecharts in November and December due to holiday sales.
  18. 18. == STEVE ==So what do you do with all this information about data volumes andgrowth rate? If your database is going to be extraordinarily large, thenyour key goal is to figure out ways to minimize data storage. Revisit yourrequirements and make sure that you are not exceeding the amount ofdata necessary to meet them.
  19. 19. == STEVE ==Now let’s turn our attention to another key area in the design phase,query building. 19
  20. 20. == STEVE ==The exercises that you want to perform in this context include someresearch, and perhaps some brushing up on what makes for efficientquery execution.
  21. 21. == STEVE ==To begin, it’s a good idea to document your schema’s availableindexes. Remember, Force.com automatically indexes various fields inevery object.
  22. 22. == STEVE ==To help you out, we’ve built a cheat sheet that reminds you what fields havedatabase indexes. You can get this unofficial Force.com cheat sheet from theArchitect Core Resources page.Once you understand what indexes are available, make sure you know how todesign queries that use them and avoid performance sapping full scans. Thecheat sheet also points out some basic selectivity rules to keep in mind whendesigning queries for list views, reports, and queries within API calls and Apexcode. 22
  23. 23. == STEVE ==To drive that point home, let’s profile a SOQL query using the DeveloperConsole and see how a custom index can affect the performance of thatquery.-- DEMO--1.  Start on Opportunity object edit form.2.  Edit Order Number custom field, point out that it already has a custom index because of “external id”.3.  Open Developer Console | Execute.4.  Review SOQL query that requests specific Opportunity by Order Number.5.  Execute 1, review Perf *WITH* index in place. Should be 8-30ms. 23
  24. 24. == STEVE ==Next up, let’s discuss design phase considerations with respect to theplatform’s full-text search engine.One of the great things about Force.com is that it includes an integratedfull-text search engine for your apps. It automatically indexes most textfields so that your users can build cross-object searches and quickly findrecords that contain strings of interest. But when data volumes getextreme, there are some design best practices to be aware of. 24
  25. 25. == STEVE ==Similar to query design, you’ll want to get to know what fields areindexed, learn how to take advantage of those indexes, and then designsearch functionality so that it is as efficient as possible. Let’s take a lookat each item here.
  26. 26. == STEVE ==Force.com automatically indexes “text” oriented fields in most objects.Again, it’s a good idea to document exactly what field have searchindexes.
  27. 27. == STEVE ==To help you out, that same cheat sheet covers searchindexes as well. Although it varies from object to object,you can bet that your search indexes will include Name,phone, Text, and picklist fields. Make sure to read thedocumentation for a complete list of fields in standardobjects that search indexes.SOSL queries are quite different than SOQL queries. Somake sure to brush up on ways that you can educateusers for efficient searching.In the best case scenario, users will understand how toand then search efficiently. However, you should have aplan ready for implementing limitations on search, to 27
  28. 28. == STEVE ==It’s also useful to understand some subtle points about the multitenantcharacteristics of Force.com’s search engine. As this illustration shows,notice that a query scans an entire index for search string matches andreduces the return set afterward by considering record sharing rules.Why does this matter? Well, the more complex that your record sharingmodel is, the longer it will take to perform this reduction step. So herewe have a case of two seemingly independent features of Salesforcethat affect one another. The key takeaway: keep your sharing model assimple as possible to facilitate faster searching.Another thing to understand is that Search indexing can lag behindheavy transaction processing loads and bulk data loads. Consideringthis, it’s important to complete such loads at off-peak times so thatusers do not unexpectedly get Search results that don’t include thelatest data.
  29. 29. == STEVE ==So that was a lot of information for the design phase. Let’s do a quickwrap up on the key takeaways here.…Bud, why don’t you take us through the data loading phase. 29
  30. 30. == BUD ==Thanks Steve …Now that you have carefully determined your data requirements andgrowth, and understand what indexes are available to help speed upqueries and searches, it’s time for the next big challenge for yourimplementation – loading 10s of millions of records.
  31. 31. == BUD ==Loading large volumes of data into an enterprise scale application canpresent unique challenges, and requires careful preparation.In this section we’re going to discuss a few important things you can doto prepare a smoother load, and to increase your throughput gettingrecords into the platform.
  32. 32. == BUD ==In a small org, you can often get adequate performance by simplyinserting records without much pre-processing, and letting validationrules and triggers take care of the object relationships and occasionalerrors. But this strategy simply won’t scale when you are loadingmillions of records on a tight timeframe. The overhead required by usingthese features to clean up data issues in “real time” is just too large.So one of the best ways to speed up a large data load is to take extracare that the data is clean to begin with, including relationships betweenparent and child objects.
  33. 33. == BUD ==When you know your data is clean, you can safely disable theseprocesses that you would normally have in place to protect against dataentry errors in batch loads, or made by users during daily operations. Allof these operations can add substantial time to inserts – complextriggers in particular. They’re one of the first things we investigate whenwe are asked to diagnose a slow load or integration.
  34. 34. == BUD ==When you load data, the Force.com Bulk API is the way to go, especiallywhen you need to load lots of data quickly into Force.com. You can alsouse it to remove large sets of data from your org, very quickly andefficiently.To learn all about the Bulk API, head on over to Developer Force in theIntegration section. Here you’ll find all the information you need to getup to speed on the Bulk API with quick start tutorials, articles, andhelpful links.
  35. 35. == BUD ==When you’re brushing up on the Bulk API, make sure you follow the linkto the Documentation page that explains the Force.com Bulk API limits.With extreme Salesforce implementations, bulk loads can often pushthese limits, so it’s very important to plan ahead and figure out how youare going to configure and time your data loads so that you aresuccessful.We STRONGLY recommend that you test large bulk loads in Sandboxbefore you load in production. Not only will it help you shake out anyproblems in your data, loading plan and process, but it will also give youat least a ballpark estimate of how long your load will take. And thatestimate is crucial for establishing a maintenance window with yourbusiness customers.
  36. 36. == BUD ==To make your data loading experience as simple as possible, don’t“reinvent the wheel.” Lots of others have crossed the same bridge thatyou are about to. So do yourself a favor: Visit the AppExchange andsearch for “data load” to get a list of utilities that can greatly improveand ease your ETL processes. Many of these utilities are free andleverage the Bulk API. In fact, Steve’s going to show us an example ofone such utility later on, right Steve?== STEVE ==Right Bud, we’re going to look at a utility called Jitterbit Data Loaderthat can handle the kind of requirements you just mentioned.
  37. 37. == BUD ==Another thing you need to anticipate when loading data is the time thatwill be spent giving users access to the new data.When you insert lots of data into an org that’s already configured, thesharing calculations can take a considerable amount of time. In manycases, you can reduce load times by deferring these sharingcalculations until after the load is complete. If your org has this featureenabled, this is easy to do.It’s important to note that after you have deferred sharing and loadedyour data, you need to resume normal sharing calculations andrecalculate sharing rules. Again, it’s a good idea to test the process inSandbox, to estimate the kind of maintenance window you are going toneed in production.
  38. 38. == BUD ==Here’s a quick recap on Data Loading best practices:-  Make sure the data is clean-  Turn off operations that slow down inserts-  Find a loading utility that fits your needs-  Learn and use the Bulk API-  Defer Sharing calculations if your org is already configured 38
  39. 39. == BUD ==The sharing configuration that you design for your Force.com applicationdoesn’t just affect data loading – it has a significant impact onperformance throughout the lifecycle of your implementation. In thissection, we will call out some best practices for keeping your datasecure AND achieving good performance with very large volumes ofdata.
  40. 40. == BUD ==So how do you design an efficient sharing model? Once you go to aprivate record sharing model as an org-wide default for an object, theinternal tables in Salesforce can really grow in record count. But bysticking to a few key rules, you can achieve success in extremeSalesforce implementations.
  41. 41. == BUD ==The first and most effective thing you can do to streamline your sharingconfiguration is consider very carefully during the design phase whichobjects truly need to be protected. For every object that has a PublicRead Only or Private sharing model, Force.com maintains a table thatrecords which groups and users have access to that object’s records.These tables can become very large, sometimes even larger than theobject tables themselves. And since the platform performs accesschecks for just about anything a user does, sharing can be a bigcomponent of search, reporting, list views, and other commonoperations.
  42. 42. == BUD ==Streamlining your group nesting and your role hierarchy can also be keyto maintaining. When a user requests access to an Account record, forexample, Force.com needs to do more than check the sharing table forthat object to see if the user has been granted access directly. It alsoneeds to check whether the user has indirect access throughmembership in a group. This can be a challenge when the user couldbelong to any one of thousands of roles, or a group 8 levels deep in astack of public groups. So the leaner you can make these hierarchies,the faster those access decisions can be made.
  43. 43. == BUD ==Another key thing to plan for is skewed data distributions that can affectrecord sharing recalculations. When a single user owns more than 10Krows, or a single parent record has more than 10K child rows, sharingrecalculations can take a long time to complete. So it’s important tohave a plan ready to implement that distributes the ownership andparenting of records in the schema as your data volumes grow.
  44. 44. == BUD ==In addition to increasing the time required to calculate sharing changes,very large data volumes can also increase the risk of encountering alock when adjusting group hierarchies and membership, or whenupdating parent records.These locks are very brief, but as data volumes and processing timesgrow, the chance of encountering a lock also grows. If you want to readmore about this, there are several papers available on our CoreResources page.
  45. 45. == BUD ==By understanding the consequences of decisions you make in designingyour sharing configuration, you can protect your data and provide goodperformance for your users. Key considerations include:•  avoiding over-protection of your data•  keeping the role hierarchy lean•  preventing data skews on ownership and parent / child relationshipsFor more detail, check out the paper Designing Record Access forEnterprise Scale, linked from the Architect Core Resources page.With that, Steve, can you take us through the next and final phase ofbest practices? 45
  46. 46. == STEVE ==Thanks Bud.So now you’ve implemented your system and you have to maintain itgoing forward. You may have done everything you can to build anefficient system, but you need some more options to make thingsperform and scale even more. In this section of the presentation, we’lllearn about some additional tools you have at your disposal.
  47. 47. == STEVE ==In this part of the presentation, we are going to focus on a few importantoptions to consider: custom indexing, skinny tables, data partitioning,and application partitioning.
  48. 48. == STEVE ==Back in the design phase, you identified all the fields that automaticallyhave indexes, and you built queries, views, and reports that leveragedthose indexes. After your data set grows, you might determine that itmakes sense to index other fields in your database to further improveperformance and scale. Let’s take a look at what options you have toadd additional indexes to your schema.
  49. 49. == STEVE ==Just a quick reminder here that you can create your own indexes bydeeming custom fields in your schema as “Unique” or a “External IDs”.We saw an example of this earlier when I showed you the custom indexon the Order Number field of the Opportunity object.
  50. 50. == STEVE ==Now if you want to create other indexes that can improve theperformance of queries and reports, but can’t do it using the Setupconfiguration interface, create a request with Salesforce Support. Wesupport the creation of custom one and two field indexes, once youjustify how it can help improve the performance of your app and reducethe load on our platform.
  51. 51. == STEVE ==Now let’s take a look at a unique feature of Salesforce that we use tosolve certain long-running query issues, a feature called skinny tables.
  52. 52. == STEVE ==Let’s say you’ve exhausted all of the other tools at your disposal totune the performance of a long-running report. You’ve got al the rightindexes in place. You’ve kept your database lean. And so forth …In such extreme cases, you can work with our Support team todetermine if you can improve the performance long-running reportqueries using what’s known of as a “skinny table”. As the nameindicates, a skinny table contains a subset of the columns in a largeobject, thus reducing the amount of data that a report query needs toscan at runtime. This can translate to substantial gains in performance.Skinny tables are automatically kept in synch by Force.com once theyare in place, and do not contribute records to the recycle bin. Evenbetter, Force.com’s optimizer automatically takes advantage of skinnytables when it determines doing so would help improve the performanceof a query. That means that you don’t have to modify any queries or
  53. 53. == STEVE ==Data partitioning is a proven technique that database systems provide tophysically divide large logical data structures into smaller, moremanageable pieces. Partitioning can also help to improve theperformance and scalability of a large database schema. For example, aquery that targets data in a specific table partition does not need toconsider other partitions of the table or related indexes. This commonoptimization is sometimes referred to as “partition pruning.”
  54. 54. == STEVE ==If you have a large database supporting your app and you ascertain thatby partitioning your data, you can reduce query response times, open arequest with Salesforce support to enable “divisions” for your org. Then,use the Setup configuration UI to implement divisions in your org.Before implementing divisions in production, it’s important to thoroughlytest your plan in Sandbox. Using divisions to partition data mightimprove the performance of some operations, but might detract from theperformance of others. Effective usage of divisions require that youunderstand usage patterns, criteria used to create the division, optimalnumber of divisions to create, data distribution between the divisionsand data growth pattern.With these considerations in mind, before implementing divisions inproduction, it’s important to thoroughly test your plan in Sandbox. Usingdivisions to partition data might improve the performance of some
  55. 55. == STEVE ==We just spoke about data partitioning. Now let’s discuss applicationpartitioning. What do I mean, exactly, by application partitioning?Every individual application platform has practical limits, and that’swhen distributed processing make sense. By partitioning your appacross different systems and then integrating the underlying systemstogether, you can scale your app to new heights.Just as with other platforms, individual components within the SalesforcePlatform each have practical limits, which shouldn’t surprise you. Butthe beauty of the platform is that it’s very easy to integrate differentcomponents, share state, authentication, and access controls, and thussupport tremendous amounts of data for even the most demanding bigdata app.
  56. 56. == STEVE ==For example, consider this scenario. You’ve determined that you canarchive historical data from your operational Salesforce org to keep itlean, but you want to preserve this data for compliance and analytics byloading it into a data warehouse. You also want to make your usershappy by not requiring them to leave your Force.com app to due theirreporting. Fortunately, this is relatively painless because we have a coolfeature called Force.com Canvas. Force.com Canvas lets you embedthe UI of another app right within the Salesforce UI.
  57. 57. == STEVE ==The scenario that we just covered is a great way to illustrate several ofthe best practices that we’ve covered today by way of a live demo.
  58. 58. == STEVE ==Before I jump into the live demo, let’s review a list of various bestpractices that the demo illustrates so that you know what to look for.See list.Demo.1.  Finished product.2.  Heroku app, alone.3.  Canvas Preview and Quick Start.4.  Visualforce page.5.  Jitterbit.Review best practices. 58
  59. 59. == STEVE ==With that, I’d like to wrap up today’s session by reminding you to visitthe Architect Core Resources page on Developer Force for more bestpractices. 59
  60. 60. 60
  61. 61. == STEVE ==And before we get to Q&A, I’d like to invite you to provide us withfeedback on this session. The feedback we get from you is veryimportant and helps us shape future webinars.Look in your GoToWebinar chat window for the hyperlink to the survey.Click on it, and fill it out. It only takes a few seconds and we’d reallyappreciate your input. 61
  62. 62. == STEVE ==Alright, let’s cover some of the great questions that we received todayduring the webinar.Bud, why don’t you start us off. 62