Azlan Allahwala (Speaker)
Senior Salesforce Consultant
Shams-ul-Arifen (Group Leader)
Senior Salesforce Consultant
Agenda
● What is LDV
● Skinny Tables
● Indexes
● Divisions
● Mashups
● Ownership Skew
● Parenting Skew
● Sharing considerations
● Data load strategy
● Archiving techniques
Large Data Volumes (LDV) in Salesforce
● We will discuss large data volumes (LDV) and data management in this session. We will also
discuss typical data and sharing considerations, data load strategy in Salesforce, and how to
build strategies for dealing with LDV scenarios in Salesforce.
Why is LDV important?
Image Credit: Apex hours
Underlying Concepts
Salesforce Platform Structure
1. Metadata Table
2. Data Table
3. Virtualization layer
4. Platform uses the SOQL entered by the user, which is immediately transformed to native SQL,
to carry out the required table joins and fetch the data from the back end.
Underlying Concepts
How does Search Work?
1. Record takes almost 20 minutes to be indexed after it is created in the system
2. Salesforce searches for the Index to find records if possible.
Skinny Tables
What is a skinny table?
● A skinny table is a custom table in the Force.com platform that contains a subset of fields
from a standard or custom base Salesforce object.
Key Points
1. Salesforce stores standard and custom field data in separate DB Tables
2. Skinny table combines standard and custom fields together
3. Do not contain soft deleted records
4. Skinny table can contain a maximum of 100 columns.
5. Skinny table can not contain fields from other objects
6. Skinny tables are copied to your full sandbox orgs.
7. Immediately updated when master tables are updated
8. Usable on standard and custom objects
9. Salesforce support creates them
Indexing Principles
What is an index?
• Sorted column or
column
combination that
uniquely identifies
rows of data.
• The index contains
sorted columns as
well as references
to data rows.
Example
• Created index on ID
field
• [SELECT * FROM
Table WHERE ID <
14]
• Query uses the
Sorted ID(Index)
column to quickly
identify data rows
• Query does not
need to do full table
scan to fetch rows
Standard vs Custom Index
Standard Index
SF creates
standard indexes
on the following
fields:
1. RecordTypeId
2. Division
3. CreatedDate
4. LastModifiedDate
5. Name
6. Email
7. Salesforce Record
Id
8. External Id and
Unique fields
Custom Index
Creating custom
indexes for fields
used in reports or
listviews is a good
idea.
1. Cannot be created
for:
1. multi-picklist
2. currency field
3. long text field
4. binary fields
Divisions
Divisions are a means of partitioning the data of large deployments to reduce the number of
records returned by queries and reports. For example, a deployment with many customer
records might create divisions called US, EMEA, and APAC to separate the customers into
smaller groups that are likely to have few interrelationships
Salesforce provides special support for partitioning data by divisions, which you can enable
by contacting Salesforce Customer Support
Mashups
One approach to reducing the amount of data in Salesforce is to maintain large data sets in a
different application, and then make that application available to Salesforce as needed.
Salesforce refers to such an arrangement as a mashup because it provides a quick, loosely
coupled integration of the two applications. Mashups use Salesforce presentation to display
Salesforce-hosted data and externally hosted data. Salesforce supports the following mashup
designs
Mashups
External Website The Salesforce UI displays an external website, and passes information and
requests to it. With this design, you can make the website look like part of the Salesforce UI.
Callouts Apex code allows Salesforce to use Web services to exchange information with external
systems in real time. Because of their real-time restrictions, mashups are limited to short
interactions and small amounts of data.
Mashups
Advantages of Using Mashups
• Data is never stale.
• No proprietary method needs to be developed to integrate the two systems.
Disadvantages of Using Mashups
• Accessing data takes more time.
• Functionality is reduced. For example, reporting and workflow do not work on the external data.
Best Practices
How should we improve performance under Large Volumes?
1. Try to use indexed fields in the WHERE clause of SOQL queries
2. Nulls in queries should be avoided as index cannot be used
3. Only use fields present in skinny table
4. Use query filters which can highlight less than 10 percent of the data
5. Avoid using wildcards in queries, such as %.
6. Select only necessary fields in the select statement
Ownership Skew
● When more than 10,000 records for a single object is owned by a single owner.
1. Share Table Calculations
2. The sharing rules are updated when a user moves up or down in the hierarchy for both that
user and any users above them in the role hierarchy.
Ownership Skew - How to avoid this?
1. Data Migration: Collaborate with the customer to distribute the records to a large number of
actual end users.
2. Avoid making the integration user the owner of the record.
3. Make use of the Lead and Case assignment rules.
4. Assign records to a user is a role at the top of the Role Hierarchy
Parenting Skew
● When there are 10,000 or more records for one object under the same parent record.
1. The bulk API batch size is 10,000 for data migration. Records linked to the same parent in
simultaneous batches will require the parent to be updated, perhaps resulting in record
locking.
2. Access to a parent record is driven by access to children in the case of implicit sharing. If you
lose access to a child record, Salesforce must examine every other child record to verify
whether or not you still have access to the parent.
Parenting Skew - How to avoid this?
1. Avoid having > 10,000 records of a single object linked to the same parent record.
2. Distribute contacts which are not associated with any account to many accounts when they
need to be connected to accounts.
Sharing Considerations
● Org Wide Defaults
• When possible, set non-confidential data's OWDs to
Public R/W and Public R/O.
• reduces the need for a share table.
• To prevent adding more share tables, choose
'Controlled by Parent'.
Sharing Calculation
● Parallel Sharing Rule Re-calculation
1. Sharing rules are processed synchronously when Role Hierarchy updates are made
2. Sharing rules can be processed asynchronously and can run into multiple execution
threads, so a single sharing rule calculation can run on parallel threads
3. Request SFDC to enable parallel sharing rule re-calculation for long running calculations
Sharing Calculation
● Deferred Sharing Rule Calculation
1. Whenever a user is updated in the role hierarchy updates are done at the backend
2. Share Table calculations across objects are deferred
3. Re-enable Sharing Calculations once updates are completed
4. Contact SFDC to enable this functionality
5. Verify the outcomes and timings after running the procedure through a sandbox first.
6. Work with the customer, work out a window and recalculate the deferred sharing rule.
Data Load Strategy Credit: Apex Hours
Credit: Apex Hours
Data Load Strategy
Step 1: Configure Your Organization for Data Load.
• Allow for the recalculation of parallel and distinct sharing rules.
• Make a role hierarchy and add users.
• Set the object's OWD to public read/write- the one we wish to load by specifying that no
sharing table has to be kept for the object, preventing sharing recalculation from being
required during data loading.
• Workflows, Triggers, Process Builder, and Validation Rules can all be disabled.
Data Load Strategy
Step 2: Prepare the Data Load
• Identify the data that you wish to load into the new organisation (for example, data that is
more than a year old, all active UK business unit accounts, and so on).
• Extract, cleanse, enrich, and transform the data before inserting it into the staging table.
• Remove duplicate data.
• Make certain that the data is clean, particularly the foreign key relationship.
• In the Sandbox, run some preliminary batch testing.
Data Load Strategy
Step 3: Execute the Data Load
• Load the parent object first, then the children. Save parent keys for later use.
• User insert and update prior to upsert- Salesforce internally checks the data during the
upsert procedure based on the Object's Id or External Id. As a result, upsert takes
somewhat longer than insert or upsert.
• Only send fields whose values have changed for updates.
• When using the Bulk API, group records by parent Id to avoid lock failure in simultaneous
batches.
• When dealing with more than 50,000 records, use the BULK API.
Data Load Strategy
Step 4: Configure your organization for production
• Defer sharing calculations while loads are running.
• After the load is complete, change the OWD for the item from Public Read/Write to Public
RO or Private. Make your own sharing guidelines. First, try these instructions in the
sandbox. Can request that SFDC support parallel sharing rules processing.
• Configure sharing rules one at a time, allowing one to finish before going on. Alternatively,
utilise postponed sharing to complete the sharing rule generation and then let the sharing
rule computation run on mass.
• Re-enable the trigger, workflow, and validation rules.
• Make summary roll-up fields.
Archiving Data
How and Why should we archive data?
• Salesforce should only save the most recent data.
• This will improve report, dashboard, and list view performance.
• SOQL query performance should be improved.
• Compliance and regulatory requirements
• To keep a backup of your info.
Approach 1 using Middleware
Image credit Salesforce
Approach 2 using Heroku
Image credit Salesforce
Approach 3 using Big Objects
Image credit Salesforce
LDV-v2.pptx

LDV-v2.pptx

  • 1.
    Azlan Allahwala (Speaker) SeniorSalesforce Consultant Shams-ul-Arifen (Group Leader) Senior Salesforce Consultant
  • 3.
    Agenda ● What isLDV ● Skinny Tables ● Indexes ● Divisions ● Mashups ● Ownership Skew ● Parenting Skew ● Sharing considerations ● Data load strategy ● Archiving techniques
  • 4.
    Large Data Volumes(LDV) in Salesforce ● We will discuss large data volumes (LDV) and data management in this session. We will also discuss typical data and sharing considerations, data load strategy in Salesforce, and how to build strategies for dealing with LDV scenarios in Salesforce.
  • 5.
    Why is LDVimportant? Image Credit: Apex hours
  • 6.
    Underlying Concepts Salesforce PlatformStructure 1. Metadata Table 2. Data Table 3. Virtualization layer 4. Platform uses the SOQL entered by the user, which is immediately transformed to native SQL, to carry out the required table joins and fetch the data from the back end.
  • 7.
    Underlying Concepts How doesSearch Work? 1. Record takes almost 20 minutes to be indexed after it is created in the system 2. Salesforce searches for the Index to find records if possible.
  • 8.
    Skinny Tables What isa skinny table? ● A skinny table is a custom table in the Force.com platform that contains a subset of fields from a standard or custom base Salesforce object. Key Points 1. Salesforce stores standard and custom field data in separate DB Tables 2. Skinny table combines standard and custom fields together 3. Do not contain soft deleted records 4. Skinny table can contain a maximum of 100 columns. 5. Skinny table can not contain fields from other objects 6. Skinny tables are copied to your full sandbox orgs. 7. Immediately updated when master tables are updated 8. Usable on standard and custom objects 9. Salesforce support creates them
  • 9.
    Indexing Principles What isan index? • Sorted column or column combination that uniquely identifies rows of data. • The index contains sorted columns as well as references to data rows. Example • Created index on ID field • [SELECT * FROM Table WHERE ID < 14] • Query uses the Sorted ID(Index) column to quickly identify data rows • Query does not need to do full table scan to fetch rows
  • 10.
    Standard vs CustomIndex Standard Index SF creates standard indexes on the following fields: 1. RecordTypeId 2. Division 3. CreatedDate 4. LastModifiedDate 5. Name 6. Email 7. Salesforce Record Id 8. External Id and Unique fields Custom Index Creating custom indexes for fields used in reports or listviews is a good idea. 1. Cannot be created for: 1. multi-picklist 2. currency field 3. long text field 4. binary fields
  • 11.
    Divisions Divisions are ameans of partitioning the data of large deployments to reduce the number of records returned by queries and reports. For example, a deployment with many customer records might create divisions called US, EMEA, and APAC to separate the customers into smaller groups that are likely to have few interrelationships Salesforce provides special support for partitioning data by divisions, which you can enable by contacting Salesforce Customer Support
  • 12.
    Mashups One approach toreducing the amount of data in Salesforce is to maintain large data sets in a different application, and then make that application available to Salesforce as needed. Salesforce refers to such an arrangement as a mashup because it provides a quick, loosely coupled integration of the two applications. Mashups use Salesforce presentation to display Salesforce-hosted data and externally hosted data. Salesforce supports the following mashup designs
  • 13.
    Mashups External Website TheSalesforce UI displays an external website, and passes information and requests to it. With this design, you can make the website look like part of the Salesforce UI. Callouts Apex code allows Salesforce to use Web services to exchange information with external systems in real time. Because of their real-time restrictions, mashups are limited to short interactions and small amounts of data.
  • 14.
    Mashups Advantages of UsingMashups • Data is never stale. • No proprietary method needs to be developed to integrate the two systems. Disadvantages of Using Mashups • Accessing data takes more time. • Functionality is reduced. For example, reporting and workflow do not work on the external data.
  • 15.
    Best Practices How shouldwe improve performance under Large Volumes? 1. Try to use indexed fields in the WHERE clause of SOQL queries 2. Nulls in queries should be avoided as index cannot be used 3. Only use fields present in skinny table 4. Use query filters which can highlight less than 10 percent of the data 5. Avoid using wildcards in queries, such as %. 6. Select only necessary fields in the select statement
  • 16.
    Ownership Skew ● Whenmore than 10,000 records for a single object is owned by a single owner. 1. Share Table Calculations 2. The sharing rules are updated when a user moves up or down in the hierarchy for both that user and any users above them in the role hierarchy.
  • 17.
    Ownership Skew -How to avoid this? 1. Data Migration: Collaborate with the customer to distribute the records to a large number of actual end users. 2. Avoid making the integration user the owner of the record. 3. Make use of the Lead and Case assignment rules. 4. Assign records to a user is a role at the top of the Role Hierarchy
  • 18.
    Parenting Skew ● Whenthere are 10,000 or more records for one object under the same parent record. 1. The bulk API batch size is 10,000 for data migration. Records linked to the same parent in simultaneous batches will require the parent to be updated, perhaps resulting in record locking. 2. Access to a parent record is driven by access to children in the case of implicit sharing. If you lose access to a child record, Salesforce must examine every other child record to verify whether or not you still have access to the parent.
  • 19.
    Parenting Skew -How to avoid this? 1. Avoid having > 10,000 records of a single object linked to the same parent record. 2. Distribute contacts which are not associated with any account to many accounts when they need to be connected to accounts.
  • 20.
    Sharing Considerations ● OrgWide Defaults • When possible, set non-confidential data's OWDs to Public R/W and Public R/O. • reduces the need for a share table. • To prevent adding more share tables, choose 'Controlled by Parent'.
  • 21.
    Sharing Calculation ● ParallelSharing Rule Re-calculation 1. Sharing rules are processed synchronously when Role Hierarchy updates are made 2. Sharing rules can be processed asynchronously and can run into multiple execution threads, so a single sharing rule calculation can run on parallel threads 3. Request SFDC to enable parallel sharing rule re-calculation for long running calculations
  • 22.
    Sharing Calculation ● DeferredSharing Rule Calculation 1. Whenever a user is updated in the role hierarchy updates are done at the backend 2. Share Table calculations across objects are deferred 3. Re-enable Sharing Calculations once updates are completed 4. Contact SFDC to enable this functionality 5. Verify the outcomes and timings after running the procedure through a sandbox first. 6. Work with the customer, work out a window and recalculate the deferred sharing rule.
  • 23.
    Data Load StrategyCredit: Apex Hours Credit: Apex Hours
  • 24.
    Data Load Strategy Step1: Configure Your Organization for Data Load. • Allow for the recalculation of parallel and distinct sharing rules. • Make a role hierarchy and add users. • Set the object's OWD to public read/write- the one we wish to load by specifying that no sharing table has to be kept for the object, preventing sharing recalculation from being required during data loading. • Workflows, Triggers, Process Builder, and Validation Rules can all be disabled.
  • 25.
    Data Load Strategy Step2: Prepare the Data Load • Identify the data that you wish to load into the new organisation (for example, data that is more than a year old, all active UK business unit accounts, and so on). • Extract, cleanse, enrich, and transform the data before inserting it into the staging table. • Remove duplicate data. • Make certain that the data is clean, particularly the foreign key relationship. • In the Sandbox, run some preliminary batch testing.
  • 26.
    Data Load Strategy Step3: Execute the Data Load • Load the parent object first, then the children. Save parent keys for later use. • User insert and update prior to upsert- Salesforce internally checks the data during the upsert procedure based on the Object's Id or External Id. As a result, upsert takes somewhat longer than insert or upsert. • Only send fields whose values have changed for updates. • When using the Bulk API, group records by parent Id to avoid lock failure in simultaneous batches. • When dealing with more than 50,000 records, use the BULK API.
  • 27.
    Data Load Strategy Step4: Configure your organization for production • Defer sharing calculations while loads are running. • After the load is complete, change the OWD for the item from Public Read/Write to Public RO or Private. Make your own sharing guidelines. First, try these instructions in the sandbox. Can request that SFDC support parallel sharing rules processing. • Configure sharing rules one at a time, allowing one to finish before going on. Alternatively, utilise postponed sharing to complete the sharing rule generation and then let the sharing rule computation run on mass. • Re-enable the trigger, workflow, and validation rules. • Make summary roll-up fields.
  • 28.
    Archiving Data How andWhy should we archive data? • Salesforce should only save the most recent data. • This will improve report, dashboard, and list view performance. • SOQL query performance should be improved. • Compliance and regulatory requirements • To keep a backup of your info.
  • 29.
    Approach 1 usingMiddleware Image credit Salesforce
  • 30.
    Approach 2 usingHeroku Image credit Salesforce
  • 31.
    Approach 3 usingBig Objects Image credit Salesforce