The document discusses strategies for managing large data volumes in Salesforce, including:
- Using "skinny tables" to combine standard and custom fields to improve performance.
- Creating indexes on fields used in queries to optimize search.
- Partitioning data using "divisions" to separate large amounts of records.
- Maintaining large external datasets through "mashups" to reduce the data in Salesforce.
- Avoiding "ownership skew" and "parenting skew" to prevent a single owner or parent from impacting performance.
- Considering data sharing, load strategies, and archiving techniques when dealing with large volumes.
This document discusses strategies for managing large data volumes in Salesforce, including:
- Skinny tables, which combine standard and custom fields to improve performance.
- Indexing principles and best practices for queries.
- Considerations for divisions, mashups, ownership skew, and parenting skew.
- A multi-step data load strategy involving preparation, execution, and post-load configuration.
- Archiving techniques like using middleware, Heroku, or Big Objects to improve performance by limiting data in Salesforce.
Managing Large Amounts of Data with SalesforceSense Corp
Critical "design skew" problems and solutions - Engaging Big Objects, MuleSoft, Snowflake and Tableau at the right time
Salesforce’s ability to handle large workloads and participate in high-consumption, mobile-application-powering technologies continues to evolve. Pub/sub-models and the investment in adjacent properties like Snowflake, Kafka, and MuleSoft, has broadened the development scope of Salesforce. Solutions now range from internal and in-platform applications to fueling world-scale mobile applications and integrations. Unfortunately, guidance on the extended capabilities is not well understood or documented. Knowing when to move your solution to a higher-order is an important Architect skill.
In this webinar, Paul McCollum, UXMC and Technical Architect at Sense Corp, will present an overview of data and architecture considerations. You’ll learn to identify reasons and guidelines for updating your solutions to larger-scale, modern reference infrastructures, and when to introduce products like Big Objects, Kafka, MuleSoft, and Snowflake.
This document outlines the steps for building a data warehouse, including: 1) extracting transactional data from various sources, 2) transforming the data to relate tables and columns, 3) loading the transformed data into a dimensional database to improve query performance, 4) building pre-calculated summary values using SQL Server Analysis Services to speed up report generation, and 5) building a front-end reporting tool for end users to easily fetch required information.
Choosing the Right Salesforce Integration: The Questions You Should Ask - A C...Cyber Group
Salesforce has built its success as a nimble provider of flexible CRM options and has a strong track record of enhancing their offerings to keep up with the latest connectivity and integration technologies. But with so many options in the mix, it is a real challenge to discern the best solution for your business.
Join our expert consultants, Carolyn Campbell & Mayuri Bhadane as we explore the universe of integration choices in Salesforce. Considering factors like timing, price, system configuration, security and business process needs, we will lay out key questions you need to ask and offer some critical advice about the available solutions for different scenarios
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Salesforce Partners
The document discusses how to scale applications for enterprise use on the Force.com platform. It recommends estimating user and data volume growth, designing for selectivity and data reduction, using the right data architecture like indexes and skinny tables, and performance testing applications at scale before customers encounter issues. The document also previews upcoming Force.com capabilities for improved data management and archiving that can help with scalability.
This document discusses strategies for managing large data volumes in Salesforce, including:
- Skinny tables, which combine standard and custom fields to improve performance.
- Indexing principles and best practices for queries.
- Considerations for divisions, mashups, ownership skew, and parenting skew.
- A multi-step data load strategy involving preparation, execution, and post-load configuration.
- Archiving techniques like using middleware, Heroku, or Big Objects to improve performance by limiting data in Salesforce.
Managing Large Amounts of Data with SalesforceSense Corp
Critical "design skew" problems and solutions - Engaging Big Objects, MuleSoft, Snowflake and Tableau at the right time
Salesforce’s ability to handle large workloads and participate in high-consumption, mobile-application-powering technologies continues to evolve. Pub/sub-models and the investment in adjacent properties like Snowflake, Kafka, and MuleSoft, has broadened the development scope of Salesforce. Solutions now range from internal and in-platform applications to fueling world-scale mobile applications and integrations. Unfortunately, guidance on the extended capabilities is not well understood or documented. Knowing when to move your solution to a higher-order is an important Architect skill.
In this webinar, Paul McCollum, UXMC and Technical Architect at Sense Corp, will present an overview of data and architecture considerations. You’ll learn to identify reasons and guidelines for updating your solutions to larger-scale, modern reference infrastructures, and when to introduce products like Big Objects, Kafka, MuleSoft, and Snowflake.
This document outlines the steps for building a data warehouse, including: 1) extracting transactional data from various sources, 2) transforming the data to relate tables and columns, 3) loading the transformed data into a dimensional database to improve query performance, 4) building pre-calculated summary values using SQL Server Analysis Services to speed up report generation, and 5) building a front-end reporting tool for end users to easily fetch required information.
Choosing the Right Salesforce Integration: The Questions You Should Ask - A C...Cyber Group
Salesforce has built its success as a nimble provider of flexible CRM options and has a strong track record of enhancing their offerings to keep up with the latest connectivity and integration technologies. But with so many options in the mix, it is a real challenge to discern the best solution for your business.
Join our expert consultants, Carolyn Campbell & Mayuri Bhadane as we explore the universe of integration choices in Salesforce. Considering factors like timing, price, system configuration, security and business process needs, we will lay out key questions you need to ask and offer some critical advice about the available solutions for different scenarios
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Salesforce Partners
The document discusses how to scale applications for enterprise use on the Force.com platform. It recommends estimating user and data volume growth, designing for selectivity and data reduction, using the right data architecture like indexes and skinny tables, and performance testing applications at scale before customers encounter issues. The document also previews upcoming Force.com capabilities for improved data management and archiving that can help with scalability.
This document provides tips for optimizing performance in Power BI by focusing on different areas like data sources, the data model, visuals, dashboards, and using trace and log files. Some key recommendations include filtering data early, keeping the data model and queries simple, limiting visual complexity, monitoring resource usage, and leveraging log files to identify specific waits and bottlenecks. An overall approach of focusing on time-based optimization by identifying and addressing the areas contributing most to latency is advocated.
Planning Your Migration to SharePoint Online #SPBiz60Christian Buckley
Session from SPBiz.com online event on June 18th, 2015. It’s always best to begin with a plan, and this session will provide a framework for developing your own migration plan. While tools will help automate some aspects of the content move, much of the complexity of a SharePoint migration happens before a tool is installed. This session will help analysts, project managers and admin of SharePoint to reduce migration time and increase success.
A data lake stores all types of structured and unstructured data in its raw format to be analyzed later. This allows organizations to store large amounts of data cheaply without deciding upfront how it will be used. A data lake is useful for large organizations with many possible ways to analyze diverse data or those collecting data without a specific plan. In contrast, a data warehouse stores only structured data optimized for queries to support reporting and analysis.
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
As enterprises continue to push more or their data to the cloud, Salesforce has seen data volumes in its tenant orgs grow at an exponential rate. How do you manage such volumes efficiently? How do you build queries and reports that respond in a timely manner?
This document from Nitai Partners provides best practices and recommendations for optimizing Oracle Business Analytics performance. It discusses hardware, storage, deployment, ETL, and Informatica configuration considerations such as using Oracle RAC for high concurrency, separating ETL components, allocating sufficient temporary space, and tuning lookups, session parameters and connections to improve mapping performance for large data volumes.
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
Linklaters is one of the world’s leading global law firms. The firm has a wealth of high value information held within our systems however due to the nature of these systems it is not always easy to leverage this value. Our goal was to improve decision making across the firm by transforming access to and ability to query data. To do this we wanted a solution that would combine our information, was easy to extend in an iterative fashion and would leverage our existing investment in business intelligence. To achieve this we chose to create a graph based warehouse using Linked Data. Data from our SAP Business Warehouse was combined with flat file and XML feeds from our systems of record and transformed into RDF via ETL services that loaded it into a triple store. To provide simple integration with our existing environment a SPARQL to OData service was deployed creating an OData compliant endpoint. Finally a model driven, mobile friendly, user interface was created allowing users to query, review results and explore the underlying graph. This talk will describe the approach we took and the lessons learnt.
Optimizing Query is very important to improve the performance of the database. Analyse query using query execution plan, create cluster index and non-cluster index and create indexed views
1. We provide database administration and management services for Oracle, MySQL, and SQL Server databases.
2. Big Data solutions need to address storing large volumes of varied data and extracting value from it quickly through processing and visualization.
3. Hadoop is commonly used to store and process large amounts of unstructured and semi-structured data in parallel across many servers.
This document provides an overview of an SAP HANA training course. The course will cover topics such as SAP HANA modeling, development, ETL processes, and big data environments. Students will learn HANA concepts and tools like the HANA studio. They will also practice tasks like modeling, querying, security, and provisioning data. The course aims to prepare students to work with HANA systems and answer related interview questions. It consists of 14 modules taught over several hours and includes hands-on assignments.
Most IT systems face the challenge of scaling at some point in time. Organizations use various tools / frameworks and architectural options to address this challenge. Before trying to understand all options / details, it might make sense to grasp the big picture on ways of achieving scalability through a classic model. The model distills all options in to 3 dimensions, and hence is known as the Scalability Cube.
The document discusses the design of an analytics database to aggregate data from multiple sources, move the aggregated data to frontend databases, and serve queries efficiently through portals. It addresses partitioning the backend warehouse for writes, replicating data to secondaries, moving data to partitioned frontend databases while serving queries, and optimizing queries and indexes at the frontend. Table partitioning is recommended for the frontend to allow efficient data insertion, removal and query serving while mitigating the impact of conflicting I/O operations.
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...semanticsconference
- The document discusses building a linked data warehouse at a law firm to integrate siloed information from diverse sources and allow for flexible information discovery.
- They extracted data from various sources like Excel reports, XML files, and databases into RDF triples and loaded them into a triple store with an ETL platform.
- This created a logical data warehouse that connects matters, people, clients, and other entities as "things" with relationships, allowing exploration from different domain views.
- An OData integration allows querying the triple store from tools like Excel and Tableau, while a web interface demonstrates lens-based exploration of entities and composing advanced semantic searches.
The Shifting Landscape of Data IntegrationDATAVERSITY
This document discusses the shifting landscape of data integration. It begins with an introduction by William McKnight, who is described as the "#1 Global Influencer in Data Warehousing". The document then discusses how challenges in data integration are shifting from dealing with volume, velocity and variety to dealing with dynamic, distributed and diverse data in the cloud. It also discusses IDC's view that this shift is occurring from the traditional 3Vs to the 3Ds. The rest of the document discusses Matillion, a vendor that provides a modern solution for cloud data integration challenges.
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tom Rieger
Platform 3 Solutions presented these slides on January 17, 2019 with Opentext to give everyone an opportunity to understand the value in removing systems from their operations
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tracy Blackburn
This document discusses how organizations can use InfoArchive and Archon to retire outdated legacy applications and systems. It begins with an overview of challenges like rising costs and compliance issues associated with maintaining old systems. A demo is shown where Archon can connect to a 25-year old legacy application, extract data, map it to InfoArchive, and generate queries in under 7 minutes. Examples are given of clients who were able to reduce wasted budgets and drive data center consolidation by using these tools. The document encourages contacting Platform 3 Solutions or an OpenText seller to learn more about a proof of concept for analyzing one's own legacy systems.
This document discusses techniques for optimizing Power BI performance. It recommends tracing queries using DAX Studio to identify slow queries and refresh times. Tracing tools like SQL Profiler and log files can provide insights into issues occurring in the data sources, Power BI layer, and across the network. Focusing on optimization by addressing wait times through a scientific process can help resolve long-term performance problems.
This document provides tips for optimizing performance in Power BI by focusing on different areas like data sources, the data model, visuals, dashboards, and using trace and log files. Some key recommendations include filtering data early, keeping the data model and queries simple, limiting visual complexity, monitoring resource usage, and leveraging log files to identify specific waits and bottlenecks. An overall approach of focusing on time-based optimization by identifying and addressing the areas contributing most to latency is advocated.
Planning Your Migration to SharePoint Online #SPBiz60Christian Buckley
Session from SPBiz.com online event on June 18th, 2015. It’s always best to begin with a plan, and this session will provide a framework for developing your own migration plan. While tools will help automate some aspects of the content move, much of the complexity of a SharePoint migration happens before a tool is installed. This session will help analysts, project managers and admin of SharePoint to reduce migration time and increase success.
A data lake stores all types of structured and unstructured data in its raw format to be analyzed later. This allows organizations to store large amounts of data cheaply without deciding upfront how it will be used. A data lake is useful for large organizations with many possible ways to analyze diverse data or those collecting data without a specific plan. In contrast, a data warehouse stores only structured data optimized for queries to support reporting and analysis.
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
As enterprises continue to push more or their data to the cloud, Salesforce has seen data volumes in its tenant orgs grow at an exponential rate. How do you manage such volumes efficiently? How do you build queries and reports that respond in a timely manner?
This document from Nitai Partners provides best practices and recommendations for optimizing Oracle Business Analytics performance. It discusses hardware, storage, deployment, ETL, and Informatica configuration considerations such as using Oracle RAC for high concurrency, separating ETL components, allocating sufficient temporary space, and tuning lookups, session parameters and connections to improve mapping performance for large data volumes.
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
Linklaters is one of the world’s leading global law firms. The firm has a wealth of high value information held within our systems however due to the nature of these systems it is not always easy to leverage this value. Our goal was to improve decision making across the firm by transforming access to and ability to query data. To do this we wanted a solution that would combine our information, was easy to extend in an iterative fashion and would leverage our existing investment in business intelligence. To achieve this we chose to create a graph based warehouse using Linked Data. Data from our SAP Business Warehouse was combined with flat file and XML feeds from our systems of record and transformed into RDF via ETL services that loaded it into a triple store. To provide simple integration with our existing environment a SPARQL to OData service was deployed creating an OData compliant endpoint. Finally a model driven, mobile friendly, user interface was created allowing users to query, review results and explore the underlying graph. This talk will describe the approach we took and the lessons learnt.
Optimizing Query is very important to improve the performance of the database. Analyse query using query execution plan, create cluster index and non-cluster index and create indexed views
1. We provide database administration and management services for Oracle, MySQL, and SQL Server databases.
2. Big Data solutions need to address storing large volumes of varied data and extracting value from it quickly through processing and visualization.
3. Hadoop is commonly used to store and process large amounts of unstructured and semi-structured data in parallel across many servers.
This document provides an overview of an SAP HANA training course. The course will cover topics such as SAP HANA modeling, development, ETL processes, and big data environments. Students will learn HANA concepts and tools like the HANA studio. They will also practice tasks like modeling, querying, security, and provisioning data. The course aims to prepare students to work with HANA systems and answer related interview questions. It consists of 14 modules taught over several hours and includes hands-on assignments.
Most IT systems face the challenge of scaling at some point in time. Organizations use various tools / frameworks and architectural options to address this challenge. Before trying to understand all options / details, it might make sense to grasp the big picture on ways of achieving scalability through a classic model. The model distills all options in to 3 dimensions, and hence is known as the Scalability Cube.
The document discusses the design of an analytics database to aggregate data from multiple sources, move the aggregated data to frontend databases, and serve queries efficiently through portals. It addresses partitioning the backend warehouse for writes, replicating data to secondaries, moving data to partitioned frontend databases while serving queries, and optimizing queries and indexes at the frontend. Table partitioning is recommended for the frontend to allow efficient data insertion, removal and query serving while mitigating the impact of conflicting I/O operations.
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...semanticsconference
- The document discusses building a linked data warehouse at a law firm to integrate siloed information from diverse sources and allow for flexible information discovery.
- They extracted data from various sources like Excel reports, XML files, and databases into RDF triples and loaded them into a triple store with an ETL platform.
- This created a logical data warehouse that connects matters, people, clients, and other entities as "things" with relationships, allowing exploration from different domain views.
- An OData integration allows querying the triple store from tools like Excel and Tableau, while a web interface demonstrates lens-based exploration of entities and composing advanced semantic searches.
The Shifting Landscape of Data IntegrationDATAVERSITY
This document discusses the shifting landscape of data integration. It begins with an introduction by William McKnight, who is described as the "#1 Global Influencer in Data Warehousing". The document then discusses how challenges in data integration are shifting from dealing with volume, velocity and variety to dealing with dynamic, distributed and diverse data in the cloud. It also discusses IDC's view that this shift is occurring from the traditional 3Vs to the 3Ds. The rest of the document discusses Matillion, a vendor that provides a modern solution for cloud data integration challenges.
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tom Rieger
Platform 3 Solutions presented these slides on January 17, 2019 with Opentext to give everyone an opportunity to understand the value in removing systems from their operations
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tracy Blackburn
This document discusses how organizations can use InfoArchive and Archon to retire outdated legacy applications and systems. It begins with an overview of challenges like rising costs and compliance issues associated with maintaining old systems. A demo is shown where Archon can connect to a 25-year old legacy application, extract data, map it to InfoArchive, and generate queries in under 7 minutes. Examples are given of clients who were able to reduce wasted budgets and drive data center consolidation by using these tools. The document encourages contacting Platform 3 Solutions or an OpenText seller to learn more about a proof of concept for analyzing one's own legacy systems.
This document discusses techniques for optimizing Power BI performance. It recommends tracing queries using DAX Studio to identify slow queries and refresh times. Tracing tools like SQL Profiler and log files can provide insights into issues occurring in the data sources, Power BI layer, and across the network. Focusing on optimization by addressing wait times through a scientific process can help resolve long-term performance problems.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
4. Large Data Volumes (LDV) in Salesforce
● We will discuss large data volumes (LDV) and data management in this session. We will also
discuss typical data and sharing considerations, data load strategy in Salesforce, and how to
build strategies for dealing with LDV scenarios in Salesforce.
6. Underlying Concepts
Salesforce Platform Structure
1. Metadata Table
2. Data Table
3. Virtualization layer
4. Platform uses the SOQL entered by the user, which is immediately transformed to native SQL,
to carry out the required table joins and fetch the data from the back end.
7. Underlying Concepts
How does Search Work?
1. Record takes almost 20 minutes to be indexed after it is created in the system
2. Salesforce searches for the Index to find records if possible.
8. Skinny Tables
What is a skinny table?
● A skinny table is a custom table in the Force.com platform that contains a subset of fields
from a standard or custom base Salesforce object.
Key Points
1. Salesforce stores standard and custom field data in separate DB Tables
2. Skinny table combines standard and custom fields together
3. Do not contain soft deleted records
4. Skinny table can contain a maximum of 100 columns.
5. Skinny table can not contain fields from other objects
6. Skinny tables are copied to your full sandbox orgs.
7. Immediately updated when master tables are updated
8. Usable on standard and custom objects
9. Salesforce support creates them
9. Indexing Principles
What is an index?
• Sorted column or
column
combination that
uniquely identifies
rows of data.
• The index contains
sorted columns as
well as references
to data rows.
Example
• Created index on ID
field
• [SELECT * FROM
Table WHERE ID <
14]
• Query uses the
Sorted ID(Index)
column to quickly
identify data rows
• Query does not
need to do full table
scan to fetch rows
10. Standard vs Custom Index
Standard Index
SF creates
standard indexes
on the following
fields:
1. RecordTypeId
2. Division
3. CreatedDate
4. LastModifiedDate
5. Name
6. Email
7. Salesforce Record
Id
8. External Id and
Unique fields
Custom Index
Creating custom
indexes for fields
used in reports or
listviews is a good
idea.
1. Cannot be created
for:
1. multi-picklist
2. currency field
3. long text field
4. binary fields
11. Divisions
Divisions are a means of partitioning the data of large deployments to reduce the number of
records returned by queries and reports. For example, a deployment with many customer
records might create divisions called US, EMEA, and APAC to separate the customers into
smaller groups that are likely to have few interrelationships
Salesforce provides special support for partitioning data by divisions, which you can enable
by contacting Salesforce Customer Support
12. Mashups
One approach to reducing the amount of data in Salesforce is to maintain large data sets in a
different application, and then make that application available to Salesforce as needed.
Salesforce refers to such an arrangement as a mashup because it provides a quick, loosely
coupled integration of the two applications. Mashups use Salesforce presentation to display
Salesforce-hosted data and externally hosted data. Salesforce supports the following mashup
designs
13. Mashups
External Website The Salesforce UI displays an external website, and passes information and
requests to it. With this design, you can make the website look like part of the Salesforce UI.
Callouts Apex code allows Salesforce to use Web services to exchange information with external
systems in real time. Because of their real-time restrictions, mashups are limited to short
interactions and small amounts of data.
14. Mashups
Advantages of Using Mashups
• Data is never stale.
• No proprietary method needs to be developed to integrate the two systems.
Disadvantages of Using Mashups
• Accessing data takes more time.
• Functionality is reduced. For example, reporting and workflow do not work on the external data.
15. Best Practices
How should we improve performance under Large Volumes?
1. Try to use indexed fields in the WHERE clause of SOQL queries
2. Nulls in queries should be avoided as index cannot be used
3. Only use fields present in skinny table
4. Use query filters which can highlight less than 10 percent of the data
5. Avoid using wildcards in queries, such as %.
6. Select only necessary fields in the select statement
16. Ownership Skew
● When more than 10,000 records for a single object is owned by a single owner.
1. Share Table Calculations
2. The sharing rules are updated when a user moves up or down in the hierarchy for both that
user and any users above them in the role hierarchy.
17. Ownership Skew - How to avoid this?
1. Data Migration: Collaborate with the customer to distribute the records to a large number of
actual end users.
2. Avoid making the integration user the owner of the record.
3. Make use of the Lead and Case assignment rules.
4. Assign records to a user is a role at the top of the Role Hierarchy
18. Parenting Skew
● When there are 10,000 or more records for one object under the same parent record.
1. The bulk API batch size is 10,000 for data migration. Records linked to the same parent in
simultaneous batches will require the parent to be updated, perhaps resulting in record
locking.
2. Access to a parent record is driven by access to children in the case of implicit sharing. If you
lose access to a child record, Salesforce must examine every other child record to verify
whether or not you still have access to the parent.
19. Parenting Skew - How to avoid this?
1. Avoid having > 10,000 records of a single object linked to the same parent record.
2. Distribute contacts which are not associated with any account to many accounts when they
need to be connected to accounts.
20. Sharing Considerations
● Org Wide Defaults
• When possible, set non-confidential data's OWDs to
Public R/W and Public R/O.
• reduces the need for a share table.
• To prevent adding more share tables, choose
'Controlled by Parent'.
21. Sharing Calculation
● Parallel Sharing Rule Re-calculation
1. Sharing rules are processed synchronously when Role Hierarchy updates are made
2. Sharing rules can be processed asynchronously and can run into multiple execution
threads, so a single sharing rule calculation can run on parallel threads
3. Request SFDC to enable parallel sharing rule re-calculation for long running calculations
22. Sharing Calculation
● Deferred Sharing Rule Calculation
1. Whenever a user is updated in the role hierarchy updates are done at the backend
2. Share Table calculations across objects are deferred
3. Re-enable Sharing Calculations once updates are completed
4. Contact SFDC to enable this functionality
5. Verify the outcomes and timings after running the procedure through a sandbox first.
6. Work with the customer, work out a window and recalculate the deferred sharing rule.
24. Data Load Strategy
Step 1: Configure Your Organization for Data Load.
• Allow for the recalculation of parallel and distinct sharing rules.
• Make a role hierarchy and add users.
• Set the object's OWD to public read/write- the one we wish to load by specifying that no
sharing table has to be kept for the object, preventing sharing recalculation from being
required during data loading.
• Workflows, Triggers, Process Builder, and Validation Rules can all be disabled.
25. Data Load Strategy
Step 2: Prepare the Data Load
• Identify the data that you wish to load into the new organisation (for example, data that is
more than a year old, all active UK business unit accounts, and so on).
• Extract, cleanse, enrich, and transform the data before inserting it into the staging table.
• Remove duplicate data.
• Make certain that the data is clean, particularly the foreign key relationship.
• In the Sandbox, run some preliminary batch testing.
26. Data Load Strategy
Step 3: Execute the Data Load
• Load the parent object first, then the children. Save parent keys for later use.
• User insert and update prior to upsert- Salesforce internally checks the data during the
upsert procedure based on the Object's Id or External Id. As a result, upsert takes
somewhat longer than insert or upsert.
• Only send fields whose values have changed for updates.
• When using the Bulk API, group records by parent Id to avoid lock failure in simultaneous
batches.
• When dealing with more than 50,000 records, use the BULK API.
27. Data Load Strategy
Step 4: Configure your organization for production
• Defer sharing calculations while loads are running.
• After the load is complete, change the OWD for the item from Public Read/Write to Public
RO or Private. Make your own sharing guidelines. First, try these instructions in the
sandbox. Can request that SFDC support parallel sharing rules processing.
• Configure sharing rules one at a time, allowing one to finish before going on. Alternatively,
utilise postponed sharing to complete the sharing rule generation and then let the sharing
rule computation run on mass.
• Re-enable the trigger, workflow, and validation rules.
• Make summary roll-up fields.
28. Archiving Data
How and Why should we archive data?
• Salesforce should only save the most recent data.
• This will improve report, dashboard, and list view performance.
• SOQL query performance should be improved.
• Compliance and regulatory requirements
• To keep a backup of your info.