2. 2
The Challenges of Querying a Distributed RDBMS
This presentation examines some common
challenges that can occur when querying a
distributed RDBMS.
- Challenges
- Solution
4. 4
The Challenges of Querying a Distributed RDBMS
A distributed relational database can give your
application unlimited scalability.
However, a number challenges can occur when
querying a distributed RDBMS.
1. Aggregation
2. Distinctive Value
3. Joins
4. Sub-Queries
5. The “Combination”
5. 5
1 - The Aggregation Challenge
• Let’s assume that a company stores the HR data
of several departments across multiple partitions.
• When requesting the average salary of all
employees, all departments must be examined.
• If the average salary is calculated separately on
each partition and later amalgamated with all
other results, the final result will be inaccurate.
6. 6
2 - The Distinctive Values Challenge
• Data entries, such as age or salary, will often repeat
throughout the database.
• Finding identical values across multiple partitions can
skew data analysis and produce false query results.
• When an application requests a list of distinct values,
the data needs to be processed in a way where
repetitions are eliminated from result set.
7. 7
3 - The Joins Challenge
• Ideally, records that exist in different partitions
should be joined after considering all of the query
criteria.
The Sharding Conflict - attempting multiple joins from
records that are situated across various partitions
poses a challenge due to the Sharding Conflict
8. 8
4 - The Sub-Queries Challenge
• Often the result of one query is needed to complete
another query. This brings dependencies and
complexity into the system.
For instance, a query examining all employees with
above average salaries requires a sub-query to
determine the average salary, considering all partitions.
In order to yield correct results, this sub-query has to
be processed independently, and before the parent
query.
9. 9
5 - The “Combination” Challenge
• Any combination of:
• Aggregation
• Distinctive Values
• Joins
• Sub-Query
For example, trying to get an average of the distinctive
values of salary.
In order to accomplish this, we first need to eliminate
repetitions and only then aggregate. It’s impossible to do
both together.
11. 11
Meeting the Challenges
• DBAs need to carefully consider how to arrange data
across multiple partitions in a distributed database.
• Distributing the data with
intelligence about the application,
schema and workloads will help
you avoid many conflicts.
• place data together what is used together
• Cross-partition queries will always exist. Considering the
nature of the queries and the application is key to
creating a functional distributed database.
12. 12
ScaleBase – Your Distributed DDBMS Experts
ScaleBase provides specialized data distribution
technology that resolve a broad range of these
challenges
1. ScaleBase Analysis Genie
• Free, SaaS data distribution policy builder
• A guided analysis of the nature of your data, data
relationships and the functional use of your data
2. ScaleBase Software
• A distributed MySQL database management system
13. 13
ScaleBase Analysis Genie, Free, SaaS
• Determines the best way to
scale out a single MySQL
instance to a distributed
relational database
• Creates the best data
distribution policy for your
specific app by analyzing
your schema and queries
• Ensures relational integrity of MySQL with the scalability of
a modern distributed database architecture
• Automated or Expert mode: provides you visibility and
control over all elements of data distribution policy
14. 14
ScaleBase Software
ScaleBase is a distributed MySQL database management system. It is
optimized for the cloud and deploys in minutes so you can scale out to an
unlimited number of users, data and transactions
Dynamically optimizes workloads and availability by logically distributing
data across public, private and geo-distributed clouds
Contact Us
sales@scalebase.com
or
Download free software
ScaleBase Software
www.scalebase.com/software/
This presentation examines some common challenges that can occur when querying a distributed RDBMS.
Challenges
Solution
A distributed relational database can give your application unlimited scalability.
However, a number challenges can occur when querying a distributed RDBMS.
Aggregation
Distinctive Value
Joins
Sub-Queries
The “Combination”
Let’s assume that a company stores the HR data of several departments across multiple partitions.
When requesting the average salary of all employees, all departments must be examined.
If the average salary is calculated separately on each partition and later amalgamated with all other results, the final result will be inaccurate.
Data entries, such as age or salary, will often repeat throughout the database.
Finding identical values across multiple partitions can skew data analysis and produce false query results.
When an application requests a list of distinct values, the data needs to be processed in a way where repetitions are eliminated from result set.
Ideally, records that exist in different partitions should be joined after considering all of the query criteria.
The Sharding Conflict - attempting multiple joins from records that are situated across various partitions poses a challenge due to the Sharding Conflict
Often the result of one query is needed to complete another query. This brings dependencies and complexity into the system.
For instance, a query examining all employees with above average salaries requires a sub-query to determine the average salary, considering all partitions. In order to yield correct results, this sub-query has to be processed independently, and before the parent query.
Any combination of:
Aggregation
Distinctive Values
Joins
Sub-Query
For example, trying to get an average of the distinctive values of salary.
In order to accomplish this, we first need to eliminate repetitions and only then aggregate. It’s impossible to do both together.
DBAs need to carefully consider how to arrange data across multiple partitions in a distributed database.
Distributing the data with
intelligence about the application,
schema and workloads will help
avoid many conflicts.
place data together what is used together
Cross-partition queries will always exist. Considering the nature of the queries and the application is key to creating a functional distributed database.
ScaleBase provides specialized data distribution technology that resolve a broad range of these challenges
ScaleBase Analysis Genie
Free, SaaS
A guided analysis of the nature of your data, data relationships and the functional use of your data
ScaleBase Software
A distributed MySQL database management system
Determines the best way to scale out a single MySQL instance to a distributed relational database
Creates the best data distribution policy for your specific app by analyzing your schema and queries
Ensures relational integrity of MySQL with the scalability of a modern distributed database architecture
Automated or Expert mode: provides you visibility and control over all elements of data distribution policy
ScaleBase is a distributed MySQL database management system. It is optimized for the cloud and deploys in minutes so you can scale out to an unlimited number of users, data and transactions
It dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds
Contact Us
sales@scalebase.com
or
Download a free software
ScaleBase Software
http://www.scalebase.com/software/
Check out ScaleBase software
ScaleBase on Amazon
ScaleBase on Rackspace
ScaleBase on IBM Cloud marketplace