Database Scalability:
The Shard Conflict
July 2014
2
The Database Scalability: The Shard Conflict
This presentation tackles a particularly
challenging situation that often o...
3
Traditional Databases vs. Distributed Databases
Traditional Monolithic DB
Made up of tables of data that are
related to ...
4
So, What Is a‘Shard Conflict’?
At ScaleBase, we have coined the term ‘shard conflict’ to
describe a situation where:
• A...
5
Identifying the Conflict
Example #1
Choosing ‘id’ as the
shard key presents a
shard conflict,
because there is no
guaran...
6
Resolving the Conflict
Example #2
The Method
• Choose
‘department_id’ as
the ‘Employee
Table’shard key
The Outcome:
• Th...
7
Wait a Minute...There’s Still a Conflict
‘Select e.first_name, e.last_name, m.first_name, m.last_name
from employee e jo...
8
‘Shard Conflict’ Effects on Query Processing
• It is clear from the examples that when dealing
with a foreign key and tw...
9
ScaleBase Can Help
ScaleBase is a modern, distributed MySQL database management
system. It is optimized for the cloud an...
10
Start Using ScaleBase Today
Check out ScaleBase’s software
• ScaleBase on Amazon
• ScaleBase on Rackspace
Upcoming SlideShare
Loading in...5
×

Database Scalability - The Shard Conflict

987

Published on

This presentation tackles a particularly challenging situation that often occurs when creating a distributed relational database.

In this presentation you will learn:
- What a ‘shard conflict’ is
- How to identify ‘shard conflicts’
- How to resolve ‘shard conflicts’ in a distributed database
- How ‘shard conflicts’ affect query processing

Published in: Data & Analytics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
987
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • The Future of the DBA: Adapting to a New World of IT
  • This presentation reviews the forces, trends and analyst research that is shaping the changing role of the DBA, along with the new skills required from DBAs in the current IT market
  • At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where:

    A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result.

    Let’s take a look at the following examples…
  • Example #1

    Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
  • Example #2

    The Method
    Choose ‘department_id’ as the ‘Employee Table’shard key

    The Outcome:
    The join query was optimized as a result of all department-related data being stored in the same partition
    No cross-joins exist between partitions
    Statements can now safely be executed on all partitions
  • Join the ‘Employee Table’ together with itself to find a manager  there is no guarantee they are in the same shard.

    The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.
  • It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts

    Distributed data can become quite complex if not handled correctly

    It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed

  • ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions. 
    It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds.

    Use your relational aDBA skills and get NoSQL capabilities

    Contact Us  
    sales@scalebase.com
    or
    Download a free software
    ScaleBase Software
    http://www.scalebase.com/software/
  • Check out ScaleBase software

    ScaleBase on Amazon
    ScaleBase on Rackspace

  • Database Scalability - The Shard Conflict

    1. 1. Database Scalability: The Shard Conflict July 2014
    2. 2. 2 The Database Scalability: The Shard Conflict This presentation tackles a particularly challenging situation that often occurs when creating a distributed database. In this presentation you will learn: • What a ‘shard conflict’ is • How to identify ‘shard conflicts’ • How to resolve ‘shard conflicts’ in a distributed database • How ‘shard conflicts’ affect query processing
    3. 3. 3 Traditional Databases vs. Distributed Databases Traditional Monolithic DB Made up of tables of data that are related to one another Modern Distributed DB Data distribution is necessary for scalability All of the data is located in one place and is easily accessible Information is spread across various servers (instances) The data relationship is stored deep in the database and can be easily analyzed and queried using conventional methods Related data can be distributed into different partitions, or shards, making related query requests difficult to process
    4. 4. 4 So, What Is a‘Shard Conflict’? At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where: • A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result. Let’s take a look at the following examples…
    5. 5. 5 Identifying the Conflict Example #1 Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
    6. 6. 6 Resolving the Conflict Example #2 The Method • Choose ‘department_id’ as the ‘Employee Table’shard key The Outcome: • The join query was optimized as a result of all department- related data being stored in the same partition • No cross-joins exist between partitions • Statements can now safely be executed on all partitions
    7. 7. 7 Wait a Minute...There’s Still a Conflict ‘Select e.first_name, e.last_name, m.first_name, m.last_name from employee e join employee m on e.manager_id=m.id’ Join the ‘Employee Table’ together with itself to find a manager  there is no guarantee they are in the same shard. The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.
    8. 8. 8 ‘Shard Conflict’ Effects on Query Processing • It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts • Distributed data can become quite complex if not handled correctly • It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed
    9. 9. 9 ScaleBase Can Help ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions. It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds. Contact Us sales@scalebase.com or Download free software ScaleBase Software http://www.scalebase.com/software/ Use your relational aDBA skills and get NoSQL capabilities
    10. 10. 10 Start Using ScaleBase Today Check out ScaleBase’s software • ScaleBase on Amazon • ScaleBase on Rackspace
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×