Data partitioning

DATA PARTITIONING
PREPARED BY VINOD – ARCHITECT – CRESTRON ELECTRONICS

Data Store
N
Data Store
1
Data Store
1
Application N
….
Application 1
Application 1
WHY PARTITION DATA?
• The design of the data stores that an application uses can have a significant bearing on the
performance, throughput, and scalability of a system
Application Data Store
Store
Retrieve
Traditional Model
Application 1 Data Store
1
Store
Retrieve
Large Scale Systems
Physically
Partitioned
Data stores
This is not the same as SQL Server Table Partitioning

BENEFITS OF PARTITIONING DATA
Improve
scalability
•scale out
almost
indefinitely
Improve
performance
•Operations on
smaller volume
of data
Improve
availability
•Replicas to
avoid single
point of failure
Improve
security
•separate
sensitive and
non-sensitive
data into
different
partitions
Provide
operational
flexibility
•management,
•monitoring,
•backup and
restore
Match the
data store to
the pattern
of use
•Deployed on a
different type
of data store

PARTITIONING STRATEGIES
Horizontal
partitioning (often
called sharding)
• All partitions have
the same schema
• Each partition is
known as a shard and
holds a specific
subset of the data
Vertical partitioning
• Partition holds a
subset of the fields
for items
Functional
partitioning
• Ex: Invoicing in one
partition and product
inventory in another
NOTE: all three strategies described here can be combined

HORIZONTAL PARTITIONING (SHARDING)
PartitionKey
PartitionKey
• Difficult to change the key after the system is in
operation
• Different shards do not have to contain similar
volumes of data

VERTICAL PARTITIONING
Vertical Partitioning
• Reduces the I/O and performance costs
• Associated with fetching the items that are accessed most
frequently
• Reduces the amount of concurrent access required to the
data

ISSUES AND CONSIDERATIONS
• Minimize cross-partition data access operations
• Consider replicating static data in all of the partitions to reduce the requirement for a separate lookup
operation in different partition
• Additional cost associated with synchronizing any changes that might occur to reference data (static
data)
• Minimize requirements for referential integrity across vertical and functional partitions
• Evaluate whether strong consistency is actually a requirement
• Common approach in the cloud is to implement eventual consistency
• When using a horizontal partitioning strategy, consider periodically rebalancing the shards

Data Partitioning – Elastic Database

HORIZONTAL PARTITIONING WITH ELASTIC DATABASE
Volume of
Data
Number of
concurrent
connections
Single SQL
DB
Limitations

HORIZONTAL PARTITIONING WITH ELASTIC DATABASE
Shard N
Data Store
1
Data Store
1Shard 1
Single Large SQL
Database
Splitted Into

SHARD
• Each shard is implemented as a SQL database
• A shard can hold more than one dataset
• Dataset is also referred as Shardlet
• Each database maintains metadata that describes the shardlets that it contains
• A shardlet can be a single data item, or it can be a group of items that share the same shardlet key
• Sharding data in a multi-tenant application, the shardlet key could be the tenant ID and all data for a
given tenant would be held as part of the same shardlet

GLOBAL SHARD-MAP MANAGER
• It is a separate SQL database
Contains a
list of
databases
(shards)
Shardlets
in each
database
Global Shard-
Map Manager

GLOBAL SHARD-MAP MANAGER
Client Application
Global Shard-
Map Manager
Shard N
Data Store
1
Data Store
1Shard 1
Global Shard-Map
Manager
Splitted Into
Get a copy of
the shard-
map (listing
shards and
shardlets)
1
Cache shard-
map data
locally
2
Connect to
appropriate
shard
3
NOTE: Replicate the global shard-map manager database to reduce latency and improve availability

SCHEMES FOR MAPPING DATA TO SHARDLETS
List Shard Map
• Association between single key
and a shardlet
• For example, in a multi-tenant
system, the data for each tenant
could be associated with a unique
key and stored in its own shardlet
Range Shard Map
• Association between a set of
contiguous key values and a
shardlet
• In the multi-tenant example - you
could group the data for a set of
tenants (each with their own key)
within the same shardlet

THINGS TO CONSIDER WHILE PARTITIONING
• Avoid operations that need to access data held in multiple shards
• Azure SQL Database does not support cross-database joins
• The data stored in shardlets that belong to the same shard map should have the same schema
• Transactional operations are only supported for data held within the same shard, and not across shards
• Place shards near to the users that access the data in those shards (geo-locate shards). This strategy will
help to reduce latency.
• Currently, only a limited set of SQL data types are supported as shardlet keys; int, bigint,
varbinary, and uniqueidentifier
• Elastic Database provides a separate Split/Merge service
NOTE: Although Azure SQL Database does not support cross-database joins, the Elastic Database API
enables you to perform cross-shard queries that can transparently iterate through the data held in all
the shardlets referenced by a shard map

Partitioning strategies for Azure Storage

AZURE STORAGE
Table Storage
• comprise a set
of properties
and values
• Structured Data
Blob Storage
• storage for large
objects and files
• Unstructured
Data
Storage Queues
• support reliable
asynchronous
messaging
between
applications

AZURE STORAGE REDUNDANCY
Locally redundant
• Maintains three copies of
data within a single
datacenter
• This form of redundancy
protects against hardware
failure but not against a
disaster that encompasses
the entire datacenter.
Zone-redundant
• Maintains three copies of
data spread across
different datacenters
within the same region (or
across two geographically
close regions)
• Can protect against
disasters that occur within
a single datacenter
Geo-redundant
• Maintains six copies of
data
• Three copies in one region
(your local region)
• Another three copies in a
remote region
• This form of redundancy
provides the highest level
of disaster protection

PARTITIONING AZURE TABLE STORAGE
• All entities are stored in a partition
• Partitions are managed internally by Azure table storage
PartitionKey
• This is a string values that determines
in which partition Azure table storage
will place the entity
RowKey
• This is another string value that
identifies the entity within the
partition
All entities within a partition are
sorted lexically, in ascending
order, by row key
The partition key/row key
combination must be unique for
each entity and cannot exceed
1KB in length

Data partitioning

More Related Content

What's hot

Similar to Data partitioning

More from Vinod Wilson

Recently uploaded

Data partitioning