DATA PARTITIONING
PREPARED BY VINOD – ARCHITECT – CRESTRON ELECTRONICS
Data Store
N
Data Store
1
Data Store
1
Application N
….
Application 1
Application 1
WHY PARTITION DATA?
• The design of the data stores that an application uses can have a significant bearing on the
performance, throughput, and scalability of a system
Application Data Store
Store
Retrieve
Traditional Model
Application 1 Data Store
1
Store
Retrieve
Large Scale Systems
Physically
Partitioned
Data stores
This is not the same as SQL Server Table Partitioning
BENEFITS OF PARTITIONING DATA
Improve
scalability
•scale out
almost
indefinitely
Improve
performance
•Operations on
smaller volume
of data
Improve
availability
•Replicas to
avoid single
point of failure
Improve
security
•separate
sensitive and
non-sensitive
data into
different
partitions
Provide
operational
flexibility
•management,
•monitoring,
•backup and
restore
Match the
data store to
the pattern
of use
•Deployed on a
different type
of data store
Designing partitions
PARTITIONING STRATEGIES
Horizontal
partitioning (often
called sharding)
• All partitions have
the same schema
• Each partition is
known as a shard and
holds a specific
subset of the data
Vertical partitioning
• Partition holds a
subset of the fields
for items
Functional
partitioning
• Ex: Invoicing in one
partition and product
inventory in another
NOTE: all three strategies described here can be combined
HORIZONTAL PARTITIONING (SHARDING)
PartitionKey
PartitionKey
• Difficult to change the key after the system is in
operation
• Different shards do not have to contain similar
volumes of data
VERTICAL PARTITIONING
Vertical Partitioning
• Reduces the I/O and performance costs
• Associated with fetching the items that are accessed most
frequently
• Reduces the amount of concurrent access required to the
data
FUNCTIONAL PARTITIONING
ISSUES AND CONSIDERATIONS
• Minimize cross-partition data access operations
• Consider replicating static data in all of the partitions to reduce the requirement for a separate lookup
operation in different partition
• Additional cost associated with synchronizing any changes that might occur to reference data (static
data)
• Minimize requirements for referential integrity across vertical and functional partitions
• Evaluate whether strong consistency is actually a requirement
• Common approach in the cloud is to implement eventual consistency
• When using a horizontal partitioning strategy, consider periodically rebalancing the shards
Data Partitioning – Elastic Database
HORIZONTAL PARTITIONING WITH ELASTIC DATABASE
Volume of
Data
Number of
concurrent
connections
Single SQL
DB
Limitations
HORIZONTAL PARTITIONING WITH ELASTIC DATABASE
Shard N
Data Store
1
Data Store
1Shard 1
Single Large SQL
Database
Splitted Into
SHARD
• Each shard is implemented as a SQL database
• A shard can hold more than one dataset
• Dataset is also referred as Shardlet
• Each database maintains metadata that describes the shardlets that it contains
• A shardlet can be a single data item, or it can be a group of items that share the same shardlet key
• Sharding data in a multi-tenant application, the shardlet key could be the tenant ID and all data for a
given tenant would be held as part of the same shardlet
GLOBAL SHARD-MAP MANAGER
• It is a separate SQL database
Contains a
list of
databases
(shards)
Shardlets
in each
database
Global Shard-
Map Manager
GLOBAL SHARD-MAP MANAGER
Client Application
Global Shard-
Map Manager
Shard N
Data Store
1
Data Store
1Shard 1
Global Shard-Map
Manager
Splitted Into
Get a copy of
the shard-
map (listing
shards and
shardlets)
1
Cache shard-
map data
locally
2
Connect to
appropriate
shard
3
NOTE: Replicate the global shard-map manager database to reduce latency and improve availability
SCHEMES FOR MAPPING DATA TO SHARDLETS
List Shard Map
• Association between single key
and a shardlet
• For example, in a multi-tenant
system, the data for each tenant
could be associated with a unique
key and stored in its own shardlet
Range Shard Map
• Association between a set of
contiguous key values and a
shardlet
• In the multi-tenant example - you
could group the data for a set of
tenants (each with their own key)
within the same shardlet
LIST SHARD MAP
RANGE SHARD MAP
HYBRID SHARDING
THINGS TO CONSIDER WHILE PARTITIONING
• Avoid operations that need to access data held in multiple shards
• Azure SQL Database does not support cross-database joins
• The data stored in shardlets that belong to the same shard map should have the same schema
• Transactional operations are only supported for data held within the same shard, and not across shards
• Place shards near to the users that access the data in those shards (geo-locate shards). This strategy will
help to reduce latency.
• Currently, only a limited set of SQL data types are supported as shardlet keys; int, bigint,
varbinary, and uniqueidentifier
• Elastic Database provides a separate Split/Merge service
NOTE: Although Azure SQL Database does not support cross-database joins, the Elastic Database API
enables you to perform cross-shard queries that can transparently iterate through the data held in all
the shardlets referenced by a shard map
Partitioning strategies for Azure Storage
AZURE STORAGE
Table Storage
• comprise a set
of properties
and values
• Structured Data
Blob Storage
• storage for large
objects and files
• Unstructured
Data
Storage Queues
• support reliable
asynchronous
messaging
between
applications
AZURE STORAGE REDUNDANCY
Locally redundant
• Maintains three copies of
data within a single
datacenter
• This form of redundancy
protects against hardware
failure but not against a
disaster that encompasses
the entire datacenter.
Zone-redundant
• Maintains three copies of
data spread across
different datacenters
within the same region (or
across two geographically
close regions)
• Can protect against
disasters that occur within
a single datacenter
Geo-redundant
• Maintains six copies of
data
• Three copies in one region
(your local region)
• Another three copies in a
remote region
• This form of redundancy
provides the highest level
of disaster protection
PARTITIONING AZURE TABLE STORAGE
• All entities are stored in a partition
• Partitions are managed internally by Azure table storage
PartitionKey
• This is a string values that determines
in which partition Azure table storage
will place the entity
RowKey
• This is another string value that
identifies the entity within the
partition
All entities within a partition are
sorted lexically, in ascending
order, by row key
The partition key/row key
combination must be unique for
each entity and cannot exceed
1KB in length
TABLE STORAGE
Thank You

Data partitioning

  • 1.
    DATA PARTITIONING PREPARED BYVINOD – ARCHITECT – CRESTRON ELECTRONICS
  • 2.
    Data Store N Data Store 1 DataStore 1 Application N …. Application 1 Application 1 WHY PARTITION DATA? • The design of the data stores that an application uses can have a significant bearing on the performance, throughput, and scalability of a system Application Data Store Store Retrieve Traditional Model Application 1 Data Store 1 Store Retrieve Large Scale Systems Physically Partitioned Data stores This is not the same as SQL Server Table Partitioning
  • 3.
    BENEFITS OF PARTITIONINGDATA Improve scalability •scale out almost indefinitely Improve performance •Operations on smaller volume of data Improve availability •Replicas to avoid single point of failure Improve security •separate sensitive and non-sensitive data into different partitions Provide operational flexibility •management, •monitoring, •backup and restore Match the data store to the pattern of use •Deployed on a different type of data store
  • 4.
  • 5.
    PARTITIONING STRATEGIES Horizontal partitioning (often calledsharding) • All partitions have the same schema • Each partition is known as a shard and holds a specific subset of the data Vertical partitioning • Partition holds a subset of the fields for items Functional partitioning • Ex: Invoicing in one partition and product inventory in another NOTE: all three strategies described here can be combined
  • 6.
    HORIZONTAL PARTITIONING (SHARDING) PartitionKey PartitionKey •Difficult to change the key after the system is in operation • Different shards do not have to contain similar volumes of data
  • 7.
    VERTICAL PARTITIONING Vertical Partitioning •Reduces the I/O and performance costs • Associated with fetching the items that are accessed most frequently • Reduces the amount of concurrent access required to the data
  • 8.
  • 9.
    ISSUES AND CONSIDERATIONS •Minimize cross-partition data access operations • Consider replicating static data in all of the partitions to reduce the requirement for a separate lookup operation in different partition • Additional cost associated with synchronizing any changes that might occur to reference data (static data) • Minimize requirements for referential integrity across vertical and functional partitions • Evaluate whether strong consistency is actually a requirement • Common approach in the cloud is to implement eventual consistency • When using a horizontal partitioning strategy, consider periodically rebalancing the shards
  • 10.
    Data Partitioning –Elastic Database
  • 11.
    HORIZONTAL PARTITIONING WITHELASTIC DATABASE Volume of Data Number of concurrent connections Single SQL DB Limitations
  • 12.
    HORIZONTAL PARTITIONING WITHELASTIC DATABASE Shard N Data Store 1 Data Store 1Shard 1 Single Large SQL Database Splitted Into
  • 13.
    SHARD • Each shardis implemented as a SQL database • A shard can hold more than one dataset • Dataset is also referred as Shardlet • Each database maintains metadata that describes the shardlets that it contains • A shardlet can be a single data item, or it can be a group of items that share the same shardlet key • Sharding data in a multi-tenant application, the shardlet key could be the tenant ID and all data for a given tenant would be held as part of the same shardlet
  • 14.
    GLOBAL SHARD-MAP MANAGER •It is a separate SQL database Contains a list of databases (shards) Shardlets in each database Global Shard- Map Manager
  • 15.
    GLOBAL SHARD-MAP MANAGER ClientApplication Global Shard- Map Manager Shard N Data Store 1 Data Store 1Shard 1 Global Shard-Map Manager Splitted Into Get a copy of the shard- map (listing shards and shardlets) 1 Cache shard- map data locally 2 Connect to appropriate shard 3 NOTE: Replicate the global shard-map manager database to reduce latency and improve availability
  • 16.
    SCHEMES FOR MAPPINGDATA TO SHARDLETS List Shard Map • Association between single key and a shardlet • For example, in a multi-tenant system, the data for each tenant could be associated with a unique key and stored in its own shardlet Range Shard Map • Association between a set of contiguous key values and a shardlet • In the multi-tenant example - you could group the data for a set of tenants (each with their own key) within the same shardlet
  • 17.
  • 18.
  • 19.
  • 20.
    THINGS TO CONSIDERWHILE PARTITIONING • Avoid operations that need to access data held in multiple shards • Azure SQL Database does not support cross-database joins • The data stored in shardlets that belong to the same shard map should have the same schema • Transactional operations are only supported for data held within the same shard, and not across shards • Place shards near to the users that access the data in those shards (geo-locate shards). This strategy will help to reduce latency. • Currently, only a limited set of SQL data types are supported as shardlet keys; int, bigint, varbinary, and uniqueidentifier • Elastic Database provides a separate Split/Merge service NOTE: Although Azure SQL Database does not support cross-database joins, the Elastic Database API enables you to perform cross-shard queries that can transparently iterate through the data held in all the shardlets referenced by a shard map
  • 21.
  • 22.
    AZURE STORAGE Table Storage •comprise a set of properties and values • Structured Data Blob Storage • storage for large objects and files • Unstructured Data Storage Queues • support reliable asynchronous messaging between applications
  • 23.
    AZURE STORAGE REDUNDANCY Locallyredundant • Maintains three copies of data within a single datacenter • This form of redundancy protects against hardware failure but not against a disaster that encompasses the entire datacenter. Zone-redundant • Maintains three copies of data spread across different datacenters within the same region (or across two geographically close regions) • Can protect against disasters that occur within a single datacenter Geo-redundant • Maintains six copies of data • Three copies in one region (your local region) • Another three copies in a remote region • This form of redundancy provides the highest level of disaster protection
  • 24.
    PARTITIONING AZURE TABLESTORAGE • All entities are stored in a partition • Partitions are managed internally by Azure table storage PartitionKey • This is a string values that determines in which partition Azure table storage will place the entity RowKey • This is another string value that identifies the entity within the partition All entities within a partition are sorted lexically, in ascending order, by row key The partition key/row key combination must be unique for each entity and cannot exceed 1KB in length
  • 25.
  • 26.