2. 2
Agenda
• MySQL visual analysis
• Design considerations
• Web scale challenges
• Characteristics of a
distributed database
• ScaleBase Analysis Genie
• Demo
• Q & A
– Please enter your questions on the GTW side panel
3. 3
Vladi Vexler
Vice President,
Technology and Product Marketing
• Over 15 years experience in software development
and product management
• Experienced in cloud, web and enterprise
• Author of patents in field of databases innovation,
dynamic data caching and machine learning analytics
4. 4
Who Are We?
Distributed Database Management System
Architected for the Cloud
Simple. Reliable. Powerful.
6. 6
What Is Your Goal?
• Scale (mostly) reads
• Scale (mostly) writes
• Performance of reads
– Affected by joins and big tables scans of big tables
• Performance of writes
– Affected by IO r/wr, CPU and table indexes
(a growing overhead)
• Locks
• CPU/IO/ RAM issues
• Load peaks
• Data growth
• Geo-distribution, special data distribution needs
7. 7
Database And Tables Metrics to Review
• Size
– Physical size on disk, Logical size (number of rows)
• Multiple/large indices
– Physical impacts (write time) and Logical impact (RAM)
• Reads vs. Writes
– Number of queries per table?
– % of total MySQL traffic
– % of table’s traffic
• Logical data relations – identify and analyze
– Joins – complexity of data distribution and data access
– Logical Data Chunks – related data in multiple tables
9. 9
Scale Out Platform Considerations
DIY <> NewSQL <> NoSQL <> ScaleBase
• Short-term cost vs long-term cost
– Do-it-yourself - open source is not truly free
– Time to market
– Pareto principle – 20% of complications will take 80% of time
– High overhead cost in maintenance and future developments
• Reliability (ACID) vs. simplicity (BASE)
• Maturity and availability/reliability
• Features and limitations
• How to define a good data distribution policy?
– How to evaluate efficiency of a policy for data distribution and access?
– How to simulate different distribution policies and compare?
12. 12
Distributed Table Types
• MASTER: Data on one shard only
– Example: general settings
• GLOBAL: Data copied to all shards
– Example: lookups
• DISTRIBUTED (root):
Data on a single shard, based on a key
– Example: Users table.
• CASCADED (distributed child table): Data on a single shard
however, distribution and access depend on the parent table
– Example: User_Photos, User_Photos_Likes – depend on Users
Note: Not all sharding platforms support Cascaded and Master table types
13. 13
Distributed Queries Types
• ONE_DB - Single-shard execution. Global or Master tables, Distributed
& Cascaded tables, joins of a Distributed and Global tables
• ALL_DB – All-shards execution, one DB-node in a shard cluster:
– SELECT and Aggregate data from many shards – Parallel execution
(“map reduce” style) on all shards, Aggregate, Order, Group-By, Limit
– DDL statements
– DML on Global tables
• FULL_DB – Session statements (USE, SET) to be sent to all database
nodes in all shard clusters
• CROSS_DB – Sharding conflict resolution, such as cross-shard joins.
Note: Not all sharding platforms support ALL_DB, FULL_DB and CROSS_DB queries.
14. 14
Importance of Logical Data Chunks
• Example: A Logical Data Chunk in a Facebook app:
– All rows in tables containing information related to George, from:
Users, Photos, Comments, Likes, Posts, Friends etc…
• Goals:
1. Optimal Data Distribution: Store maximum logical data chunks in
same shards
2. Maximize ONE_DB and ALL_DB queries
3. Handle all complex cases: related data is in multiple shards
– ALL_DB, CROSS_DB, FULL_DB queries
15. 15
Data Relationships can be Extremely Complex
Usually, scale out is applied to growing-mature apps.
How do you define an optimal data distribution policy?
17. 17
ScaleBase Analysis Genie
• A tool enabling MySQL visual analysis and building an optimal data
distribution policy; Designed for DBAs, Architects & Dev. Managers
• Two step-process:
– Analysis Assistant
– An agent captures app/DB information, including SQL traffic and
database metrics
– Obfuscates, summarizes and packages the App-DB data
– Analysis Genie
– a SaaS application, receives the AA package and presents the
visual analysis and details the policy configuration
Analysis Assistant Analysis Genie
18. 18
ScaleBase Analysis Genie
• Advanced analytics
– Your schemas, data &
queries
• Identification of best
data distribution policy
– Customized for even the
most complex apps
• Complete policy control
• Quality assurance
– Review before production
• Policy simulation
– “What-if” analysis
https://www.scalebase.com/software/
20. 20
Relationship Identification
Mapping includes:
• Schemas structures
• Tables & columns names
matching
• Queries parsing and
identification of joined
tables and columns
• Statistics on every object
size and access
27. 27
Customer: Million+ User Online Gaming Company
Who:
• Mobile gaming company expanding globally
• Hosted on SoftLayer cloud in Hong Kong
Problem:
• Over a million downloads - peak period overload
• Needed scaling in place for expansion
Alternatives considered:
• Manually sharding/open source tools
• Other commercial solutions were too costly
Solution:
• Used visual analysis to determine optimized policy
• Up and running within a few weeks of initial download and now supports hundreds of
thousands of daily users
• Fully operational using data distribution and anticipating additional scale out within
next quarter
28. 28
Scale out to unlimited users
Continuous availability
Dynamic workload optimization
Fast and simple deployment
Easily scale out a single
MySQL instance
Optimized for the Cloud
Reduces time-to-market
No changes needed to app or database
Database usage analytics
Intelligent load balancing
Centralized data management
ScaleBase
Distributed Database Management System
29. 29
Get Instant Application/Database Insight!
Use visual analysis to plan your scale out strategy
Download the Analysis Genie here:
https://www.scalebase.com/software
Next questions to discuss and consider are about the type of platform applicable for me.
BASE = Basically Available, Soft state, Eventual consistency
Basically Available: This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.
Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’
Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.
Here is a summary of different approaches. More detailed description can be found on our website, under Resources -> Competitive Comparison
Explain the circles,
We are the only one for example that provide Advanced Analytics, which is the foundation for defining optimal distribution policy.
ScaleBase solution is the most simple to deploy, enabling shortest go-to-market and lowest maintenance
One of first steps is to Visually Analyze complete summary about state of your MySQL tables:
- Physical and Logical Sizes, Writes, Reads, Joins
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Risk
Cost savings (ROI)
Time to market
Building solution takes years
Open source is limited
Not comprehensive
Lack of technical support and services
Custom built
Inefficient and hard to maintain