High Performance and Scalability Database Design

High Performance and Scalability
Database Design
Nguyễn Sơn Tùng
Former Head of Technology @ Tiki.vn
Former Technical Manager @ Clip.vn
facebook.com/tungns
TopDev Event
22/08/2017
Ho Chi Minh city

Agenda
I. PART 1 - Overview and Approaches
II. PART 2 – Performance and Scaling
III. PART 3 – Case Study
What will include
○ Scalability
○ Performance
○ Approaches, Best Practice
What will NOT include
○ Maintenance
○ Fail Tolerance
Scopes
https://goo.gl/nF95Nf
Extra Materials
○ Design documents
○ Samples

PART 1
# Overview and Approaches

Why Database matters?
Complexity
Quantity
growth
Business
changing
velocity
Data Intensive
Application

Data Intensive - Complexity
A DB Diagram of Magento 2.1.3

Data Intensive – Quantity & Changes
Data size growth by one of our Application year-
over-year
Code changes

If we don’t well prepare
> Service Unavailable > Traffic Drop

If we don’t well prepare
Panic!!!
“What can’t kill you makes you stronger”

Yes! It’s all serious
… when it comes to $$$

How should we deal with this?
What affect DB Performance?
Quite a lots!!!

Approaches
#1 Pick the right tools for your demand
o No One-size-fit-all
o Some pitfalls:
• Too complex: Intensive time to build, maintain or change
• Too bad: Too many mistakes, not ready to scale

Approaches
o Answer some questions:
• How fast will it grow? (gradually vs exponentially)
• What traffic pattern looks like?
Early invest in high-performance
and high-scalability
Example: Flash sales

Approaches
• How fresh (real-time) each data must be?
0s 5
min
24h
cacheable

Approaches
• Is there any locking model?
Virtual Item
Unlimited purchases
Physical item
Discount 30%
1 item only
1000 purchasing attempts!!!
Transactional DB
Read locking

Approaches
#2. Precise (trust) more important than scaling (ready)
… again, $$$

Approaches
#3. Testing yourself
 Simulate large data

Approaches
#3. Testing yourself
 Benchmark
o Server: sysbench
o Database: mysqlslap
o Application: ab, siege, loader.io,…

Approaches
#4. Improving
 Stay monitored: We can’t improve anything without measurement
NewRelic Database Monitor
Sample, EXPLAIN &
Suggestions

Approaches
#4. Improving
 Stay alerted
Daily reports
Top slow queries

Approaches
#4. Improving
 Top-down approach
 Rule of thumb: 80/20
o >80% Traffic hit to cache, doesn’t need slow
I/O DB Transaction Processing, require <20%
resource cost
o Statistics, Aggregates, Analytical tasks are
<20% but cost >80% resource cost
DB Processing
Reporting
User Traffic

Conclusion: Scaling Approaches
1. There’s no One-Size-Fit-All
2. Understand your business
3. Attack Top -> Down, 80/20
4. Measure -> Improve

PART 2
# Performance and Scaling

A quick look: Typical Data Architecture
Analytical system
source: Data Intensive Applications - O'reilly

Scaling Principles
 [Important] Speeding-up with Caching/Indexed Data
o High read performance
 [Important] Separating Operational DB vs Reporting DB
o Different complexity
 Other:
o Separating Read/Write: Different I/O
o Speeding-up with Pre-calculated Data: E.g: Statistics data
o Avoiding monolithics (everything in one DB): Hard to scale

Scale with Data Caching
A Traffic Spike
Fullpage cache (Varnish)
Partial (template cache)
Data cache
Query cache
Pre-calculated data
Real-data
Most efficient for high-traffic
applications, landing pages

Data Cache and Indexed Data
 Cache:
o Key => {Value}
o Super high read performance
o Lack of filter/query abilities
o Engine: Redis, Memcache
 Indexed Data:
o Indexed = Not Source of Truth
o High read performance
o without loosing filter/query abilities
o Engine: MongoDB

Data Cache and Indexed Data (cont)
 Cache, Indexed data refreshing
Passive Active
Data Changed
5
min
Worker
clear()
build()
buildCache()
Target
page or
Data
Target
page or
Data

Scale Reporting (1)
 Simple approach

Scale Reporting (2)
 Data-warehouse approach

Example: Analytical Service, BI Tool
Analytics Service consumed by Excel Holistics BI tool

Common pitfall
 Slow queries (legendary)
 Indexing problems
 Lock/Deadlock
 Putting queries into Loop
 Retrieving too much data (bandwidth issue)

Slow queries
 Detection: logging
 Investigation:
o EXPLAIN, EXPLAIN ANALYZE
(PostgreSQL)
o Profiling
 Powerful tools:
o DB Monitoring (e.g: NewRelic)
o percona pt-query-digest

Indexing
 Key of performance
 Too many, too few
o When will we do indexing? (or Should we index everything?)
o Check data cardinality
 Fields to be indexed:
o Sorting, Searching, Grouping, Joining
 Types of Index: B-Tree, Hash

Lock and DeadLock
○ Locks and Deadlocks: When will it happen?
○ How to avoid?

Pre-calculation
On-demand calculation vs Pre-calculation
o Similar to Cache (timeout) vs Indexed Data (permanent)
o Example: Calculating Cohort data (Retention)

DB Normalization
 Normalization: To do or not to do
o Data duplication avoid JOIN
o Be-aware of duplication data-updating
 Foreign keys
o Increase change of table-locks
o Be-aware of Cascading deleting and updating

Triggers and Events
 Be aware:
o Hidden logics
o Hard to monitor
 Triggers increase change of table-locks

Scale Vertically
 DB Partitioning
o Some contrainsts
• A primary must include all columns in the table's
partitioning location
• All parts of a PRIMARY KEY must be NOT NULL
o Partitioned by: Date/Time, ID
 Configuration tuning (my.cnf)
 More RAM, SSD, …

Scale Horizontally
Replication
 Master-Slave: MySQL Replication
Replication model Separating READ/Write
from Application

Scale Horizontally
 Replication setup
Downtime

Scale Horizontally
Replication
 Multi-Master: Galera Cluster, Percona ExtraDB Cluster
Working model Commit all nodes or commit nothing
source: http://galeracluster.com/

Before we begin
 Conventions
 Tools
o Modelling (MySQL Workbench)
o Faking Data (fake2db)
o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)

Case Study: E-commerce System
Product Detail OrderListing/Landing pages
Main Flow

Thank you
 QnA
 Sharing your own best practices

High Performance and Scalability Database Design

More Related Content

What's hot

Similar to High Performance and Scalability Database Design

Recently uploaded

High Performance and Scalability Database Design

Editor's Notes