High Performance and Scalability
Database Design
Nguyễn Sơn Tùng
Former Head of Technology @ Tiki.vn
Former Technical Manager @ Clip.vn
facebook.com/tungns
TopDev Event
22/08/2017
Ho Chi Minh city
Agenda
I. PART 1 - Overview and Approaches
II. PART 2 – Performance and Scaling
III. PART 3 – Case Study
What will include
○ Scalability
○ Performance
○ Approaches, Best Practice
What will NOT include
○ Maintenance
○ Fail Tolerance
Scopes
https://goo.gl/nF95Nf
Extra Materials
○ Design documents
○ Samples
PART 1
# Overview and Approaches
Why Database matters?
Complexity
Quantity
growth
Business
changing
velocity
Data Intensive
Application
Data Intensive - Complexity
A DB Diagram of Magento 2.1.3
Data Intensive – Quantity & Changes
Data size growth by one of our Application year-
over-year
Code changes
If we don’t well prepare
> Service Unavailable > Traffic Drop
If we don’t well prepare
Panic!!!
“What can’t kill you makes you stronger”
Yes! It’s all serious
… when it comes to $$$
How should we deal with this?
What affect DB Performance?
Quite a lots!!!
Approaches
#1 Pick the right tools for your demand
o No One-size-fit-all
o Some pitfalls:
• Too complex: Intensive time to build, maintain or change
• Too bad: Too many mistakes, not ready to scale
Approaches
#1 Pick the right tools for your demand
o Answer some questions:
• How fast will it grow? (gradually vs exponentially)
• What traffic pattern looks like?
Early invest in high-performance
and high-scalability
Example: Flash sales
Approaches
#1 Pick the right tools for your demand
o Answer some questions:
• How fresh (real-time) each data must be?
0s 5
min
24h
cacheable
Approaches
#1 Pick the right tools for your demand
o Answer some questions:
• Is there any locking model?
Virtual Item
Unlimited purchases
Physical item
Discount 30%
1 item only
1000 purchasing attempts!!!
Transactional DB
Read locking
Approaches
#2. Precise (trust) more important than scaling (ready)
… again, $$$
Approaches
#3. Testing yourself
 Simulate large data
Approaches
#3. Testing yourself
 Benchmark
o Server: sysbench
o Database: mysqlslap
o Application: ab, siege, loader.io,…
Approaches
#4. Improving
 Stay monitored: We can’t improve anything without measurement
NewRelic Database Monitor
Sample, EXPLAIN &
Suggestions
Approaches
#4. Improving
 Stay alerted
Daily reports
Top slow queries
Approaches
#4. Improving
 Top-down approach
 Rule of thumb: 80/20
o >80% Traffic hit to cache, doesn’t need slow
I/O DB Transaction Processing, require <20%
resource cost
o Statistics, Aggregates, Analytical tasks are
<20% but cost >80% resource cost
DB Processing
Reporting
User Traffic
Conclusion: Scaling Approaches
1. There’s no One-Size-Fit-All
2. Understand your business
3. Attack Top -> Down, 80/20
4. Measure -> Improve
PART 2
# Performance and Scaling
A quick look: Typical Data Architecture
Analytical system
source: Data Intensive Applications - O'reilly
Scaling Principles
 [Important] Speeding-up with Caching/Indexed Data
o High read performance
 [Important] Separating Operational DB vs Reporting DB
o Different complexity
 Other:
o Separating Read/Write: Different I/O
o Speeding-up with Pre-calculated Data: E.g: Statistics data
o Avoiding monolithics (everything in one DB): Hard to scale
Scale with Data Caching
A Traffic Spike
Fullpage cache (Varnish)
Partial (template cache)
Data cache
Query cache
Pre-calculated data
Real-data
Most efficient for high-traffic
applications, landing pages
Data Cache and Indexed Data
 Cache:
o Key => {Value}
o Super high read performance
o Lack of filter/query abilities
o Engine: Redis, Memcache
 Indexed Data:
o Indexed = Not Source of Truth
o High read performance
o without loosing filter/query abilities
o Engine: MongoDB
Data Cache and Indexed Data (cont)
 Cache, Indexed data refreshing
Passive Active
Data Changed
5
min
Worker
clear()
build()
buildCache()
Target
page or
Data
Target
page or
Data
Scale Reporting (1)
 Simple approach
Scale Reporting (2)
 Data-warehouse approach
Example: Analytical Service, BI Tool
Analytics Service consumed by Excel Holistics BI tool
Application Usage
Common pitfall
 Slow queries (legendary)
 Indexing problems
 Lock/Deadlock
 Putting queries into Loop
 Retrieving too much data (bandwidth issue)
Slow queries
 Detection: logging
 Investigation:
o EXPLAIN, EXPLAIN ANALYZE
(PostgreSQL)
o Profiling
 Powerful tools:
o DB Monitoring (e.g: NewRelic)
o percona pt-query-digest
Indexing
 Key of performance
 Too many, too few
o When will we do indexing? (or Should we index everything?)
o Check data cardinality
 Fields to be indexed:
o Sorting, Searching, Grouping, Joining
 Types of Index: B-Tree, Hash
Lock and DeadLock
○ Locks and Deadlocks: When will it happen?
○ How to avoid?
Pre-calculation
On-demand calculation vs Pre-calculation
o Similar to Cache (timeout) vs Indexed Data (permanent)
o Example: Calculating Cohort data (Retention)
DB Normalization
 Normalization: To do or not to do
o Data duplication avoid JOIN
o Be-aware of duplication data-updating
 Foreign keys
o Increase change of table-locks
o Be-aware of Cascading deleting and updating
Triggers and Events
 Be aware:
o Hidden logics
o Hard to monitor
 Triggers increase change of table-locks
Scaling Infrastructure
Scale Vertically
 DB Partitioning
o Some contrainsts
• A primary must include all columns in the table's
partitioning location
• All parts of a PRIMARY KEY must be NOT NULL
o Partitioned by: Date/Time, ID
 Configuration tuning (my.cnf)
 More RAM, SSD, …
Partitioning
Scale Horizontally
Replication
 Master-Slave: MySQL Replication
Replication model Separating READ/Write
from Application
Scale Horizontally
 Replication setup
Downtime
Scale Horizontally
Replication
 Multi-Master: Galera Cluster, Percona ExtraDB Cluster
Working model Commit all nodes or commit nothing
source: http://galeracluster.com/
PART 3
# Case-study
Before we begin
 Conventions
 Tools
o Modelling (MySQL Workbench)
o Faking Data (fake2db)
o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)
Case Study: E-commerce System
Product Detail OrderListing/Landing pages
Main Flow
Data Architecture
DB Diagram
DB Diagram - EAV
Data Update Flow
Thank you
 QnA
 Sharing your own best practices

High Performance and Scalability Database Design

  • 1.
    High Performance andScalability Database Design Nguyễn Sơn Tùng Former Head of Technology @ Tiki.vn Former Technical Manager @ Clip.vn facebook.com/tungns TopDev Event 22/08/2017 Ho Chi Minh city
  • 2.
    Agenda I. PART 1- Overview and Approaches II. PART 2 – Performance and Scaling III. PART 3 – Case Study What will include ○ Scalability ○ Performance ○ Approaches, Best Practice What will NOT include ○ Maintenance ○ Fail Tolerance Scopes https://goo.gl/nF95Nf Extra Materials ○ Design documents ○ Samples
  • 3.
    PART 1 # Overviewand Approaches
  • 4.
  • 5.
    Data Intensive -Complexity A DB Diagram of Magento 2.1.3
  • 6.
    Data Intensive –Quantity & Changes Data size growth by one of our Application year- over-year Code changes
  • 7.
    If we don’twell prepare > Service Unavailable > Traffic Drop
  • 8.
    If we don’twell prepare Panic!!! “What can’t kill you makes you stronger”
  • 9.
    Yes! It’s allserious … when it comes to $$$
  • 10.
    How should wedeal with this? What affect DB Performance? Quite a lots!!!
  • 11.
    Approaches #1 Pick theright tools for your demand o No One-size-fit-all o Some pitfalls: • Too complex: Intensive time to build, maintain or change • Too bad: Too many mistakes, not ready to scale
  • 12.
    Approaches #1 Pick theright tools for your demand o Answer some questions: • How fast will it grow? (gradually vs exponentially) • What traffic pattern looks like? Early invest in high-performance and high-scalability Example: Flash sales
  • 13.
    Approaches #1 Pick theright tools for your demand o Answer some questions: • How fresh (real-time) each data must be? 0s 5 min 24h cacheable
  • 14.
    Approaches #1 Pick theright tools for your demand o Answer some questions: • Is there any locking model? Virtual Item Unlimited purchases Physical item Discount 30% 1 item only 1000 purchasing attempts!!! Transactional DB Read locking
  • 15.
    Approaches #2. Precise (trust)more important than scaling (ready) … again, $$$
  • 16.
  • 17.
    Approaches #3. Testing yourself Benchmark o Server: sysbench o Database: mysqlslap o Application: ab, siege, loader.io,…
  • 18.
    Approaches #4. Improving  Staymonitored: We can’t improve anything without measurement NewRelic Database Monitor Sample, EXPLAIN & Suggestions
  • 19.
    Approaches #4. Improving  Stayalerted Daily reports Top slow queries
  • 20.
    Approaches #4. Improving  Top-downapproach  Rule of thumb: 80/20 o >80% Traffic hit to cache, doesn’t need slow I/O DB Transaction Processing, require <20% resource cost o Statistics, Aggregates, Analytical tasks are <20% but cost >80% resource cost DB Processing Reporting User Traffic
  • 21.
    Conclusion: Scaling Approaches 1.There’s no One-Size-Fit-All 2. Understand your business 3. Attack Top -> Down, 80/20 4. Measure -> Improve
  • 22.
  • 23.
    A quick look:Typical Data Architecture Analytical system source: Data Intensive Applications - O'reilly
  • 24.
    Scaling Principles  [Important]Speeding-up with Caching/Indexed Data o High read performance  [Important] Separating Operational DB vs Reporting DB o Different complexity  Other: o Separating Read/Write: Different I/O o Speeding-up with Pre-calculated Data: E.g: Statistics data o Avoiding monolithics (everything in one DB): Hard to scale
  • 25.
    Scale with DataCaching A Traffic Spike Fullpage cache (Varnish) Partial (template cache) Data cache Query cache Pre-calculated data Real-data Most efficient for high-traffic applications, landing pages
  • 26.
    Data Cache andIndexed Data  Cache: o Key => {Value} o Super high read performance o Lack of filter/query abilities o Engine: Redis, Memcache  Indexed Data: o Indexed = Not Source of Truth o High read performance o without loosing filter/query abilities o Engine: MongoDB
  • 27.
    Data Cache andIndexed Data (cont)  Cache, Indexed data refreshing Passive Active Data Changed 5 min Worker clear() build() buildCache() Target page or Data Target page or Data
  • 28.
    Scale Reporting (1) Simple approach
  • 29.
    Scale Reporting (2) Data-warehouse approach
  • 30.
    Example: Analytical Service,BI Tool Analytics Service consumed by Excel Holistics BI tool
  • 31.
  • 32.
    Common pitfall  Slowqueries (legendary)  Indexing problems  Lock/Deadlock  Putting queries into Loop  Retrieving too much data (bandwidth issue)
  • 33.
    Slow queries  Detection:logging  Investigation: o EXPLAIN, EXPLAIN ANALYZE (PostgreSQL) o Profiling  Powerful tools: o DB Monitoring (e.g: NewRelic) o percona pt-query-digest
  • 34.
    Indexing  Key ofperformance  Too many, too few o When will we do indexing? (or Should we index everything?) o Check data cardinality  Fields to be indexed: o Sorting, Searching, Grouping, Joining  Types of Index: B-Tree, Hash
  • 35.
    Lock and DeadLock ○Locks and Deadlocks: When will it happen? ○ How to avoid?
  • 36.
    Pre-calculation On-demand calculation vsPre-calculation o Similar to Cache (timeout) vs Indexed Data (permanent) o Example: Calculating Cohort data (Retention)
  • 37.
    DB Normalization  Normalization:To do or not to do o Data duplication avoid JOIN o Be-aware of duplication data-updating  Foreign keys o Increase change of table-locks o Be-aware of Cascading deleting and updating
  • 38.
    Triggers and Events Be aware: o Hidden logics o Hard to monitor  Triggers increase change of table-locks
  • 39.
  • 40.
    Scale Vertically  DBPartitioning o Some contrainsts • A primary must include all columns in the table's partitioning location • All parts of a PRIMARY KEY must be NOT NULL o Partitioned by: Date/Time, ID  Configuration tuning (my.cnf)  More RAM, SSD, …
  • 41.
  • 42.
    Scale Horizontally Replication  Master-Slave:MySQL Replication Replication model Separating READ/Write from Application
  • 43.
  • 44.
    Scale Horizontally Replication  Multi-Master:Galera Cluster, Percona ExtraDB Cluster Working model Commit all nodes or commit nothing source: http://galeracluster.com/
  • 45.
  • 46.
    Before we begin Conventions  Tools o Modelling (MySQL Workbench) o Faking Data (fake2db) o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)
  • 47.
    Case Study: E-commerceSystem Product Detail OrderListing/Landing pages Main Flow
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
    Thank you  QnA Sharing your own best practices

Editor's Notes

  • #5 Telling stories DB topic is not new, but event giant companies have been struggle with Data Intensive (vs CPU Intensive)
  • #6 My feeling at first My story after, 100
  • #8 Collapsing
  • #9 Maybe some of you think I am a Manager, do management stuff, talking about theories but the fact is: as head of Technology, I’m responsible for everything, every failure I’m used to be the 1st guy jumped in & resolve with my co-workers
  • #12 Tell me your story
  • #13 Questions & Explanation
  • #16 Visualize: tranh chấp, sai số
  • #17 Estimation (generator) Load testing Environment (engineering), configuration management
  • #18 Estimation (generator) Load testing Environment (engineering), configuration management
  • #19 Measure, Alert Top Down, 80/20 Analyzing
  • #20 Measure, Alert Top Down, 80/20 Analyzing
  • #21 Measure, Alert Top Down, 80/20 Analyzing
  • #24 Drawing
  • #26 Story of Spikes, Zalo event
  • #27 Cache vs Indexed High read
  • #28 Cache vs Indexed High read
  • #37 Vs caching (Holistics)
  • #41  + Cloud
  • #43  + Cloud
  • #45  + Cloud