SlideShare a Scribd company logo
1 of 197
Download to read offline
Jenni Snyder
Yelp
@jcsuperstar
The Accidental DBA
Yelp’s Mission
Connecting people with great
local businesses.
9:30am: Welcome & Let's talk about the basics
11:00am: Break
11:15am: Running MySQL in the real world
12:15pm:Final wrap up
12:30pm: Lunch
Administrative
3
Part 1:
0. Brief Introduction
1. MySQL History, Philosophy, Community
2. Installation, Basic Configuration, Launching
3. Using it, Backing it up
4. Replication basics, Monitoring
5. Database Defense & fast feedback to developers
Agenda in More Detail
4
Part 2:
● Running in Production #realtalk
● Being Prepared
● Advanced Monitoring and Investigation
● Fantastic Errors and Where to Find Them
● When Things Are "Slow"
● Advanced Tuning
● Replication and Scaling Out
● Discussion
Agenda in More Detail
5
● My name is Jenni
● Engineering Manager @ Yelp
● First DBA there in 2011
Who am I Why am I Here?
6
Got Into Databases by Accident
7
8
9
Myth 1: databases are scary!
10
Myth 2: DBAs are scary!
<image of frustration?>
11
Why Databases?
12
You may already have a database.
It Was on Fire When I Got Here
13
"If someone is asking you who owns the
database, it's probably you"
- @mipsytipsy
And it's Not Clear Who Fixes it
14
● Yelp didn't have a DBA for >5 years
● We've acquired companies with great big DBs
● There are probably some easy things to fix
Don't Feel Bad!
Database Flavors:
● SQL
● NoSQL
● Geospatial
● Graph
Just in Case You Don't Have a DB Yet...
17
● MySQL
● PostgreSQL
● SQLite
SQL/Relational Databases
18
● Cassandra
● MongoDB
● Couchbase
● Redis
NoSQL
19
● PostGIS
● Neo4j
● Titan
Geospatial & Graph
20
When it's a:
● log
● queue
● cache
● photo
● disk directory
● spreadsheet
● data warehouse/cube/shed/lake/garage/whatever
Use the right tool for the job.
When Not to Use a Database
21
22
● MySQL - Wikipedia
● First released in 1995
● Pluggable storage engines
● Written in the age of spinning rust
● Transactional engine InnoDB released in 2001
● Oracle buys InnoDB in 2005
● Bought by Sun in 2008
● Oracle bought Sun in 2010
Brief History of MySQL
23
● It is meant to work best the most of the time
● It is hard impossible to optimize for all cases
● That sometimes surprises you
"Performs extremely well in the average case" - Wikipedia
MySQL - Why Does it Do What It Does
24
● The manual is awesome
● DBA StackExchange
● So many forums - "just <search engine> for it"
● Conferences
● Lots of great open source
The MySQL Community
25
● MySQL Manual - Server System Variables - 500+
● 400+ status metrics
Don't freak out. You don't have to care about these yet :)
Highly Configurable
26
27
● Source, package, #whatever
● Drop a config in /etc/my.cnf:
○ basedir, datadir
○ socket
○ pid-file
○ server-id
Installation
28
● Error log
● Slow log
● Don't log warnings for now
MySQL Files - Logs
29
● You may have lots of these:
○ Binary logs
○ Relay logs
MySQL Files - Replication Logs
30
● innodb_buffer_pool_size ~ 75% memory
● Worry about the rest of it later
MySQL Config - memory
31
● service mysql start
or
● /etc/init.d/mysql start
● Runs on port 3306
● Throw a party
● Or just connect to it
○ Use the mysql command
Let's Fire This Puppy Up
33
jsnyder@db:~$ ps auxw| fgrep mysql
root 16661 0.0 0.0 4460 1692 ? Ss Oct09 0:00
/bin/sh /usr/bin/mysqld_safe
mysql 17877 1.2 88.8 30221480 27893084 ? Sl Oct09 318:00
/usr/sbin/mysqld --basedir=/usr
--datadir=/nail/databases/mysql/data
--plugin-dir=/usr/lib/mysql/plugin --user=mysql
--log-error=/nail/databases/mysql/log/err.log
--pid-file=/nail/databases/mysql/mysqld.pid
--socket=/nail/databases/mysql/mysqld.sock --port=3306
34
jsnyder@db:~$ mysql -u root
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 1124438
Server version: 5.6.32-78.1-log Percona Server (GPL), Yelp Release 1
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights
reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input
statement.
mysql>
35
mysql> show databases ;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.02 sec)
36
● Find the start script
● Find that err.log
● Run mysqld_safe manually
When it Won't Start
37
●
38
● MySQL uses GRANTS
● Users are a combination of:
○ User (name)
○ Password
○ Host - % is wildcard
● localhost uses unix domain socket
● 127.0.0.1 uses TCP/IP
MySQL Grants
39
● GRANTS can be for different levels:
○ Global *.*
○ Database yelp_db.*
○ Table yelp_db.user
MySQL Grant Levels
40
mysql> CREATE USER 'yolo'@'%' IDENTIFIED BY '' ;
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL ON *.* TO 'yolo'@'%' WITH GRANT OPTION ;
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON first_db.* TO
'app'@'10.10.10.%' IDENTIFIED by 'sweetpassword' ;
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT CREATE, DROP, INDEX, TRIGGER, SELECT, INSERT,
UPDATE, DELETE ON *.* TO 'schema_manager'@'10.%' IDENTIFIED BY
'yougettheidea' ;
Query OK, 0 rows affected (0.01 sec)
41
mysql> select user, host from mysql.user ;
+----------------+------------+
| user | host |
+----------------+------------+
| yolo | % |
| schema_manager | 10.% |
| app | 10.10.10.% |
| mysql.sys | localhost |
| root | localhost |
+----------------+------------+
5 rows in set (0.00 sec)
42
mysql> show grants for 'root'@'%';
+-------------------------------------------------------------+
| Grants for root@% |
+-------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'%' WITH GRANT OPTION |
+-------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> show grants for current_user ;
+-------------------------------------------------------------+
| Grants for root@% |
+-------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'%' WITH GRANT OPTION |
+-------------------------------------------------------------+
2 rows in set (0.00 sec)
43
● Make sure MySQL is running
● On that host & port
● Match user & host in error with grants
● Read err.log
● Use skip-grant-tables to start over
Debugging Connection Issues
44
● Add skip-grant-tables to your my.cnf
● Restart MySQL
ALTER USER 'root'@'localhost' IDENTIFIED BY 'MyNewPass';
What if You Don't Know the Password?
45
● Start with broad grants ('user'@'%')
● Refine from there
Debugging MySQL Connection Errors
46
mysql -u myuser -p mypass -h myhost -P myport
jsnyder@db:~$ mysql -u myuser -h myhost
ERROR 1045 (28000): Access denied for user myuser@'10.10.10.10'
(using password: NO)
jsnyder@db:~$ mysql -u myuser -p -h myhost
Enter password:
ERROR 1045 (28000): Access denied for user
'myuser'@'10.10.10.10' (using password: YES)
jsnyder@db:~$ mysql -u myuser -p -h myhost
Enter password:
ERROR 2005 (HY000): Unknown MySQL server host 'myhost' (0)
47
48
We use:
To ORM or not to ORM?
Developing Against MySQL
49
Language Python
MySQL Client Library mysqlclient-python (there is also
Connector/Python)
Object Relational Mapping (ORM) SQLAlchemy
Schema Manager Hand-rolled system
● Have developers read up on normal form
○ It's OK to denormalize for performance (later)
● Use one table for one thing
● Try not to get too fancy right away
Reference: Database Normalization Basics
About That Schema
50
51
● Don't swallow database client warnings/errors
● Track client error rates
● Make sure devs get error feedback
Build feedback to go directly to developers.
When Developing Against MySQL
52
● You can start simple
● Reads and writes scale differently
● Use discovery, load balancer or proxy
○ Smartstack
○ ProxySQL
○ MySQL Router
Use technology you already have if possible.
Application Connections
53
54
● When you care about your database
○ Back it up to a safe place
○ Test restoring them
○ Have more than one
Back to the Administration Part
55
● Binary a.k.a. hot backups
● Logical backups
● Point in time recovery
○ tl;dr: backup your binary logs
Different Kinds of Backups
56
● "They're Hot"
● MySQL Enterprise
● xtrabackup
○ Streaming
Binary Backups
57
● Spoiler alert: it's SQL
● mysqldump
● Best run on an offline/not replicating DB
● Sometimes the only way to upgrade
Logical Backups
58
● Logs of all changes
● Can be used to "roll" a DB forward in time
● More on these later...
● Copy & compress
Sample command to look at these:
mysqlbinlog --base64-output=DECODE-ROWS -v mysql-bin.000001
Backing up the Binary Logs
59
● Backups are only as good as your ability to use them
● Test your restores
Testing Backups
60
● mysqlbinlog turns binary logs into SQL
● Use the coordinates you saved
● It's documented, but I'm not aware of a great tool
Point in Time Recovery With Binary Logs
61
jsnyder@db:~$ cat
/path/to/backup/xtrabackup_binlog_info
mysql-bin.000003 589767
jsnyder@db:~$ mysqlbinlog
/path/to/binlogs/mysql-bin.000003
/path/to/binlogs/mysql-bin.000004 
--start-position=589767 
--stop-datetime="2017-10-29 09:00:00" | mysql
<args>
62
● Try it before you need it
● We call these "war games"
○ And have failover exercises
● The best plan is one you know will work!
More on replication next.
Practice!
63
64
MySQL uses the terms MASTER and SLAVE.
It is modern and inclusive to use PRIMARY and REPLICA.
This talk uses MASTER and REPLICA.
But First, a Word About Words
65
● Basics
○ One master + multiple replicas
○ Client-initiated, client configured
○ The replication user also needs grants
GRANT REPLICATION CLIENT, REPLICATION SLAVE
ON *.* TO 'replication'@'10.%' ...
Replication
66
● Master executes statement, writes it to:
○ Binary log
● Replica's IO_THREAD reads from master, writes it to:
○ Relay log
● Replica's SQL_THREAD executes statement, writes it to:
○ Binary log
Replicas can have their own replicas!
Logs & Threads
67
● Unique server-id is a must
● Binary log formats
○ STATEMENT
○ ROW
○ MIXED
● Log-slave-updates = ON
● master_info_repository = TABLE
MySQL won't execute events from it's own server-id.
Basic Configuration
68
jsnyder@db:~$ fgrep bin /etc/my.cnf
# binary logging is required for replication
log-bin = ../master_log/mysql-bin
relay-log = ../relay_log/relay-bin
binlog_format = statement
jsnyder@db:~$ fgrep server-id /etc/my.cnf
server-id = 171739394
jsnyder@db:~$ fgrep slave /etc/my.cnf
log_slow_slave_statements = 1
log-slave-updates = 1
slave_net_timeout = 30
slave-skip-errors = 1141
69
● CHANGE MASTER TO
○ MASTER_USER =
○ , MASTER_PASSWORD =
○ , MASTER_HOST =
● If using GTID:
○ , MASTER_AUTO_POSITION = 1
● If not:
○ , MASTER_LOG_FILE =
○ , MASTER_LOG_POSITION =
Configuring Replication
70
● START SLAVE
● SHOW SLAVE STATUS
Start it up
71
mysql> SHOW SLAVE STATUS G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.10.44
Master_User: replication_user
Master_Port: 3306
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 914832470
Relay_Log_File: relay-bin.000014
Relay_Log_Pos: 914832633
Relay_Master_Log_File: mysql-bin.000008
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
72
mysql> SHOW SLAVE STATUS G
… (continued)
Last_Errno: 0
Last_Error:
...
Exec_Master_Log_Pos: 914832470
...
Seconds_Behind_Master: 0
...
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
73
● Replication errors in two places
● err.log
2017-10-28 17:22:06 5061 [ERROR] Slave SQL: Error 'Duplicate entry
'14' for key 'PRIMARY'' on query. Default database: 'yelp_db'.
Query: 'INSERT INTO user (id, name) VALUES ('14', 'Barbara')',
Error_code: 1062
● SHOW SLAVE STATUS:
Last_SQL_Errno: 1062
Last_SQL_Error:'Duplicate entry '14' for key 'PRIMARY'' on query.
Default database: 'yelp_db'. Query: 'INSERT INTO user (id, name)
VALUES ('14', 'Barbara')'
What Could Possibly go Wrong?
74
● Who had the problem?
[ERROR] Slave SQL: Error
● What was it trying to do?
Query: 'INSERT INTO user (id, name) VALUES ('14', 'Barbara')'
● What happened?
'Duplicate entry '14' for key 'PRIMARY'' on query.
● Where was this?
Default database: 'yelp_db
Try Breaking Errors Up
75
● Wrong binary log coordinates
● Inconsistent data
Common Causes of Errors
76
77
● If you care about it, monitor it
● Lots of open source options
○ Built-in to your monitoring solution
○ percona-toolkit
○ Cheap scripts
○ Look on github
Humans like pictures graphs!
Monitoring - Server
78
● Disk space:
○ We file Jira ticket at 85%
○ And page at 95%
● Is MySQL running, does it seem OK?
● Are it's cron jobs completing?
● Does it's config match running state?
Host Stuff
79
● Auto-increment id limits
● Binary log growth
● Killed queries
● Replication hierarchy
● read_only
● Temporary tables on disk
● Threads connected
● Threads running
● More fancy replication checks
MySQL Server Stuff
80
● Have your application:
○ Track client errors
○ Track query times
○ Bonus: query annotations
select id, name from user where id = 1 /* host ; code_path.py */
MySQL Client Stuff
81
82
Kill and Log:
● Long-running queries
● Long-running transactions
● Too many of the same queries
Send feedback directly to developers via the client.
Automated Defense
83
● MySQL client will receive errors
○ And retry
○ Or fail gracefully
● Track client errors & statistics
● Alert devs where reasonable
Let developers see immediate feedback of a change!
Handling Client Errors
84
Before devs push to production they can test for:
● Queries that will be slow
● Transactions that do too much
Send feedback directly to developers via the client.
Automated Testing
85
Seems like a lot of work!
Let's moment to reflect on humans and feelings.
Is All This Really Worth it?
86
87
88
89
If you get paged:
● And it was because of someone else's change
● You might feel #negative_emotion
● Especially if it happens again…
Think About it - Ops Perspective
90
If someone tells you they got paged:
● You do something about it, you didn't know
● But if it happens again…
● You might feel #negative_emotion
Think About It - Dev Perspective
91
Will:
● Let developers discover issues & fix on their own
● Faster iteration while developing
● Confidence when pushing
● Keep the pager for when you're really needed
<3 <3 <3
Useful Automation/Monitoring
92
93
94
Questions, comments, quemments?
We covered:
1. MySQL History, Philosophy, Community
2. Installation, Basic Configuration, Launching
3. Using it, Backing it up
4. Replication basics, Monitoring
5. Database Defense & fast feedback to developers
Let's Talk About It
95
Here is what we'll cover when we get back
● Running in Production
● Being Prepared
● Advanced Monitoring and Investigation
● Fantastic Errors and Where to Find Them
● When Things Are "Slow"
● Advanced Tuning
● Replication and Scaling Out
● Discussion
Before we Break
96
97
Part 2a:
● Running in Production
● Being Prepared
● Advanced Monitoring and Investigation
● Fantastic Errors and Where to Find Them
And we're back
98
Part 2b:
● When Things Are "Slow"
● Advanced Tuning
● Replication and Scaling Out
● Discussion
And we're back
99
100
Some things to think about:
● Expectations
● SLOs - uptime and query time
● Caching where you can
● Degrading gracefully
Be prepared, or prepare to be surprised!
Welcome to Production!
101
Nothing is up 100%
● Get PM's and devs involved
● Put features behind toggles
● Maintenance mode(s)
● Read-only mode
Because We Measure Uptime in 9's
102
103
Embrace your fears:
● Think about what can go wrong
● Plan a response - write Runbooks!
● Practice
I don't always test in production...
Preparation For the Worst
104
Sometimes you have to do this anyway:
● Upgrades
● New hardware
● New features mean new tables & queries
Try to use the same procedures every time.
It's Not All Bad
105
● Host failures
● Network failures
● Provider failures
● Overwhelming usage
What Could go Wrong?
106
107
Normalize maintenance & disaster recovery
● Replacing a replica
○ Same as building a new one
● Replacing a master
○ Same as promoting a new one
● Cloning a database
○ Same as restoring a backup
Best Practices - Server
108
● Give developers insight into DB health during:
○ Pushing code
○ Traffic surges
Best Practices - Application
109
110
● Blameless Post-mortems
● Retrospectives
Learn From Mistakes
111
112
● Monitoring & alerting is pretty much
○ Collecting data
○ Using it
Advanced Monitoring and Investigation
113
Examples of things to monitor on a DB server:
● Is MySQL running?
● Load % CPU Idle
● Disk usage
● Critical processes
● Critical jobs
Monitoring the Server
114
● Auto-increment id limits
● Binary log growth
● Killed queries
● Replication delay, hierarchy
● read_only
● Temporary tables on disk
● Threads connected
● Threads running
● Fancy replication checks
Monitoring MySQL
115
SHOW GLOBAL STATUS has counters & gauges
● Queries per second
● Threads running
● Threads connected
● Slow queries
● Open transactions
Monitoring MySQL Usage
116
You can monitor these through our client:
● Error rate
○ Rates per type
● Log all warnings & errors
Monitoring MySQL Client
117
118https://www.flickr.com/photos/wocintechchat/25392405123/
Ways your monitors can send you info:
● Email
● Chat
● Tickets
● Pages
Chatbots are so hot right now.
Alerting
119
Send Good Alerts
120
Make sure the alert:
● Goes to the right team!
● Contains
○ A link to the runbook/guide
○ The first step to take
Make your Alerts Actionable
121
Nothing is perfect the first time around
● Tune thresholds
● Keep track of alerts and response
● Handover to the next person down the line
If it happens consistently, automate the response!
Iterate and Pay it Forward
122
123
124
● Start with the read some logs/part
● Iterate, refine, shorten
Have a Runbook/Guide
125
Cheapa$$ cron jobs capture this every minute:
● ps
● mysql -e "SHOW PROCESSLIST"
● mysql -e "SHOW ENGINE INNODB STATUS"
And this once a day:
● mysql -e "SHOW TABLE STATUS"
● mysqldump --no-data
What happened just before I was paged?
Tracking More Than Metrics
126
jsnyder@db:~$ cat ps-log.sh
#!/bin/bash
[ -d /var/log/ps ] || mkdir /var/log/ps
(
date -R
/bin/ps auxf
free
echo
) | bzip2 -c >> /var/log/ps/ps-`/bin/date +%Y%m%d-%H`.bz2
127
Collect forensic data about MySQL when problems occur.
● Can run as a daemon
● Highly configurable
● Waits for a trigger condition
● Collects a ton of information
pt-stalk
128
129
● Connection Errors
● Client Errors
● Server Errors
● Replication Errors
2013, 'Lost connection to MySQL server during query'
Fantastic Errors...
130
● Your application's log
● MySQL's error_log
● SHOW SLAVE STATUS
● SHOW ENGINE INNODB STATUS
Just <insert search engine here> for it...
And Where to Find Them
131
Traceback (most recent call last):
File "/blahblah/blah/blah.py", line 357, in run
return self.execute()
File "/blah/blah.py", line 359, in execute
return action(**match)
File "/blah/blaah.py", line 172, in blaah_html
return self.xtmpl('blaah.blaah',
env=self.blaah_env(blaah_form_obj))
File "/lib/blah-blah/orm/session.py", line 470, in __exit__
self.rollback()
File "/lib/blah-blah/bleh.py", line 238, in commit
self._delegate.commit()
OperationalError: (2013, 'Lost connection to MySQL
server during query') None None
132
● Try running the query itself
○ Does it return a huge number of rows? Sometimes?
○ Return a large amount of data?
○ Increase max_allowed_packet
● Does it go away if you increase net_read_timeout?
● Is the application server overloaded?
● It might be the network
Debugging Lost Connections
133
● Sometimes MySQL goes away…
OperationalError: (2006, 'MySQL server has gone away')
MySQL Goes Away...
134
135
A sensitive subject:
● You may store sensitive information in your DB
○ Email address
○ First & last names
○ Payment data
● You probably don't want it in your logs
● pt-fingerprint
When an Error Contains a Query
136
You'll find these in the error_log
● log_error_verbosity = 1
● start/stop messages
● replication information
● crashes
Server Errors
137
InnoDB: Warning: a long semaphore wait:
--Thread 140375092094720 has waited at row0purge.cc line 770
for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x1346820 '&dict_operation_lock'
a writer (thread id 140375041206016) has reserved it in mode
exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 770
Last time write locked in file
/dist/work/percona-server/storage/innobase/row/row0mysql.cc
line 3827
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print
diagnostic info:
InnoDB: Pending preads 5, pwrites 0
138
● MySQL will dump what info it can to the log
● mysqld_safe will attempt to restart MySQL
● Crashes are not normal
MySQL Crashes
139
16:22:00 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that
this binary or one of the libraries it was linked against blah
blahblah blah blah blah blah blah.
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=8
max_threads=41
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size +
sort_buffer_size)*max_threads = 309930 K bytes of memory
Hope that's ok; if not, decrease some variables in the
equation.
140
● OOM Killer
● Corrupted Data
● Hardware Failure
● Bug
Common Crash Causes
141
142
Lots of things can cause this:
● Individual statements can be slow
● Lots of queries
● Lots of slow-ish queries
Find out what resource is in contention.
When Things Are "Slow"
143
Examples here are from our open dataset.
Not queries we use in real-life :)
Tuning an Individual Query or Table
144
# Time: 2017-10-27T19:38:53.671918Z
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 4.947666 Lock_time: 0.000416 Rows_sent: 5
Rows_examined: 610939
SET timestamp=1509305933;
select count(*), r.stars
from review r
join business b on r.business_id = b.id
join category c on b.id = c.business_id
where c.category = 'Soul Food'
group by r.stars ;
145
mysql> select count(*), r.stars from review r join
business b on r.business_id = b.id join category c on
b.id = c.business_id where c.category = 'Soul Food'
group by r.stars ;
+----------+-------+
| count(*) | stars |
+----------+-------+
| 2052 | 1 |
| 1785 | 2 |
| 2356 | 3 |
| 5432 | 4 |
| 8779 | 5 |
+----------+-------+
5 rows in set (4.95 sec)
146
mysql> EXPLAIN select count(*), r.stars from review r join
business b on r.business_id = b.id join /*snip*/ ... G
*********************** 1. row ***********************
id: 1
select_type: SIMPLE
table: category
partitions: NULL
type: ALL
possible_keys: fk_categories_business1_idx
key: NULL
key_len: NULL
ref: NULL
rows: 587969
filtered: 10.00
Extra: Using where; Using temporary; Using
filesort 147
*********************** 2. row ***********************
id: 1
select_type: SIMPLE
table: business
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 68
ref: yelp_db.c.business_id
rows: 1
filtered: 100.00
Extra: Using index
148
*********************** 3. row ***********************
id: 1
select_type: SIMPLE
table: review
partitions: NULL
type: ref
possible_keys: fk_reviews_business1_idx
key: fk_reviews_business1_idx
key_len: 68
ref: yelp_db.c.business_id
rows: 25
filtered: 100.00
Extra: NULL
3 rows in set, 1 warning (0.00 sec)
149
mysql> pager fgrep rows
PAGER set to 'fgrep rows'
mysql> EXPLAIN select count(*), r.stars from review r join
business b on r.business_id = b.id join category c on b.id =
c.business_id where c.category = 'Soul Food' group by r.stars
G
rows: 587969
rows: 1
rows: 25
3 rows in set, 1 warning (0.00 sec)
150
mysql> nopager
PAGER set to stdout
mysql> select 587969 * 1 * 25 ;
+-----------------+
| 587969 * 1 * 25 |
+-----------------+
| 14699225 |
+-----------------+
1 row in set (0.00 sec)
# From our slow query log, the numbers can be pretty off!
# Query_time: 4.947666 Lock_time: 0.000416 Rows_sent: 5
Rows_examined: 610939
151
mysql> alter table category add key (category) ;
Query OK, 0 rows affected (1.53 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table review drop key fk_reviews_business1_idx ,
add key( business_id, stars) ;
Query OK, 0 rows affected (38.23 sec)
Records: 0 Duplicates: 0 Warnings: 0
152
mysql> alter table category add key (category) ;
Query OK, 0 rows affected (1.53 sec)
Records: 0 Duplicates: 0 Warnings: 0
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
partitions: NULL
type: ref
possible_keys: fk_categories_business1_idx,category
key: category
key_len: 768
ref: const
rows: 235
filtered: 100.00
Extra: Using temporary; Using filesort
153
mysql> select count(*), r.stars from review r join business b
on r.business_id = b.id join category c on b.id = c.business_id
where c.category = 'Soul Food' group by r.stars ;
+----------+-------+
| count(*) | stars |
+----------+-------+
| 2052 | 1 |
| 1785 | 2 |
| 2356 | 3 |
| 5432 | 4 |
| 8779 | 5 |
+----------+-------+
5 rows in set (0.03 sec)
154
● Long ALTER TABLE's can hurt performance
● pt-online-schema-change
● gh-ost
Changing the Schema IRL
155
There's a a great manual page How MySQL Uses Indexes
PERFORMANCE_SCHEMA can give you stats on index usage.
Keep in mind:
● Adding indexes slows down writes a bit
● Cardinality
● Leftmost-prefixing
And Just a Bit More on Indexes
156
157
158
Find them using SHOW PROCESSLIST
Or SELECT … FROM information_schema.processlist
You can use cheap bash commands to sort & find culprits.
Lots of Queries at Once
159
mysql -uroot -e "show full processlist" | awk '{print $5}' |
sort | uniq -c | sort -gr | head
jsnyder@db:~$ mysql -uroot -e "show full processlist" | awk
'{print $5}' | cut -c 1-70 | sort | uniq -c | sort -gr | head
521 Sleep
109 select count(*), r.stars from review r j
3 Query
2 Connect
1 Command
# kill them with pt-kill. Use --print to test first!
pt-kill --config /etc/mysql/emergency-kill.conf --kill
--match-info 'your query' --daemonize --run-time=60
160
You can:
● Some slow queries run infrequently
● Some slowish queries run all the time
● Visualization tools like Anemometer, VividCortex
Take a tcpdump & analyze with pt-query-digest:
pt-query-digest --type tcpdump tcpdump.txt
Finding the Next Thing To Optimize
161
162
● I/O
● Memory
● Tables/Transactions
Tuning MySQL/InnoDB
163
InnoDB has a ton of knobs and dials
● Innodb_io_capacity
● Optimize your filesystem
● Add more memory, larger/more buffer pools
● Tinker with innodb_flush_method
There is a whole manual page dedicated to optimizing
InnoDB disk I/O.
I/O is Always the Slowest
164
Think of the InnoDB buffer pool as InnoDB's cache
● Increase the amount of memory to reduce I/O
Rough math:
innodb_buffer_pool_instances * innodb_log_buffer_size
+ (read_buffer_size + sort_buffer_size) * max_threads
MySQL Memory Calculator looks out of date, but can give
you a pretty good idea...
How InnoDB Uses Memory
165
To view running transactions:
● SHOW ENGINE INNODB STATUS
● SELECT * FROM information_schema.INNODB_TRX G
On Transactions
166
mysql> SHOW ENGINE INNODB STATUS ;
...
------------
TRANSACTIONS
------------
Trx id counter 50806393604
Purge done for trx's n:o < 50806389839 undo n:o < 0 state:
running but idle
History list length 527
LIST OF TRANSACTIONS FOR EACH SESSION:
...
167
...
---TRANSACTION 50806392298, ACTIVE 12 sec starting index read
mysql tables in use 1, locked 1
314 lock struct(s), heap size 46632, 358 row lock(s), undo log
entries 353
MySQL thread id 23649033, OS thread handle 0x7f6bd31c7700,
query id 32707038971 10.10.10.40 application_user updating
UPDATE business SET ...
Trx read view will not see trx with id >= 50806392299, sees <
50806389830
168
Reduce lock contention by:
● Decreasing the number of locks your transactions use
● The total time of your transactions
● Making writes faster
See InnoDB Locking Explained with Stick Figures
Improving Transaction Efficiency
169
● Performance_schema
○ table_lock_waits_summary_by_table
Other Ways to Look at Transactions
170
Let's Hit Pause For a Moment
171
172
Ways to use replication:
● Take load off the master
● Send reads to the replicas
● Having a separate reporting database
● Delayed replica with pt-slave-delay
Replication and Scaling Out
173
● Master writes binary logs
● Each replica has 2 threads:
○ IO_THREAD: reads from master, writes relay logs
○ SQL_THREAD: runs SQL from relay logs
● Serial per database with parallel replication
How MySQL Replication Works Again
174
● master-master
○ One master "live", or
○ auto_increment_increment, offset
● master, intermediate master
● master(s) with many replicas
Some Replication Strategies
175
Master-Master - Active-Standby
176
read-only=0 read-only=1
Master-Master - Active-Active
177
read-only=0 read-only=0
One Master, Many Replicas
178
master, intermediate master
179
masters with their own replicas
180
Think About What You Want to Do
181
● Enable GTID - easy reparenting
● orchestrator - GUI and tooling
● pt-table-checksum - check consistency
● ProxySQL - can help you split your read-write traffic
Tools For Replication
182
183
● Don't write to sketchy databases
● Don't let your application write to replicas
● use read_only=1 to prevent writes to replicas
● SHOW MASTER STATUS, SHOW SLAVE STATUS
Remember, a host that goes away may wander back...
When in Doubt, Throw it Out
184
1. Set old master to read_only=1
2. CHANGE MASTER TO if necessary
3. Configure your application to write to new master
4. Set new master to read_only=0
Never allow writes to a replica unless you know that's what you
want!
Sample Master Promotion Plan
185
186
Common causes:
● IO_Thread can't keep up
○ Local load or slow/lossy network
● SQL_Thread can't keep up
○ Local load, long statements or transactions
○ mysqlbinlog to look at transactions
Speaking of Lag...
187
You can always phone a friend:
● MySQL has a big, friendly online community
● Oracle MySQL, Percona, MariaDB offer support
Get the most out of your interactions with experts by
providing lots of context and what you've already tried.
When You're Really Stuck
188
189
We covered:
● Running in Production
● Being Prepared
● Advanced Monitoring and Investigation
● Fantastic Errors and Where to Find Them
● When Things Are "Slow"
● Advanced Tuning
● Replication and Scaling Out
Let's Talk About It
190
www.yelp.com/careers/
We're Hiring!
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp
● MySQL Manual
● MySQL Utilities
● Percona Toolkit
● Orchestrator: MySQL Replication Topology Manager
● ProxySQL: High Performance MySQL Proxy
● mysql-sys - tools for using performance_schema
● Anemometer - Slow Query Monitor
Appendix - Links to Resources
193
● Database Reliability Engineering by Laine Campbell,
Charity Majors
● High Performance MySQL by Baron Schwartz, Peter
Zaitsev & Vadim Tkachenko
Appendix - Books
194
● Database Defense in Depth by @geodbz
● GrossQueryChecker: Don't get surprised by query
performance in production! by @jcsuperstar
● InnoDB Locking Explained with Stick Figures by @billkarwin
● Blameless PostMortems and a Just Culture by @allspaw
● Literally any Percona Blog Post by @svetsmirnova
Appendix - Blog Posts
195
● Pexels - Images licensed under CC0
● Icons from The Noun Project
● #WOCInTechChat
● The Jopwell Collection
Appendix - Image Sources
196
● Yelp Dataset
Other datasets:
● MySQL's Employees Sample Database
● Percona's list of Sample datasets
Appendix - Yelp Dataset
197

More Related Content

What's hot

Plmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalPlmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalMarco Tusa
 
MySQL Indexierung CeBIT 2014
MySQL Indexierung CeBIT 2014MySQL Indexierung CeBIT 2014
MySQL Indexierung CeBIT 2014FromDual GmbH
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Kristofferson A
 
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...Frederic Descamps
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)M Malai
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
How to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tearsHow to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tearsSveta Smirnova
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesSeveralnines
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsDave Stokes
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)NeoClova
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...Dave Stokes
 
MySQL Performance Best Practices
MySQL Performance Best PracticesMySQL Performance Best Practices
MySQL Performance Best PracticesOlivier DASINI
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016Wagner Bianchi
 

What's hot (20)

Plmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalPlmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_final
 
MySQL Indexierung CeBIT 2014
MySQL Indexierung CeBIT 2014MySQL Indexierung CeBIT 2014
MySQL Indexierung CeBIT 2014
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
 
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...
OpenWorld 2014 - Schema Management: versioning and automation with Puppet and...
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
How to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tearsHow to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tears
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slides
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
 
Introducing Galera 3.0
Introducing Galera 3.0Introducing Galera 3.0
Introducing Galera 3.0
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
 
MySQL Performance Best Practices
MySQL Performance Best PracticesMySQL Performance Best Practices
MySQL Performance Best Practices
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
 

Similar to Percona Live '18 Tutorial: The Accidental DBA

The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesDave Stokes
 
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux AdminsLinuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux AdminsDave Stokes
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
The Peoper Care and Feeding of a MySQL Server for Busy Linux AdminThe Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
The Peoper Care and Feeding of a MySQL Server for Busy Linux AdminDave Stokes
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryContinuent
 
MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015Dave Stokes
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAsFromDual GmbH
 
MySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupMySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupDave Stokes
 
A Backup Today Saves Tomorrow
A Backup Today Saves TomorrowA Backup Today Saves Tomorrow
A Backup Today Saves TomorrowAndrew Moore
 
My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2Morgan Tocker
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudJan Kischkel
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeOlivier DASINI
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New FeaturesFromDual GmbH
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseParesh Patel
 
Performance schema and sys schema
Performance schema and sys schemaPerformance schema and sys schema
Performance schema and sys schemaMark Leith
 
Mysql 57-upcoming-changes
Mysql 57-upcoming-changesMysql 57-upcoming-changes
Mysql 57-upcoming-changesMorgan Tocker
 

Similar to Percona Live '18 Tutorial: The Accidental DBA (20)

The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux AdminsLinuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
The Peoper Care and Feeding of a MySQL Server for Busy Linux AdminThe Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & Recovery
 
My sql 5.6&MySQL Cluster 7.3
My sql 5.6&MySQL Cluster 7.3My sql 5.6&MySQL Cluster 7.3
My sql 5.6&MySQL Cluster 7.3
 
MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAs
 
MySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupMySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users Group
 
A Backup Today Saves Tomorrow
A Backup Today Saves TomorrowA Backup Today Saves Tomorrow
A Backup Today Saves Tomorrow
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtime
 
Curso de MySQL 5.7
Curso de MySQL 5.7Curso de MySQL 5.7
Curso de MySQL 5.7
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New Features
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
 
Performance schema and sys schema
Performance schema and sys schemaPerformance schema and sys schema
Performance schema and sys schema
 
Mysql 57-upcoming-changes
Mysql 57-upcoming-changesMysql 57-upcoming-changes
Mysql 57-upcoming-changes
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Percona Live '18 Tutorial: The Accidental DBA

  • 2. Yelp’s Mission Connecting people with great local businesses.
  • 3. 9:30am: Welcome & Let's talk about the basics 11:00am: Break 11:15am: Running MySQL in the real world 12:15pm:Final wrap up 12:30pm: Lunch Administrative 3
  • 4. Part 1: 0. Brief Introduction 1. MySQL History, Philosophy, Community 2. Installation, Basic Configuration, Launching 3. Using it, Backing it up 4. Replication basics, Monitoring 5. Database Defense & fast feedback to developers Agenda in More Detail 4
  • 5. Part 2: ● Running in Production #realtalk ● Being Prepared ● Advanced Monitoring and Investigation ● Fantastic Errors and Where to Find Them ● When Things Are "Slow" ● Advanced Tuning ● Replication and Scaling Out ● Discussion Agenda in More Detail 5
  • 6. ● My name is Jenni ● Engineering Manager @ Yelp ● First DBA there in 2011 Who am I Why am I Here? 6
  • 7. Got Into Databases by Accident 7
  • 8. 8
  • 9. 9 Myth 1: databases are scary!
  • 10. 10 Myth 2: DBAs are scary!
  • 13. You may already have a database. It Was on Fire When I Got Here 13
  • 14. "If someone is asking you who owns the database, it's probably you" - @mipsytipsy And it's Not Clear Who Fixes it 14
  • 15. ● Yelp didn't have a DBA for >5 years ● We've acquired companies with great big DBs ● There are probably some easy things to fix Don't Feel Bad!
  • 16.
  • 17. Database Flavors: ● SQL ● NoSQL ● Geospatial ● Graph Just in Case You Don't Have a DB Yet... 17
  • 18. ● MySQL ● PostgreSQL ● SQLite SQL/Relational Databases 18
  • 19. ● Cassandra ● MongoDB ● Couchbase ● Redis NoSQL 19
  • 20. ● PostGIS ● Neo4j ● Titan Geospatial & Graph 20
  • 21. When it's a: ● log ● queue ● cache ● photo ● disk directory ● spreadsheet ● data warehouse/cube/shed/lake/garage/whatever Use the right tool for the job. When Not to Use a Database 21
  • 22. 22
  • 23. ● MySQL - Wikipedia ● First released in 1995 ● Pluggable storage engines ● Written in the age of spinning rust ● Transactional engine InnoDB released in 2001 ● Oracle buys InnoDB in 2005 ● Bought by Sun in 2008 ● Oracle bought Sun in 2010 Brief History of MySQL 23
  • 24. ● It is meant to work best the most of the time ● It is hard impossible to optimize for all cases ● That sometimes surprises you "Performs extremely well in the average case" - Wikipedia MySQL - Why Does it Do What It Does 24
  • 25. ● The manual is awesome ● DBA StackExchange ● So many forums - "just <search engine> for it" ● Conferences ● Lots of great open source The MySQL Community 25
  • 26. ● MySQL Manual - Server System Variables - 500+ ● 400+ status metrics Don't freak out. You don't have to care about these yet :) Highly Configurable 26
  • 27. 27
  • 28. ● Source, package, #whatever ● Drop a config in /etc/my.cnf: ○ basedir, datadir ○ socket ○ pid-file ○ server-id Installation 28
  • 29. ● Error log ● Slow log ● Don't log warnings for now MySQL Files - Logs 29
  • 30. ● You may have lots of these: ○ Binary logs ○ Relay logs MySQL Files - Replication Logs 30
  • 31. ● innodb_buffer_pool_size ~ 75% memory ● Worry about the rest of it later MySQL Config - memory 31
  • 32.
  • 33. ● service mysql start or ● /etc/init.d/mysql start ● Runs on port 3306 ● Throw a party ● Or just connect to it ○ Use the mysql command Let's Fire This Puppy Up 33
  • 34. jsnyder@db:~$ ps auxw| fgrep mysql root 16661 0.0 0.0 4460 1692 ? Ss Oct09 0:00 /bin/sh /usr/bin/mysqld_safe mysql 17877 1.2 88.8 30221480 27893084 ? Sl Oct09 318:00 /usr/sbin/mysqld --basedir=/usr --datadir=/nail/databases/mysql/data --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/nail/databases/mysql/log/err.log --pid-file=/nail/databases/mysql/mysqld.pid --socket=/nail/databases/mysql/mysqld.sock --port=3306 34
  • 35. jsnyder@db:~$ mysql -u root Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 1124438 Server version: 5.6.32-78.1-log Percona Server (GPL), Yelp Release 1 Copyright (c) 2009-2017 Percona LLC and/or its affiliates Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> 35
  • 36. mysql> show databases ; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.02 sec) 36
  • 37. ● Find the start script ● Find that err.log ● Run mysqld_safe manually When it Won't Start 37
  • 39. ● MySQL uses GRANTS ● Users are a combination of: ○ User (name) ○ Password ○ Host - % is wildcard ● localhost uses unix domain socket ● 127.0.0.1 uses TCP/IP MySQL Grants 39
  • 40. ● GRANTS can be for different levels: ○ Global *.* ○ Database yelp_db.* ○ Table yelp_db.user MySQL Grant Levels 40
  • 41. mysql> CREATE USER 'yolo'@'%' IDENTIFIED BY '' ; Query OK, 0 rows affected (0.01 sec) mysql> GRANT ALL ON *.* TO 'yolo'@'%' WITH GRANT OPTION ; Query OK, 0 rows affected (0.00 sec) mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON first_db.* TO 'app'@'10.10.10.%' IDENTIFIED by 'sweetpassword' ; Query OK, 0 rows affected (0.00 sec) mysql> GRANT CREATE, DROP, INDEX, TRIGGER, SELECT, INSERT, UPDATE, DELETE ON *.* TO 'schema_manager'@'10.%' IDENTIFIED BY 'yougettheidea' ; Query OK, 0 rows affected (0.01 sec) 41
  • 42. mysql> select user, host from mysql.user ; +----------------+------------+ | user | host | +----------------+------------+ | yolo | % | | schema_manager | 10.% | | app | 10.10.10.% | | mysql.sys | localhost | | root | localhost | +----------------+------------+ 5 rows in set (0.00 sec) 42
  • 43. mysql> show grants for 'root'@'%'; +-------------------------------------------------------------+ | Grants for root@% | +-------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION | | GRANT PROXY ON ''@'' TO 'root'@'%' WITH GRANT OPTION | +-------------------------------------------------------------+ 2 rows in set (0.00 sec) mysql> show grants for current_user ; +-------------------------------------------------------------+ | Grants for root@% | +-------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION | | GRANT PROXY ON ''@'' TO 'root'@'%' WITH GRANT OPTION | +-------------------------------------------------------------+ 2 rows in set (0.00 sec) 43
  • 44. ● Make sure MySQL is running ● On that host & port ● Match user & host in error with grants ● Read err.log ● Use skip-grant-tables to start over Debugging Connection Issues 44
  • 45. ● Add skip-grant-tables to your my.cnf ● Restart MySQL ALTER USER 'root'@'localhost' IDENTIFIED BY 'MyNewPass'; What if You Don't Know the Password? 45
  • 46. ● Start with broad grants ('user'@'%') ● Refine from there Debugging MySQL Connection Errors 46
  • 47. mysql -u myuser -p mypass -h myhost -P myport jsnyder@db:~$ mysql -u myuser -h myhost ERROR 1045 (28000): Access denied for user myuser@'10.10.10.10' (using password: NO) jsnyder@db:~$ mysql -u myuser -p -h myhost Enter password: ERROR 1045 (28000): Access denied for user 'myuser'@'10.10.10.10' (using password: YES) jsnyder@db:~$ mysql -u myuser -p -h myhost Enter password: ERROR 2005 (HY000): Unknown MySQL server host 'myhost' (0) 47
  • 48. 48
  • 49. We use: To ORM or not to ORM? Developing Against MySQL 49 Language Python MySQL Client Library mysqlclient-python (there is also Connector/Python) Object Relational Mapping (ORM) SQLAlchemy Schema Manager Hand-rolled system
  • 50. ● Have developers read up on normal form ○ It's OK to denormalize for performance (later) ● Use one table for one thing ● Try not to get too fancy right away Reference: Database Normalization Basics About That Schema 50
  • 51. 51
  • 52. ● Don't swallow database client warnings/errors ● Track client error rates ● Make sure devs get error feedback Build feedback to go directly to developers. When Developing Against MySQL 52
  • 53. ● You can start simple ● Reads and writes scale differently ● Use discovery, load balancer or proxy ○ Smartstack ○ ProxySQL ○ MySQL Router Use technology you already have if possible. Application Connections 53
  • 54. 54
  • 55. ● When you care about your database ○ Back it up to a safe place ○ Test restoring them ○ Have more than one Back to the Administration Part 55
  • 56. ● Binary a.k.a. hot backups ● Logical backups ● Point in time recovery ○ tl;dr: backup your binary logs Different Kinds of Backups 56
  • 57. ● "They're Hot" ● MySQL Enterprise ● xtrabackup ○ Streaming Binary Backups 57
  • 58. ● Spoiler alert: it's SQL ● mysqldump ● Best run on an offline/not replicating DB ● Sometimes the only way to upgrade Logical Backups 58
  • 59. ● Logs of all changes ● Can be used to "roll" a DB forward in time ● More on these later... ● Copy & compress Sample command to look at these: mysqlbinlog --base64-output=DECODE-ROWS -v mysql-bin.000001 Backing up the Binary Logs 59
  • 60. ● Backups are only as good as your ability to use them ● Test your restores Testing Backups 60
  • 61. ● mysqlbinlog turns binary logs into SQL ● Use the coordinates you saved ● It's documented, but I'm not aware of a great tool Point in Time Recovery With Binary Logs 61
  • 62. jsnyder@db:~$ cat /path/to/backup/xtrabackup_binlog_info mysql-bin.000003 589767 jsnyder@db:~$ mysqlbinlog /path/to/binlogs/mysql-bin.000003 /path/to/binlogs/mysql-bin.000004 --start-position=589767 --stop-datetime="2017-10-29 09:00:00" | mysql <args> 62
  • 63. ● Try it before you need it ● We call these "war games" ○ And have failover exercises ● The best plan is one you know will work! More on replication next. Practice! 63
  • 64. 64
  • 65. MySQL uses the terms MASTER and SLAVE. It is modern and inclusive to use PRIMARY and REPLICA. This talk uses MASTER and REPLICA. But First, a Word About Words 65
  • 66. ● Basics ○ One master + multiple replicas ○ Client-initiated, client configured ○ The replication user also needs grants GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'replication'@'10.%' ... Replication 66
  • 67. ● Master executes statement, writes it to: ○ Binary log ● Replica's IO_THREAD reads from master, writes it to: ○ Relay log ● Replica's SQL_THREAD executes statement, writes it to: ○ Binary log Replicas can have their own replicas! Logs & Threads 67
  • 68. ● Unique server-id is a must ● Binary log formats ○ STATEMENT ○ ROW ○ MIXED ● Log-slave-updates = ON ● master_info_repository = TABLE MySQL won't execute events from it's own server-id. Basic Configuration 68
  • 69. jsnyder@db:~$ fgrep bin /etc/my.cnf # binary logging is required for replication log-bin = ../master_log/mysql-bin relay-log = ../relay_log/relay-bin binlog_format = statement jsnyder@db:~$ fgrep server-id /etc/my.cnf server-id = 171739394 jsnyder@db:~$ fgrep slave /etc/my.cnf log_slow_slave_statements = 1 log-slave-updates = 1 slave_net_timeout = 30 slave-skip-errors = 1141 69
  • 70. ● CHANGE MASTER TO ○ MASTER_USER = ○ , MASTER_PASSWORD = ○ , MASTER_HOST = ● If using GTID: ○ , MASTER_AUTO_POSITION = 1 ● If not: ○ , MASTER_LOG_FILE = ○ , MASTER_LOG_POSITION = Configuring Replication 70
  • 71. ● START SLAVE ● SHOW SLAVE STATUS Start it up 71
  • 72. mysql> SHOW SLAVE STATUS G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.10.10.44 Master_User: replication_user Master_Port: 3306 Master_Log_File: mysql-bin.000008 Read_Master_Log_Pos: 914832470 Relay_Log_File: relay-bin.000014 Relay_Log_Pos: 914832633 Relay_Master_Log_File: mysql-bin.000008 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... 72
  • 73. mysql> SHOW SLAVE STATUS G … (continued) Last_Errno: 0 Last_Error: ... Exec_Master_Log_Pos: 914832470 ... Seconds_Behind_Master: 0 ... Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: 73
  • 74. ● Replication errors in two places ● err.log 2017-10-28 17:22:06 5061 [ERROR] Slave SQL: Error 'Duplicate entry '14' for key 'PRIMARY'' on query. Default database: 'yelp_db'. Query: 'INSERT INTO user (id, name) VALUES ('14', 'Barbara')', Error_code: 1062 ● SHOW SLAVE STATUS: Last_SQL_Errno: 1062 Last_SQL_Error:'Duplicate entry '14' for key 'PRIMARY'' on query. Default database: 'yelp_db'. Query: 'INSERT INTO user (id, name) VALUES ('14', 'Barbara')' What Could Possibly go Wrong? 74
  • 75. ● Who had the problem? [ERROR] Slave SQL: Error ● What was it trying to do? Query: 'INSERT INTO user (id, name) VALUES ('14', 'Barbara')' ● What happened? 'Duplicate entry '14' for key 'PRIMARY'' on query. ● Where was this? Default database: 'yelp_db Try Breaking Errors Up 75
  • 76. ● Wrong binary log coordinates ● Inconsistent data Common Causes of Errors 76
  • 77. 77
  • 78. ● If you care about it, monitor it ● Lots of open source options ○ Built-in to your monitoring solution ○ percona-toolkit ○ Cheap scripts ○ Look on github Humans like pictures graphs! Monitoring - Server 78
  • 79. ● Disk space: ○ We file Jira ticket at 85% ○ And page at 95% ● Is MySQL running, does it seem OK? ● Are it's cron jobs completing? ● Does it's config match running state? Host Stuff 79
  • 80. ● Auto-increment id limits ● Binary log growth ● Killed queries ● Replication hierarchy ● read_only ● Temporary tables on disk ● Threads connected ● Threads running ● More fancy replication checks MySQL Server Stuff 80
  • 81. ● Have your application: ○ Track client errors ○ Track query times ○ Bonus: query annotations select id, name from user where id = 1 /* host ; code_path.py */ MySQL Client Stuff 81
  • 82. 82
  • 83. Kill and Log: ● Long-running queries ● Long-running transactions ● Too many of the same queries Send feedback directly to developers via the client. Automated Defense 83
  • 84. ● MySQL client will receive errors ○ And retry ○ Or fail gracefully ● Track client errors & statistics ● Alert devs where reasonable Let developers see immediate feedback of a change! Handling Client Errors 84
  • 85. Before devs push to production they can test for: ● Queries that will be slow ● Transactions that do too much Send feedback directly to developers via the client. Automated Testing 85
  • 86. Seems like a lot of work! Let's moment to reflect on humans and feelings. Is All This Really Worth it? 86
  • 87. 87
  • 88. 88
  • 89. 89
  • 90. If you get paged: ● And it was because of someone else's change ● You might feel #negative_emotion ● Especially if it happens again… Think About it - Ops Perspective 90
  • 91. If someone tells you they got paged: ● You do something about it, you didn't know ● But if it happens again… ● You might feel #negative_emotion Think About It - Dev Perspective 91
  • 92. Will: ● Let developers discover issues & fix on their own ● Faster iteration while developing ● Confidence when pushing ● Keep the pager for when you're really needed <3 <3 <3 Useful Automation/Monitoring 92
  • 93. 93
  • 94. 94
  • 95. Questions, comments, quemments? We covered: 1. MySQL History, Philosophy, Community 2. Installation, Basic Configuration, Launching 3. Using it, Backing it up 4. Replication basics, Monitoring 5. Database Defense & fast feedback to developers Let's Talk About It 95
  • 96. Here is what we'll cover when we get back ● Running in Production ● Being Prepared ● Advanced Monitoring and Investigation ● Fantastic Errors and Where to Find Them ● When Things Are "Slow" ● Advanced Tuning ● Replication and Scaling Out ● Discussion Before we Break 96
  • 97. 97
  • 98. Part 2a: ● Running in Production ● Being Prepared ● Advanced Monitoring and Investigation ● Fantastic Errors and Where to Find Them And we're back 98
  • 99. Part 2b: ● When Things Are "Slow" ● Advanced Tuning ● Replication and Scaling Out ● Discussion And we're back 99
  • 100. 100
  • 101. Some things to think about: ● Expectations ● SLOs - uptime and query time ● Caching where you can ● Degrading gracefully Be prepared, or prepare to be surprised! Welcome to Production! 101
  • 102. Nothing is up 100% ● Get PM's and devs involved ● Put features behind toggles ● Maintenance mode(s) ● Read-only mode Because We Measure Uptime in 9's 102
  • 103. 103
  • 104. Embrace your fears: ● Think about what can go wrong ● Plan a response - write Runbooks! ● Practice I don't always test in production... Preparation For the Worst 104
  • 105. Sometimes you have to do this anyway: ● Upgrades ● New hardware ● New features mean new tables & queries Try to use the same procedures every time. It's Not All Bad 105
  • 106. ● Host failures ● Network failures ● Provider failures ● Overwhelming usage What Could go Wrong? 106
  • 107. 107
  • 108. Normalize maintenance & disaster recovery ● Replacing a replica ○ Same as building a new one ● Replacing a master ○ Same as promoting a new one ● Cloning a database ○ Same as restoring a backup Best Practices - Server 108
  • 109. ● Give developers insight into DB health during: ○ Pushing code ○ Traffic surges Best Practices - Application 109
  • 110. 110
  • 111. ● Blameless Post-mortems ● Retrospectives Learn From Mistakes 111
  • 112. 112
  • 113. ● Monitoring & alerting is pretty much ○ Collecting data ○ Using it Advanced Monitoring and Investigation 113
  • 114. Examples of things to monitor on a DB server: ● Is MySQL running? ● Load % CPU Idle ● Disk usage ● Critical processes ● Critical jobs Monitoring the Server 114
  • 115. ● Auto-increment id limits ● Binary log growth ● Killed queries ● Replication delay, hierarchy ● read_only ● Temporary tables on disk ● Threads connected ● Threads running ● Fancy replication checks Monitoring MySQL 115
  • 116. SHOW GLOBAL STATUS has counters & gauges ● Queries per second ● Threads running ● Threads connected ● Slow queries ● Open transactions Monitoring MySQL Usage 116
  • 117. You can monitor these through our client: ● Error rate ○ Rates per type ● Log all warnings & errors Monitoring MySQL Client 117
  • 119. Ways your monitors can send you info: ● Email ● Chat ● Tickets ● Pages Chatbots are so hot right now. Alerting 119
  • 121. Make sure the alert: ● Goes to the right team! ● Contains ○ A link to the runbook/guide ○ The first step to take Make your Alerts Actionable 121
  • 122. Nothing is perfect the first time around ● Tune thresholds ● Keep track of alerts and response ● Handover to the next person down the line If it happens consistently, automate the response! Iterate and Pay it Forward 122
  • 123. 123
  • 124. 124
  • 125. ● Start with the read some logs/part ● Iterate, refine, shorten Have a Runbook/Guide 125
  • 126. Cheapa$$ cron jobs capture this every minute: ● ps ● mysql -e "SHOW PROCESSLIST" ● mysql -e "SHOW ENGINE INNODB STATUS" And this once a day: ● mysql -e "SHOW TABLE STATUS" ● mysqldump --no-data What happened just before I was paged? Tracking More Than Metrics 126
  • 127. jsnyder@db:~$ cat ps-log.sh #!/bin/bash [ -d /var/log/ps ] || mkdir /var/log/ps ( date -R /bin/ps auxf free echo ) | bzip2 -c >> /var/log/ps/ps-`/bin/date +%Y%m%d-%H`.bz2 127
  • 128. Collect forensic data about MySQL when problems occur. ● Can run as a daemon ● Highly configurable ● Waits for a trigger condition ● Collects a ton of information pt-stalk 128
  • 129. 129
  • 130. ● Connection Errors ● Client Errors ● Server Errors ● Replication Errors 2013, 'Lost connection to MySQL server during query' Fantastic Errors... 130
  • 131. ● Your application's log ● MySQL's error_log ● SHOW SLAVE STATUS ● SHOW ENGINE INNODB STATUS Just <insert search engine here> for it... And Where to Find Them 131
  • 132. Traceback (most recent call last): File "/blahblah/blah/blah.py", line 357, in run return self.execute() File "/blah/blah.py", line 359, in execute return action(**match) File "/blah/blaah.py", line 172, in blaah_html return self.xtmpl('blaah.blaah', env=self.blaah_env(blaah_form_obj)) File "/lib/blah-blah/orm/session.py", line 470, in __exit__ self.rollback() File "/lib/blah-blah/bleh.py", line 238, in commit self._delegate.commit() OperationalError: (2013, 'Lost connection to MySQL server during query') None None 132
  • 133. ● Try running the query itself ○ Does it return a huge number of rows? Sometimes? ○ Return a large amount of data? ○ Increase max_allowed_packet ● Does it go away if you increase net_read_timeout? ● Is the application server overloaded? ● It might be the network Debugging Lost Connections 133
  • 134. ● Sometimes MySQL goes away… OperationalError: (2006, 'MySQL server has gone away') MySQL Goes Away... 134
  • 135. 135
  • 136. A sensitive subject: ● You may store sensitive information in your DB ○ Email address ○ First & last names ○ Payment data ● You probably don't want it in your logs ● pt-fingerprint When an Error Contains a Query 136
  • 137. You'll find these in the error_log ● log_error_verbosity = 1 ● start/stop messages ● replication information ● crashes Server Errors 137
  • 138. InnoDB: Warning: a long semaphore wait: --Thread 140375092094720 has waited at row0purge.cc line 770 for 241.00 seconds the semaphore: S-lock on RW-latch at 0x1346820 '&dict_operation_lock' a writer (thread id 140375041206016) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file row0purge.cc line 770 Last time write locked in file /dist/work/percona-server/storage/innobase/row/row0mysql.cc line 3827 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 5, pwrites 0 138
  • 139. ● MySQL will dump what info it can to the log ● mysqld_safe will attempt to restart MySQL ● Crashes are not normal MySQL Crashes 139
  • 140. 16:22:00 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against blah blahblah blah blah blah blah blah. key_buffer_size=268435456 read_buffer_size=131072 max_used_connections=8 max_threads=41 thread_count=2 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 309930 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. 140
  • 141. ● OOM Killer ● Corrupted Data ● Hardware Failure ● Bug Common Crash Causes 141
  • 142. 142
  • 143. Lots of things can cause this: ● Individual statements can be slow ● Lots of queries ● Lots of slow-ish queries Find out what resource is in contention. When Things Are "Slow" 143
  • 144. Examples here are from our open dataset. Not queries we use in real-life :) Tuning an Individual Query or Table 144
  • 145. # Time: 2017-10-27T19:38:53.671918Z # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 4.947666 Lock_time: 0.000416 Rows_sent: 5 Rows_examined: 610939 SET timestamp=1509305933; select count(*), r.stars from review r join business b on r.business_id = b.id join category c on b.id = c.business_id where c.category = 'Soul Food' group by r.stars ; 145
  • 146. mysql> select count(*), r.stars from review r join business b on r.business_id = b.id join category c on b.id = c.business_id where c.category = 'Soul Food' group by r.stars ; +----------+-------+ | count(*) | stars | +----------+-------+ | 2052 | 1 | | 1785 | 2 | | 2356 | 3 | | 5432 | 4 | | 8779 | 5 | +----------+-------+ 5 rows in set (4.95 sec) 146
  • 147. mysql> EXPLAIN select count(*), r.stars from review r join business b on r.business_id = b.id join /*snip*/ ... G *********************** 1. row *********************** id: 1 select_type: SIMPLE table: category partitions: NULL type: ALL possible_keys: fk_categories_business1_idx key: NULL key_len: NULL ref: NULL rows: 587969 filtered: 10.00 Extra: Using where; Using temporary; Using filesort 147
  • 148. *********************** 2. row *********************** id: 1 select_type: SIMPLE table: business partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 68 ref: yelp_db.c.business_id rows: 1 filtered: 100.00 Extra: Using index 148
  • 149. *********************** 3. row *********************** id: 1 select_type: SIMPLE table: review partitions: NULL type: ref possible_keys: fk_reviews_business1_idx key: fk_reviews_business1_idx key_len: 68 ref: yelp_db.c.business_id rows: 25 filtered: 100.00 Extra: NULL 3 rows in set, 1 warning (0.00 sec) 149
  • 150. mysql> pager fgrep rows PAGER set to 'fgrep rows' mysql> EXPLAIN select count(*), r.stars from review r join business b on r.business_id = b.id join category c on b.id = c.business_id where c.category = 'Soul Food' group by r.stars G rows: 587969 rows: 1 rows: 25 3 rows in set, 1 warning (0.00 sec) 150
  • 151. mysql> nopager PAGER set to stdout mysql> select 587969 * 1 * 25 ; +-----------------+ | 587969 * 1 * 25 | +-----------------+ | 14699225 | +-----------------+ 1 row in set (0.00 sec) # From our slow query log, the numbers can be pretty off! # Query_time: 4.947666 Lock_time: 0.000416 Rows_sent: 5 Rows_examined: 610939 151
  • 152. mysql> alter table category add key (category) ; Query OK, 0 rows affected (1.53 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table review drop key fk_reviews_business1_idx , add key( business_id, stars) ; Query OK, 0 rows affected (38.23 sec) Records: 0 Duplicates: 0 Warnings: 0 152
  • 153. mysql> alter table category add key (category) ; Query OK, 0 rows affected (1.53 sec) Records: 0 Duplicates: 0 Warnings: 0 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: c partitions: NULL type: ref possible_keys: fk_categories_business1_idx,category key: category key_len: 768 ref: const rows: 235 filtered: 100.00 Extra: Using temporary; Using filesort 153
  • 154. mysql> select count(*), r.stars from review r join business b on r.business_id = b.id join category c on b.id = c.business_id where c.category = 'Soul Food' group by r.stars ; +----------+-------+ | count(*) | stars | +----------+-------+ | 2052 | 1 | | 1785 | 2 | | 2356 | 3 | | 5432 | 4 | | 8779 | 5 | +----------+-------+ 5 rows in set (0.03 sec) 154
  • 155. ● Long ALTER TABLE's can hurt performance ● pt-online-schema-change ● gh-ost Changing the Schema IRL 155
  • 156. There's a a great manual page How MySQL Uses Indexes PERFORMANCE_SCHEMA can give you stats on index usage. Keep in mind: ● Adding indexes slows down writes a bit ● Cardinality ● Leftmost-prefixing And Just a Bit More on Indexes 156
  • 157. 157
  • 158. 158
  • 159. Find them using SHOW PROCESSLIST Or SELECT … FROM information_schema.processlist You can use cheap bash commands to sort & find culprits. Lots of Queries at Once 159
  • 160. mysql -uroot -e "show full processlist" | awk '{print $5}' | sort | uniq -c | sort -gr | head jsnyder@db:~$ mysql -uroot -e "show full processlist" | awk '{print $5}' | cut -c 1-70 | sort | uniq -c | sort -gr | head 521 Sleep 109 select count(*), r.stars from review r j 3 Query 2 Connect 1 Command # kill them with pt-kill. Use --print to test first! pt-kill --config /etc/mysql/emergency-kill.conf --kill --match-info 'your query' --daemonize --run-time=60 160
  • 161. You can: ● Some slow queries run infrequently ● Some slowish queries run all the time ● Visualization tools like Anemometer, VividCortex Take a tcpdump & analyze with pt-query-digest: pt-query-digest --type tcpdump tcpdump.txt Finding the Next Thing To Optimize 161
  • 162. 162
  • 163. ● I/O ● Memory ● Tables/Transactions Tuning MySQL/InnoDB 163
  • 164. InnoDB has a ton of knobs and dials ● Innodb_io_capacity ● Optimize your filesystem ● Add more memory, larger/more buffer pools ● Tinker with innodb_flush_method There is a whole manual page dedicated to optimizing InnoDB disk I/O. I/O is Always the Slowest 164
  • 165. Think of the InnoDB buffer pool as InnoDB's cache ● Increase the amount of memory to reduce I/O Rough math: innodb_buffer_pool_instances * innodb_log_buffer_size + (read_buffer_size + sort_buffer_size) * max_threads MySQL Memory Calculator looks out of date, but can give you a pretty good idea... How InnoDB Uses Memory 165
  • 166. To view running transactions: ● SHOW ENGINE INNODB STATUS ● SELECT * FROM information_schema.INNODB_TRX G On Transactions 166
  • 167. mysql> SHOW ENGINE INNODB STATUS ; ... ------------ TRANSACTIONS ------------ Trx id counter 50806393604 Purge done for trx's n:o < 50806389839 undo n:o < 0 state: running but idle History list length 527 LIST OF TRANSACTIONS FOR EACH SESSION: ... 167
  • 168. ... ---TRANSACTION 50806392298, ACTIVE 12 sec starting index read mysql tables in use 1, locked 1 314 lock struct(s), heap size 46632, 358 row lock(s), undo log entries 353 MySQL thread id 23649033, OS thread handle 0x7f6bd31c7700, query id 32707038971 10.10.10.40 application_user updating UPDATE business SET ... Trx read view will not see trx with id >= 50806392299, sees < 50806389830 168
  • 169. Reduce lock contention by: ● Decreasing the number of locks your transactions use ● The total time of your transactions ● Making writes faster See InnoDB Locking Explained with Stick Figures Improving Transaction Efficiency 169
  • 171. Let's Hit Pause For a Moment 171
  • 172. 172
  • 173. Ways to use replication: ● Take load off the master ● Send reads to the replicas ● Having a separate reporting database ● Delayed replica with pt-slave-delay Replication and Scaling Out 173
  • 174. ● Master writes binary logs ● Each replica has 2 threads: ○ IO_THREAD: reads from master, writes relay logs ○ SQL_THREAD: runs SQL from relay logs ● Serial per database with parallel replication How MySQL Replication Works Again 174
  • 175. ● master-master ○ One master "live", or ○ auto_increment_increment, offset ● master, intermediate master ● master(s) with many replicas Some Replication Strategies 175
  • 178. One Master, Many Replicas 178
  • 180. masters with their own replicas 180
  • 181. Think About What You Want to Do 181
  • 182. ● Enable GTID - easy reparenting ● orchestrator - GUI and tooling ● pt-table-checksum - check consistency ● ProxySQL - can help you split your read-write traffic Tools For Replication 182
  • 183. 183
  • 184. ● Don't write to sketchy databases ● Don't let your application write to replicas ● use read_only=1 to prevent writes to replicas ● SHOW MASTER STATUS, SHOW SLAVE STATUS Remember, a host that goes away may wander back... When in Doubt, Throw it Out 184
  • 185. 1. Set old master to read_only=1 2. CHANGE MASTER TO if necessary 3. Configure your application to write to new master 4. Set new master to read_only=0 Never allow writes to a replica unless you know that's what you want! Sample Master Promotion Plan 185
  • 186. 186
  • 187. Common causes: ● IO_Thread can't keep up ○ Local load or slow/lossy network ● SQL_Thread can't keep up ○ Local load, long statements or transactions ○ mysqlbinlog to look at transactions Speaking of Lag... 187
  • 188. You can always phone a friend: ● MySQL has a big, friendly online community ● Oracle MySQL, Percona, MariaDB offer support Get the most out of your interactions with experts by providing lots of context and what you've already tried. When You're Really Stuck 188
  • 189. 189
  • 190. We covered: ● Running in Production ● Being Prepared ● Advanced Monitoring and Investigation ● Fantastic Errors and Where to Find Them ● When Things Are "Slow" ● Advanced Tuning ● Replication and Scaling Out Let's Talk About It 190
  • 193. ● MySQL Manual ● MySQL Utilities ● Percona Toolkit ● Orchestrator: MySQL Replication Topology Manager ● ProxySQL: High Performance MySQL Proxy ● mysql-sys - tools for using performance_schema ● Anemometer - Slow Query Monitor Appendix - Links to Resources 193
  • 194. ● Database Reliability Engineering by Laine Campbell, Charity Majors ● High Performance MySQL by Baron Schwartz, Peter Zaitsev & Vadim Tkachenko Appendix - Books 194
  • 195. ● Database Defense in Depth by @geodbz ● GrossQueryChecker: Don't get surprised by query performance in production! by @jcsuperstar ● InnoDB Locking Explained with Stick Figures by @billkarwin ● Blameless PostMortems and a Just Culture by @allspaw ● Literally any Percona Blog Post by @svetsmirnova Appendix - Blog Posts 195
  • 196. ● Pexels - Images licensed under CC0 ● Icons from The Noun Project ● #WOCInTechChat ● The Jopwell Collection Appendix - Image Sources 196
  • 197. ● Yelp Dataset Other datasets: ● MySQL's Employees Sample Database ● Percona's list of Sample datasets Appendix - Yelp Dataset 197