4. Why you are here
• You want to understand what PS is
• You know what it is, but want to know a bit more
• You’d like to grill the speaker with some nasty
questions about it (wait for the end!)
5. Agenda
• General definition
• Implementation
• Filtering
• Performance impact
• Tracing an ALTER
TABLE
• Get help from Sys
schema
• How to use it and dig
• Memory utilization
• Replication
• Command reference
and Steps
7. Assumption, notes & facts
• Based on MySQL 5.7.6
• Many difference with previous versions (parameters name and not
only)
• Full presentation is > 100 slides (no worries those are for your
reference).
• Tests done using multiple Application nodes
– Write (Primary tables 372 bytes, secondary tables 46K)
– Reads on Primary (int,Date) & Secondary (int or Varchar)
– Some reads on purpose doing table scan
– Deletes by range on int (PK)
8. Performance Schema - what is it?
The PERFORMANCE_SCHEMA is a way to introspect the
internal execution of the server at runtime.
The performance schema focuses primarily on performance
data, as opposed to the INFORMATION_SCHEMA
whose purpose is to inspect metadata.
9. Performance Schema – User point of view
From a user point of view, the performance schema
consists of:
–dedicated database schema, named
PERFORMANCE_SCHEMA,
–SQL tables, used to query the server internal state or
change configuration settings
10. PSchema – Internal implementation
From an implementation point of view, the performance
schema is a dedicated Storage Engine which exposes
data collected by 'Instrumentation Points' placed in the
server code.
11. PSchema interfaces
The performance schema exposes many different
interfaces, for different components, and for different
purposes:
•Instrument interface (is a coding interface provided by implementors)
•Compiling interface
•Server startup interface
•Server bootstrap interface
•Runtime configuration interface
•Internal audit interface
•Query interface
12. PSchema design principles
The primary goal of the performance schema is to
measure (instrument) the execution of the server.
• The parser is unchanged.
• Instrumentation points return "void", no error returned
• No dynamic memory allocation, all at start-up
• The instrumentation point should not cause thread
scheduling
13. PSchema design principles cont.
Low or zero impact while collecting information
• No performance hit (priority to collect faster, pushing complexity to
data retrieval)
• Non intrusive instrumentation (easy to be implemented by
developers)
• Easy deployment (support multiple versions of the instrumentation
interface, and ensure binary compatibility with each version)
14. PS Runtime Configuration Interface
This is the place where we decide what can run
• Use standard SQL to define what to use
• List of table use to setup
– setup_actors (Identify the threads to monitor by user & host)
– setup_consumers (Receiving tables in PS; query by SQL)
– setup_instruments (Detailed list of instruments)
– setup_objects (Used to include/exclude schemas/objects from
monitor)
– setup_timers (Define what timer is used for events)
15. Consumer VS Instrument
• Consumers are in short where the information will be
– events_statements_current
(Show tables in PS)
• Instruments are the collectors
– statement/sql/select
select * from setup_instruments where name like
'statement/sql/sele%';
16. Pre-filtering VS Post-filtering
• Pre-filtering refer to the possibility to perform selective
monitoring, before storing the information
• Post-filtering is perform after data collection, excluding
the undesired data. Mainly using SQL statements:
SELECT THREAD_ID, NUMBER_OF_BYTES FROM events_waits_history WHERE
EVENT_NAME LIKE 'wait/io/file/%' AND NUMBER_OF_BYTES IS NOT NULL;
• Pre-filtering more efficient, but you need to know what
you are looking for, in advance.
17. Pre-filtering
• Pre-filtering
– Choose only the instruments you really need
– Choose only the object you are interested in
– Filter by user(s)
– Filter by Consumer (events_statements_current)
18. Post-filtering
• Post-filtering
– SQL query
– You can create queries using joins cross
performance_schema tables
– Information_schema tables can help you to identify
thread/user
19. PS Performance impact
• In MySQL 5.6 we had between 10-30% (reported by
others)
• In MYSQL 5.7 I saw between 0.5-4% (tested by me)
– Couple of spikes at 18% considered anomaly
• With BP hot less impact
• Write operations less impact
• Wait instruments higher impact
20. Real example
Monitor activities from one application user coming from
different application servers.
• Set the actor
• Monitor first SQL CRUD
• Choose the history level (Consumers)
• Then dig more
21. PS Monitor a specific user
• Add user
• insert into setup_actors values('%',stress','%','YES');
• [performance_schema]>select * from setup_actors;
+------+--------+------+---------+
| HOST | USER | ROLE | ENABLED |
+------+--------+------+---------+
| % | % | % | NO |
| % | stress | % | YES |
+------+--------+------+---------+
* Bug 76428 (Oracle says is not a bug, I do not agree)
22. PS Monitor only SQL CRUD
• Set the instruments
update setup_instruments set ENABLED='YES' where
NAME='statement/sql/select|update|insert|delete';
select * from setup_instruments where ENABLED='YES';
+----------------------+---------+-------+
| NAME | ENABLED | TIMED |
+----------------------+---------+-------+
| statement/sql/select | YES | YES |
| statement/sql/update | YES | YES |
| statement/sql/insert | YES | YES |
| statement/sql/delete | YES | YES |
+----------------------+---------+-------+
23. Check what is going on
SELECT TH.PROCESSLIST_USER,ISPL.HOST,ISPL.DB,ISPL.User,
TH.PROCESSLIST_ID, ESH.*
from information_schema.processlist ISPL
JOIN threads TH on ISPL.ID=TH.PROCESSLIST_ID
JOIN events_statements_history ESH ON
TH.THREAD_ID=ESH.THREAD_ID
Where ISPL.User='stress'
AND (EVENT_NAME like '%select%' OR EVENT_NAME like
'%insert%' OR EVENT_NAME like '%update%' OR EVENT_NAME
like '%delete%')
24. Results
PROCESSLIST_USER: stress
HOST: 10.0.0.151:53644
DB: test
User: stress
PROCESSLIST_ID: 54
THREAD_ID: 85
EVENT_ID: 20245
END_EVENT_ID: 20245
EVENT_NAME:
statement/sql/delete
SOURCE:
socket_connection.cc:98
TIMER_START:
7939401943217000
TIMER_END:
7939751199462000
TIMER_WAIT: 349256245000
LOCK_TIME: 63000000
SQL_TEXT: DELETE FROM tbtest4 where a
between 2082189 and 2083189
DIGEST:
ece618f657b2f637038cb7f72b1cfbf1
DIGEST_TEXT: DELETE FROM
tbtest4 WHERE a BETWEEN ? AND ?
CURRENT_SCHEMA: test
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 0
RETURNED_SQLSTATE: 00000
MESSAGE_TEXT: NULL
ERRORS: 0
WARNINGS: 0
ROWS_AFFECTED: 23
ROWS_SENT: 0
ROWS_EXAMINED: 23
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 0
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 0
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
NESTING_EVENT_LEVEL: 0
25. I want to know more!
Activate Stage tracing
• Select the consumers
– update setup_consumers set ENABLED='YES' where name like
'events_stages_%';
• Select Instruments (all)
– update setup_instruments set ENABLED='YES' , TIMED='YES'
where name like 'stage/sql/%';
26. Query the Stage Status
• Current
– select ISPL.HOST,ISPL.DB,ISPL.User,ISPL.ID,ESH.*, SUBSTR(ISPL.Info,1,20) SQLT from
information_schema.processlist ISPL JOIN threads TH on ISPL.ID=TH.PROCESSLIST_ID JOIN
events_stages_current ESH ON TH.THREAD_ID=ESH.THREAD_ID Where ISPL.User='stress' and
info is not null order by TIMER_START
• History
– select ISPL.HOST,ISPL.DB,ISPL.User,ISPL.ID,ESH.*, SUBSTR(ISPL.Info,1,20) SQLT from
information_schema.processlist ISPL JOIN threads TH on ISPL.ID=TH.PROCESSLIST_ID JOIN
events_stages_history ESH ON TH.THREAD_ID=ESH.THREAD_ID Where ISPL.User='stress' AND
ID=52 and info is not null order by TIMER_START DES
• History Long
– select ISPL.HOST,ISPL.DB,ISPL.User,ISPL.ID,ESH.*, SUBSTR(ISPL.Info,1,20) SQLT from
information_schema.processlist ISPL JOIN threads TH on ISPL.ID=TH.PROCESSLIST_ID JOIN
events_stages_history_long ESH ON TH.THREAD_ID=ESH.THREAD_ID Where ISPL.User='stress'
AND ID=52 and info is not null order by TIMER_START DESC LIMIT 50;
27. Stage Status - Current
• I KNOW you cannot read it on the screen!
• There is a Session with ID 90 and I will filter using it
28. Stage Status - History
• Same thing I know is small
• I can see all the operations done during the execution
and the time taken
29. Stage Status – History Long
• I can compare different executions
• Something to investigate; WHY event ID 16142 takes so
long?
• BTW Query cache was disabled!?!? (size 0; mode on!!!)
30. PS Summary tables
• Summary tables are automatic generated by PS
• You can reset the values with TRUNCATE
• Select query and group by
• Organized by Categories:
– Event Wait, Stage, Statement, Transaction, Object Wait, File I/O,
Table I/O and Lock Wait, Connection,Socket, Memory
31. How to trace an Alter on a table?
• Enable the Instruments to monitor
InnoDB Alter
– update setup_instruments set
ENABLED='YES', TIMED='YES' where
name like 'stage/innodb/alter%';
• Using WORK_COMPLETED |
WORK_ESTIMATED
– Status of the operation
HOST: localhost
DB: test
User: stress
ID: 4081
THREAD_ID: 4111
EVENT_ID: 1731
END_EVENT_ID: NULL
EVENT_NAME: stage/innodb/alter
table (read PK and internal sort)
SOURCE: ut0stage.h:241
TIMER_START: 851457100596885000
TIMER_END: NULL
TIMER_WAIT: NULL
WORK_COMPLETED: 88749
WORK_ESTIMATED: 2214165
NESTING_EVENT_ID: 1594
NESTING_EVENT_TYPE: STATEMENT
SQLT: alter table tbtest1
32. Using SYS schema to query PS
SYS schema is a convenient way to manage and parse the
Performance schema.
• Views
– Two versions (pretty print, command usable)
• Store Procedure
– Manage the instruments/consumers
– Manage configuration
– Show histograms
33. SYS schema setup 1
Clean start.
• Reset all to defaults
– call sys.ps_setup_reset_to_default(1)*
• Set actors
– insert into setup_actors values('%','root','%',NO');
– insert into setup_actors values('%','stress','%','YES');
* Given a difference 5.6/5.7 the procedure need a fix to accommodate the additional
attribute “ENABLED” in 5.7 (line 20 of the SP)
37. How sys schema organize data
• Initial review/Overview
• User review
• Statement
• InnoDB Buffer Pool (from Information Schema)
• I/O
• Waits
• Memory usage
38. Example how to dig
• Which is the most expensive operation on my server?
– Who is doing what
– From where
– Where is costing more
– How much memory is taking
39. Who is doing what?
Using:
select * from sys.x$host_summary_by_statement_latency where max_latency > 0;
| host | total | total_latency | max_latency | lock_latency | rows_sent | rows_examined | rows_affected | full_scans |
+--------------+----------+----------------------+--------------------+------------------+-----------+---------------+---------------+------------+
| 10.0.0.153 | 3540524 | 27658534864398068384 | 7574715779257000 | 567306099000000 | 113508806 | 283506393702 | 26396594 | 510331 |
| 10.0.0.151 | 32963074 | 13892931298629665000 | 5326814750390000 | 1322101732000000 | 162953871 | 147355803 | 57977303 | 410706 |
| 10.0.0.152 | 1141135 | 9414853354124562000 | 6580206516327000 | 183363968000000 | 140876442 | 34664668021 | 3971379 | 484817 |
| 10.0.0.13 | 1 | 439140856048176000 | 439140856048176000 | 0 | 0 | 0 | 0 | 0 |
| localhost | 681 | 1305286713760000 | 1305225252156000 | 25516000000 | 805 | 794 | 0 | 22 |
| 192.168.51.1 | 635 | 80742106000 | 9207681000 | 14236000000 | 250 | 250 | 0 | 39 |
•All have full table scan and not using properly the index
•10.0.0.153 is heavy in read, but doing also modification
•10.0.0.151 is mainly inserting
•10.0.0.152 is mainly doing reads
40. From where?
Using:
select *, total_latency/total as latency_by_operation from sys.x$host_summary_by_stages
where host like '10%' order by 6 desc limit 10;
| host | event_name | total | total_latency | avg_latency | latency_by_operation |
+------------+------------------------------------------+---------+----------------------+----------------+----------------------+
| 10.0.0.153 | stage/sql/updating | 373841 | 9417613923098048000 | 25191495644000 | 25191495644132.2594 |
| 10.0.0.152 | stage/sql/updating | 100076 | 2141666904448883000 | 21400404736000 | 21400404736888.7945 |
| 10.0.0.153 | stage/sql/update | 1545766 | 15976559090187292000 | 10335690583000 | 10335690583301.2836 |
| 10.0.0.152 | stage/sql/Sending data | 486611 | 4684747633406580000 | 9627294971000 | 9627294971561.6375 |
| 10.0.0.152 | stage/sql/update | 270936 | 2387151366604427000 | 8810757398000 | 8810757398811.6271 |
| 10.0.0.153 | stage/sql/Sending data | 510331 | 1661917815254503384 | 39403175368000 | 3256548818814.6583 |
| 10.0.0.151 | stage/sql/update | 3612146 | 7780662705903419000 | 2154027745000 | 2154027745806.3486 |
| 10.0.0.12 | stage/sql/Finished reading one binlog; | 69 | 126037850052000 | 1826635508000 | 1826635508000.0000 |
| 10.0.0.151 | stage/sql/updating | 6583069 | 5268439570669058000 | 800301435000 | 800301435495.9758 |
| 10.0.0.151 | stage/sql/Waiting for table metadata lock| 4 | 2829687324000 | 707421831000 | 707421831000.0000 |
•153 has high single cost while updating & highest total for update
•152 has high cost updating but highest total operation cost is in sending data
•151 has high total cost for I/O update & updating, also had some wait for meta lock
Thread states page: https://dev.mysql.com/doc/refman/5.7/en/general-thread-states.html
42. Add a view
I want to analyze by statement type
and host.
CREATE
ALGORITHM = MERGE
DEFINER = `root`@`localhost`
SQL SECURITY INVOKER
VIEW `sys`.`x$statement_analysis_by_host` AS
select
EVENT_NAME as event ,
COUNT_STAR as exec_count,
SUM_TIMER_WAIT as total_latency ,
AVG_TIMER_WAIT as avg_latency ,
SUM_LOCK_TIME as lock_latency ,
SUM_ROWS_AFFECTED as changed_rows ,
SUM_ROWS_SENT as sent_rows ,
SUM_ROWS_EXAMINED as examined_rows,
SUM_CREATED_TMP_DISK_TABLES as tmp_table_on_disk,
SUM_CREATED_TMP_TABLES as tmp_table,
SUM_SELECT_FULL_JOIN as join_scan,
SUM_SELECT_FULL_JOIN as join_scan,
SUM_SELECT_FULL_RANGE_JOIN as join_range,
SUM_SELECT_RANGE_CHECK as join_select_check,
SUM_SELECT_SCAN as join_full_scan,
SUM_SORT_MERGE_PASSES as sort_passes,
SUM_SORT_SCAN as sort_scan,
SUM_NO_INDEX_USED as no_index_used,
SUM_NO_GOOD_INDEX_USED as no_good_index
from
events_statements_summary_by_host_by_event_name
where COUNT_STAR > 0
order by avg_latency
44. Dig more (queries)
Will focus on what is going on, on 10.0.0.153
select * from sys.x$statement_analysis order by total_latency desc limit 10 G
********* 1. row ******
query: DELETE FROM tbtest1
WHERE a BETWEEN ? AND ?
db: test
full_scan:
exec_count: 1476041
total_latency: 7618840631251163000
max_latency: 2141828209362000
avg_latency: 5161672766000
lock_latency: 78640125000000
rows_sent: 0
rows_examined: 26761
rows_affected: 21495
tmp_tables: 0
tmp_disk_tables: 0
rows_sorted: 0
sort_merge_passes: 0
********* 2. row ******
query: SELECT tbtest3 …- ? - ? LIMIT ?
db: test
full_scan: *
exec_count: 71828
total_latency: 6219726371992349000
max_latency: 3200860074610000
avg_latency: 86591947040000
lock_latency: 19742168000000
rows_sent: 0
rows_sent_avg: 0
rows_examined: 79995071105
rows_examined_avg: 1113703
rows_affected: 0
rows_affected_avg: 0
tmp_tables: 0
tmp_disk_tables: 0
rows_sorted: 0
sort_merge_passes: 0
********* 6. row ******
query: INSERT INTO tbtest1 ....
db: test
full_scan:
exec_count: 342374
total_latency: 4722320670153881000
max_latency: 1458059844761000
avg_latency: 13792871743000
lock_latency: 341116537000000
rows_sent: 0
rows_sent_avg: 0
rows_examined: 0
rows_examined_avg: 0
rows_affected: 17457402
rows_affected_avg: 51
tmp_tables: 0
tmp_disk_tables: 0
rows_sorted: 0
sort_merge_passes: 0
45. Find waits
I want to find waits, for the threads running from 10.0.0.153
and doing a DELETE operation.
•Use PS tables
–events_waits_history
–events_stages_history_long
–events_statements_history_long
–Threads
•Connect the events using NESTING_EVENT_ID
46. Find waits (query) – hide?
SELECT PROCESSLIST_ID,ewh.THREAD_ID,ewh.EVENT_NAME,ewh.SOURCE,
sum(ewh.TIMER_WAIT)/1000000000 asWait_ms, ewh.OBJECT_SCHEMA,
ewh.OBJECT_NAME, ewh.OBJECT_TYPE, ewh.OPERATION
from events_waits_history ewh
Join events_stages_history_long esth ON ewh.NESTING_EVENT_ID =
esth.EVENT_ID
Join events_statements_history_long esh ON esth.NESTING_EVENT_ID
=esh.EVENT_ID
Join threads th ON ewh.THREAD_ID = th.THREAD_ID
where PROCESSLIST_HOST like '10.%'
group by EVENT_NAME,PROCESSLIST_ID,ewh.THREAD_ID,ewh.EVENT_NAME,ewh.SOURCE,
ewh.OBJECT_SCHEMA, ewh.OBJECT_NAME, ewh.OBJECT_TYPE, ewh.OPERATION
order by ewh.TIMER_WAIT desc limit 50
61. Other Lock information from SYS
• Meta locks
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: performance_schema
OBJECT_NAME: metadata_locks
OBJECT_INSTANCE_BEGIN:
140045498780240
LOCK_TYPE: SHARED_READ
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5585
OWNER_THREAD_ID: 6141
OWNER_EVENT_ID: 9
select * from metadata_locksG
• InnoDb lock waits
wait_started: 2015-04-06 10:26:21
wait_age: 00:00:47
locked_table: `test`.`tbtest1`
locked_index: PRIMARY
locked_type: RECORD
waiting_trx_id: 111008059
waiting_trx_started: 2015-04-06 10:26:21
waiting_trx_age: 00:00:47
waiting_trx_rows_locked: 1
waiting_trx_rows_modified: 0
waiting_pid: 6104
waiting_query: update test.tbtest1
set b ='aaaaaddaaaasa' where a <> 1582347553
select * from x$innodb_lock_waitsG
waiting_lock_id: 111008059:284:6:2
waiting_lock_mode: X
blocking_trx_id: 111008056
blocking_pid: 6078
blocking_query: NULL
blocking_lock_id: 111008056:284:6:2
blocking_lock_mode: X
blocking_trx_started: 2015-04-06 09:27:08
blocking_trx_age: 01:00:00
blocking_trx_rows_locked: 18048041
blocking_trx_rows_modified: 16450211
62. InnoDB Lock (example)
select * from INNODB_LOCK_WAITS;
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
| 91459149 | 91459149:183:54275:439 | 91323715 | 91323715:183:54275:439 |
select * from INNODB_LOCKS;
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index
| 91465549:183:48232:264 | 91465549 | X,GAP | RECORD | `test`.`tbtest1` | IDX_a
select * from information_schema.INNODB_TRX where trx_id=91323715G
trx_id: 91323715
trx_mysql_thread_id: 269
select * from processlist where id=269;
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO
| 269 | root | localhost | test | Sleep | 397 | | NULL
63. PS Replication
• Useful when multi-threaded slave operation
• Group replication
• Reports by applier / coordinator
• Not really useful when using single replication
• Just for fun run all of them:
– Show slave statusG
– select * from replication_applier_configuration G
– select * from replication_applier_status G
– select * from replication_applier_status_by_coordinator G
– select * from replication_applier_status_by_worker G
– select * from replication_connection_configuration G
– select * from replication_connection_status G
– select * from replication_group_member_stats G
– select * from replication_group_members G
66. What you can do with PS
Source: LeFred https://github.com/lefred/pfs2elastic
67. Commands reference slide 1/3
Initial review
select * from sys.x$host_summary;
call sys.ps_statement_avg_latency_histogram()G
select * from sys.host_summary_by_file_io; <-- summary IO by connections (ip sum not single thread)
select * from sys.host_summary_by_file_io_type; <-- summary by IO with max latency and main consumer (by ip)
select * from sys.x$host_summary_by_stages order by 4 desc; <-- summary latency per stage per ip
select * from sys.x$host_summary_by_statement_latency; <-- summary by statement latency by IP with row examined affected and full
scans
select * from sys.x$host_summary_by_statement_type order by lock_latency desc; <-- summary locking latency by statement and IP
User review
select * from sys.user_summary; <-- summary to know what a user is doing including memory utilization
select * from sys.user_summary_by_file_io; <--- user io and latency
select * from sys.x$user_summary_by_file_io_type; <--- summary by user about io consumer and latency
select * from sys.x$user_summary_by_stages where user like 'stress%' order by avg_latency desc; <--- avg latency by stage event and
user
select * from sys.x$user_summary_by_statement_latency ; <--- total statement latency by user
select * from sys.x$user_summary_by_statement_type where user like 'stress%' order by lock_latency desc; <--- summary per user of
statement with latency
Statement
select * from sys.x$statement_analysis where db='test' order by avg_latency desc limit 2 G <--- slow query
select * from sys.x$statements_with_errors_or_warnings order by error_pct desc limit 4G <--- statements generating errors
select * from sys.x$statements_with_full_table_scans order by total_latency desc limit 2 G <--- full table scan (and query not
using index)
select * from sys.x$statements_with_runtimes_in_95th_percentile order by max_latency descG <--- slow query
select * from sys.x$statements_with_temp_tables where disk_tmp_tables > 0 order by exec_count,tmp_tables_to_disk_pct desc limit 5G
<--- Query with temp table
68. Commands reference slide 2/3
InnoDB Buffer Pool (from Information Schema) <--- heavy
select * from sys.innodb_buffer_stats_by_schema;
select * from sys.innodb_buffer_stats_by_table;
select * from sys.innodb_lock_waitsG
I/O
select * from sys.x$io_by_thread_by_latency order by max_latency desc limit 10; <----- io latency by thread
DIG MORE???
select * from sys.processlist where conn_id=1659G <---- which thread from processlist
select * from sys.x$io_global_by_file_by_bytes limit 50; <---- I/O latency/bytes by file
select * from sys.x$io_global_by_file_by_latency limit 50; <---- I/O count of operation /latency per file
select * from sys.io_global_by_wait_by_bytes; <--- I/O global main consumers by wait and bytes
select * from sys.io_global_by_wait_by_latency; <--- I/O global main consumers by latency per
operation read/write
select * from sys.latest_file_io;
More:
select * from sys.x$latest_file_io where thread like '%thread:21'; <-- list specific activity for the thread
select file , count(file), thread, operation from sys.x$latest_file_io group by thread; <-- specific count of the threads
69. Commands reference slide 3/3
Wait
select * from `sys`.`x$waits_global_by_latency`; <-- Lists the top wait events by their total latency, ignoring idle
select * from `sys`.x$wait_classes_global_by_avg_latency; <-- Lists the top wait classes by average latency, ignoring idle
select * from `sys`.x$wait_classes_global_by_latency; <-- Same above but for totals instead AVG).
select * from `sys`.x$waits_by_host_by_latency order by avg_latency desc limit 40; <-- Lists the top wait events per host by their
total latency, ignoring idle
select * from `sys`.x$waits_by_host_by_latency where host='10.0.0.151' order by avg_latency desc limit 40;
select * from events_waits_summary_by_host_by_event_name where host = '10.0.0.151'and count_star >0; <-- similar to the one above
select * from `sys`.x$waits_by_user_by_latency order by avg_latency desc limit 40; <-- Lists the top wait events per user by their
total latency, ignoring idle
select * from `sys`.x$waits_by_user_by_latency where user='stress1' order by avg_latency desc limit 40;
Memory check
select * from `sys`.memory_global_total;
select * from `sys`.x$memory_by_host_by_current_bytes order by current_allocated desc; <-- Summarizes memory use by host.
select * from memory_summary_by_thread_by_event_name where COUNT_ALLOC > 0 and THREAD_ID IN (select THREAD_ID from threads where
PROCESSLIST_HOST='10.0.0.151') order by thread_id, SUM_NUMBER_OF_BYTES_ALLOC desc; <-- Detail memory by host
select * from `sys`.x$memory_by_thread_by_current_bytes order by current_allocated desc limit 50; <-- Summarizes memory use by
thread
select * from memory_summary_by_thread_by_event_name where thread_id in (select THREAD_ID from performance_schema.threads where
PROCESSLIST_HOST='10.0.0.151') and COUNT_ALLOC > 0 order by SUM_NUMBER_OF_BYTES_ALLOC desc; <--- Details for a specific host
select * from `sys`.x$memory_by_user_by_current_bytes order by current_allocated desc limit 50; <---- Summarizes memory use by
user
select * from `sys`.x$memory_global_by_current_allocated order by current_alloc desc limit 50; <---- memory allocated by consumer
select * from `sys`.x$memory_global_by_current_bytes order by current_alloc desc limit 50; <---- Shows the current memory
usage within the server globally broken down by allocation type.
To know memeory used by Performance Schema
SELECT distinct (sum(CURRENT_NUMBER_OF_BYTES_USED)/1024)/1024 as MB FROM memory_summary_global_by_event_name WHERE EVENT_NAME LIKE
'memory/performance_schema/%';
70. Steps & Commands 1/3
Initial review by SQL
select ss.*,(ss.full_scans/ss.total) * 100 as `FScan_%` from sys.x$host_summary_by_statement_latency as ss where max_latency > 0;
Slower tables
select object_schema, object_name, count_star, sum_timer_wait from performance_schema.table_io_waits_summary_by_table order by 4
desc limit 10;
Find the most expensive stages
select *, total_latency/total as latency_by_operation from sys.x$host_summary_by_stages where host like '10%' order by 6 desc limit 10;
Identify the files wait
select *, (total_latency/total) as latency_by_operation from sys.x$host_summary_by_file_io_type where host like '10%' order by
latency_by_operation desc;
Identify the most expensive by type and host with details
Select
host,event,exec_count,total_latency,lock_latency,changed_rows,sent_rows,examined_rows,tmp_table_on_disk,tmp_table,join_scan,join_range,j
oin_select_check,join_full_scan,sort_passes,no_index_used,no_good_index from sys.x$statement_analysis_by_host where host like '10.%'
order by total_latency desc limit 50;
71. Steps & Commands 2/3
Identify which wait event are the expensive ones
SELECT PROCESSLIST_ID,ewh.THREAD_ID,ewh.EVENT_NAME,ewh.SOURCE, sum(ewh.TIMER_WAIT)/1000000000 asWait_ms,
ewh.OBJECT_SCHEMA, ewh.OBJECT_NAME, ewh.OBJECT_TYPE, ewh.OPERATION from events_waits_history ewh Join
events_stages_history_long esth ON ewh.NESTING_EVENT_ID = esth.EVENT_ID Join events_statements_history_long esh ON
esth.NESTING_EVENT_ID =esh.EVENT_ID Join threads th ON ewh.THREAD_ID = th.THREAD_ID where PROCESSLIST_HOST like '10.%'
group by EVENT_NAME,PROCESSLIST_ID,ewh.THREAD_ID,ewh.EVENT_NAME,ewh.SOURCE, ewh.OBJECT_SCHEMA, ewh.OBJECT_NAME,
ewh.OBJECT_TYPE, ewh.OPERATION order by ewh.TIMER_WAIT desc limit 50;
Compare with what id GLOBALLY impacting
select * from sys.x$waits_global_by_latency limit 50;
For each Event dig what is happening
select ewh.EVENT_NAME,ewh.SOURCE,sum(ewh.TIMER_WAIT)/1000000000 as
TWait_ms,ewh.OBJECT_NAME,ewh.OBJECT_TYPE,ewh.OPERATION from events_waits_history ewh Join threads th on ewh.THREAD_ID =
th.THREAD_ID where EVENT_NAME like '%handler%' group by
EVENT_NAME,ewh.SOURCE,ewh.OBJECT_NAME,ewh.OBJECT_TYPE,ewh.OPERATION order by TWait_ms desc limit 10;
72. Steps & Commands 3/3
Time to see data files as well
select ewh.EVENT_NAME,ewh.SOURCE,sum(ewh.TIMER_WAIT)/1000000000 as
TWait_ms,ewh.OBJECT_NAME,ewh.OBJECT_TYPE,ewh.OPERATION from events_waits_history ewh Join threads th on ewh.THREAD_ID =
th.THREAD_ID where EVENT_NAME like '%innodb_data_file%' group by
EVENT_NAME,ewh.SOURCE,ewh.OBJECT_NAME,ewh.OBJECT_TYPE,ewh.OPERATION order by TWait_ms desc limit 10;
And compare with GLOBAL wait for table
select table_name,count_read,sum_timer_read,count_write,sum_timer_write,sum_timer_misc from sys.x$ps_schema_table_statistics_io where
count_read > 0 order by sum_timer_misc desc limit 10;
Check Memory
• Account: select thread_id,user,current_count_used,current_allocated,current_avg_alloc,total_allocated from
x$memory_by_thread_by_current_bytes order by current_allocated desc limit 10;
• Event: select event_name,current_count,current_alloc,high_count,high_alloc from x$memory_global_by_current_allocated
where event_name not like '%performance_schema%' order by current_alloc desc limit 10;
• PS: select event_name,current_count,current_alloc,high_count,high_alloc from x$memory_global_by_current_allocated where
event_name like '%performance_schema%' order by current_alloc desc limit 10;
73. Information schema - what is it?
• Focus on MySQL metadata information
• It is an abstraction, no real tables or directory
• Can be query with SQL statements (SELECT ..)
• Most information can be retrieve using SHOW
• Queries may impact on server performance
74. Information schema – Access Mode
When writing a SQL statement to access IS take in to
account the modes:
•SKIP_OPEN_TABLE: Table files do not need to be opened. The information
has already become available within the query by scanning the database
directory.
•OPEN_FRM_ONLY: Only the table's .frm file need be opened.
•OPEN_TRIGGER_ONLY: Only the table's .TRG file need be opened.
•OPEN_FULL_TABLE: The unoptimized information lookup. The .frm, table files
must be opened.
75. Information schema – Use Explain
A simple query:
SELECT TABLE_SCHEMA, ENGINE, COUNT(1) as 'TABLES', sum(TABLE_ROWS) as 'ROWS',
TRUNCATE(sum(DATA_LENGTH)/pow(1024,2),2) as 'DATA (M)',
TRUNCATE(sum(INDEX_LENGTH)/pow(1024,2),2) as 'INDEX (M)',
TRUNCATE((sum(DATA_LENGTH)+sum(INDEX_LENGTH))/pow(1024,2),2) AS 'TOTAL(M)'
FROM information_schema.tables
WHERE TABLE_SCHEMA <> 'information_schema'
AND TABLE_SCHEMA <> 'mysql'
AND TABLE_SCHEMA <> 'performance_schema'
AND TABLE_TYPE = 'BASE TABLE'
GROUP BY TABLE_SCHEMA, ENGINE WITH ROLLUP;
Extra
Using where; Open_full_table; Scanned all databases; Using filesort
76. Information schema - the main tables for the catalog
• Privileges (General/Schema/Table/Column)
• Structure information
(Table/Column/Views/Routines/Triggers/Files/Tablespaces)
• Server Information (Character_sets/Collations/Plugins/Engines)
• Server behaviour (Processlist/Status)
• InnoDB
– Catalog
– Behaviour
78. Processlist – recipe
• Count user by origin and if active
select count(USER) N,USER,SUBSTRING_INDEX(HOST,':',1) HH ,STATE from
processlist WHERE COMMAND !='Sleep' GROUP BY USER,HH;
• List long running queries
select USER,SUBSTRING_INDEX(HOST,':',1) HH ,STATE,COMMAND,STATE,INFO,TIME
from PROCESSLIST where TIME > 4 Order by Time DESCG
• List Threads by State
select COUNT(USER) N, USER,SUBSTRING_INDEX(HOST,':',1) HH
,STATE,COMMAND,STATE from PROCESSLIST Group By STATE Order by STATEG
79. Partitions – recipe
• Crazy query, reporting Partitions, information, and data
distribution
SELECT TB.TABLE_SCHEMA,TB.TABLE_NAME, TB.TABLE_TYPE,TB.ENGINE,TB.ROW_FORMAT,IBS.SPACE,
IBS.FILE_FORMAT,PARTITION_NAME,PARTITION_ORDINAL_POSITION,TB.TABLE_ROWS,
TB.DATA_LENGTH,TB.INDEX_LENGTH,(TP.TABLE_ROWS/TB.TABLE_ROWS)*100 as PCT_PARTITION_ROWS,
(TP.DATA_LENGTH/TB.DATA_LENGTH)*100 PCT_PARTITION_DATA
from information_schema.TABLES as TB JOIN information_schema.PARTITIONS as TP on
TB.TABLE_SCHEMA = TP.TABLE_SCHEMA AND TB.TABLE_NAME = TP.TABLE_NAME LEFT OUTER JOIN
information_schema.INNODB_SYS_TABLES AS IBS on
CONCAT(TB.TABLE_SCHEMA,'/',CONCAT(TB.TABLE_NAME,CONCAT('#P#',PARTITION_NAME)))=IBS.NAME
where TP.PARTITION_NAME IS NOT NULL;
Extra
Using where; Open_full_table; Scanned all databases
Using where; Open_full_table; Scanned all databases; Using join buffer
(Block Nested Loop)Using where; Using join buffer (Block Nested Loop)
80. Partitions – recipe 2
• Query, reporting Partitions, information, and data
distribution
SELECT TB.TABLE_SCHEMA,TB.TABLE_NAME, count(TB.TABLE_NAME),
TB.TABLE_TYPE,TB.ENGINE,TB.ROW_FORMAT,
TB.TABLE_ROWS,TB.DATA_LENGTH,TB.INDEX_LENGTH from TABLES as TB JOIN
PARTITIONS as TP on TB.TABLE_SCHEMA = TP.TABLE_SCHEMA AND TB.TABLE_NAME
= TP.TABLE_NAME GROUP BY TB.TABLE_SCHEMA,TB.TABLE_NAME ORDER By
1,2,3,5;
Extra
Open_full_table; Scanned all databases; Using
temporary; Using filesort Using where;
Open_full_table; Scanned all databases; Using join buffer (Block Nested
Loop)
81. Statistics – recipe 1
• Query, reporting Partitions, information, and data
distribution
SELECT TB.TABLE_SCHEMA,TB.TABLE_NAME, count(TB.TABLE_NAME) AS NPARTS,
TB.TABLE_TYPE,TB.ENGINE,TB.ROW_FORMAT,
TB.TABLE_ROWS,TB.DATA_LENGTH,TB.INDEX_LENGTH from TABLES as TB JOIN
PARTITIONS as TP on TB.TABLE_SCHEMA = TP.TABLE_SCHEMA AND TB.TABLE_NAME
= TP.TABLE_NAME GROUP BY TB.TABLE_SCHEMA,TB.TABLE_NAME ORDER By
1,2,3,5;
Extra
Open_full_table; Scanned all databases; Using
temporary; Using filesort Using where;
Open_full_table; Scanned all databases; Using join buffer (Block Nested
Loop)
83. Statistics – recipe 2
• Query, reporting possible tables to be ANALYZE
select ST.TABLE_NAME,INDEX_NAME, SEQ_IN_INDEX, COLUMN_NAME,
CARDINALITY,TB.TABLE_ROWS, (CARDINALITY/TB.TABLE_ROWS)*100 AS 'Card_%'
from STATISTICS ST JOIN TABLES TB on ST.TABLE_SCHEMA=TB.TABLE_SCHEMA AND
ST.TABLE_NAME=TB.TABLE_NAME where ST.TABLE_SCHEMA='test' ORDER BY
ST.TABLE_NAME,INDEX_NAME,SEQ_IN_INDEX;
Extra
Using where; Open_full_table; Scanned 1 database; Using
temporary; Using filesort
Using where; Open_full_table; Scanned all databases; Using join buffer
(Block Nested Loop)
84. Statistics – result 2
TABLE_NAME | INDEX_NAME | SEQ_IN_INDEX | COLUMN_NAME | CARDINALITY | TABLE_ROWS | Card_%
tbtest_child3 | bb | 1 | bb | 5511 | 5884 | 93.6608
tbtest_child3 | PRIMARY | 1 | a | 4958 | 5884 | 84.2624
tbtest_child3 | PRIMARY | 2 | bb | 5882 | 5884 | 99.9660
+--------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+-------------------+-----------------------------+
| a | int(11) | NO | PRI | NULL | |
| bb | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| partitionid | int(11) | NO | | 0 | |
| stroperation | varchar(254) | YES | | NULL | |
| time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+--------------+------+-----+-------------------+-----------------------------+
85. FILES - TABLESPACES
• The FILES table provides information about the files in
which MySQL tablespace data is stored.
• The TABLESPACES table provides information about
active tablespaces.
• Quite useful to identify multiple files for tablespace
Unfortunately Bug in 5.7.6 M16 and none of the is working
(Bug #76182)
86. OPTIMIZER_TRACE
• Present from 5.6.3
• Activate on need (SET optimizer_trace="enabled=on";)
• Terrific tool when debugging queries
• Given memory utilization it can be disable (for all)
--maximum-optimizer-trace-max-mem-size=0
--optimizer-trace-max-mem-size=0
87. OPTIMIZER_TRACE example
SET optimizer_trace_offset=-2, optimizer_trace_limit=2;
SET optimizer_trace="enabled=on";
select tbtest2.a from tbtest2 join tbtest_child2 on
tbtest2.a=tbtest_child2.a;
select tbtest1.a from tbtest1 join tbtest_child1 on
tbtest1.a=tbtest_child1.a;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;SET
optimizer_trace="enabled=off";
90. InnoDB – New in SYS_INDEXES
• MERGE_THRESHOLD
– When the “page-full” percentage for an index page
falls below 50%, which is the default
MERGE_THRESHOLD setting, InnoDB attempts to
merge the index page with a neighboring page.
CREATE TABLE t1 (
id INT,
KEY id_index (id)
) COMMENT='MERGE_THRESHOLD=45';
91. InnoDB TRX & Lock
• Detailed information about TRX
• Unique source of information to identify loking threads
– INNODB_TRX Table
– INNODB_LOCKS Table
– INNODB_LOCK_WAITS Tabl
93. InnoDB Lock (example)
select * from INNODB_LOCK_WAITS;
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
| 91459149 | 91459149:183:54275:439 | 91323715 | 91323715:183:54275:439 |
select * from INNODB_LOCKS;
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index
| 91465549:183:48232:264 | 91465549 | X,GAP | RECORD | `test`.`tbtest1` | IDX_a
select * from information_schema.INNODB_TRX where trx_id=91323715G
trx_id: 91323715
trx_mysql_thread_id: 269
select * from processlist where id=269;
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO
| 269 | root | localhost | test | Sleep | 397 | | NULL
94. InnoDB Buffer Pool info
• The INNODB_BUFFER_PAGE table holds information about each
page in the InnoDB buffer pool.
• The INNODB_BUFFER_PAGE_LRU table holds information about
how they are ordered in the LRU list that determines which pages to
evict from the buffer pool when it becomes full
(NOTE DO NOT USE IN PROD)
• The INNODB_BUFFER_POOL_STATS table provides much of the
same buffer pool information provided in SHOW ENGINE INNODB
STATUS
96. InnoDB Buffer Pool Metrics
• By default only 65 enabled over 234
• Enable ONLY what you need & Disable them after
• Enable/reset/disable is simple:
– set global innodb_monitor_enable='index_%';
– set global innodb_monitor_disable='index_%';
– set global innodb_monitor_reset='index_%';
97. InnoDB Buffer Pool Metrics
NAME | COUNT | AVG | COMMENT
|adaptive_hash_searches | 470 | 2.3 |successful
searches using Adaptive Hash Index
|adaptive_hash_searches_btree | 626974 | 311 |searches using B-tree on an index search
|adaptive_hash_pages_added | 71 | 0.3 |index pages on which the Adaptive Hash Index is
built
|adaptive_hash_pages_removed | 2 | 0.0 |index pages whose corresponding Adaptive Hash
Index entries were removed
|adaptive_hash_rows_added | 95028 | 472 |Adaptive Hash Index rows added
|adaptive_hash_rows_removed | 389 | 1.9 |Adaptive Hash Index rows
removed |adaptive_hash_rows_deleted_no_hash_entry | 0 | 0 |
rows deleted that did not have corresponding
Adaptive Hash Index entries
|adaptive_hash_rows_updated | 0 | 0 |Adaptive Hash Index rows updated
Good insight on Adaptive Hash usage!
98. Thanks to…
• Mark Leith (for SYS schema)
• Sveta Smirnova
• Vaerii Kravchuk
• Todd Farmer
• Dimitri Kravtchuk
• Fred Descamps (LeFred)
99. What Next?
• Improve SYS schema
• Develop scripts for data analysis
• Create/improve graphic interface
In short make Performance schema usable
(take a look to VividCortex)
100. Others presentation coming from me
• Group replication VS Galera
• Galera new feature old problems?
• Sharding with MySQL, once more.
• MariaDb and Oracle MySQL where are they going?
• MySQL/Galera pushed beyond the limits
• PS how to get even more but easier
http://www.tusacentral.net
102. Thank you
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.c
om/pages/The-Pythian-
Group/163902527671
@pythian
http://www.linkedin.com/co
To contact Me
tusa@pythian.com
marcotusa@tusacentral.ne
t
To follow me
http://www.tusacentral.n
et/
https://www.facebook.com/marco.tusa.94
@marcotusa
Editor's Notes
Ask who is using
Ask who use 5.6/5.7
Focus on monitoring and understand large set of servers, where thousands is more probable then hundreds
Environment split by cluster so groups of boxes that must behave as a single entity, can fail or get “sick” in one part but not get affected or different level of being affect
PS should be see as a tool not for the DBA to just understand what is happening in that line of code, also if u can do it, but more to be able to do real time check of the part that compose the body.
So while this presentation is going to talk about the ps the tables the command, the main idea is that you should learn how to apply this in order to achieve the most important target that is understand what is going on as whole and be able to measure the ”temperature” or the level of wellness or sickness
Multiple interfaces
The performance schema exposes many different interfaces, for different components, and for different purposes.
Instrumenting interface
-------------------------------------
All the data representing the server internal state exposed in the performance schema must be first collected: this is the role of the instrumenting interface. The instrumenting interface is a coding interface provided by implementors (of the performance schema) to implementors (of the server or server components).
This interface is available to:
C implementations
C++ implementations
the core SQL layer (/sql)
the mysys library (/mysys)
MySQL plugins, including storage engines,
third party plugins, including third party storage engines.
For details, see the instrumentation interface page.
Compiling interface
-------------------------------------
The implementation of the performance schema can be enabled or disabled at build time, when building MySQL from the source code.
When building with the performance schema code, some compilation flags are available to change the default values used in the code, if required.
For more details, see:
./configure --help
To compile with the performance schema:
./configure --with-perfschema
The implementation of all the compiling options is located in
./storage/perfschema/plug.in
Server startup interface
-------------------------------------
The server startup interface consists of the &quot;./mysqld ...&quot; command line used to start the server. When the performance schema is compiled in the server binary, extra command line options are available.
These extra start options allow the DBA to:
enable or disable the performance schema
specify some sizing parameters.
To see help for the performance schema startup options, see:
./sql/mysqld --verbose --help
The implementation of all the startup options is located in
./sql/mysqld.cc, my_long_options[]
Server bootstrap interface
The bootstrap interface is a private interface exposed by the performance schema, and used by the SQL layer. Its role is to advertise all the SQL tables natively supported by the performance schema to the SQL server. The code consists of creating MySQL tables for the performance schema itself, and is used in &apos;./mysql –bootstrap&apos; mode when a server is installed.
The implementation of the database creation script is located in
./scripts/mysql_system_tables.sql
Runtime configuration interface
-------------------------------------
When the performance schema is used at runtime, various configuration parameters can be used to specify what kind of data is collected, what kind of aggregations are computed, what kind of timers are used, what events are timed, etc.
For all these capabilities, not a single statement or special syntax was introduced in the parser. Instead of new SQL statements, the interface consists of DML (SELECT, INSERT, UPDATE, DELETE) against special &quot;SETUP&quot; tables.
For example:
mysql&gt; update performance_schema.SETUP_INSTRUMENTS
set ENABLED=&apos;YES&apos;, TIMED=&apos;YES&apos;;
Query OK, 234 rows affected (0.00 sec)
Rows matched: 234 Changed: 234 Warnings: 0
Internal audit interface
The internal audit interface is provided to the DBA to inspect if the performance schema code itself is functioning properly. This interface is necessary because a failure caused while instrumenting code in the server should not cause failures in the MySQL server itself, so that the performance schema implementation never raises errors during runtime execution.
This auditing interface consists of:
-------------------------------------
SHOW ENGINE PERFORMANCE_SCHEMA STATUS;
It displays data related to the memory usage of the performance schema, as well as statistics about lost events, if any.
The SHOW STATUS command is implemented in
./storage/perfschema/pfs_engine_table.cc
Query interface
-------------------------------------
The query interface is used to query the internal state of a running server. It is provided as SQL tables.
For example:
mysql&gt; select * from performance_schema.EVENTS_WAITS_CURRENT;
Design principles
========================
No behavior changes
========================
The primary goal of the performance schema is to measure (instrument) the execution of the server. A good measure should not cause any change in behavior.
To achieve this, the overall design of the performance schema complies with the following very severe design constraints:
The parser is unchanged. There are no new keywords, no new statements. This guarantees that existing applications will run the same way with or without the performance schema.
All the instrumentation points return &quot;void&quot;, there are no error codes. Even if the performance schema internally fails, execution of the server code will proceed.
None of the instrumentation points allocate memory. All the memory used by the performance schema is pre-allocated at startup, and is considered &quot;static&quot; during the server life time.
None of the instrumentation points use any pthread_mutex, pthread_rwlock, or pthread_cond (or platform equivalents). Executing the instrumentation point should not cause thread scheduling to change in the server.
In other words, the implementation of the instrumentation points, including all the code called by the instrumentation points, is:
malloc free
mutex free
rwlock free
TODO: All the code located in storage/perfschema is malloc free, but unfortunately the usage of LF_HASH introduces some memory allocation. This should be revised if possible, to use a lock-free, malloc-free hash code table.
No performance hit
========================
The instrumentation of the server should be as fast as possible. In cases when there are choices between:
doing some processing when recording the performance data in the instrumentation,
doing some processing when retrieving the performance data,
priority is given in the design to make the instrumentation faster, pushing some complexity to data retrieval.
As a result, some parts of the design, related to:
the setup code path,
the query code path,
might appear to be sub-optimal.
The criterion used here is to optimize primarily the critical path (data collection), possibly at the expense of non-critical code paths.
Unintrusive instrumentation
========================
For the performance schema in general to be successful, the barrier of entry for a developer should be low, so it&apos;s easy to instrument code.
In particular, the instrumentation interface:
is available for C and C++ code (so it&apos;s a C interface),
does not require parameters that the calling code can&apos;t easily provide,
supports partial instrumentation (for example, instrumenting mutexes does not require that every mutex is instrumented)
Extendable instrumentation
As the content of the performance schema improves, with more tables exposed and more data collected, the instrumentation interface will also be augmented to support instrumenting new concepts. Existing instrumentations should not be affected when additional instrumentation is made available, and making a new instrumentation available should not require existing instrumented code to support it.
Versioned instrumentation
========================
Given that the instrumentation offered by the performance schema will be augmented with time, when more features are implemented, the interface itself should be versioned, to keep compatibility with previous instrumented code.
For example, after both plugin-A and plugin-B have been instrumented for mutexes, read write locks and conditions, using the instrumentation interface, we can anticipate that the instrumentation interface is expanded to support file based operations.
Plugin-A, a file based storage engine, will most likely use the expanded interface and instrument its file usage, using the version 2 interface, while Plugin-B, a network based storage engine, will not change its code and not release a new binary.
When later the instrumentation interface is expanded to support network based operations (which will define interface version 3), the Plugin-B code can then be changed to make use of it.
Note, this is just an example to illustrate the design concept here. Both mutexes and file instrumentation are already available since version 1 of the instrumentation interface.
Easy deployment
========================
Internally, we might want every plugin implementation to upgrade the instrumented code to the latest available, but this will cause additional work and this is not practical if the code change is monolithic.
Externally, for third party plugin implementors, asking implementors to always stay aligned to the latest instrumentation and make new releases, even when the change does not provide new functionality for them, is a bad idea.
For example, requiring a network based engine to re-release because the instrumentation interface changed for file based operations, will create too many deployment issues.
So, the performance schema implementation must support concurrently, in the same deployment, multiple versions of the instrumentation interface, and ensure binary compatibility with each version.
In addition to this, the performance schema can be included or excluded from the server binary, using build time configuration options.
Regardless, the following types of deployment are valid:
a server supporting the performance schema + a storage engine that is not instrumented
a server not supporting the performance schema + a storage engine that is instrumented
No performance hit
========================
The instrumentation of the server should be as fast as possible. In cases when there are choices between:
doing some processing when recording the performance data in the instrumentation,
doing some processing when retrieving the performance data,
priority is given in the design to make the instrumentation faster, pushing some complexity to data retrieval.
As a result, some parts of the design, related to:
the setup code path,
the query code path,
might appear to be sub-optimal.
The criterion used here is to optimize primarily the critical path (data collection), possibly at the expense of non-critical code paths.
Unintrusive instrumentation
========================
For the performance schema in general to be successful, the barrier of entry for a developer should be low, so it&apos;s easy to instrument code.
In particular, the instrumentation interface:
is available for C and C++ code (so it&apos;s a C interface),
does not require parameters that the calling code can&apos;t easily provide,
supports partial instrumentation (for example, instrumenting mutexes does not require that every mutex is instrumented)
Extendable instrumentation
As the content of the performance schema improves, with more tables exposed and more data collected, the instrumentation interface will also be augmented to support instrumenting new concepts. Existing instrumentations should not be affected when additional instrumentation is made available, and making a new instrumentation available should not require existing instrumented code to support it.
Versioned instrumentation
========================
Given that the instrumentation offered by the performance schema will be augmented with time, when more features are implemented, the interface itself should be versioned, to keep compatibility with previous instrumented code.
For example, after both plugin-A and plugin-B have been instrumented for mutexes, read write locks and conditions, using the instrumentation interface, we can anticipate that the instrumentation interface is expanded to support file based operations.
Plugin-A, a file based storage engine, will most likely use the expanded interface and instrument its file usage, using the version 2 interface, while Plugin-B, a network based storage engine, will not change its code and not release a new binary.
When later the instrumentation interface is expanded to support network based operations (which will define interface version 3), the Plugin-B code can then be changed to make use of it.
Note, this is just an example to illustrate the design concept here. Both mutexes and file instrumentation are already available since version 1 of the instrumentation interface.
No performance hit
========================
The instrumentation of the server should be as fast as possible. In cases when there are choices between:
doing some processing when recording the performance data in the instrumentation,
doing some processing when retrieving the performance data,
priority is given in the design to make the instrumentation faster, pushing some complexity to data retrieval.
As a result, some parts of the design, related to:
the setup code path,
the query code path,
might appear to be sub-optimal.
The criterion used here is to optimize primarily the critical path (data collection), possibly at the expense of non-critical code paths.
Unintrusive instrumentation
========================
For the performance schema in general to be successful, the barrier of entry for a developer should be low, so it&apos;s easy to instrument code.
In particular, the instrumentation interface:
is available for C and C++ code (so it&apos;s a C interface),
does not require parameters that the calling code can&apos;t easily provide,
supports partial instrumentation (for example, instrumenting mutexes does not require that every mutex is instrumented)
Extendable instrumentation
As the content of the performance schema improves, with more tables exposed and more data collected, the instrumentation interface will also be augmented to support instrumenting new concepts. Existing instrumentations should not be affected when additional instrumentation is made available, and making a new instrumentation available should not require existing instrumented code to support it.
Versioned instrumentation
========================
Given that the instrumentation offered by the performance schema will be augmented with time, when more features are implemented, the interface itself should be versioned, to keep compatibility with previous instrumented code.
For example, after both plugin-A and plugin-B have been instrumented for mutexes, read write locks and conditions, using the instrumentation interface, we can anticipate that the instrumentation interface is expanded to support file based operations.
Plugin-A, a file based storage engine, will most likely use the expanded interface and instrument its file usage, using the version 2 interface, while Plugin-B, a network based storage engine, will not change its code and not release a new binary.
When later the instrumentation interface is expanded to support network based operations (which will define interface version 3), the Plugin-B code can then be changed to make use of it.
Note, this is just an example to illustrate the design concept here. Both mutexes and file instrumentation are already available since version 1 of the instrumentation interface.
Easy deployment
========================
Internally, we might want every plugin implementation to upgrade the instrumented code to the latest available, but this will cause additional work and this is not practical if the code change is monolithic.
Externally, for third party plugin implementors, asking implementors to always stay aligned to the latest instrumentation and make new releases, even when the change does not provide new functionality for them, is a bad idea.
For example, requiring a network based engine to re-release because the instrumentation interface changed for file based operations, will create too many deployment issues.
So, the performance schema implementation must support concurrently, in the same deployment, multiple versions of the instrumentation interface, and ensure binary compatibility with each version.
In addition to this, the performance schema can be included or excluded from the server binary, using build time configuration options.
Regardless, the following types of deployment are valid:
a server supporting the performance schema + a storage engine that is not instrumented
a server not supporting the performance schema + a storage engine that is instrumented========================
Internally, we might want every plugin implementation to upgrade the instrumented code to the latest available, but this will cause additional work and this is not practical if the code change is monolithic.
Externally, for third party plugin implementors, asking implementors to always stay aligned to the latest instrumentation and make new releases, even when the change does not provide new functionality for them, is a bad idea.
For example, requiring a network based engine to re-release because the instrumentation interface changed for file based operations, will create too many deployment issues.
So, the performance schema implementation must support concurrently, in the same deployment, multiple versions of the instrumentation interface, and ensure binary compatibility with each version.
In addition to this, the performance schema can be included or excluded from the server binary, using build time configuration options.
Regardless, the following types of deployment are valid:
a server supporting the performance schema + a storage engine that is not instrumented
a server not supporting the performance schema + a storage engine that is instrumented
update
The thread is getting ready to start updating the table.
Updating
The thread is searching for rows to update and is updating them.
updating main table
join_full_scan are
Select_scan
The number of joins that did a full scan of the first table.
/**
Read [part of] row via [part of] index.
@param[out] buf buffer where store the data
@param key Key to search for
@param keypart_map Which part of key to use
@param find_flag Direction/condition on key usage
@returns Operation status
@retval 0 Success (found a record, and function has
set table-&gt;status to 0)
@retval HA_ERR_END_OF_FILE Row not found (function has set table-&gt;status
to STATUS_NOT_FOUND). End of index passed.
@retval HA_ERR_KEY_NOT_FOUND Row not found (function has set table-&gt;status
to STATUS_NOT_FOUND). Index cursor positioned.
@retval != 0 Error
@note Positions an index cursor to the index specified in the handle.
Fetches the row if available. If the key value is null,
begin at the first key of the index.
ha_index_read_map can be restarted without calling index_end on the previous
index scan and without calling ha_index_init. In this case the
ha_index_read_map is on the same index as the previous ha_index_scan.
This is particularly used in conjunction with multi read ranges.
*/
int handler::ha_index_read_map(uchar *buf, const uchar *key,
/**
Reserves an interval of auto_increment values from the handler.
@param offset offset (modulus increment)
@param increment increment between calls
@param nb_desired_values how many values we want
@param[out] first_value the first value reserved by the handler
@param[out] nb_reserved_values how many values the handler reserved
offset and increment means that we want values to be of the form
offset + N * increment, where N&gt;=0 is integer.
If the function sets *first_value to ULLONG_MAX it means an error.
If the function sets *nb_reserved_values to ULLONG_MAX it means it has
reserved to &quot;positive infinite&quot;.
*/
void handler::get_auto_increment(ulonglong offset, ulonglong increment,
ulonglong nb_desired_values,
ulonglong *first_value,
ulonglong *nb_reserved_values)
{
ulonglong nr;
int error;
DBUG_ENTER(&quot;handler::get_auto_increment&quot;);
(void) extra(HA_EXTRA_KEYREAD);
table-&gt;mark_columns_used_by_index_no_reset(table-&gt;s-&gt;next_number_index,
table-&gt;read_set);
column_bitmaps_signal();
if (ha_index_init(table-&gt;s-&gt;next_number_index, 1))
{
/* This should never happen, assert in debug, and fail in release build */
DBUG_ASSERT(0);
*first_value= ULLONG_MAX;
DBUG_VOID_RETURN;
}
if (table-&gt;s-&gt;next_number_keypart == 0)
{// Autoincrement at key-start
error= ha_index_last(table-&gt;record[1]);
/*
MySQL implicitely assumes such method does locking (as MySQL decides to
use nr+increment without checking again with the handler, in
handler::update_auto_increment()), so reserves to infinite.
*/
*nb_reserved_values= ULLONG_MAX;
}
else
{
uchar key[MAX_KEY_LENGTH];
key_copy(key, table-&gt;record[0],
table-&gt;key_info + table-&gt;s-&gt;next_number_index,
table-&gt;s-&gt;next_number_key_offset);
error= ha_index_read_map(table-&gt;record[1], key,
make_prev_keypart_map(table-&gt;s-&gt;next_number_keypart),
HA_READ_PREFIX_LAST);
/*
MySQL needs to call us for next row: assume we are inserting (&quot;a&quot;,null)
here, we return 3, and next this statement will want to insert
(&quot;b&quot;,null): there is no reason why (&quot;b&quot;,3+1) would be the good row to
insert: maybe it already exists, maybe 3+1 is too large...
*/
*nb_reserved_values= 1;
}
if (error)
{
if (error == HA_ERR_END_OF_FILE || error == HA_ERR_KEY_NOT_FOUND)
{
/* No entry found, start with 1. */
nr= 1;
}
else
{
DBUG_ASSERT(0);
nr= ULLONG_MAX;
}
}
else
nr= ((ulonglong) table-&gt;next_number_field-&gt;
val_int_offset(table-&gt;s-&gt;rec_buff_length)+1);
ha_index_end();
(void) extra(HA_EXTRA_NO_KEYREAD);
*first_value= nr;
DBUG_VOID_RETURN;
}
void handler::ha_release_auto_increment()
{
DBUG_ASSERT(table_share-&gt;tmp_table != NO_TMP_TABLE ||
m_lock_type != F_UNLCK ||
(!next_insert_id && !insert_id_for_cur_row));
DEBUG_SYNC(ha_thd(), &quot;release_auto_increment&quot;);
release_auto_increment();
insert_id_for_cur_row= 0;
auto_inc_interval_for_cur_row.replace(0, 0, 0);
auto_inc_intervals_count= 0;
if (next_insert_id &gt; 0)
{
next_insert_id= 0;
/*
this statement used forced auto_increment values if there were some,
wipe them away for other statements.
*/
table-&gt;in_use-&gt;auto_inc_intervals_forced.empty();
}
}
/**
Reads the last row via index.
int handler::ha_index_last(uchar * buf)
@note Positions an index cursor to the index specified in the handle.
Fetches the row if available. If the key value is null,
begin at the first key of the index.
ha_index_read_map can be restarted without calling index_end on the previous
index scan and without calling ha_index_init. In this case the
ha_index_read_map is on the same index as the previous ha_index_scan.
This is particularly used in conjunction with multi read ranges.
*/
int handler::ha_index_read_map(uchar *buf, const uchar *key,
/**
Read next row via random scan.
/* SQL HANDLER call locks/unlock while scanning (RND/INDEX). */
/*
Whether this is lock or unlock, this should be true, and is to verify that
if get_auto_increment() was called (thus may have reserved intervals or
taken a table lock), ha_release_auto_increment() was too.
*/
Calculates new statistics for a given index and saves them to the index
members stat_n_diff_key_vals[], stat_n_sample_sizes[], stat_index_size and
stat_n_leaf_pages. This function could be slow. */
static
void
dict_stats_analyze_index(
btr_cur_search_to_nth_level
Searches an index tree and positions a tree cursor on a given level.
NOTE: n_fields_cmp in tuple must be set so that it cannot be compared
to node pointer page number fields on the upper levels of the tree!
Note that if mode is PAGE_CUR_LE, which is used in inserts, then
cursor-&gt;up_match and cursor-&gt;low_match both will have sensible values.
If mode is PAGE_CUR_GE, then up_match will a have a sensible value.
If mode is PAGE_CUR_LE , cursor is left at the place where an insert of the
search tuple should be performed in the B-tree. InnoDB does an insert
immediately after the cursor. Thus, the cursor may end up on a user record,
or on a page infimum record. */
st_select_lex::add_table_to_list
/**
Add a table to list of used tables.