Mysql For Developers

•Download as ODP, PDF•

30 likes•17,909 views

Carol McDonald

How to get the Most out of MySQL for developers

Indexes ,[object Object],[object Object],[object Object]

Why is it significant for a developer to know MySQL? ,[object Object]

understanding the database helps you develop better-performing applications ,[object Object]

...than try to fix a slow one after the fact!

MySQL Pluggable Storage Engine Architecture ,[object Object],[object Object]

What makes engines different? ,[object Object]

Memory usage: ,[object Object],[object Object],[object Object]

So... As a developer, what do I need to know about storage engines, without being a MySQL expert? keep in mind the following questions: ,[object Object]

MyISAM Pluggable Storage engine ,[object Object]

high-speed Query and Insert capability ,[object Object]

updates,deletes use table-level locking, slower ,[object Object]

InnoDB Storage engine in MySQL ,[object Object]

good query performance , depending on indexes

row -level locking , MultiVersion Concurrency Control (MVCC) ,[object Object]

high concurrency possible ,[object Object]

Good for Online transaction processing ( OLTP ) ,[object Object]

Ideal for storing and retrieving large amounts of historical data ,[object Object]

Storage Engines Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …” High High Low High High Low Memory cost (relative to other engines) High Med Highest High Med High Bulk insert speed Yes Yes Yes Yes Yes Yes Replication support No No No Yes No No Built-in Cluster/High-availability support No Yes No No Yes No Foreign Key support NA Med Small Med Med Small Storage cost (relative to other engines) No No Yes No No Yes Compressed data NA Yes No Yes Yes Yes Index caches NA Yes No Yes Yes No Data caches No Yes Yes No Yes Yes Geospatial support No Yes No No Yes No MVCC snapshot read Table Row Row Row MVCC Table Locking granularity No Yes No Yes Yes No Transactions Yes 64TB No Yes 110TB No Storage limits Memory InnoDB Archive NDB Falcon MyISAM Feature

Does the storage engine really make a difference? Using mysqlslap, against MySQL 5.1.23rc, the Archive engine has 50% more INSERT throughput compared to MyISAM, and 255% more than InnoDB User Load MyISAM Inserts Per Second InnoDB Inserts Per Second Archive Inserts Per Second 1 3,203.00 2,670.00 3,576.00 4 9,123.00 5,280.00 11,038.00 8 9,361.00 5,044.00 13,202.00 16 8,957.00 4,424.00 13,066.00 32 8,470.00 3,934.00 12,921.00 64 8,382.00 3,541.00 12,571.00

Pluggable storage engines offer Flexibility ,[object Object],master slave innodb isam

Inside MySQL Replication ,[object Object]

can be different than that of the master MySQL Master I/O Thread SQL Thread relay binlog MySQL Slave mysqld data index & binlogs mysqld data binlog Replication Web/App Server Writes & Reads Writes

Using different engines ,[object Object]

The schema Basic foundation of performance ,[object Object]

Data Types ,[object Object],[object Object],[object Object]

better performance for distinct or group by queries

taking normalization way too far http://thedailywtf.com/forums/thread/75982.aspx ,[object Object]

excessively normalized database: ,[object Object],[object Object]

De-normalized better for reads , reporting

Cache selected columns in memory table Normalize first denormalize later

Data Types: Smaller, smaller , smaller ,[object Object]

The smaller your data types, The more index (and data ) can fit into a block of memory , the faster your queries will be. ,[object Object]

Especially for indexed fields Smaller = less disk=less memory= better performance

Choose your Numeric Data Type ,[object Object]

Require 8, 16, 24, 32, and 64 bits of space. ,[object Object],[object Object],[object Object],[object Object]

when updates rare (updates fragment) ,[object Object],[object Object]

nullable columns make indexes , index statistics, and value comparisons more complicated . ,[object Object],[object Object]

smaller, smaller, smaller The Pygmy Marmoset world's smallest monkey The more records you can fit into a single page of memory/disk, the faster your seeks and scans will be ,[object Object]

Use TEXT sparingly ,[object Object],[object Object],[object Object]

only if there is good selectivity: ,[object Object],[object Object]

Look to add indexes on columns used in WHERE and GROUP BY expressions

PRIMARY KEY, UNIQUE , and Foreign key Constraint columns are automatically indexed. ,[object Object]

MyISAM index structure Non-clustered organisation 1-100 Data file containing unordered data records 1-33 34-66 67-100 Non leaf Nodes store keys, along with pointers to nodes Leaf nodes store index keys with pointers to a row data

Clustered organisation (InnoDB) So, bottom line : When looking up a record by a primary key, for a clustered layout/ organisation , the lookup operation (following the pointer from the leaf node to the data file) is not needed. 1-100 1-33 34-66 67-100 Non leaf Nodes store keys, along with pointers to nodes leaf nodes actually contain all the row data for the record

What's hot

Data Guard Architecture & SetupSatishbabu Gunukula

MOUG17: How to Build Multi-Client APEX ApplicationsMonica Li

SQL Server 2016 noveltiesMSDEVMTL

Nabil Nawaz Oracle Oracle 12c Data Guard Deep Dive PresentationNabil Nawaz

Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...DataStax

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax

ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESLudovico Caldara

High Availability And Oracle Data Guard 11g R2Mario Redón Luz

Modernizing your database with SQL Server 2019Antonios Chatzipavlis

Exadata Smart Scan - What is so smart about it?Uwe Hesse

Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax

Database Performance Tuning Arno Huetter

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax

MySQL 8.0 InnoDB Cluster demoKeith Hollman

MariaDB: Connect Storage EngineKangaroot

Citus Architecture: Extending Postgres to Build a Distributed DatabaseOzgun Erdogan

MySQL 8.0 Released UpdateKeith Hollman

שבוע אורקל 2016Aaron Shilo

MySQL configuration - The most important VariablesFromDual GmbH

What's hot (20)

Data Guard Architecture & Setup

MOUG17: How to Build Multi-Client APEX Applications

SQL Server 2016 novelties

Nabil Nawaz Oracle Oracle 12c Data Guard Deep Dive Presentation

Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...

ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES

High Availability And Oracle Data Guard 11g R2

Modernizing your database with SQL Server 2019

Exadata Smart Scan - What is so smart about it?

Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

Database Performance Tuning

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...

MySQL 8.0 InnoDB Cluster demo

MariaDB: Connect Storage Engine

Citus Architecture: Extending Postgres to Build a Distributed Database

MySQL 8.0 Released Update

שבוע אורקל 2016

MySQL configuration - The most important Variables

Viewers also liked

MySQL High Availability and Disaster Recovery with Continuent, a VMware companyContinuent

MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?Sveta Smirnova

Load Data Fast!Karwin Software Solutions LLC

MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI

MySQL Group Replication - HandsOn TutorialKenny Gryp

MySQL Replication Performance Tuning for Fun and Profit!Vitor Oliveira

Mysql参数-GDBzhaolinjnu

Advanced Percona XtraDB Cluster in a nutshell... la suiteKenny Gryp

MySQL High Availability SolutionsLenz Grimmer

淘宝数据库架构演进历程zhaolinjnu

Inno db internals innodb file formats and source code structurezhaolinjnu

Extensible Data ModelingKarwin Software Solutions LLC

Capturing, Analyzing and Optimizing MySQLRonald Bradford

MySQL Group ReplicationKenny Gryp

Mysql high availability and scalabilityyin gong

Мониторинг и отладка MySQL: максимум информации при минимальных потеряхSveta Smirnova

Multi Source Replication With MySQL 5.7 @ VerisureKenny Gryp

What you wanted to know about MySQL, but could not find using inernal instrum...Sveta Smirnova

Galera cluster for high availability Mydbops

Hbase源码初探zhaolinjnu

Viewers also liked (20)

MySQL High Availability and Disaster Recovery with Continuent, a VMware company

MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?

Load Data Fast!

MySQL InnoDB Cluster - A complete High Availability solution for MySQL

MySQL Group Replication - HandsOn Tutorial

MySQL Replication Performance Tuning for Fun and Profit!

Mysql参数-GDB

Advanced Percona XtraDB Cluster in a nutshell... la suite

MySQL High Availability Solutions

淘宝数据库架构演进历程

Inno db internals innodb file formats and source code structure

Extensible Data Modeling

Capturing, Analyzing and Optimizing MySQL

MySQL Group Replication

Mysql high availability and scalability

Мониторинг и отладка MySQL: максимум информации при минимальных потерях

Multi Source Replication With MySQL 5.7 @ Verisure

What you wanted to know about MySQL, but could not find using inernal instrum...

Galera cluster for high availability

Hbase源码初探

Similar to Mysql For Developers

MySQL: Know more about open Source DatabaseMahesh Salaria

15 Ways to Kill Your Mysql Application Performanceguest9912e5

MySQL: Know more about open Source DatabaseMahesh Salaria

The thinking persons guide to data warehouse designCalpont

Mohan Testingsmittal81

My sql storage enginesVasudeva Rao

Building better SQL Server DatabasesColdFusionConference

Quick And Easy Guide To Speeding Up MySQL for web developersJonathan Levin

How to Fine-Tune Performance Using Amazon RedshiftAWS Germany

MySQL 8 Server Optimization Swanseacon 2018Dave Stokes

SQLServer Database Structures Antonios Chatzipavlis

MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes

My Database Skills Killed the ServerColdFusionConference

Perl and ElasticsearchDean Hamstead

World-class Data Engineering with Amazon RedshiftLars Kamp

Star schema my sqldeathsubte

Mysql database basic user guidePoguttuezhiniVP

Expert summit SQL Server 2016Łukasz Grala

High Performance Mysqlliufabin 66688

MemSQL 201: Advanced Tips and Tricks WebcastSingleStore

Similar to Mysql For Developers (20)

MySQL: Know more about open Source Database

15 Ways to Kill Your Mysql Application Performance

MySQL: Know more about open Source Database

The thinking persons guide to data warehouse design

Mohan Testing

My sql storage engines

Building better SQL Server Databases

Quick And Easy Guide To Speeding Up MySQL for web developers

How to Fine-Tune Performance Using Amazon Redshift

MySQL 8 Server Optimization Swanseacon 2018

SQLServer Database Structures

MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco

My Database Skills Killed the Server

Perl and Elasticsearch

World-class Data Engineering with Amazon Redshift

Star schema my sql

Mysql database basic user guide

Expert summit SQL Server 2016

High Performance Mysql

MemSQL 201: Advanced Tips and Tricks Webcast

More from Carol McDonald

Introduction to machine learning with GPUsCarol McDonald

Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Carol McDonald

Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBCarol McDonald

Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...Carol McDonald

Predicting Flight Delays with Spark Machine LearningCarol McDonald

Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald

Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald

How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald

Demystifying AI, Machine Learning and Deep LearningCarol McDonald

Spark graphxCarol McDonald

Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald

Streaming patterns revolutionary architectures Carol McDonald

Spark machine learning predicting customer churnCarol McDonald

Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald

Applying Machine Learning to Live Patient DataCarol McDonald

Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald

Apache Spark Machine Learning Decision TreesCarol McDonald

Advanced Threat Detection on Streaming DataCarol McDonald

More from Carol McDonald (20)

Introduction to machine learning with GPUs

Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...

Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB

Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...

Predicting Flight Delays with Spark Machine Learning

Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB

Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...

How Big Data is Reducing Costs and Improving Outcomes in Health Care

Demystifying AI, Machine Learning and Deep Learning

Spark graphx

Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...

Streaming patterns revolutionary architectures

Spark machine learning predicting customer churn

Fast Cars, Big Data How Streaming can help Formula 1

Applying Machine Learning to Live Patient Data

Streaming Patterns Revolutionary Architectures with the Kafka API

Apache Spark Machine Learning Decision Trees

Advanced Threat Detection on Streaming Data

Mysql For Developers

1. MySQL for Developers Carol McDonald, Java Architect

4. Data types

6. JPA lazy loading

7. Resources

9. take advantage of MySQL's strengths

10.

11. ...than try to fix a slow one after the fact!

12.

13.

14.

15.

16. Is the data constantly changing ?

17. Is the data mostly logs ( INSERT s)?

18. requirements for reports ?

19. Requirements for transactions ?

20.

21.

22.

23. Non-transactional

24.

25.

26. good query performance , depending on indexes

27.

28.

29.

30.

31.

32. " lookup " or "mapping" tables,

33. calculated table counts,

34. for caching Session or temporary tables

35.

36. Great compression rates

37. No UPDATEs

38.

39. Data that can never be updated

40. Storage Engines Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …” High High Low High High Low Memory cost (relative to other engines) High Med Highest High Med High Bulk insert speed Yes Yes Yes Yes Yes Yes Replication support No No No Yes No No Built-in Cluster/High-availability support No Yes No No Yes No Foreign Key support NA Med Small Med Med Small Storage cost (relative to other engines) No No Yes No No Yes Compressed data NA Yes No Yes Yes Yes Index caches NA Yes No Yes Yes No Data caches No Yes Yes No Yes Yes Geospatial support No Yes No No Yes No MVCC snapshot read Table Row Row Row MVCC Table Locking granularity No Yes No Yes Yes No Transactions Yes 64TB No Yes 110TB No Storage limits Memory InnoDB Archive NDB Falcon MyISAM Feature

41. Does the storage engine really make a difference? Using mysqlslap, against MySQL 5.1.23rc, the Archive engine has 50% more INSERT throughput compared to MyISAM, and 255% more than InnoDB User Load MyISAM Inserts Per Second InnoDB Inserts Per Second Archive Inserts Per Second 1 3,203.00 2,670.00 3,576.00 4 9,123.00 5,280.00 11,038.00 8 9,361.00 5,044.00 13,202.00 16 8,957.00 4,424.00 13,066.00 32 8,470.00 3,934.00 12,921.00 64 8,382.00 3,541.00 12,571.00

42.

43.

44. can be different than that of the master MySQL Master I/O Thread SQL Thread relay binlog MySQL Slave mysqld data index & binlogs mysqld data binlog Replication Web/App Server Writes & Reads Writes

45.

46.

47.

48.

49. Only store related data in a table

50. reduces database size and errors

51.

52. better performance for distinct or group by queries

53.

54.

55. De-normalized better for reads , reporting

56.

57. Cache selected columns in memory table Normalize first denormalize later

58.

59.

60. Especially for indexed fields Smaller = less disk=less memory= better performance

61.

62.

63.

64.

65.

66.

67.

68.

69. Keep primary keys small

70.

71.

72.

73.

74. Look to add indexes on columns used in WHERE and GROUP BY expressions

75.

76. MyISAM index structure Non-clustered organisation 1-100 Data file containing unordered data records 1-33 34-66 67-100 Non leaf Nodes store keys, along with pointers to nodes Leaf nodes store index keys with pointers to a row data

77. Clustered organisation (InnoDB) So, bottom line : When looking up a record by a primary key, for a clustered layout/ organisation , the lookup operation (following the pointer from the leaf node to the data file) is not needed. 1-100 1-33 34-66 67-100 Non leaf Nodes store keys, along with pointers to nodes leaf nodes actually contain all the row data for the record

78.

79.

80.

81.

82.

83.

84.

85.

86.

87. EXPLAIN: the execution plan EXPLAIN returns a row of information for each " table " used in the SELECT statement The "table" can mean a real table, a temporary table, a subquery, a union result.

88. EXPLAIN example

89.

90. No WHERE condition

91. Understanding EXPLAIN EXPLAIN SELECT * FROM customer WHERE custid=1 id: 1 select_type: SIMPLE table: customer type : const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: primary key lookup primary key used in the WHERE very fast because the table has at most one matching row constant

92. Range Access type EXPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16' id: 1 select_type: SIMPLE table: rental type : range possible_keys: rental_date key: rental_date key_len: 8 ref: null rows: 364 Extra: Using where rental_date must be Indexed

93. Full Table Scan EXPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-05-16' id: 1 select_type: SIMPLE table: rental type : ALL possible_keys: rental_date key: null key_len: null ref: null rows: 16000 Extra: Using where when range returns a lot of rows, > 20% table, forces scan If too many rows estimated returned, scan will be used instead

94.

95.

96. When optimizer sees a condition will return > ~20% of the rows in a table, it will prefer a scan versus many seeks

97.

98. No index on any field in WHERE condition

99. Poor selectivity on an indexed field

100. Too many records meet WHERE condition

101. scans can be a sign of poor indexing

102.

103.

104.

105. ref : index access = good

106. index : index tree is scanned = bad

107.

108. filesort or Using temporary = bad means a covering index was found ( good !) ‏ means a full index tree scan ( bad !) ‏

109. EXPLAIN example

110.

111.

112. Indexed columns and functions don't mix indexed column should be alone on left of comparison mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'Tr%' *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: range possible_keys: idx_title key: idx_title key_len: 767 ref: NULL rows: 15 Extra: Using where mysql> EXPLAIN SELECT * FROM film WHERE LEFT(title ,2) = 'Tr' *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 951 Extra: Using where Nice. In the top query, we have a fast range access on the indexed field title Oops. here we have a slower full table scan because of the function operating on the indexed field (the LEFT() function)‏

113.

114. Partitioning in MySQL 5.1 is horizontal partitioning for data warehousing Niccolò Machiavelli The Art of War, (1519-1520) : divide the forces of the enemy co me weaker .”

115.

116.

117. Understanding the Query Cache Caches the complete query Clients Parser Optimizer Query Cache Pluggable Storage Engine API MyISAM InnoDB MEMORY Falcon Archive PBXT SolidDB Cluster (Ndb)‏ Connection Handling & Net I/O “ Packaging”

118.

119.

120. Solving multiple problems in one query SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) – TO_DAYS(order_created) <= 7 ; First, we are operating on an indexed column ( order_created ) with a function TO_DAYS – let's fix that: SELECT * FROM Orders WHERE order_created >= CURRENT_DATE() - INTERVAL 7 DAY ; We want to get the orders that were created in the last 7 days we rewrote the WHERE expression to remove the function on the index , we still have a the function CURRENT_DATE() in the statement, which eliminates this query from being placed in the query cache – let's fix that

121.

122. What if there are fields we don't need ? Could cause large result set which may not fit into the query cache and may force a disk-based temporary table SELECT order_id, customer_id, order_total, order_created FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;

123.

124. Archival and Date based partitioning CREATE TABLE cust (id int) ENGINE=MyISAM PARTITION BY RANGE (id) ( PARTITION P1 VALUES LESS THAN (10), PARTITION P2 VALUES LESS THAN (20) ) Cust_id 1-999 Cust_id 1000-1999 Cust_id 2000-2999 MySQL Partitioning Web/App Servers

125. Scalability: Sharding - Application Partitioning Cust_id 1-999 Cust_id 1000-1999 Cust_id 2000-2999 Sharding Architecture Web/App Servers

126.

127.

128. only retrieve the data your application needs! public class Employee{ @OneToMany (mappedBy = "employee") private Collection<Address> addresses; ..... }

129.

130.

131.

132. GA Target: December 2009

133. MySQL Server 5.4 Scalability improvements Solaris x86 sysbench benchmark – MySQL 5.4 vs. 5.1

134. Dozens for Reads Dozens for Reads Dozens for Reads Dozens for Reads # of Slaves Master/Slave(s) No Yes MySQL Replication Yes No Varies No No MySQL Replication Master/Slave(s) No Yes MySQL Replication Yes No Varies No Yes MySQL Replication + Heartbeat Active/Passive If configured correctly MySQL Replication MySQL Replication MySQL Replication Yes < 30 secs Yes Yes MySQL, Heartbeat + DRBD Yes Automated DB Fail Over 255 # of Nodes per Cluster Yes Write Intensive Yes Read Intensive Yes Built-in Load Balancing Scalability MySQL Replication Geographic Redundancy Yes Auto Resynch of Data < 3 secs Typical Fail Over Time No Automated IP Fail Over Availability MySQL Cluster Requirements

135. MySQL: #3 Most Deployed Database Gartner 2006 Source: Gartner 63% Are Deploying MySQL or Are Planning To Deploy

136.

137.

138. Training

139. Consulting

140.

141. Monthly Rapid Updates

142. Quarterly Service Packs

143. Hot Fix Program

144.

145. Web-Based Central Console

146. Built-in Advisors

147. Expert Advice

148.

149. Web-Based Knowledge Base

150. Consultative Help

151. Bug Escalation Program

152.

153. Optimal performance, reliability, security, and uptime Open-source server with pluggable APIs Monitoring Enterprise manager Query analysis Hot fixes Service packs Best practices rules Knowledge base 24x7 support Advanced backup Load balancer

154.

155. Auto-discovery of MySQL servers, replication topologies

156. Customizable rules-based monitoring and alerts

157. Identifies problems before they occur

158. Reduces risk of downtime

159. Makes it easier to scale out without requiring more DBAs A Virtual MySQL DBA Assistant!

160.

161. facebook Application Facebook is a social networking site Key Business Benefit MySQL has enabled facebook to grow to 70 million users. Why MySQL? “ We are one of the largest MySQL web sites in production. MySQL has been a revolution for young entrepreneurs.” Owen Van Natta Chief Operating Officer Facebook

162.

163. Performance: 13,000 TPS on Sun Fire x4100

164. Scalability: Designed for 10x future growth

165. Monitoring: MySQL Enterprise Monitor Chris Kasten, Kernel Framework Group, eBay

166. Zappos Application $800 Million Online Retailer of shoes. Zappos stocks over 3 million items. Key Business Benefit Zappos selected MySQL because it was the most robust, affordable database software available at the time. Why MySQL? "MySQL provides the perfect blend of an enterprise-level database and a cost-effective technology solution. In my opinion, MySQL is the only database we would ever trust to power the Zappos.com website." Kris Ongbongan, IT Manager

167. Glassfish and MySQL Part 2

168. Catalog Sample Java EE Application DB Registration Application Managed Bean JSF Components Session Bean Entity Class Catalog Item ManagedBean

169. Glassfish and MySQL Part 3

170. Web Service Client Catalog Sample JAX-WS Application DB Registration Application Managed Bean JSF Components Web Service Entity Class Catalog EJB Item ManagedBean SOAP

171. Glassfish and MySQL Part 4

172. RIA App REST Web Services Persistence-tier DataBase RESTful Catalog DB Registration Application JAX-RS class JavaFX JAXB class Entity Class ItemsConverter Item ItemsResource HTTP

173.

174.

175. Use good indexing

176.

177. http://java.sun.com/developer/technicalArticles/glassfish/GFandMySQL_Part2.html

178. http://java.sun.com/developer/technicalArticles/glassfish/GFandMySQL_Part4Intro.html

Editor's Notes

To get the most from MySQL, you need to understand its design , MySQL's architecture is very different from that of other database servers, and makes it useful for a wide range of purposes. The point about understanding the database is especially important for Java developers, who typically will use an abstraction or ORM layer like Hibernate, which hides the SQL implementation (and often the schema itself). ORMs tend to obscure the database schema for the developer, which leads to poorly-performing index and schema strategies, one-engine designs that are not optimal, and queries that use inefficient SQL constructs such as correlated subqueries.
query parsing, analysis, optimization, caching, all the built-in functions, stored procedures, triggers, and views is provided across storage engines. A storage engine is responsible for storing and retrieving all the data stored . The storage engines have different functionality, capabilities and performance characteristics, A key difference between MySQL and other database platforms is the pluggable storage engine architecture of MySQL, which allows you to select a specialized storage engine for a particular application need such as data warehousing, transaction processing, high availability... in many applications choosing the right storage engine can greatly improve performance. IMPORTANT: There is not one single best storage engine. Each one is good for specific data and application characteristics. Query cache is a MySQL-specific result-set cache that can be excellent for read-intense applications but must be guarded against for mixed R/W apps.
Each set of the pluggable storage engine infrastructure components are designed to offer a selective set of benefits for a particular application Some of the key differentiations include: Concurrency -- some applications have more granular lock requirements (such as row-level locks) than others. Choosing the right locking strategy can reduce overhead and therefore help with overall performance. This area also includes support for capabilities like multi-version concurrency control or 'snapshot'? read. Transaction Support - not every application needs transactions, but for those that do, there are very well defined requirements like ACID compliance and more. Referential Integrity - the need to have the server enforce relational database referential integrity through DDL defined foreign keys. Physical Storage - this involves everything from the overall page size for tables and indexes as well as the format used for storing data to physical disk. Index Support - different application scenarios tend to benefit from different index strategies, and so each storage engine generally has its own indexing methods, although some (like B-tree indexes) are common to nearly all engines. Memory Caches - different applications respond better to some memory caching strategies than others, so while some memory caches are common to all storage engines (like those used for user connections, MySQL's high-speed Query Cache, etc.), others are uniquely defined only when a particular storage engine is put in play. Performance Aids - includes things like multiple I/O threads for parallel operations, thread concurrency, database checkpointing, bulk insert handling, and more. Miscellaneous Target Features - this may include things like support for geospatial operations, security restrictions for certain data manipulation operations, and other like items.
The MySQL storage engines provide flexibility to database designers, and also to allow for the server to take advantage of different types of storage media. Database designers can choose the appropriate storage engines based on their application’s needs. each one comes with a distinct set of benefits and drawbacks As we discuss each of the available storage engines in depth, keep in mind the following questions: · What type of data will you eventually be storing in your MySQL databases? · Is the data constantly changing? · Is the data mostly logs (INSERTs)? · Are your end users constantly making requests for aggregated data and other reports? · For mission-critical data, will there be a need for foreign key constraints or multiplestatement transaction control? The answers to these questions will affect the storage engine and data types most appropriate for your particular application.
MyISAM excels at high-speed operations that don't require the integrity guarantees (and associated overhead) of transactions MyISAM locks entire tables, not rows. Readers obtain shared (read) locks on all tables they need to read. Writers obtain exclusive (write) locks. However, you can insert new rows into the table while select queries are running against it (concurrent inserts). This is a very important and useful feature. Read-only or read-mostly tables Tables that contain data used to construct a catalog or listing of some sort (jobs, auctions, real estate, etc.) are usually read from far more often than they are written to. This makes them good candidates for MyISAM It is a great engine for data warehouses because of that environment's high read-to-write ratio and the need to fit large amounts of data in a small amount of space MyISAM doesn't support transactions or row-level locks. MyISAM is not a good general purpose storage engine for any application that has: a) high concurrency b) lots of UPDATEs or DELETEs (INSERTs and SELECTs are fine)
InnoDB - supports ACID transactions, multi-versioning, row-level locking, foreign key constraints, crash recovery, and good query performance depending on indexes. InnoDB uses row-level locking with multiversion concurrency control (MVCC). MVCC can allow fewer row locks by keeping data snapshots. Depending on the isolation level, InnoDB does not require any locking for a SELECT. This makes high concurrency possible, with some trade-offs: InnoDB requires more disk space compared to MyISAM, and for the best performance, lots of memory is required for the InnoDB buffer pool. InnoDB is a good choice for any order processing application, any application where transactions are required. InnoDB was designed for transaction processing. Its performance and automatic crash recovery make it popular for non transactional storage needs, too. When you deal with any sort of order processing, transactions are all but required. Another important consideration is whether the engine needs to support foreign key constraints.
Memory - stores all data in RAM for extremely fast access. Useful when you need fast access to data that doesn't change or doesn't need to persist after a restart. Good for &quot;lookup&quot; or &quot;mapping&quot; tables, for caching the results of periodically aggregated data, for intermediate results when analyzing data. The Memory Engine tables are useful when you need fast access to data that either never changes or doesn't need to persist after a restart. Memory tables are generally faster . All of their data is stored in memory, so queries don't have to wait for disk I/O. The table structure of a Memory table persists across a server restart, but no data survives. good uses for Memory tables:For &quot;lookup&quot; or &quot;mapping&quot; tables, such as a table that maps postal codes to state names For caching the results of periodically aggregated data For intermediate results when analyzing data Memory tables support HASH indexes, which are very fast for lookup queries. . They use table-level locking, which gives low write concurrency, and they do not support TEXT or BLOB column types. They also support only fixed-size rows, so they really store VARCHARs as CHARs, which can waste memory.
Archive tables are ideal for logging and data acquisition, where analysis tends to scan an entire table, or where you want fast INSERT queries on a replication master. # Archive - provides for storing and retrieving large amounts of seldom-referenced historical, archived, or security audit information. More specialized engines: FEDERATED – Kind of like “linked tables” in MS SQL Server or MS Access. Allows a remote server's tables to be used as if they were local. Not good performance, but can be useful at times. NdbCluster – Highly-available clustered storage engine. Very specialized and much harder to administer than regular MySQL storage engines CSV – stores in tab-delimited format. Useful for large bulk imports or exports Blackhole – the /dev/null storage engine. Useful for benchmarking and some replication scenarios # Merge - allows to logically group together a series of identical MyISAM tables and reference them as one object. Good for very large DBs like data warehousing.
you can use multiple storage engines in a single application; you are not limited to using only one storage engine in a particular database. So, you can easily mix and match storage engines for the given application need. This is often the best way to achieve optimal performance for truly demanding applications: use the right storage engine for the right job. You can use multiple storage engines in a single application. This is particularly useful in a replication setup where a master copy of a database on one server is used to supply copies, called slaves, to other servers. A storage engine for a table in a slave can be different than a storage engine for a table in the master. In this way, you can take advantage of each engine's abilities. For instance, assume a master with two slaves environment. We can have InnoDB tables on the master, for referential integrity and transactional safety. One slave can also be set up with innoDB or the ARCHIVE engine in order to do backups in a consistent state. Another can be set up with MyISAM and MEMORY tables in order to take advantage of FULLTEXT (MyISAM) or HASH-based indexing (MEMORY).
In a normalized database, each fact is represented once and only once. Conversely, in a denormalized database, information is duplicated, or stored in multiple places. People who ask for help with performance issues are frequently advised to normalize their schemas, especially if the workload is write-heavy. This is often good advice. It works well for the following reasons: Normalized updates are usually faster than denormalized updates. When the data is well normalized, there's little or no duplicated data, so there's less data to change. Normalized tables are usually smaller, so they fit better in memory and perform better. The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values. Consider the preceding example: it's impossible to get a distinct list of departments from the denormalized schema without DISTINCT or GROUP BY, but if DEPARTMENT is a separate table, it's a trivial query. The drawbacks of a normalized schema usually have to do with retrieval. Any nontrivial query on a well-normalized schema will probably require at least one join, and perhaps several. This is not only expensive, but it can make some indexing strategies impossible. For example, normalizing may place columns in different tables that would benefit from belonging to the same index.
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Database normalization minimizes duplication of information, this makes updates simpler and faster because the same information doesn't have to be updated in multiple tables. With a normalized database: * updates are usually faster. * there's less data to change. * tables are usually smaller, use less memory, which can give better performance. * better performance for distinct or group by queries
In a normalized database, each fact is represented once and only once. Conversely, in a denormalized database, information is duplicated, or stored in multiple places. People who ask for help with performance issues are frequently advised to normalize their schemas, especially if the workload is write-heavy. This is often good advice. It works well for the following reasons: Normalized updates are usually faster than denormalized updates. When the data is well normalized, there's little or no duplicated data, so there's less data to change. Normalized tables are usually smaller, so they fit better in memory and perform better. The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values. Consider the preceding example: it's impossible to get a distinct list of departments from the denormalized schema without DISTINCT or GROUP BY, but if DEPARTMENT is a separate table, it's a trivial query. The drawbacks of a normalized schema usually have to do with retrieval. Any nontrivial query on a well-normalized schema will probably require at least one join, and perhaps several. This is not only expensive, but it can make some indexing strategies impossible. For example, normalizing may place columns in different tables that would benefit from belonging to the same index.
In a denormalized database, information is duplicated, or stored in multiple places. The disadvantages of a normalized schema are queries typically involve more tables and require more joins which can reduce performance. Also normalizing may place columns in different tables that would benefit from belonging to the same index, which can also reduce query performance. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of application. You should normalize your schema first, then de-normalize later. Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. A denormalized schema works well because everything is in the same table, which avoids joins. If you don't need to join tables, the worst case for most queries—even the ones that don't use indexes—is a full table scan. This can be much faster than a join when the data doesn't fit in memory, because it avoids random I/O. A single table can also allow more efficient indexing strategies. In the real world, you often need to mix the approaches, possibly using a partially normalized schema, cache tables, and other techniques. The most common way to denormalize data is to duplicate, or cache, selected columns from one table in another table.
In general, try to use the smallest data type that you can. Small and simple data types usually give better performance because it means fewer disk accesses (less I/O), more data in memory, and less CPU to process operations.
If you're storing whole numbers, use one of the integer types: TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT. These require 8, 16, 24, 32, and 64 bits of storage space, respectively. They can store values from –2(N–1) to 2(N–1)–1, where N is the number of bits of storage space they use. FLOAT, DOUBLE: supports approximate calculations with standard floating-point math. DECIMAL: use DECIMAL when you need exact results, always use for monetary/currency fields. Floating-point types typically use less space than DECIMAL to store the same range of values use DECIMAL only when you need exact results for fractional numbers BIT: to store 0,1 values.
INT(1) does not mean 1 digit! The number in parentheses is the ZEROFILL argument, and specifies the number of characters some tools reserve for display purposes. For storage and computational purposes, INT(1) is identical to INT(20). Integer data types work best for primary key data types. Use UNSIGNED when you don't need negative numbers, this doubles the bits of storage space. BIGINT is not needed for AUTO_INCREMENT, INT UNSIGNED stores 4.3 billion values! Always use DECIMAL for monetary/currency fields, never use FLOAT or DOUBLE!
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters to store. VARCHAR(n) stores variable-length character strings. VARCHAR uses only as much space as it needs, which helps performance because it saves disk space.. However, because the rows are variable-length, they can grow when you update them, which can cause extra work. use VARCHAR when the maximum column length is much larger than the average length; when updates to the field are rare, so fragmentation is not a problem; CHAR(n) is fixed-length: MySQL allocates enough space for the specified number of characters. Useful to store very short strings, when all the values are nearly the same length, and for data that's changed frequently. AR is also better than VARCHAR for data that's changed frequently, Changing an ENUM or SET field's definition requires an entire rebuild of the table . When VARCHAR Is Bad VARCHAR(255) Poor Design - No understanding of underlying data Disk usage may be efficient MySQL internal memory usage is not
Use NOT NULL always unless you want or really expect NULL values You should define fields as NOT NULL whenever you can.It's harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated.if you're planning to index columns, avoid making them nullable if possible. NOT NULL Saves up to a byte per column per row of data Double benefit for indexed columns NOT NULL DEFAULT '' is bad design
smaller is usually better. In general, try to use the smallest data type that can correctly store and represent your data. Simple is good. Fewer CPU cycles are typically required to process operations on simpler data types. Disk = Memory = Performance Every single byte counts Less disk accesses and more data in memory
Indexes are data structures that help retrieve row data with specific column values faster. Indexes can especially improve performance for larger data bases. ,but they do have some downsides. Index information needs to be updated every time there are changes made to the table. This means that if you are constantly updating, inserting and removing entries in your table this could have a negative impact on performance. You can add an index to a table with CREATE INDEX
Most MySQL storage engines support B-tree indexes. a B-tree is a tree data structure that sorts data values, tree nodes define the upper and lower bounds of the values in the child nodes. B-trees are kept balanced by requiring that all leaf nodes are at the same depth. MyISAM Leaf nodes have pointers to the row data corresponding to the index key .
In a clustered layout, the leaf nodes actually contain all the data for the record (not just the index key, like in the non-clustered layout) so When looking up a record by a primary key, for a clustered layout/organization, the lookup operation (following the pointer from the leaf node to the data file) involved in a non-clustered layout is not needed. InnoDB leaf nodes refers to the index by its primary key values. InnoDB's clustered indexes store the row data in the leaf nodes, it's called clustered because rows with close primary key values are stored close to each other. This can make retrieving indexed data fast, since the data is in the index. But this can be slower for updates , secondary indexes, and for full table scans.
Covering Indexes are indexes that contain all the data values needed for a query, these queries can improve performance because the row does not have to be read. Covering indexes When MySQL can locate every field needed for a specific table within an index (as opposed to the full table records) the index is known as a covering index . Covering indexes are critically important for performance of certain queries and joins. When a covering index is located and used by the optimizer, you will see “ Using index” show up in the Extra column of the EXPLAIN output.
You need to understand the SQL queries your application makes and evaluate their performance To Know how your query is executed by MySQL, you can harness the MySQL slow query log and use EXPLAIN. Basically you want to make your queries access less data: is your application retrieving more data than it needs, are queries accessing too many rows or columns? is MySQL analyzing more rows than it needs? Indexes are a good way to reduce data access. When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. Developers should run EXPLAIN on all SELECT statements that their code is executing against the database. This ensures that missing indexes are picked up early in the development process and gives developers insight into how the MySQL optimizer has chosen to execute the query.
MySQL Query Analyzer The MySQL Query Analyzer is designed to save time and effort in finding and fixing problem queries. It gives DBAs a convenient window, with instant updates and easy-to-read graphics, The analyzer can do simple things such as tell you how long a recent query took and how the optimizer handled it (the results of EXPLAIN statements). But it can also give historical information such as how the current runs of a query compare to earlier runs. Most of all, the analyzer will speed up development and deployment because sites will use it in conjunction with performance testing and the emulation of user activity to find out where the choke points are in the application and how they can expect it to perform after deployment. The MySQL Query Analyzer saves time and effort in finding and fixing problem queries by providing: Aggregated view into query execution counts, run time, result sets across all MySQL servers with no dependence on MySQL logs or SHOW PROCESSLIST Sortable views by all monitored statisticsSearchable and sortable queries by query type, content, server, database, date/time, interval range, and &quot;when first seen&quot;Historical and real-time analysis of all queries across all serversDrill downs into sampled query execution statistics, fully qualified with variable substitutions, and EXPLAIN results The new MySQL Query Analyzer was added into the MySQL Enterprise Monitor and it packs a lot of punch for those wanting to ensure their systems are free of bad running SQL code. let me tell you the two things I particularly like about it from a DBA perspective: 1. It's Global: If you have a number of servers, you'll love what Query Analyzer does for you. Even Oracle and other DB vendors only provide single-server views of bad SQL that runs across their servers. Query Analyzer bubbles to the top the worst SQL across all your servers – which is a much more efficient way to work. No more wondering what servers you need to spend your time on or which have the worst code. 2. It's Smart: Believe it or not, sometimes it's not slow-running SQL that kills your system – it's SQL that executes way more times than you think it is. You really couldn't see this well before Query Analyzer, but now you can. One customer already shaved double-digits off their response time by finding queries that were running more much than they should have been. And that's just one area Query Analyzer looks at; there's much more intelligence there too, along with other stats you can't get from the general server utilities.
When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. EXPLAIN returns a row of information for each &quot;table&quot; used in the SELECT statement, which shows each part and the order of the execution plan. The &quot;table&quot; can mean a real schema table, a derived or temporary table, a subquery, a union result... Developers should run EXPLAIN on all SELECT statements that their code is executing against the database. This ensures that missing indexes are picked up early in the development process and gives developers insight into how the MySQL optimizer has chosen to execute the query.
. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. EXPLAIN returns a row of information for each &quot;table&quot; used in the SELECT statement, which shows each part and the order of the execution plan. The &quot;table&quot; can mean a real schema table, a derived or temporary table, a subquery, a union result. rows: the number of rows MySQL estimates it must examine to execute the query. type The “access strategy” used to grab the data in this set possible_keys keys available to optimizer keys keys chosen by the optimizer rows An estimate of the number of rows Extra Extra information the optimizer chooses to give you Extra: additional information about how MySQL resolves the query. Watch out for Extra values of Using filesort and Using temporary. Using index means information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index (Covering Index).
. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. EXPLAIN returns a row of information for each &quot;table&quot; used in the SELECT statement, which shows each part and the order of the execution plan. The &quot;table&quot; can mean a real schema table, a derived or temporary table, a subquery, a union result. rows: the number of rows MySQL estimates it must examine to execute the query. type The “access strategy” used to grab the data in this set possible_keys keys available to optimizer keys keys chosen by the optimizer rows An estimate of the number of rows Extra Extra information the optimizer chooses to give you Extra: additional information about how MySQL resolves the query. Watch out for Extra values of Using filesort and Using temporary. Using index means information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index (Covering Index).
How do you know if a scan is used? In the EXPLAIN output, the “type” for the table/set will be “ALL” or “index”. “ALL” means a full table data record scan is performed. “index” means a full index record scan. Avoid them by ensuring indexes are on columns that are used in the WHERE, ON, and GROUP BY clauses.
system, or const: very fast because the table has at most one matching row (For example a primary key used in the WHERE) The const access strategy is just about as good as you can get from the optimizer. It means that a WHERE clause was provided in the SELECT statement that used: ● an equality operator ● on a field indexed with a unique non-nullable key ● and a constant value was supplied The access strategy of system is related to const and refers to when a table with only a single row is referenced in the SELECT
let's assume we need to find all rentals that were made between the 14th and 16th of June, 2005. We'll need to make a change to our original SELECT statement to use a BETWEEN operator: SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16'G As you can see, the access strategy chosen by the optimizer is the range type. This makes perfect sense, since we are using a BETWEEN operator in the WHERE clause. The BETWEEN operator deals with ranges, as do <, <=, IN, >, >=. The MySQL optimizer is highly optimized to deal with range optimizations. Generally, range operations are very quick, but here's some things you may not be aware of regarding the range access strategy: An index must be available on the field operated upon by a range operator If too many records are estimated to be returned by the condition, the range operator won't be used an index or a full table scan will instead be preferred The field must not be operated on by a function call
To demonstrate this scan versus seek choice, the range query has been modified to include a larger range of rental_dates. the optimizer is no longer using the range access strategy, because the number of rows estimated to be matched by the range condition > certain % of total rows in the table which the optimizer uses to determine whether to perform a single scan or a seek operation for each matched record. In this case, the optimizer chose to perform a full table scan, which corresponds to the ALL access strategy you see in the type column of the EXPLAIN output
The scan vs seek dilemma Behind the scenes, the MySQL optimizer has to decide what access strategy to use in order to retrieve information from the storage engine. One of the decisions it must make is whether to do a seek operation or a scan operation. A seek operation, generally speaking, jumps into a random place -- either on disk or in memory -- to fetch the data needed. The operation is repeated for each piece of data needed from disk or memory. A scan operation, on the other hand, will jump to the start of a chunk of data, and sequentially read data -- either from disk or from memory -- until the end of the chunk of data. With large amounts of data, sequentially scanning through contiguous data on disk or in memory is faster than performing many random seek operations. MySQL keeps stats about the uniqueness of values in an index in order to estimate the rows returned (rows in the explain output). If the estimated number of matched rows is greater than a certain % of total rows in the table, then MySQL will do a scan.
The ALL access strategy (Full Table Scan) The full table scan (ALL type column value) is definitely something you want to watch out for, particularly if: ? You are not running a data warehouse scenario ? You are supplying a WHERE clause to the SELECT ? You have very large data sets Sometimes, full table scans cannot be avoided -- and sometimes they can perform better than other access strategies -- but generally they are a sign of a lack of proper indexing on your schema. If you don't have an appropriate index, no range optimization
Covering Indexes are indexes that contain all the data values needed for a query, these queries can improve performance because the row does not have to be read. Covering indexes When MySQL can locate every field needed for a specific table within an index (as opposed to the full table records) the index is known as a covering index . Covering indexes are critically important for performance of certain queries and joins. When a covering index is located and used by the optimizer, you will see “ Using index” show up in the Extra column of the EXPLAIN output.
Remember that “index” in the type column means a full index scan. “ Using index” in the Extra column means a covering index is being used. The benefit of a covering index is that MySQL can grab the data directly from the index records and does not need to do a lookup operation into the data file or memory to get additional fields from the main table records. One of the reasons that using SELECT * is not a recommended practice is because by specifying columns instead of *, you have a better chance of hitting a covering index.
. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. EXPLAIN returns a row of information for each &quot;table&quot; used in the SELECT statement, which shows each part and the order of the execution plan. The &quot;table&quot; can mean a real schema table, a derived or temporary table, a subquery, a union result. rows: the number of rows MySQL estimates it must examine to execute the query. type The “access strategy” used to grab the data in this set possible_keys keys available to optimizer keys keys chosen by the optimizer rows An estimate of the number of rows Extra Extra information the optimizer chooses to give you Extra: additional information about how MySQL resolves the query. Watch out for Extra values of Using filesort and Using temporary. Using index means information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index (Covering Index).
Indexes can quickly find the rows that match a WHERE clause, however this works only if the index is NOT used in a function or expression in the WHERE clause.
In the 1 st example a fast range &quot;access strategy&quot; is chosen by the optimizer, and the index scan on title is used to winnow the query results down. 2 nd example A slow full table scan (the ALL&quot;access strategy&quot;) is used because a function (LEFT) is operating on the title column. Operating on an indexed column with a function (in this case the LEFT() function) means the optimizer cannot use the index to satisfy the query. Typically, you can rewrite queries in order to not operate on an indexed column with a function.
the main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reduced Vertical Partitioning – this partitioning scheme is traditionally used to reduce the width of a target table by splitting a table vertically so that only certain columns are included in a particular dataset, with each partition including all rows. An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another. Horizontal Partitioning – this form of partitioning segments table rows so that distinct groups of physical row-based datasets are formed that can be addressed individually (one partition) or collectively (one-to-all partitions). All columns defined to a table are found in each set of partitions so no actual table attributes are missing. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.
An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another. • limit number of columns per table • split large, infrequently used columns into a separate one-to-one table By removing the VARCHAR column from the design, you actually get a reduction in query response time. Beyond partitioning, this speaks to the effect wide tables can have on queries and why you should always ensure that all columns defined to a table are actually needed.
Here is an example of improving a query: SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) – TO_DAYS(order_created) <= 7; First, we are operating on an indexed column (order_created) with a function TO_DAYS – let's fix that: SELECT * FROM Orders WHERE order_created >= CURRENT_DATE() - INTERVAL 7 DAY;
Although we rewrote the WHERE expression to remove the function on the index, we still have a non-deterministic function CURRENT_DATE() in the statement, which eliminates this query from being placed in the query cache. Any time a non-deterministic function is used in a SELECT statement, the query cache ignores the query. In read-intensive applications, this can be a significant performance problem. – let's fix that: SELECT * FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY; We replaced the function with a constant (probably using our application programming language). However, we are specifying SELECT * instead of the actual fields we need from the table. What if there is a TEXT field in Orders called order_memo that we don't need to see? Well, having it included in the result means a larger result set which may not fit into the query cache and may force a disk-based temporary table. – let's fix that: SELECT order_id, customer_id, order_total, order_created FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;
An important new 5.1 feature is horizontal partitioning # Increased performance – during scan operations, the MySQL optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. Partitioning is best suited for VLDB's that contain a lot of query activity that targets specific portions/ranges of one or more database tables. other situations lend themselves to partitioning as well (e.g. data archiving, etc.) good for datawarehousing not designed for OLTP environments
Lazy loading and JPA With JPA many-to-one and many-to-many relationships lazy load by default , meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the &quot;many&quot; objects in a relationship, it will cause n+1 selects where n is the number of &quot;many&quot; objects. You can change the relationship to be loaded eagerly as follows : public class Employee{ @OneToMany(mappedBy = &quot;employee&quot;, fetch = FetchType.EAGER) private Collection<Address> addresses; ..... } However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections. If you want to temporarily override the LAZY fetch type, you could use Fetch Join. For example this query would eagerly load the employee addresses: @NamedQueries({ @NamedQuery(name=&quot;getItEarly&quot;, query=&quot;SELECT e FROM Employee e JOIN FETCH e.addresses&quot;)}) public class Employee{ ..... }
Lazy loading and JPA With JPA many-to-one and many-to-many relationships lazy load by default , meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the &quot;many&quot; objects in a relationship, it will cause n+1 selects where n is the number of &quot;many&quot; objects. You can change the relationship to be loaded eagerly as follows : public class Employee{ @OneToMany(mappedBy = &quot;employee&quot;, fetch = FetchType.EAGER) private Collection<Address> addresses; ..... } However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections. If you want to temporarily override the LAZY fetch type, you could use Fetch Join. For example this query would eagerly load the employee addresses: @NamedQueries({ @NamedQuery(name=&quot;getItEarly&quot;, query=&quot;SELECT e FROM Employee e JOIN FETCH e.addresses&quot;)}) public class Employee{ ..... }
Facebook is an excellent example of a company that started using MySQL in its infancy and has scaled MySQL to become one of the top 10 most trafficked web sites in the world. Facebook uses deploys hundreds of MySQL Servers with Replication in multiple data centers to manage: - 175M active users - 26 billion photos - Serve 250,000 photos every second Facebook is also a heavy user of Memcached, an open source caching layer to improve performance and scalability: - Memcache handles 50,000-100,000 requests/second alleviating the database burden MySQL also helps Facebook manage their Facebook applications 20,000 applications which are helping other web properties grow exponentially. iLike (Music Sharing) added 20,000 users/hour after launching their facebook application
- eBay is a heavy Oracle user, but Oracle was become too expensive and it was cost-prohibitive to deploy new applications.b - MySQL is used to run the eBay’s Personalization Platform which serves advertisements based on user interest. - A business critical system running on MySQL Enterprise for one of the largest scale websites in the world - Highly scalable and low cost system that handles all of eBay’s personalization and session data needs - Ability to handle 4 billion requests per day of 50/50 read/write operations for approximately 40KB of data per user / session - Approx 25 Sun 4100’s running 100% of eBay’s personalization and session data service (2 CPU, Dual core Opteron, 16 GB RAM, Solaris 10 x86) - Highly manageable system for entire operational life cycle - Leveraging MySQL Enterprise Dashboard as a critical tool in providing insight into system performance, trending, and identifying issues - Adding new applications to ebay.com domain that previously would have been in a different domain because of cookie constraints - Creating several new business opportunities that would not have been possible without this new low cost personalization platform - Leveraging MySQL Memory Engine for other types of caching tiers that are enabling new business opportunities
Zappos is one of the world's largest online retailers with over $1 billion in annual sales. They focus on selling shoes, handbags, eyewear as well as other apparel. However their primary focus is delivering superior customers service. They believe delivering the best customer services is key to a successful online shopping experience. MySQL plays a critical role in delivering that customer service by providing Zappos with: High performance and scalability enabling millions of customers to shop on Zappos.com every day. 99.99% database availability so that Zappos' customers don't experience service interruptions that impact revenue - A cost-effective solution saving Zappos over $1 million per year, allowing them to spend more money on their customer service and less on their technical infrastructure. Since Zappos was founded in 1999 they have used MySQL as their primary database to power their web site, internal tools and reporting tasks. In the early days of Zappos, they could not afford a proprietary enterprise database. But, as Zappos has grown, MySQL has been able to scale with their business making it a perfect solution even at their current sales volume. Its been an important piece of infrastructure that they have scaled as the company has grown to $1 billion in sales. Compared to proprietary enterprise systems, Zappos estimates they are saving about $1 million per year in licensing fees and salaries of dedicated DBAs that can only manage individual systems. In the lifetime of Zappos, they estimate they have saved millions of dollars using MySQL.

Mysql For Developers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Mysql For Developers

Similar to Mysql For Developers (20)

More from Carol McDonald

More from Carol McDonald (20)

Mysql For Developers

Editor's Notes