SlideShare a Scribd company logo
1 of 26
Download to read offline
Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
MySQL 8.0 & Unicode
Why, what & how
Bernt Marius Johnsen
Senior Software QA Engineer
2Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Agenda
Why Unicode
What is character set/collation etc.
How to migrate and some issues to
consider
1
2
3
4
5
6
7
3Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Why Unicode?
●
The whole world is moving towards Unicode as digital devices is used by more and
more people across all cultures all around the globe.
– Approximate billion users of the six most used writing systems:
Latin1: ~5, Chinese: ~1.5, Arabic: ~0.7, Devanagari: ~0.5, Cyrillic: ~0.25, Bengali: ~0.22, Kana:
~0.12
●
One driving force is Emojis
– Smileys, hearts, roses etc, and all the stuff people are sending to each other when communicating
these days. )(���
–
“Useful” example: Unicode character 0x1F574, MAN IN BUSINESS SUIT LEVITATING: �
1This is way more letters than just ASCII!
4Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Why Unicode in a database?
●
You may use one character set for all your data,
for all purposes.
– E.g. if you make an application, utf8mb4 for a table with
names, it may be used by Russians, Chinese, Japanese
etc.
– Even esoteric extinct writing systems are covered like
e.g. the Phaistos disc (look it up...)
– But not Klingon, nor Tengwar �
5Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What is Unicode?
●
Unicode is a computing industry standard for the consistent encoding,
representation, and handling of text expressed in most of the world's writing
systems. (Wikipedia)
●
ISO/IEC 10646
●
Unicode covers most existing and extinct writing systems known to man in
one standard.
●
The standard has allocated 17 planes, blocks of characters are allocated into
the planes
●
Three planes defined so far:
● 0x0000-0xFFFF: Basic Multilingual Plane (BMP)
● 0x10000-0x1FFFF: Supplementary Multilingual Plane (SMP)
● 0x20000-0x2FFFF: Supplementary Ideographic Plane (SIP)
6Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What is a CHARACTER SET?
●
A character set is defined by:
– A repertoire of characters/graphemes
– A value given to each character/grapheme (codepoint)
– An encoding which defines the binary representation of the
values
7Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What is Encoding?
●
The binary representation of a character. Unicode
defines 3 encodings:
– UTF-8 (1-4 bytes per character)
– UTF-16 (2 or 4 bytes per character)
– UTF32 (4 bytes per character)
8Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Character set examples
Character Character set Value Encoding Encoded as
A ASCII
ISO-8859-1 (Latin-1)
Unicode
41
41
0041
1:1
1:1
UTF-8
UTF16
41
41
41
0041
Ä ISO-8859-1 (Latin-1)
Unicode
C4
00C4
1:1
UTF-8
UTF16
C4
C384
00C4
д KOI8-R
ISO-8859-5
Unicode
C4
D4
0434
1:1
1:1
UTF-8
UTF-16
C4
D4
D0B4
0434
人 GB-18030
Unicode
Big5
JIS X 0208 (SJIS)
C8CB
4EBA
A448
906C
1:1
UTF-8
UTF-16
1:1
1:1
C8CB
E4BABA
4EBA
A448
906C
� Unicode
GB-18030
1F574
9439EE36
UTF8
UTF-16
1:1
F09F95B4
D83DDD74
9439EE36
9Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What is collation
●
Collation is the assembly of written information into a standard order
(Wikipedia)
●
Collation may consider
– Case (e.g 'A' vs. 'a')
– Accents (e.g. 'E' vs. 'É')
– Locale-specific rules (e.g. 'A' vs. 'Å' vs. 'AA' in Danish and Norwegian)
– Numeric characters (e.g. '2' vs. ' ')ⅱ
– Punctuation (e.g. 'blackbird' vs. 'black-bird')
– Etc.
●
10Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What is a COLLATION in (My)SQL?
●
In MySQL, a COLLATION is a set of rules for a given character set which
defines an order and affects:
– ORDER BY
– LIKE
– Primary keys and indexes
– Unique constraints
– Comparison operators
– Some string functions
●
All strings in MySQL have a character set and a collation
11Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Character sets in MySQL
+­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­+
| Charset  | Description                     | Default collation   | Maxlen |
+­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­+
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| utf8     | UTF­8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF­8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
Get all by typing:
mysql> show character set;
The rest of them are:
armscii8, big5, binary, cp1250, cp1251, cp1256, cp1257, cp850, cp852, cp866, cp932, dec8, eucjpms,
euckr, gb18030, gb2312, gbk, geostd8, greek, hebrew, hp8, keybcs2, koi8r, koi8u, latin2, latin5, latin7,
macce, macroman, sjis, swe7, tis620, ucs2, ujis, utf16, utf16le, utf32
12Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
What's new in MySQL 8.0
●
utf8mb4 will be the default character set for 8.0
●
utf8mb4_0900_ai_ci is the default collation of
utf8mb4
●
A lot of new collations based on Unicode v. 9.0.0
– UCA (Unicode Collation Algorithm)
– DUCET (Default Unicode Collation Entry Table)
– CLDR v.30 (Common Locale Data Repository)
13Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
MySQL 8.0 collation name scheme
●
<charset>[_<language> [_<variant>]]_<unicodeversion>(_<attribute>)+
– <charset> = utf8mb4
– <language>, an ISO 639-1 language code (or ISO 639-2 if needed)
– <variant>, a variant to the standard collation for the language.
Per today: utf8mb4_de_pb_0900_* and utf8mb4_es_trad_0900_*.
– <unicodeversion> = 0900
– <attribute>: accent sensitivity (ai, as), case sensitivity (ci, cs) and possible future ones.
●
Special collations:
– Default collation (not language specific): utf8mb4_0900_ai_ci
● may be used for German dictionary order, English, French1, Irish Gaelic, Indonesian, Italian, Luxembourgian,
Malay, Dutch, Portuguese, Swahili and Zulu
– Codepoint order: utf8mb4_bin
1) Canadian French may not use utf8mb4_0900_as_cs collations due to differences to standard accent order.
14Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Why not ...
● Fix utf8mb4_general_ci instead of introducing
utf8mb4_0900_ai_ci or fix utf8mb4_german2_ci instead of
introducing utf8mb4_de_pb_0900_ai_ci?
– Because that might break existing applications using the old collations (The
most serious issue for large databases: Indexes would have to be rebuilt).
Policy: Collations don't change!
●
Have a simpler name scheme?
– Because we prepare for
● More languages
● New Unicode versions (Unicode 10.0.0 is expected in 2018)
– ISO-639-1/ISO-639-2 language codes are well defined
15Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
How to migrate?
●
When migrating from 5.7 tables:
– Just convert the table:
ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4;
● This will change the default character set of the table (so that future
new columns get utf8mb4) and the character set of all applicable
columns.
●
In principle, all character data in MySQL may be
converted to utf8mb4 without loss of data.
That was easy ..... is that all to it ... ?
16Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
… not quite … column by column
●
If you have more complex tables with different character sets:
– Change the default character set of the table:
ALTER TABLE foo DEFAULT CHARACTER SET utf8mb4;
– Modify all relevant relevant columns:
ALTER TABLE foo MODIFY bar VARCHAR(100) CHARACTER SET 
utf8mb4;
Generally we recommend doing it column by column.
– ALTER TABLE … CONVERT … will e.g. change TEXT to MEDIUMTEXT
when you convert from latin1 to utf8mb4 and that won't necessarily be
what you want.
17Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
… not quite … the schema too
●
A schema (aka. database) in MySQL has a default character set which
will be the default character set of new tables in the schema
– mysql> show create schema bar;
+­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+
| Database | Create Database                                                |
+­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+
| bar      | CREATE DATABASE `bar` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+
1 row in set (0.00 sec)
●
Change the default character set of the schema(database):
ALTER SCHEMA bar DEFAULT CHARACTER SET utf8mb4;
18Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
… not quite … collation differences
Collations are not equal, so converting from one collation to another may
break UNIQUE constraints (e.g PRIMARY KEY).
●
Default collation:
– latin1_swedish_ci vs. utf8mb4_0900_ai_ci
E.g. 'a'='å' is false in the first, but true in the other.
– Possible solution: Stick to Swedish/Danish/Norwegian depending on your application.:
ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4 COLLATE 
utf8mb4_sv_0900_ai_ci;
– Generally, if you don't care about case insensitivity (just got it by default),
utf8mb4_0900_as_cs should be safe.
●
There's an huge number of possibilities depending on your data and the
collations used, partly because pre MySQL 8.0 collations where not complete
(and in some cases not correct).
19Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
… not quite … index and key issues
●
If you change the collation of a column, indexes on that column will be
regenerated.
– This takes time for large data, and the table is locked during that time.
– And the conversion may fail due to changed space consumption.
●
Max key length is 3072 bytes1, which implies that max length of a utf8mb4
varchar column which is also a key is 768 characters (Worst case scenario: 4
bytes per character).
– mysql> create table foo (v varchar(1000) character set latin1 primary key);
Query OK, 0 rows affected (0.01 sec)
mysql> alter table foo modify v varchar(1000) character set utf8mb4;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
1For default InnoDB row format and default innodb_page_size in MySQL 8.0. See the documentation for details.
20Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Space consumption
●
utf8mb4 use
– 1 byte for ASCII characters (0x00-0x7F),
– 2 bytes for most alphabets/abjads (0x80-0x7FF),
– 3 bytes for Indic scripts, Hangul, Kana, the most used
CJK Ideographs (0x800-0xFFFF),
– 4 bytes for the rest: Archaic scripts, Emojis, Rarely used
CJK extensions etc. (0x10000-)
21Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Speed issues
●
Operations on multibyte character sets inherently
slower than singlebyte character sets (e.g. latin1
vs. utf8mb4)
●
We are working on a lot of code improvements.
– Expect a performance degradation in the order of 10-
20% for sorting when you migrate from e.g latin1 to
utf8mb4, depending on your data of course.
22Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
⚠ 文字化け (Mojibake)
… or you what you see is not what you get...
mysql> create table foo(v varchar(10) character set latin1);
mysql> insert into foo values('å');
mysql> set names latin1;
mysql> insert into foo values('å');
mysql> set names utf8;
mysql> select * from foo;
+­­­­­­+
| v    |
+­­­­­­+
| å    |
| Ã¥   |
+­­­­­­+
2 rows in set (0.00 sec)
mysql> select hex(v) from foo;
+­­­­­­­­+
| hex(v) |
+­­­­­­­­+
| E5     |
| C3A5   |
+­­­­­­­­+
2 rows in set (0.00 sec)
23Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Truly usable for global purposes.....
24Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Q&A
�U+1F634
25Copyright © 2017 Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is
intended for information purposes only, and may not be incorporated
into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing
decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole
discretion of Oracle.
MySQL 8.0 & Unicode: Why, what & how

More Related Content

What's hot

MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...Frederic Descamps
 
20190615 hkos-mysql-troubleshootingandperformancev2
20190615 hkos-mysql-troubleshootingandperformancev220190615 hkos-mysql-troubleshootingandperformancev2
20190615 hkos-mysql-troubleshootingandperformancev2Ivan Ma
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8Ivan Ma
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMorgan Tocker
 
MySQL 5.7 -- SCaLE Feb 2014
MySQL 5.7 -- SCaLE Feb 2014MySQL 5.7 -- SCaLE Feb 2014
MySQL 5.7 -- SCaLE Feb 2014Dave Stokes
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014Dave Stokes
 
MySQL Query Optimization
MySQL Query OptimizationMySQL Query Optimization
MySQL Query OptimizationMorgan Tocker
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Dave Stokes
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMorgan Tocker
 
New awesome features in MySQL 5.7
New awesome features in MySQL 5.7New awesome features in MySQL 5.7
New awesome features in MySQL 5.7Zhaoyang Wang
 
Introduction to MySQL Document Store
Introduction to MySQL Document StoreIntroduction to MySQL Document Store
Introduction to MySQL Document StoreFrederic Descamps
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store ProcedureMarco Tusa
 
Oracle to MySQL 2012
Oracle to MySQL  2012 Oracle to MySQL  2012
Oracle to MySQL 2012 Marco Tusa
 
State ofdolphin short
State ofdolphin shortState ofdolphin short
State ofdolphin shortMandy Ang
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespaceMarco Tusa
 
What's New MySQL 8.0?
What's New MySQL 8.0?What's New MySQL 8.0?
What's New MySQL 8.0?OracleMySQL
 

What's hot (20)

MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
 
20190615 hkos-mysql-troubleshootingandperformancev2
20190615 hkos-mysql-troubleshootingandperformancev220190615 hkos-mysql-troubleshootingandperformancev2
20190615 hkos-mysql-troubleshootingandperformancev2
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
MySQL 5.7 -- SCaLE Feb 2014
MySQL 5.7 -- SCaLE Feb 2014MySQL 5.7 -- SCaLE Feb 2014
MySQL 5.7 -- SCaLE Feb 2014
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
 
MySQL NoSQL APIs
MySQL NoSQL APIsMySQL NoSQL APIs
MySQL NoSQL APIs
 
MySQL Query Optimization
MySQL Query OptimizationMySQL Query Optimization
MySQL Query Optimization
 
MySQL 5.7 + JSON
MySQL 5.7 + JSONMySQL 5.7 + JSON
MySQL 5.7 + JSON
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server Changes
 
MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017
 
New awesome features in MySQL 5.7
New awesome features in MySQL 5.7New awesome features in MySQL 5.7
New awesome features in MySQL 5.7
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Introduction to MySQL Document Store
Introduction to MySQL Document StoreIntroduction to MySQL Document Store
Introduction to MySQL Document Store
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store Procedure
 
Oracle to MySQL 2012
Oracle to MySQL  2012 Oracle to MySQL  2012
Oracle to MySQL 2012
 
State ofdolphin short
State ofdolphin shortState ofdolphin short
State ofdolphin short
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespace
 
What's New MySQL 8.0?
What's New MySQL 8.0?What's New MySQL 8.0?
What's New MySQL 8.0?
 

Viewers also liked

MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?Norvald Ryeng
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17Alkin Tezuysal
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQLDag H. Wanvik
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions oysteing
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...Sveta Smirnova
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagJean-François Gagné
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationKenny Gryp
 
Jeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB ClusterJeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB ClusterFrederic Descamps
 
Jeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document StoreJeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document StoreFrederic Descamps
 
Driving Design through Examples
Driving Design through ExamplesDriving Design through Examples
Driving Design through ExamplesCiaranMcNulty
 
MySQL 5.7の次のMySQL 8.0はどんなものになるだろう
MySQL 5.7の次のMySQL 8.0はどんなものになるだろうMySQL 5.7の次のMySQL 8.0はどんなものになるだろう
MySQL 5.7の次のMySQL 8.0はどんなものになるだろうyoku0825
 
Glemte forskningshelter
Glemte forskningshelterGlemte forskningshelter
Glemte forskningshelterOlaf Husby
 
Oracle my sql-or-nosql
Oracle my sql-or-nosqlOracle my sql-or-nosql
Oracle my sql-or-nosqlSky Jian
 
MySQL Tuning For CPU Bottleneck
MySQL Tuning For CPU BottleneckMySQL Tuning For CPU Bottleneck
MySQL Tuning For CPU BottleneckSky Jian
 
MySQL Scalability Mistakes - OTN
MySQL Scalability Mistakes - OTNMySQL Scalability Mistakes - OTN
MySQL Scalability Mistakes - OTNRonald Bradford
 
The History and Future of the MySQL ecosystem
The History and Future of the MySQL ecosystemThe History and Future of the MySQL ecosystem
The History and Future of the MySQL ecosystemRonald Bradford
 
10x Performance Improvements - A Case Study
10x Performance Improvements - A Case Study10x Performance Improvements - A Case Study
10x Performance Improvements - A Case StudyRonald Bradford
 
Lessons Learned Managing Large AWS Environments
Lessons Learned Managing Large AWS EnvironmentsLessons Learned Managing Large AWS Environments
Lessons Learned Managing Large AWS EnvironmentsRonald Bradford
 
MySQL性能调优最佳实践
MySQL性能调优最佳实践MySQL性能调优最佳实践
MySQL性能调优最佳实践Sky Jian
 

Viewers also liked (20)

MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Jeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB ClusterJeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB Cluster
 
Jeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document StoreJeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document Store
 
Driving Design through Examples
Driving Design through ExamplesDriving Design through Examples
Driving Design through Examples
 
MySQL 5.7の次のMySQL 8.0はどんなものになるだろう
MySQL 5.7の次のMySQL 8.0はどんなものになるだろうMySQL 5.7の次のMySQL 8.0はどんなものになるだろう
MySQL 5.7の次のMySQL 8.0はどんなものになるだろう
 
Glemte forskningshelter
Glemte forskningshelterGlemte forskningshelter
Glemte forskningshelter
 
Oracle my sql-or-nosql
Oracle my sql-or-nosqlOracle my sql-or-nosql
Oracle my sql-or-nosql
 
MySQL Tuning For CPU Bottleneck
MySQL Tuning For CPU BottleneckMySQL Tuning For CPU Bottleneck
MySQL Tuning For CPU Bottleneck
 
MySQL Scalability Mistakes - OTN
MySQL Scalability Mistakes - OTNMySQL Scalability Mistakes - OTN
MySQL Scalability Mistakes - OTN
 
The History and Future of the MySQL ecosystem
The History and Future of the MySQL ecosystemThe History and Future of the MySQL ecosystem
The History and Future of the MySQL ecosystem
 
10x Performance Improvements - A Case Study
10x Performance Improvements - A Case Study10x Performance Improvements - A Case Study
10x Performance Improvements - A Case Study
 
Lessons Learned Managing Large AWS Environments
Lessons Learned Managing Large AWS EnvironmentsLessons Learned Managing Large AWS Environments
Lessons Learned Managing Large AWS Environments
 
MySQL性能调优最佳实践
MySQL性能调优最佳实践MySQL性能调优最佳实践
MySQL性能调优最佳实践
 

Similar to MySQL 8.0 & Unicode: Why, what & how

Unicode and Collations in MySQL 8.0
Unicode and Collations in MySQL 8.0Unicode and Collations in MySQL 8.0
Unicode and Collations in MySQL 8.0Bernt Marius Johnsen
 
MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014) MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014) Frazer Clement
 
ScilabTEC 2015 - Scilab
ScilabTEC 2015 - ScilabScilabTEC 2015 - Scilab
ScilabTEC 2015 - ScilabScilab
 
SequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageSequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageDoug Norton
 
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, Unicode
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, UnicodeOracle Globalization Support, NLS_LENGTH_SEMANTICS, Unicode
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, UnicodeMarkus Flechtner
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuEstelaJeffery653
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expertPaul King
 
Pipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructorPipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructorMoshe Zioni
 
Simplified instructional computer
Simplified instructional computerSimplified instructional computer
Simplified instructional computerKirby Fabro
 
What's New in MySQL 8.0 @ HKOSC 2017
What's New in MySQL 8.0 @ HKOSC 2017What's New in MySQL 8.0 @ HKOSC 2017
What's New in MySQL 8.0 @ HKOSC 2017Ivan Ma
 
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017ITCamp
 
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...Cisco Canada
 
Neural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionNeural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionYusuke Oda
 
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...xKinAnx
 
Apache Thrift, a brief introduction
Apache Thrift, a brief introductionApache Thrift, a brief introduction
Apache Thrift, a brief introductionRandy Abernethy
 

Similar to MySQL 8.0 & Unicode: Why, what & how (20)

Unicode and Collations in MySQL 8.0
Unicode and Collations in MySQL 8.0Unicode and Collations in MySQL 8.0
Unicode and Collations in MySQL 8.0
 
MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014) MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014)
 
ScilabTEC 2015 - Scilab
ScilabTEC 2015 - ScilabScilabTEC 2015 - Scilab
ScilabTEC 2015 - Scilab
 
SequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageSequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggage
 
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, Unicode
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, UnicodeOracle Globalization Support, NLS_LENGTH_SEMANTICS, Unicode
Oracle Globalization Support, NLS_LENGTH_SEMANTICS, Unicode
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expert
 
Pipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructorPipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructor
 
chapt_02.ppt
chapt_02.pptchapt_02.ppt
chapt_02.ppt
 
Kirby, Fabro
Kirby, FabroKirby, Fabro
Kirby, Fabro
 
Simplified instructional computer
Simplified instructional computerSimplified instructional computer
Simplified instructional computer
 
Till Vollmer Presentation
Till Vollmer PresentationTill Vollmer Presentation
Till Vollmer Presentation
 
What's New in MySQL 8.0 @ HKOSC 2017
What's New in MySQL 8.0 @ HKOSC 2017What's New in MySQL 8.0 @ HKOSC 2017
What's New in MySQL 8.0 @ HKOSC 2017
 
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017
ITCamp 2018 - Andrea Martorana Tusa - Writing queries in SQL Server 2016-2017
 
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...
Cisco Connect Montreal 2017 - Segment Routing - Technology Deep-dive and Adva...
 
Pl ams 2015_unicode_dveeden
Pl ams 2015_unicode_dveedenPl ams 2015_unicode_dveeden
Pl ams 2015_unicode_dveeden
 
Neural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionNeural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code Prediction
 
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
 
Apache Thrift, a brief introduction
Apache Thrift, a brief introductionApache Thrift, a brief introduction
Apache Thrift, a brief introduction
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 

Recently uploaded

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 

Recently uploaded (20)

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

MySQL 8.0 & Unicode: Why, what & how

  • 1. Copyright © 2017 Oracle and/or its affiliates. All rights reserved. MySQL 8.0 & Unicode Why, what & how Bernt Marius Johnsen Senior Software QA Engineer
  • 2. 2Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Agenda Why Unicode What is character set/collation etc. How to migrate and some issues to consider 1 2 3 4 5 6 7
  • 3. 3Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Why Unicode? ● The whole world is moving towards Unicode as digital devices is used by more and more people across all cultures all around the globe. – Approximate billion users of the six most used writing systems: Latin1: ~5, Chinese: ~1.5, Arabic: ~0.7, Devanagari: ~0.5, Cyrillic: ~0.25, Bengali: ~0.22, Kana: ~0.12 ● One driving force is Emojis – Smileys, hearts, roses etc, and all the stuff people are sending to each other when communicating these days. )(��� – “Useful” example: Unicode character 0x1F574, MAN IN BUSINESS SUIT LEVITATING: � 1This is way more letters than just ASCII!
  • 4. 4Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Why Unicode in a database? ● You may use one character set for all your data, for all purposes. – E.g. if you make an application, utf8mb4 for a table with names, it may be used by Russians, Chinese, Japanese etc. – Even esoteric extinct writing systems are covered like e.g. the Phaistos disc (look it up...) – But not Klingon, nor Tengwar �
  • 5. 5Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What is Unicode? ● Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. (Wikipedia) ● ISO/IEC 10646 ● Unicode covers most existing and extinct writing systems known to man in one standard. ● The standard has allocated 17 planes, blocks of characters are allocated into the planes ● Three planes defined so far: ● 0x0000-0xFFFF: Basic Multilingual Plane (BMP) ● 0x10000-0x1FFFF: Supplementary Multilingual Plane (SMP) ● 0x20000-0x2FFFF: Supplementary Ideographic Plane (SIP)
  • 6. 6Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What is a CHARACTER SET? ● A character set is defined by: – A repertoire of characters/graphemes – A value given to each character/grapheme (codepoint) – An encoding which defines the binary representation of the values
  • 7. 7Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What is Encoding? ● The binary representation of a character. Unicode defines 3 encodings: – UTF-8 (1-4 bytes per character) – UTF-16 (2 or 4 bytes per character) – UTF32 (4 bytes per character)
  • 8. 8Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Character set examples Character Character set Value Encoding Encoded as A ASCII ISO-8859-1 (Latin-1) Unicode 41 41 0041 1:1 1:1 UTF-8 UTF16 41 41 41 0041 Ä ISO-8859-1 (Latin-1) Unicode C4 00C4 1:1 UTF-8 UTF16 C4 C384 00C4 д KOI8-R ISO-8859-5 Unicode C4 D4 0434 1:1 1:1 UTF-8 UTF-16 C4 D4 D0B4 0434 人 GB-18030 Unicode Big5 JIS X 0208 (SJIS) C8CB 4EBA A448 906C 1:1 UTF-8 UTF-16 1:1 1:1 C8CB E4BABA 4EBA A448 906C � Unicode GB-18030 1F574 9439EE36 UTF8 UTF-16 1:1 F09F95B4 D83DDD74 9439EE36
  • 9. 9Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What is collation ● Collation is the assembly of written information into a standard order (Wikipedia) ● Collation may consider – Case (e.g 'A' vs. 'a') – Accents (e.g. 'E' vs. 'É') – Locale-specific rules (e.g. 'A' vs. 'Å' vs. 'AA' in Danish and Norwegian) – Numeric characters (e.g. '2' vs. ' ')ⅱ – Punctuation (e.g. 'blackbird' vs. 'black-bird') – Etc. ●
  • 10. 10Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What is a COLLATION in (My)SQL? ● In MySQL, a COLLATION is a set of rules for a given character set which defines an order and affects: – ORDER BY – LIKE – Primary keys and indexes – Unique constraints – Comparison operators – Some string functions ● All strings in MySQL have a character set and a collation
  • 11. 11Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Character sets in MySQL +­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­+ | Charset  | Description                     | Default collation   | Maxlen | +­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­+ | ascii    | US ASCII                        | ascii_general_ci    |      1 | | latin1   | cp1252 West European            | latin1_swedish_ci   |      1 | | utf8     | UTF­8 Unicode                   | utf8_general_ci     |      3 | | utf8mb4  | UTF­8 Unicode                   | utf8mb4_0900_ai_ci  |      4 | Get all by typing: mysql> show character set; The rest of them are: armscii8, big5, binary, cp1250, cp1251, cp1256, cp1257, cp850, cp852, cp866, cp932, dec8, eucjpms, euckr, gb18030, gb2312, gbk, geostd8, greek, hebrew, hp8, keybcs2, koi8r, koi8u, latin2, latin5, latin7, macce, macroman, sjis, swe7, tis620, ucs2, ujis, utf16, utf16le, utf32
  • 12. 12Copyright © 2017 Oracle and/or its affiliates. All rights reserved. What's new in MySQL 8.0 ● utf8mb4 will be the default character set for 8.0 ● utf8mb4_0900_ai_ci is the default collation of utf8mb4 ● A lot of new collations based on Unicode v. 9.0.0 – UCA (Unicode Collation Algorithm) – DUCET (Default Unicode Collation Entry Table) – CLDR v.30 (Common Locale Data Repository)
  • 13. 13Copyright © 2017 Oracle and/or its affiliates. All rights reserved. MySQL 8.0 collation name scheme ● <charset>[_<language> [_<variant>]]_<unicodeversion>(_<attribute>)+ – <charset> = utf8mb4 – <language>, an ISO 639-1 language code (or ISO 639-2 if needed) – <variant>, a variant to the standard collation for the language. Per today: utf8mb4_de_pb_0900_* and utf8mb4_es_trad_0900_*. – <unicodeversion> = 0900 – <attribute>: accent sensitivity (ai, as), case sensitivity (ci, cs) and possible future ones. ● Special collations: – Default collation (not language specific): utf8mb4_0900_ai_ci ● may be used for German dictionary order, English, French1, Irish Gaelic, Indonesian, Italian, Luxembourgian, Malay, Dutch, Portuguese, Swahili and Zulu – Codepoint order: utf8mb4_bin 1) Canadian French may not use utf8mb4_0900_as_cs collations due to differences to standard accent order.
  • 14. 14Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Why not ... ● Fix utf8mb4_general_ci instead of introducing utf8mb4_0900_ai_ci or fix utf8mb4_german2_ci instead of introducing utf8mb4_de_pb_0900_ai_ci? – Because that might break existing applications using the old collations (The most serious issue for large databases: Indexes would have to be rebuilt). Policy: Collations don't change! ● Have a simpler name scheme? – Because we prepare for ● More languages ● New Unicode versions (Unicode 10.0.0 is expected in 2018) – ISO-639-1/ISO-639-2 language codes are well defined
  • 15. 15Copyright © 2017 Oracle and/or its affiliates. All rights reserved. How to migrate? ● When migrating from 5.7 tables: – Just convert the table: ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4; ● This will change the default character set of the table (so that future new columns get utf8mb4) and the character set of all applicable columns. ● In principle, all character data in MySQL may be converted to utf8mb4 without loss of data. That was easy ..... is that all to it ... ?
  • 16. 16Copyright © 2017 Oracle and/or its affiliates. All rights reserved. … not quite … column by column ● If you have more complex tables with different character sets: – Change the default character set of the table: ALTER TABLE foo DEFAULT CHARACTER SET utf8mb4; – Modify all relevant relevant columns: ALTER TABLE foo MODIFY bar VARCHAR(100) CHARACTER SET  utf8mb4; Generally we recommend doing it column by column. – ALTER TABLE … CONVERT … will e.g. change TEXT to MEDIUMTEXT when you convert from latin1 to utf8mb4 and that won't necessarily be what you want.
  • 17. 17Copyright © 2017 Oracle and/or its affiliates. All rights reserved. … not quite … the schema too ● A schema (aka. database) in MySQL has a default character set which will be the default character set of new tables in the schema – mysql> show create schema bar; +­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+ | Database | Create Database                                                | +­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+ | bar      | CREATE DATABASE `bar` /*!40100 DEFAULT CHARACTER SET latin1 */ | +­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+ 1 row in set (0.00 sec) ● Change the default character set of the schema(database): ALTER SCHEMA bar DEFAULT CHARACTER SET utf8mb4;
  • 18. 18Copyright © 2017 Oracle and/or its affiliates. All rights reserved. … not quite … collation differences Collations are not equal, so converting from one collation to another may break UNIQUE constraints (e.g PRIMARY KEY). ● Default collation: – latin1_swedish_ci vs. utf8mb4_0900_ai_ci E.g. 'a'='å' is false in the first, but true in the other. – Possible solution: Stick to Swedish/Danish/Norwegian depending on your application.: ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4 COLLATE  utf8mb4_sv_0900_ai_ci; – Generally, if you don't care about case insensitivity (just got it by default), utf8mb4_0900_as_cs should be safe. ● There's an huge number of possibilities depending on your data and the collations used, partly because pre MySQL 8.0 collations where not complete (and in some cases not correct).
  • 19. 19Copyright © 2017 Oracle and/or its affiliates. All rights reserved. … not quite … index and key issues ● If you change the collation of a column, indexes on that column will be regenerated. – This takes time for large data, and the table is locked during that time. – And the conversion may fail due to changed space consumption. ● Max key length is 3072 bytes1, which implies that max length of a utf8mb4 varchar column which is also a key is 768 characters (Worst case scenario: 4 bytes per character). – mysql> create table foo (v varchar(1000) character set latin1 primary key); Query OK, 0 rows affected (0.01 sec) mysql> alter table foo modify v varchar(1000) character set utf8mb4; ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes 1For default InnoDB row format and default innodb_page_size in MySQL 8.0. See the documentation for details.
  • 20. 20Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Space consumption ● utf8mb4 use – 1 byte for ASCII characters (0x00-0x7F), – 2 bytes for most alphabets/abjads (0x80-0x7FF), – 3 bytes for Indic scripts, Hangul, Kana, the most used CJK Ideographs (0x800-0xFFFF), – 4 bytes for the rest: Archaic scripts, Emojis, Rarely used CJK extensions etc. (0x10000-)
  • 21. 21Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Speed issues ● Operations on multibyte character sets inherently slower than singlebyte character sets (e.g. latin1 vs. utf8mb4) ● We are working on a lot of code improvements. – Expect a performance degradation in the order of 10- 20% for sorting when you migrate from e.g latin1 to utf8mb4, depending on your data of course.
  • 22. 22Copyright © 2017 Oracle and/or its affiliates. All rights reserved. ⚠ 文字化け (Mojibake) … or you what you see is not what you get... mysql> create table foo(v varchar(10) character set latin1); mysql> insert into foo values('å'); mysql> set names latin1; mysql> insert into foo values('å'); mysql> set names utf8; mysql> select * from foo; +­­­­­­+ | v    | +­­­­­­+ | å    | | Ã¥   | +­­­­­­+ 2 rows in set (0.00 sec) mysql> select hex(v) from foo; +­­­­­­­­+ | hex(v) | +­­­­­­­­+ | E5     | | C3A5   | +­­­­­­­­+ 2 rows in set (0.00 sec)
  • 23. 23Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Truly usable for global purposes.....
  • 24. 24Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Q&A �U+1F634
  • 25. 25Copyright © 2017 Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.