Successfully reported this slideshow.

MySQL 5.7 NF – JSON Datatype 활용

4

Share

1 of 29
1 of 29

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

MySQL 5.7 NF – JSON Datatype 활용

  1. 1. MySQL 5.7 JSON datatype 2015.11.29 정지원 1
  2. 2. 2 Index 1. Why JSON 2. About JSON datatype 3. DDL ,DML with JSON 4. Indexing JSON data 5. Data performance 6. 적용 사례 7. ROADMAP
  3. 3. 3 1. Why JSON  편리한 객체 나열 형식  JSON data 의 효과적인 처리 필요  RDB & Schemaless data의 통합  새로운 어플리케이션에 대한 기존 database의 대응강화  참고 : http://www.w3schools.com/json/
  4. 4. 4  MySQL 5.7 부터 지원  Binary format  Parse and validation on insert only  Dictionary  Sorted objects’ keys  Fast access to array cells by index  지원되는 타입  모든 JSON type 지원됨  숫자,문자,boolean  객체, 배열  Extended  date, time, datetime, timestamp … 등등 2. About JSON data type Ex1> ["12:18:29.000000", "2015-07-29", "2015-07-29 12:18:29.000000"] Ex2> SELECT JSON_ARRAY('a', 1, NOW()); +---------------------------------------+ | JSON_ARRAY('a', 1, NOW()) | +----------------------------------------+ | ["a", 1, "2015-07-27 09:43:47.000000"] | +----------------------------------------+
  5. 5. 5  max_allowed_packet  JSON 컬럼 길이 제한 2. About JSON data type
  6. 6. 6 2. About JSON data type  Function List https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
  7. 7. 7  CREATE & INSERT 3. DDL & DML with JSON insert into t1(data) values ('{"series":1}') ,('{"series":7}') ,('{"series":3}') ,(JSON_QUOTE('some, might be formatted,{text} with "quotes"')) ; select * from t1; +---------------------------------------------------+ | data | +---------------------------------------------------+ | {"series": 1} | | {"series": 7} | | {"series": 3} | | "some, might be formatted,{text} with "quotes"" | +---------------------------------------------------+ 12 rows in set (0.00 sec) create table t1 ( data JSON // 데이터 타입 (JSON) );
  8. 8. 8  SELECT 3. DDL & DML with JSON select * from t1 where json_extract(data,"$.series") >= 3; +----------------+ | data | +----------------+ | {"series": 3} | | {"series": 7} | +----------------+ select * from t1 where data -> "$.series" >= 3; -- [5.7.9~] inlined json path +----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | +----------------------------------+------+ select * from t1 where data >= json_object("series",3); +----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | | {"a": "valid", "json": ["text"]} | NULL | -- ?? +----------------------------------+------+
  9. 9. 9  UPDATE 3. DDL & DML with JSON create table gm_friends ( uid bigint primary key ,friend_uid json -- 친구리스트 ); set @friend := '[113]'; -- 친구추가 insert into gm_friends values (111 , @friend) on duplicate key update friend_uid = json_merge(friend_uid,@friend); select * from gm_friends where uid=111; +-----+------------+ | uid | friend_uid | +-----+------------+ | 111 | [112, 113] | -- 유저 111의 친구리스트 +-----+------------+ 1 row in set (0.00 sec)
  10. 10. 10  CTAS 3. DDL & DML with JSON create table friend_list as select 100 user_id, 200 friend_id union all select 100 user_id, 300 friend_id union all select 200 user_id, 100 friend_id union all select 200 user_id, 300 friend_id union all select 200 user_id, 400 friend_id; select * from friend_list; +---------+-----------+ | user_id | friend_id | +---------+-----------+ | 100 | 200 | | 100 | 300 | | 200 | 100 | | 200 | 300 | | 200 | 400 | +---------+-----------+ create table t2 as select user_id , json_object('lst‘ ,json_array(group_concat(friend_id))) as friend_lst from friend_list group by user_id; select * from t2; +---------+--------------------------+ | user_id | friend_lst | +---------+--------------------------+ | 100 | {"lst": ["200,300"]} | | 200 | {"lst": ["100,300,400"]} | +---------+--------------------------+ select JSON_SEARCH(friend_lst, 'all', '200,300') from t2 where user_id = 100; +-------------------------------------------+ | JSON_SEARCH(friend_lst, 'all', '200,300') | +-------------------------------------------+ | "$.lst[0]" | +-------------------------------------------+ select user_id , friend_lst , JSON_EXTRACT(friend_lst, "$.lst") as s1 , JSON_EXTRACT(friend_lst, "$.lst[0]") as s2 , JSON_UNQUOTE(JSON_EXTRACT(friend_lst, "$.lst[0]")) as s3 from t2 where user_id = 100; +---------+----------------------+-------------+-----------+---------+ | user_id | friend_lst | s1 | s2 | s3 | +---------+----------------------+-------------+-----------+---------+ | 100 | {"lst": ["200,300"]} | ["200,300"] | "200,300" | 200,300 | +---------+----------------------+-------------+-----------+---------+
  11. 11. 11  JOIN 3. DDL & DML with JSON create table t2 ( data JSON ); insert into t2(data) values ('{"series":[11, 1, 100]}') ,('{"series":[22, 7 ]}') ,('{"series":[33, 3, 200]}'); select * from t2; +--------------------------+ | data | +--------------------------+ | {"series": [11, 1, 100]} | | {"series": [22, 7]} | | {"series": [33, 3, 200]} | +--------------------------+ select * from t1, t2 where t1.data -> "$.series" = t2.data -> "$.series[1]"; +---------------+--------------------------+ | data | data | +---------------+--------------------------+ | {"series": 1} | {"series": [11, 1, 100]} | | {"series": 7} | {"series": [22, 7]} | | {"series": 3} | {"series": [33, 3, 200]} | +---------------+--------------------------+
  12. 12. 12 4. Indexing JSON data JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column. See Secondary Indexes and Virtual Generated Columns, for a detailed example.  Generated Column (=Virtual Column) MySQL supports indexes on generated columns. For example CREATE TABLE t1 ( f1 INT , gc INT AS (f1 + 1) STORED , INDEX (gc) ); The generated column, gc, is defined as the expression f1 + 1. The column is also indexed and the optimizer can take that index into account during execution plan construction.
  13. 13. 13 4. Indexing JSON data  VIRTUAL - 가상컬럼의 데이터는 실제 저장 되지 않음 => insert / update 빠름 - SELECT 컬럼이 나타내야하는 값을 읽을때 마다 계산 - 인덱스 secondary index만 생성가능 btree만 지원 - 컬럼 추가 시 table rebuild 작업 하지 않음  STORED - 가상컬럼의 데이터가 실제로 저장됨 - 인덱스 primary & secondary 인덱스 모두가능 btree , fts, gis 지원 - 컬럼 추가 시 table rebuild 작업 필요 VS  GENERATED COLUMN
  14. 14. 14 4. Indexing JSON data  GENERATED COLUMN을 이용한 인덱스 생성 create table `t1` ( `data` json, `id` int(11) AS (JSON_EXTRACT(data,"$.id")) STORED, `id2` int(11) AS (JSON_EXTRACT(data,"$.series")) VIRTUAL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; alter table t1 add primary key (id); Create index id_idx on t1(id2); show create table t1G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `data` json DEFAULT NULL, `id` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.id")) STORED NOT NULL, `id2` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.series")) VIRTUAL, PRIMARY KEY (`id`), KEY `id_idx` (`id2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
  15. 15. 15 4. Indexing JSON data explain select data from t1 where JSON_EXTRACT(data,"$.series") between 3 and 5; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | NULL | ALL | id_idx | NULL | NULL | NULL | 10 | 11.11 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ explain select data from t1 where id between 3 and 5; +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 3 | 100.00 | Using where | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+  GENERATED COLUMN을 이용한 인덱스 생성 - 실행계획 desc t1; +-------+---------+------+-----+---------+-------------------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+-------------------+ | data | json | YES | | NULL | | | id | int(11) | NO | PRI | NULL | STORED GENERATED | | id2 | int(11) | YES | MUL | NULL | VIRTUAL GENERATED | +-------+---------+------+-----+---------+-------------------+ select * from t1; +-------------------------+----+------+ | data | id | id2 | +-------------------------+----+------+ | {"id": 0, "series": 11} | 0 | 11 | | {"id": 1, "series": 10} | 1 | 10 | | {"id": 3, "series": 8} | 3 | 8 | | {"id": 4, "series": 7} | 4 | 7 | +-------------------------+----+------+
  16. 16. 16 5. Data performance 일반 테이블 desc log_col; +----------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+---------------+------+-----+---------+----------------+ | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | user_id | bigint(20) | NO | MUL | NULL | | | world_id | tinyint(4) | NO | | NULL | | | log_date | datetime | NO | | NULL | | | col1 | bigint(20) | YES | | NULL | | | col2 | bigint(20) | YES | | NULL | | | col3 | bigint(20) | YES | | NULL | | | col4 | bigint(20) | YES | | NULL | | | col5 | bigint(20) | YES | | NULL | | | str1 | varchar(50) | YES | | NULL | | | str2 | varchar(50) | YES | | NULL | | | str3 | varchar(100) | YES | | NULL | | | str4 | varchar(100) | YES | | NULL | | | str5 | varchar(1000) | YES | | NULL | | +----------+---------------+------+-----+---------+----------------+ 14 rows in set (0.04 sec) JSON 테이블 desc log_json; +----------+------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------+------+-----+---------+----------------+ | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | user_id | bigint(20) | NO | MUL | NULL | | | world_id | tinyint(4) | NO | | NULL | | | log_date | datetime | NO | | NULL | | | jdata | JSON | YES | | NULL | | +----------+------------+------+-----+---------+----------------+ 5 rows in set (0.00 sec)) 테이블 크기 +--------------+------------------+------------+---------------+ | table_schema | table_name | table_rows | DB Size in MB | +--------------+------------------+------------+---------------+ | test | log_col | 994788 | 111.2 | # 일반 테이블 | test | log_json | 992943 | 163.3 | # JSON 테이블 (40%) +--------------+------------------+------------+---------------+
  17. 17. 17  INSERT  SELECT 테이블 시간 일반 4 min 6.55 sec JSON 4 min 14.62 sec 테이블 시간 일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.24 sec JSON select count(json_extract(jdata,"$.col1")) from log_json where json_extract(jdata,"$.col1") >= 3336 and json_extract(jdata,"$.col1") <= 5990; 2.13 sec 5. Data performance create index idx01 on log_col(col1); -- 1.07 sec 테이블 시간 일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.2 sec JSON 인덱스 생성 불가
  18. 18. 18 STORED 테이블 desc log_json_store; +----------+------------+------+-----+---------+------------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------+------+-----+---------+------------------+ | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | user_id | bigint(20) | NO | MUL | NULL | | | world_id | tinyint(4) | NO | | NULL | | | log_date | datetime | NO | | NULL | | | id | bigint(20) | YES | | NULL | STORED GENERATED | | jdata | json | YES | | NULL | | +----------+------------+------+-----+---------+------------------+ 6 rows in set (0.01 sec) VIRTUAL 테이블 desc log_json_virtual; +----------+------------+------+-----+---------+-------------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------+------+-----+---------+-------------------+ | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | user_id | bigint(20) | NO | MUL | NULL | | | world_id | tinyint(4) | NO | | NULL | | | log_date | datetime | NO | | NULL | | | id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | jdata | json | YES | | NULL | | +----------+------------+------+-----+---------+-------------------+ 6 rows in set (0.00 sec) 5. Data performance 테이블 크기 +--------------+------------------+------------+---------------+ | table_schema | table_name | table_rows | DB Size in MB | +--------------+------------------+------------+---------------+ | test | log_json | 992943 | 163.3 | | test | log_json_store | 991134 | 197.8 | # STORED 테이블 | test | log_json_virtual | 989866 | 168.8 | # VIRTUAL 테이블 +--------------+------------------+------------+---------------+
  19. 19. 19 테이블 시간 STORED 4 min 27.99 sec VIRTUAL 4 min 12.83 sec 테이블 시간 STORED select count(id) from log_json_store where id between 3336 and 5990; 0.21 sec VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 1.93 sec 5. Data performance  INSERT (100만건)  SELECT 테이블 시간 STORED select count(id) from log_json_store where id between 3336 and 5990; 0.0 sec VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 0.0 sec create index idx01 on log_json_store(id); -- 0.81 sec create index idx01 on log_json_virtual(id); -- 1.38 sec
  20. 20. 20 테이블 시간 JSON STORED 0.54 sec JSON VIRTUAL 2.43 sec TEXT STORED 0.66 sec TEXT VIRTUAL 8.02 sec 5. Data performance  WHY JSON THAN TEXT/VARCHAR ??? desc log_text_virtual; desc log_json_virtual; +----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+--------- ----------+ | Field | Type | Null | Key | Default | Extra | | Field | Type | Null | Key | Default | Extra | +----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+--------- ----------+ | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | user_id | bigint(20) | NO | MUL | NULL | | | user_id | bigint(20) | NO | MUL | NULL | | | world_id | tinyint(4) | NO | | NULL | | | world_id | tinyint(4) | NO | | NULL | | | log_date | datetime | NO | | NULL | | | log_date | datetime | NO | | NULL | | | id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | jdata | text | YES | | NULL | | | jdata | json | YES | | NULL | | +----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+--------- ----------+ 6 rows in set (0.00 sec) select sum(id) from log_text_stored;  TEXT/VARCHAR  내부 위치한 객체 키 값 /배열 항목에 대한 위치정보 따로 관리 안 됨 => select 시 해당row 위치를 다시 찾아야 함  VIRTUAL 테이블
  21. 21. 21 6. 적용사례 Column Based Table
  22. 22. 22 6. 적용사례 JSON type 사용 * JSON 포함내용에서 제외항목 1) 예측 가능한 컬럼 2) 조회 시 중요하게 사용될 수 있는 컬럼 3) 분석 시 Dimension 에 해당 되는 컬럼 “HIBRID TABLE”
  23. 23. 23 6. 적용사례  JSON  Column based table
  24. 24. 24 6. 적용사례  조회 편의성을 위해 View 로 제공  JSON Data 는 중첩구조[배열] 로 저장되지 않도록 가이드
  25. 25. 25 6. 적용사례  JSON SELECT 7배 이상 느림 ( ∵ Disk IO 부하 + JSON internal search 부하 로 예상 ) Column based JSON based Column based JSON based
  26. 26. 26 6. 적용사례  JSON WRITE 속도: Column Table 보다 20~30% 이내로 느림 ( ∵ Row Length 에 따른 Disk IO 부하로 예상 ) 크기: JSON based table 30% 더 차지 ( ∵ row별 객체KEY + 내부 객체 KEY인덱스) Column based JSON based Column based JSON based
  27. 27. 27 컬럼 추가에 대한 확장성이 필 요해! (down time 최소화) 쓰기 성능은 그럭저럭? “읽기” 성능이 너무 떨어지는 거 아냐? 6. 적용사례 COLUMN? or JSON? Your Choice!!!
  28. 28. 28 7. ROADMAP  JSON/BLOB replication 시 partial streaming 제공  GENERATED COLUMN-VIRTUAL 에서도 FULL text / GIS 인덱스 제공  JSON/BLOB 의 in-place update 지원 (update시 동일 페이지에 있는 해당 rows들이 옮겨지지 않고 rowid도 바뀌지 않는 방법)  Condition Pushdown을 통한 성능향상 제공
  29. 29. 29

×