Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)

332 views

Published on

HighLoad++ 2017

Зал «Сингапур», 7 ноября, 18:00

Тезисы:
http://www.highload.ru/2017/abstracts/2980.html

Каждый специалист в области баз данных уже знаком с Jsonb - одной из самых привлекательный фич PostgreSQL, позволяющей эффективно использовать документо-ориентированный подход без необходимости жертвовать консистентностью и возможностью использования проверенных временем подходов реляционных баз данных. Но как именно устроен этот тип данных, какие он имеет ограничения, и какие опасности (a.k.a грабли) можно незаметно для себя получить при работе с ним?

В докладе мы обсудим все эти вопросы, преимущества и недостатки использования Jsonb в различных ситуациях в сравнении с другими существующими решениями. Поговорим также о важных best practices (как писать компактные
запросы и как избежать распространенных и не очень проблем).

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)

  1. 1. NOSQL FOR POSTGRESQL BEST PRACTICES DMITRY DOLGOV 11-07-2017
  2. 2. 1
  3. 3. 2
  4. 4. Jsonb internals and performance-related factors 2
  5. 5. Jsonb internals and performance-related factors Tricky queries 2
  6. 6. Jsonb internals and performance-related factors Tricky queries Benchmarks 2
  7. 7. Jsonb internals and performance-related factors Tricky queries Benchmarks How to shoot yourself in the foot 2
  8. 8. 3
  9. 9. AWS EC2 m4.large instance separate instance (database and generator) 16GB memory, 4 core 2.3GHz Ubuntu 16.04 Same VPC and placement group AMI that supports HVM virtualization type at least 4 rounds of benchmark 4
  10. 10. PostgreSQL 9.6.3/10 MySQL 5.7.9/8.0 MongoDB 3.4.4 YCSB 0.13 106 rows and operations AWS EC2 5
  11. 11. Configuration shared_buffers effective_cache_size max_wal_size innodb_buffer_pool_size innodb_log_file_size write concern level (journaled or transaction_sync) checkpoint eviction 6
  12. 12. Document types “simple” document 10 key/value pairs (100 characters) “large” document 100 key/value pairs (200 characters) “complex” document 100 keys, 3 nesting levels (100 characters) 7
  13. 13. Performance-related factors 8
  14. 14. Performance-related factors On-disk representation 8
  15. 15. Performance-related factors On-disk representation In-memory representation 8
  16. 16. Performance-related factors On-disk representation In-memory representation Indexing support 8
  17. 17. Indexing support Postgresql – single path, multiple paths, entire doc- ument MongoDB – single path, multiple paths MySQL – virtual columns, single path, multiple paths 9
  18. 18. PG indexing details jsonb_path jsonb_path_ops 10
  19. 19. Select, GIN ”simple” document jsonb_path_ops 11
  20. 20. 12
  21. 21. 13
  22. 22. 14
  23. 23. 15
  24. 24. Select, BTree ”simple” document btree 16
  25. 25. 17
  26. 26. Internals
  27. 27. 19
  28. 28. Jsonb document size node node JEntry content ... 20
  29. 29. Jsonb document size node node JEntry content ... 21
  30. 30. Jsonb Header type number of items JEntry length or offset? value type value length or offset 22
  31. 31. JB_OFFSET_STRIDE JEntry may contains a value lenght or offset Offset = access speed Length = compressibility Every JB_OFFSET_STRIDE’th JEntry contains an offset Rest of them contain length 23
  32. 32. Bson document size node node Header Content ... 24
  33. 33. Bson document size node node Header Content ... 25
  34. 34. Bson Header Value type Key name Value size 26
  35. 35. MySQL Json node node Type Value ... 27
  36. 36. MySQL Json node node Type Value ... 28
  37. 37. MySQL Json Object Count of elements Size Pointers to keys Pointers to values Keys Values 29
  38. 38. Bson Key Value Key Value Key Value ... Jsonb/MySQL Json Key Key Value Key Value Value ... 30
  39. 39. {”a”: 3, ”b”: ”xyz”} 31
  40. 40. select pg_relation_filepath(oid), relpages from pg_class where relname = ’table_name’; pg_relation_filepath | relpages ———————-+———- base/40960/325477 | 0 (1 row) 32
  41. 41. bson.dumps({”a”: 3, ”b”: u”xyz”}) 33
  42. 42. $ hexdump -C database/table.ibd 34
  43. 43. Alignment Variable-length portion is aligned to a 4-byte insert into test values(’{”a”: ”aa”, ”b”: 1}’); insert into test values(’{”a”: 1, ”b”: ”aa”}’); 35
  44. 44. Select, BTree ”complex” document btree 36
  45. 45. 37
  46. 46. TOAST_TUPLE_THRESHOLD ”simple” document 40 threads different document size select 38
  47. 47. 39
  48. 48. TOAST Jsonb Compression Chunks Toast table TOAST_TUPLE_THRESHOLD bytes (normally 2 kB) PostgreSQL and MySQL use LZ variation MongoDB uses snappy block compression 40
  49. 49. In-memory representation Tree-like representation (JsonbValue, Document, Json_dom) Little bit more expensive but more convenient to work with Mostly in use to modify data (except MySQL) Most of the read operations use on-disk representation 41
  50. 50. Insert ”simple” document journaled 42
  51. 51. 43
  52. 52. 44
  53. 53. 45
  54. 54. 46
  55. 55. 47
  56. 56. Update one field of a document DETOAST of a document (select, constraints, procedures etc.) Reindex of an entire document 48
  57. 57. Index update ”simple” document Update one field 49
  58. 58. 50
  59. 59. Update 50%, Select 50% ”simple” document Update one field journaled max wal size 5GB 51
  60. 60. 52
  61. 61. Update 50%, Select 50% ”large” document Update one field 53
  62. 62. 54
  63. 63. Document slice ”large” document One field from a document 55
  64. 64. select data->’key1’->’key2’ from table; select data->’key1’, data->’key2’ from table; 56
  65. 65. select data->‘key1‘->‘key2‘ from table; select data->’key1’, data->’key2’ from table; 56
  66. 66. select data->’key1’->’key2’ from table; select data->‘key1‘, data->‘key2‘ from table; 56
  67. 67. 57
  68. 68. Document slice ”large” document 10 fields from a document 58
  69. 69. 59
  70. 70. Solutions? set storage external different query 60
  71. 71. Queries
  72. 72. Pitfalls No Json path out of the box (jsquery, SQL/JSON) Queries with an array somewhere in the middle Iterating through document Update inside document 62
  73. 73. Document slice create type test as (”a” text, ”b” text); insert into test_jsonb values(’{”a”: 1, ”b”: 2, ”c”: 3}’); select q.* from test_jsonb, jsonb_populate_record(NULL::test, data) as q; a | b —+— 1 | 2 (1 row) 63
  74. 74. Document slice create type test as (”a” text, ”b” text); insert into test_jsonb values(’{”a”: 1, ”b”: 2, ”c”: 3}’); select q.* from test_jsonb, jsonb_populate_record (NULL::test, data) as q; a | b —+— 1 | 2 (1 row) 63
  75. 75. [{ ”items”: [ {”id”: 1, ”value”: ”aaa”}, {”id”: 2, ”value”: ”bbb”} ] }, { ”items”: [ {”id”: 3, ”value”: ”aaa”}, {”id”: 4, ”value”: ”bbb”} ] }] 64
  76. 76. WITH items AS ( SELECT jsonb_array_elements(data->’items’) AS item FROM test ) SELECT * FROM items WHERE item->>’value’ = ’aaa’; item ————————— {”id”: 1, ”value”: ”aaa”} {”id”: 3, ”value”: ”aaa”} (2 rows) 65
  77. 77. WITH items AS ( SELECT jsonb_array_elements (data->’items’) AS item FROM test ) SELECT * FROM items WHERE item->>’value’ = ’aaa’; item ————————— {”id”: 1, ”value”: ”aaa”} {”id”: 3, ”value”: ”aaa”} (2 rows) 65
  78. 78. { ”items”: { ”item1”: {”status”: true}, ”item2”: {”status”: true}, ”item3”: {”status”: false} } } 66
  79. 79. WITH items AS ( SELECT jsonb_each(data->’items’) AS item FROM test ) SELECT (item).key FROM items WHERE (item).value->>’status’ = ’true’; key ——- item1 item2 (2 rows) 67
  80. 80. WITH items AS ( SELECT jsonb_each (data->’items’) AS item FROM test ) SELECT (item).key FROM items WHERE (item).value->>’status’ = ’true’; key ——- item1 item2 (2 rows) 67
  81. 81. 68
  82. 82. SQL vs JSONB ”simple” document btree insert 69
  83. 83. 70
  84. 84. SQL vs JSONB ”simple” document btree select 71
  85. 85. 72
  86. 86. JSON vs JSONB ”simple” document btree insert 73
  87. 87. 74
  88. 88. JSON vs JSONB ”simple” document btree select 75
  89. 89. 76
  90. 90. Jsonb is more that good for many use cases 77
  91. 91. Jsonb is more that good for many use cases Reasons for performance difference is mostly database itself 77
  92. 92. Jsonb is more that good for many use cases Reasons for performance difference is mostly database itself Benchmarks above are only ”hints” 77
  93. 93. Jsonb is more that good for many use cases Reasons for performance difference is mostly database itself Benchmarks above are only ”hints” You need your own tests 77
  94. 94. Questions?  github.com/erthalion  @erthalion  9erthalion6 at gmail dot com 78

×