SlideShare a Scribd company logo
1 of 16
Download to read offline
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 121
Insert Picture Here
MySQL Cluster page
management
Frazer Clement
MySQL Cluster Technical lead
frazer.clement@oracle.com
messagepassing.blogspot.com
November 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.2
●
MySQL Cluster data nodes allocate all memory at initialisation
●
Bulk of allocated memory is commonly DataMemory (DM) and
IndexMemory (IM) - separate pools for historic reasons
●
Both are managed as pages. 8KB pages for IndexMemory and 32kB
pages for DataMemory
●
Confusingly, IndexMemory pages are only used for the built-in primary
key hash index for each fragment replica.
●
This is literally only a hash table, the keys are stored externally (in
DataMemory pages)
●
DataMemory pages are used to store Primary keys and other columns,
as well as Ordered Index T-tree nodes
●
Occasionally we 'borrow' DM pages for other reasons
Memory types
Secondary unique indices use both DM +
IM as they are implemented as tables
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.3
●
Index and Data Memory pages are allocated to fragments as
necessary to handle growth due to Inserts and Updates.
●
Both are freed back to the shared pools when fragments no longer
need them (Deletes).
●
Fragments use DM pages for storing either Fixed size data or Variable
sized data
●
In both cases there is a 128 byte per-page header, leaving 32768 – 128
= 32640 bytes usable / page. (~0.4% overhead)
●
Most storage is handled in terms of 32bit words, so there's 32640 / 4 =
8160 words usable / page.
●
The usable space within Fixed-sized and Var-sized pages is handled
differently
Page based allocation
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.4
●
Every row has a fixed-size part containing the tuple header and any
fixed-size columns.
●
For tables with var-sized columns (VARCHAR,BINARY,BLOB,TEXT,
dynamic columns), every row has a variable-sized part containing the
var-sized columns.
●
Each fragment replica has :
●
Logical to Physical page map mapping per-fragment logical Fixed-
size page ids to a physical 32kB page
●
Fixed-size-pages-with-free-space freelist
●
Five size-binned var-size-pages-with-free-space freelists
●
Allocation involves 1) Finding/allocating a page to allocate from, 2)
Finding a space on the page to use.
Page management
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.5
Pages
Fragment
Fixed freelist
Var freelist 1
Var freelist 2
Var freelist 3
Var freelist 4
L2Pmap
0 1234
1 1235
2 600
3 -
4 983
5 786
...
1234 983 600 1235 786
Physical pages
containing fixed-
size parts
Physical pages
containing var-
sized parts
Rows are externally located via RowId (Table:Fragment:Page:Index). Every
row has a fixed-size part and an optional var-sized part. Fixed size parts refer
to var-sized parts. Page allocation is managed by the fragment, and each
page manages its own free space
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.6
●
For Fixed-size elements, the usable space is treated as storage for an
array of the fixed-size elements. Therefore there can be up to
element_size -1 words always wasted at the end.
●
The elements within a Fixed-size page are linked together into a per-
page free list.
●
Fixed-size pages with 1 or more free elements are linked together in a
per-fragment replica 'pages with space' list.
●
Elements have an index within the page, which is their word offset.
Fixed size pages
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.7
●
For Variable-size elements, the usable space in each page is split into
an index (at the end of the page) which refers to variable length parts
which grow up from the start of the page.
●
The index can grow and shrink as the number of elements changes
●
New inserts are made from the insert position (append only)
●
The last inserted element can grow efficiently
●
The index entries are on a freelist, similar to the Fixed-page slots
Variable size pages
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.8
●
If a non-last element wants to grow, or there is not enough space after
the insert pos for a new element, the page is re-organised
automatically.
●
Re-organisation compacts the in-use parts together, making the free
space contiguous in the 'middle' of the page
●
The index entries stay in the same positions, so external references to
a stored var-part are unchanged.
Variable size pages
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.9
●
Goal : efficiency in the common case
●
RowId (Logical page, Fixed-page index) are the same in all replicas of
a fragment (On different nodes)
This is required for optimised node recovery, where only rows changed
since a node has failed are copied across
●
Pages can only be freed when they are entirely empty
●
Pages are freed to the global pool (can be used by other tables etc)
●
Var-sized page content is reorganised regularly (within a page),
preserving external references via an index.
Row allocation details and constraints
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.10
Fixed sized pages
●
Potential permanent waste at end of fixed-size array
Not possible to avoid currently, but can maybe be made use of to store
extra data/row for 'free', or a small reduction in (fixed) row length can
gain more capacity than expected.
●
Unused 'slots' in pages due to rows deleted and no new rows to
take the space
Currently can only be solved by dumping and restoring data. All
fragment replicas must change atomically as the ROWID must be the
same across them.
Feature development required to implement an online defragmentation
here.
(De)Fragmentation
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.11
Var sized pages
●
Index with lots of free space
Some index shrinking applied already. Waste % not high.
●
Free space fragmentation within each page
Handled automatically as needed
●
Fragmentation across pages
Lots of free var-sized space, but not enough in any one page.
OPTIMIZE TABLE solves this.
Also solved by : Rolling node restart, Backup + Restore etc..
OPTIMIZE TABLE attempts to move every var-part not in a full var-sized
page into a 'better fitting' different page. Goal is to fill some pages and
free others
(De)Fragmentation Var-sized pages can be defragmented
online using OPTIMIZE TABLE
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.12
Optimize
Fragment
Fixed freelist
Var freelist 1
Var freelist 2
Var freelist 3
Var freelist 4
L2Pmap
0 1234
1 1235
2 600
3 -
4 983
5 786
...
1234 983 600 1235 786
Physical pages
containing fixed-
size parts
Physical pages
containing var-
sized parts
BEFORE : Fragmentation of Fixed and Var sized tables
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.13
Optimize
Fragment
Fixed freelist
Var freelist 1
Var freelist 2
Var freelist 3
Var freelist 4
L2Pmap
0 1234
1 1235
2 600
3 -
4 983
5 786
...
1234 983 600 1235 786
Physical pages
containing fixed-
size parts
Physical pages
containing var-
sized parts
AFTER : Var part moved, filling existing page, so it's no longer on freelist.
Source page now empty so returned to pool. Var page internal fragmentation
not necessarily affected. Fixed page fragmentation remains.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.14
Prior to 7.4 :
●
ndb_mgm> ALL REPORT MEMORY
Total IM and DM in use in each data node
●
shell>ndb_desc ­d<database> <table> ­p ­n
Total Fixed and Var-sized DM pages allocated per fragment (Primary
only)
●
mysql> SELECT AVG_ROW_LENGTH from 
INFORMATION_SCHEMA.TABLES where 
TABLE_NAME=”<my_tab>”;
Ndb currently reports the size of the Fixed-part of rows (in bytes) as the
AVG_ROW_LENGTH. Can therefore be used to determine # of rows per
page (32640 / AVG_ROW_LENGTH) which can then be used to
determine level of Fixed-size page fragmentation.
Monitoring usage Difficult to determine var-sized fragmentation
without scanning whole table using LENGTH()
and summing.
Balance information not available
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.15
From 7.4 :
mysql> SELECT * FROM ndbinfo.memory_per_fragment;
●
Per-fragment replica DM and IM usage
●
Correlated to Node, LDM instance – good for checking balance
●
Explicit info on Fixed and Var size free space – triggers for online reorg
or other action
●
Can use normal SQL to compare across replicas, group by table, group
by node or LDM or nodegroup etc...
●
Can sample periodically to spot trends, track rates of change etc.
Monitoring usage
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.16
Notes on NdbInfo tables
●
Implemented in data nodes, analogous to Linux /proc/ filesystem –
contents are generated for each view
●
Currently no indexing / filtering. Any query will retrieve full content of
table (full table scan). We hope to improve this in future.
●
All existing tables are relatively lightweight in terms of CPU cost to build
and send content
●
But they are not cached in MySQLD or 'free'. Beware sampling at a
high frequency.
Monitoring usage

More Related Content

What's hot

MariaDB CONNECT Storage Engine
MariaDB CONNECT Storage EngineMariaDB CONNECT Storage Engine
MariaDB CONNECT Storage EngineSerge Frezefond
 
Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Trieu Dao Minh
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIParesh Nayak,OCP®,Prince2®
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Clustergrandis_au
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
Oracle 11gR2 plain servers vs Exadata - 2013
Oracle 11gR2 plain servers vs Exadata - 2013Oracle 11gR2 plain servers vs Exadata - 2013
Oracle 11gR2 plain servers vs Exadata - 2013Connor McDonald
 
Sql server compression
Sql server compressionSql server compression
Sql server compressionWarwick Rudd
 
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)Frazer Clement
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
 
Microsoft azure database offerings
Microsoft azure database offeringsMicrosoft azure database offerings
Microsoft azure database offeringsGuruprasad Vijayarao
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guideneinamat
 
The Truth About Partitioning
The Truth About PartitioningThe Truth About Partitioning
The Truth About PartitioningEDB
 

What's hot (20)

MariaDB CONNECT Storage Engine
MariaDB CONNECT Storage EngineMariaDB CONNECT Storage Engine
MariaDB CONNECT Storage Engine
 
Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part II
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
In-Memory DataBase
In-Memory DataBaseIn-Memory DataBase
In-Memory DataBase
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Cluster
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
Oracle 11gR2 plain servers vs Exadata - 2013
Oracle 11gR2 plain servers vs Exadata - 2013Oracle 11gR2 plain servers vs Exadata - 2013
Oracle 11gR2 plain servers vs Exadata - 2013
 
Sql server compression
Sql server compressionSql server compression
Sql server compression
 
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)
MySQL Cluster Local Checkpoint (LCP) evolution in 7.6 (2019)
 
No SQL
No SQLNo SQL
No SQL
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
Mysql database
Mysql databaseMysql database
Mysql database
 
Microsoft azure database offerings
Microsoft azure database offeringsMicrosoft azure database offerings
Microsoft azure database offerings
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guide
 
The Truth About Partitioning
The Truth About PartitioningThe Truth About Partitioning
The Truth About Partitioning
 

Similar to MySQL Cluster page management (2014)

Similar to MySQL Cluster page management (2014) (20)

Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summary
 
Map db
Map dbMap db
Map db
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
 
Caching in drupal
Caching in drupalCaching in drupal
Caching in drupal
 
Map db
Map dbMap db
Map db
 
MySQL-InnoDB
MySQL-InnoDBMySQL-InnoDB
MySQL-InnoDB
 
Virtual Memory 53565686598386865286860.pdf
Virtual Memory 53565686598386865286860.pdfVirtual Memory 53565686598386865286860.pdf
Virtual Memory 53565686598386865286860.pdf
 
Lecture 8- Virtual Memory Final.pptx
Lecture 8- Virtual Memory Final.pptxLecture 8- Virtual Memory Final.pptx
Lecture 8- Virtual Memory Final.pptx
 
Virtual Memory (1).pptx
Virtual Memory (1).pptxVirtual Memory (1).pptx
Virtual Memory (1).pptx
 
OSCh10
OSCh10OSCh10
OSCh10
 
Ch10 OS
Ch10 OSCh10 OS
Ch10 OS
 
OS_Ch10
OS_Ch10OS_Ch10
OS_Ch10
 
os presentation.ppt
os presentation.pptos presentation.ppt
os presentation.ppt
 
Cs416 08 09a
Cs416 08 09aCs416 08 09a
Cs416 08 09a
 
Building Websites Using ASP.NET Core Razor Pages
Building Websites Using ASP.NET Core Razor PagesBuilding Websites Using ASP.NET Core Razor Pages
Building Websites Using ASP.NET Core Razor Pages
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

MySQL Cluster page management (2014)

  • 1. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 121 Insert Picture Here MySQL Cluster page management Frazer Clement MySQL Cluster Technical lead frazer.clement@oracle.com messagepassing.blogspot.com November 2014
  • 2. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.2 ● MySQL Cluster data nodes allocate all memory at initialisation ● Bulk of allocated memory is commonly DataMemory (DM) and IndexMemory (IM) - separate pools for historic reasons ● Both are managed as pages. 8KB pages for IndexMemory and 32kB pages for DataMemory ● Confusingly, IndexMemory pages are only used for the built-in primary key hash index for each fragment replica. ● This is literally only a hash table, the keys are stored externally (in DataMemory pages) ● DataMemory pages are used to store Primary keys and other columns, as well as Ordered Index T-tree nodes ● Occasionally we 'borrow' DM pages for other reasons Memory types Secondary unique indices use both DM + IM as they are implemented as tables
  • 3. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.3 ● Index and Data Memory pages are allocated to fragments as necessary to handle growth due to Inserts and Updates. ● Both are freed back to the shared pools when fragments no longer need them (Deletes). ● Fragments use DM pages for storing either Fixed size data or Variable sized data ● In both cases there is a 128 byte per-page header, leaving 32768 – 128 = 32640 bytes usable / page. (~0.4% overhead) ● Most storage is handled in terms of 32bit words, so there's 32640 / 4 = 8160 words usable / page. ● The usable space within Fixed-sized and Var-sized pages is handled differently Page based allocation
  • 4. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.4 ● Every row has a fixed-size part containing the tuple header and any fixed-size columns. ● For tables with var-sized columns (VARCHAR,BINARY,BLOB,TEXT, dynamic columns), every row has a variable-sized part containing the var-sized columns. ● Each fragment replica has : ● Logical to Physical page map mapping per-fragment logical Fixed- size page ids to a physical 32kB page ● Fixed-size-pages-with-free-space freelist ● Five size-binned var-size-pages-with-free-space freelists ● Allocation involves 1) Finding/allocating a page to allocate from, 2) Finding a space on the page to use. Page management
  • 5. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.5 Pages Fragment Fixed freelist Var freelist 1 Var freelist 2 Var freelist 3 Var freelist 4 L2Pmap 0 1234 1 1235 2 600 3 - 4 983 5 786 ... 1234 983 600 1235 786 Physical pages containing fixed- size parts Physical pages containing var- sized parts Rows are externally located via RowId (Table:Fragment:Page:Index). Every row has a fixed-size part and an optional var-sized part. Fixed size parts refer to var-sized parts. Page allocation is managed by the fragment, and each page manages its own free space
  • 6. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.6 ● For Fixed-size elements, the usable space is treated as storage for an array of the fixed-size elements. Therefore there can be up to element_size -1 words always wasted at the end. ● The elements within a Fixed-size page are linked together into a per- page free list. ● Fixed-size pages with 1 or more free elements are linked together in a per-fragment replica 'pages with space' list. ● Elements have an index within the page, which is their word offset. Fixed size pages
  • 7. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.7 ● For Variable-size elements, the usable space in each page is split into an index (at the end of the page) which refers to variable length parts which grow up from the start of the page. ● The index can grow and shrink as the number of elements changes ● New inserts are made from the insert position (append only) ● The last inserted element can grow efficiently ● The index entries are on a freelist, similar to the Fixed-page slots Variable size pages
  • 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.8 ● If a non-last element wants to grow, or there is not enough space after the insert pos for a new element, the page is re-organised automatically. ● Re-organisation compacts the in-use parts together, making the free space contiguous in the 'middle' of the page ● The index entries stay in the same positions, so external references to a stored var-part are unchanged. Variable size pages
  • 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.9 ● Goal : efficiency in the common case ● RowId (Logical page, Fixed-page index) are the same in all replicas of a fragment (On different nodes) This is required for optimised node recovery, where only rows changed since a node has failed are copied across ● Pages can only be freed when they are entirely empty ● Pages are freed to the global pool (can be used by other tables etc) ● Var-sized page content is reorganised regularly (within a page), preserving external references via an index. Row allocation details and constraints
  • 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.10 Fixed sized pages ● Potential permanent waste at end of fixed-size array Not possible to avoid currently, but can maybe be made use of to store extra data/row for 'free', or a small reduction in (fixed) row length can gain more capacity than expected. ● Unused 'slots' in pages due to rows deleted and no new rows to take the space Currently can only be solved by dumping and restoring data. All fragment replicas must change atomically as the ROWID must be the same across them. Feature development required to implement an online defragmentation here. (De)Fragmentation
  • 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.11 Var sized pages ● Index with lots of free space Some index shrinking applied already. Waste % not high. ● Free space fragmentation within each page Handled automatically as needed ● Fragmentation across pages Lots of free var-sized space, but not enough in any one page. OPTIMIZE TABLE solves this. Also solved by : Rolling node restart, Backup + Restore etc.. OPTIMIZE TABLE attempts to move every var-part not in a full var-sized page into a 'better fitting' different page. Goal is to fill some pages and free others (De)Fragmentation Var-sized pages can be defragmented online using OPTIMIZE TABLE
  • 12. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.12 Optimize Fragment Fixed freelist Var freelist 1 Var freelist 2 Var freelist 3 Var freelist 4 L2Pmap 0 1234 1 1235 2 600 3 - 4 983 5 786 ... 1234 983 600 1235 786 Physical pages containing fixed- size parts Physical pages containing var- sized parts BEFORE : Fragmentation of Fixed and Var sized tables
  • 13. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.13 Optimize Fragment Fixed freelist Var freelist 1 Var freelist 2 Var freelist 3 Var freelist 4 L2Pmap 0 1234 1 1235 2 600 3 - 4 983 5 786 ... 1234 983 600 1235 786 Physical pages containing fixed- size parts Physical pages containing var- sized parts AFTER : Var part moved, filling existing page, so it's no longer on freelist. Source page now empty so returned to pool. Var page internal fragmentation not necessarily affected. Fixed page fragmentation remains.
  • 14. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.14 Prior to 7.4 : ● ndb_mgm> ALL REPORT MEMORY Total IM and DM in use in each data node ● shell>ndb_desc ­d<database> <table> ­p ­n Total Fixed and Var-sized DM pages allocated per fragment (Primary only) ● mysql> SELECT AVG_ROW_LENGTH from  INFORMATION_SCHEMA.TABLES where  TABLE_NAME=”<my_tab>”; Ndb currently reports the size of the Fixed-part of rows (in bytes) as the AVG_ROW_LENGTH. Can therefore be used to determine # of rows per page (32640 / AVG_ROW_LENGTH) which can then be used to determine level of Fixed-size page fragmentation. Monitoring usage Difficult to determine var-sized fragmentation without scanning whole table using LENGTH() and summing. Balance information not available
  • 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.15 From 7.4 : mysql> SELECT * FROM ndbinfo.memory_per_fragment; ● Per-fragment replica DM and IM usage ● Correlated to Node, LDM instance – good for checking balance ● Explicit info on Fixed and Var size free space – triggers for online reorg or other action ● Can use normal SQL to compare across replicas, group by table, group by node or LDM or nodegroup etc... ● Can sample periodically to spot trends, track rates of change etc. Monitoring usage
  • 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.16 Notes on NdbInfo tables ● Implemented in data nodes, analogous to Linux /proc/ filesystem – contents are generated for each view ● Currently no indexing / filtering. Any query will retrieve full content of table (full table scan). We hope to improve this in future. ● All existing tables are relatively lightweight in terms of CPU cost to build and send content ● But they are not cached in MySQLD or 'free'. Beware sampling at a high frequency. Monitoring usage