Deep learning has come a long way over the past few years, with advances in cloud computing, frameworks, and open source tooling, working with images has gotten simpler over time. Delta Lake has been amazing at creating a tabular structured transactional layer on object storage, but what about images? Would you like to know how to gain a 45x improvement in your image processing pipeline? Join Jason and Rohit to find out how!
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
Tutorial delivered at Percona Live London 2014, where we explore new features and techniques for faster queries with MySQL 5.6 and 5.7 and MariaDB 10, including the newest options in MySQL 5.7.5 and MariaDB 10.1.
Download here the virtual machine with the example database: http://dbahire.com/pluk14
Update: WordPress has a workaround for STRICT mode: https://core.trac.wordpress.org/ticket/26847
These slides are for my talk at Percona Live 2022: https://sched.co/10KEo
MySQL Cookbook 4th edition (https://www.target.com/p/mysql-cookbook-4th-edition-by-sveta-smirnova-alkin-tezuysal-paperback/-/A-85851771) is planned to be released this spring. I am one of the authors of the book and will show you how to "cook" MySQL. I will show you a few tasks with different priorities, such as JSON in MySQL for those who need flexibility; modern SQL for analytics, and Group Replication for high availability. I will also show how to write programs using JavaScript and Python languages, X DevAPI, and MySQL Shell. I expect this talk will be interesting for MySQL application developers.
Deep learning has come a long way over the past few years, with advances in cloud computing, frameworks, and open source tooling, working with images has gotten simpler over time. Delta Lake has been amazing at creating a tabular structured transactional layer on object storage, but what about images? Would you like to know how to gain a 45x improvement in your image processing pipeline? Join Jason and Rohit to find out how!
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
Tutorial delivered at Percona Live London 2014, where we explore new features and techniques for faster queries with MySQL 5.6 and 5.7 and MariaDB 10, including the newest options in MySQL 5.7.5 and MariaDB 10.1.
Download here the virtual machine with the example database: http://dbahire.com/pluk14
Update: WordPress has a workaround for STRICT mode: https://core.trac.wordpress.org/ticket/26847
These slides are for my talk at Percona Live 2022: https://sched.co/10KEo
MySQL Cookbook 4th edition (https://www.target.com/p/mysql-cookbook-4th-edition-by-sveta-smirnova-alkin-tezuysal-paperback/-/A-85851771) is planned to be released this spring. I am one of the authors of the book and will show you how to "cook" MySQL. I will show you a few tasks with different priorities, such as JSON in MySQL for those who need flexibility; modern SQL for analytics, and Group Replication for high availability. I will also show how to write programs using JavaScript and Python languages, X DevAPI, and MySQL Shell. I expect this talk will be interesting for MySQL application developers.
Conteúdo apresentado durante o webinar produzido pela KingHost e disponibilizado pelos colaboradores Jerônimo Fagundes e Rodrigo Paris.
Assista ao webinar completo aqui > http://www.kinghost.com.br/eventos-online/webinar-performance-otimizacao-sql
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
The webinar will review a multi-layered framework for PostgreSQL security, with a deeper focus on limiting access to the database and data, as well as securing the data.
Using the popular AAA (Authentication, Authorization, Auditing) framework we will cover:
- Best practices for authentication (trust, certificate, MD5, Scram, etc).
- Advanced approaches, such as password profiles.
- Deep dive of authorization and data access control for roles, database objects (tables, etc), view usage, row-level security, and data redaction.
- Auditing, encryption, and SQL injection attack prevention.
Note: this session is delivered in German
Speaker:
Borys Neselovskyi, Sales Engineer, EDB
Technical introduction into Apache Spark - the Swiss Army Knife of Big Data analytics tools.
The talk was held at the Big Data User Group Mannheim, Germany at 24.11.2014.
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
The Hadoop ecosystem has standardized on columnar formats—Apache Parquet for on-disk storage and Apache Arrow for in-memory. With this trend, deep integration with columnar formats is a key differentiator for big data technologies. Vertical integration from storage to execution greatly improves the latency of accessing data by pushing projections and filters to the storage layer, reducing time spent in IO reading from disk, as well as CPU time spent decompressing and decoding. Standards like Arrow and Parquet make this integration even more valuable as data can now cross system boundaries without incurring costly translation. Cross-system programming using languages such as Spark, Python, or SQL can becomes as fast as native internal performance.
In this talk we’ll explain how Parquet is improving at the storage level, with metadata and statistics that will facilitate more optimizations in query engines in the future. We’ll detail how the new vectorized reader from Parquet to Arrow enables much faster reads by removing abstractions as well as several future improvements. We will also discuss how standard Arrow-based APIs pave the way to breaking the silos of big data. One example is Arrow-based universal function libraries that can be written in any language (Java, Scala, C++, Python, R, ...) and will be usable in any big data system (Spark, Impala, Presto, Drill). Another is a standard data access API with projection and predicate push downs, which will greatly simplify data access optimizations across the board.
Speaker
Julien Le Dem, Principal Engineer, WeWork
The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
Amazon Aurora services are MySQL and PostgreSQL -compatible relational database engines with the speed, reliability, and availability of high-end commercial databases at one-tenth the cost. This session introduces you to Amazon Aurora, explores the capabilities and features of Aurora, explains common use cases, and helps you get started with Aurora.
Modern query optimisation features in MySQL 8.Mydbops
MySQL 8 (a huge leap forward), indexing capabilities, execution plan enhancements, optimizer improvements, and many other current query tweak features are covered in the slides.
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinDatabricks
Over the past few years, Python has become the default language for data scientists. Packages such as pandas, numpy, statsmodel, and scikit-learn have gained great adoption and become the mainstream toolkits. At the same time, Apache Spark has become the de facto standard in processing big data. Spark ships with a Python interface, aka PySpark, however, because Spark’s runtime is implemented on top of JVM, using PySpark with native Python library sometimes results in poor performance and usability.
In this talk, we introduce a new type of PySpark UDF designed to solve this problem – Vectorized UDF. Vectorized UDF is built on top of Apache Arrow and bring you the best of both worlds – the ability to define easy to use, high performance UDFs and scale up your analysis with Spark.
This presentation was presented at Percona Live UK.
Although a DBMS hides the internal mechanics of indexing. But to be able to create efficient indexes, you need to know how they work. This talk will help you understand the mechanics of the data structure used to store indexes and as to how it applies to InnoDB. At the end of the talk you will be able to learn how to use cost-analysis to pick and choose correct index definitions and will learn how to create indexes that will work efficiently with InnoDB.
An introductory presentation about table partitioning in PostgreSQL and how to integrate it in your Rails application. Given at the Cambridge Ruby User Group meetup Mar 27th 2014.
Conteúdo apresentado durante o webinar produzido pela KingHost e disponibilizado pelos colaboradores Jerônimo Fagundes e Rodrigo Paris.
Assista ao webinar completo aqui > http://www.kinghost.com.br/eventos-online/webinar-performance-otimizacao-sql
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
The webinar will review a multi-layered framework for PostgreSQL security, with a deeper focus on limiting access to the database and data, as well as securing the data.
Using the popular AAA (Authentication, Authorization, Auditing) framework we will cover:
- Best practices for authentication (trust, certificate, MD5, Scram, etc).
- Advanced approaches, such as password profiles.
- Deep dive of authorization and data access control for roles, database objects (tables, etc), view usage, row-level security, and data redaction.
- Auditing, encryption, and SQL injection attack prevention.
Note: this session is delivered in German
Speaker:
Borys Neselovskyi, Sales Engineer, EDB
Technical introduction into Apache Spark - the Swiss Army Knife of Big Data analytics tools.
The talk was held at the Big Data User Group Mannheim, Germany at 24.11.2014.
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
The Hadoop ecosystem has standardized on columnar formats—Apache Parquet for on-disk storage and Apache Arrow for in-memory. With this trend, deep integration with columnar formats is a key differentiator for big data technologies. Vertical integration from storage to execution greatly improves the latency of accessing data by pushing projections and filters to the storage layer, reducing time spent in IO reading from disk, as well as CPU time spent decompressing and decoding. Standards like Arrow and Parquet make this integration even more valuable as data can now cross system boundaries without incurring costly translation. Cross-system programming using languages such as Spark, Python, or SQL can becomes as fast as native internal performance.
In this talk we’ll explain how Parquet is improving at the storage level, with metadata and statistics that will facilitate more optimizations in query engines in the future. We’ll detail how the new vectorized reader from Parquet to Arrow enables much faster reads by removing abstractions as well as several future improvements. We will also discuss how standard Arrow-based APIs pave the way to breaking the silos of big data. One example is Arrow-based universal function libraries that can be written in any language (Java, Scala, C++, Python, R, ...) and will be usable in any big data system (Spark, Impala, Presto, Drill). Another is a standard data access API with projection and predicate push downs, which will greatly simplify data access optimizations across the board.
Speaker
Julien Le Dem, Principal Engineer, WeWork
The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
Amazon Aurora services are MySQL and PostgreSQL -compatible relational database engines with the speed, reliability, and availability of high-end commercial databases at one-tenth the cost. This session introduces you to Amazon Aurora, explores the capabilities and features of Aurora, explains common use cases, and helps you get started with Aurora.
Modern query optimisation features in MySQL 8.Mydbops
MySQL 8 (a huge leap forward), indexing capabilities, execution plan enhancements, optimizer improvements, and many other current query tweak features are covered in the slides.
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinDatabricks
Over the past few years, Python has become the default language for data scientists. Packages such as pandas, numpy, statsmodel, and scikit-learn have gained great adoption and become the mainstream toolkits. At the same time, Apache Spark has become the de facto standard in processing big data. Spark ships with a Python interface, aka PySpark, however, because Spark’s runtime is implemented on top of JVM, using PySpark with native Python library sometimes results in poor performance and usability.
In this talk, we introduce a new type of PySpark UDF designed to solve this problem – Vectorized UDF. Vectorized UDF is built on top of Apache Arrow and bring you the best of both worlds – the ability to define easy to use, high performance UDFs and scale up your analysis with Spark.
This presentation was presented at Percona Live UK.
Although a DBMS hides the internal mechanics of indexing. But to be able to create efficient indexes, you need to know how they work. This talk will help you understand the mechanics of the data structure used to store indexes and as to how it applies to InnoDB. At the end of the talk you will be able to learn how to use cost-analysis to pick and choose correct index definitions and will learn how to create indexes that will work efficiently with InnoDB.
An introductory presentation about table partitioning in PostgreSQL and how to integrate it in your Rails application. Given at the Cambridge Ruby User Group meetup Mar 27th 2014.
My talk for "MySQL, MariaDB and Friends" devroom at Fosdem on February 2, 2019
Born in 2010 in MySQL 5.5.3 as "a feature for monitoring server execution at a low level," grown in 5.6 times with performance fixes and DBA-faced features, in MySQL 5.7 Performance Schema is a mature tool, used by humans and more and more monitoring products. It becomes more popular over the years. In this talk I will give an overview of Performance Schema, focusing on its tuning, performance, and usability.
Performance Schema helps to troubleshoot query performance, complicated locking issues, memory leaks, resource usage, problematic behavior, caused by inappropriate settings and much more. It comes with hundreds of options which allow precisely tune what to instrument. More than 100 consumers store collected data.
Performance Schema is a potent tool. And very complicated at the same time. It does not affect performance in most cases and can slow down server dramatically if configured without care. It collects a lot of data, and sometimes this data is hard to read.
This talk will start from the introduction of how Performance Schema designed, and you will understand why it slowdowns server in some cases and does not affect your queries in others. Then we will discuss which information you can retrieve from Performance Schema and how to do it effectively.
I will cover its companion sys schema and graphical monitoring tools.
MySQL 5.7 innodb_enhance_partii_20160527Saewoong Lee
Release Date : 2016.05.27
Version : MySQL 5.7
Index :
- Part I : InnoDB Performance
- Part I : InnoDB Buffer Pool Flushing
- Part I : InnoDB internal Transaction General
- Part I : InnoDB Improved adaptive flushing
- Part II : InnoDB Online DDL
- Part II : Tablespace management
- Part II : InnoDB Bulk Load for Create Index
- Part II : InnoDB Temporary Tables
- Part II : InnoDB Full-Text CJK Support
- Part II : Support Syslog on Linux / Unix OS
- Part II : Performance_schema
- Part II : Useful tips
CONNECT is a storage engine for MariaDB. It allows to use external, possibly remote data sources of several types. We can then query them as if they were local relational tables. In this presentation, Federico Razzoli demonstrates a couple of interesting things we can do with it. The talk took place at MariaDB Server Fest 2020.
Transparent sharding with Spider: what's new and getting startedMariaDB plc
OpenWorks 2019 Session
MariaDB Server 10.3 introduced transparent, built-in sharding with the Spider storage engine to scale out reads, writes and storage. MariaDB Server 10.4 will include a number of improvements, including DDL pushdown. In this session, Ralf Gebhardt and Kentoku Shiba of MariaDB show how to set up a sharded MariaDB cluster and scale out on demand, as well explore as best practices for high availability and consistency in a sharded deployment.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
OSMC 2008 | Monitoring MySQL by Geert VanderkelenNETWAYS
Monitoring MySQL has a long history within Nagios. Several plugins are available already. In addition to that, there are probably lots of plugins that have been developed by the community. We take a look at some of these and discuss what kind of additional useful information could be pulled out of a MySQL Server for monitoring it even better. A simple example on how to write such plugins will be shown, also using NDB API for monitoring MySQL Cluster. Now that MySQL Enterprise Monitor (MEM) is available, we'll go through the possibilities for combining the two platforms. We will also discuss the NDOUtils for storing configuration and event data using MySQL.
This talk starts with a brief overview of MySQL itself: some history, where it's heading too, and why it is so successful.
Logstash for SEO: come monitorare i Log del Web Server in realtimeAndrea Cardinale
Durante questo intervento verrà illustrato come si possono installare software di analisi in tempo reale dei log del server (ELK pattern: ElasticSearch, Logstash, Kibana) in modo da ottenere tutte le informazioni su Googlebot e per scoprire i punti di debolezza e gli eventi relativi ai nostri siti che non potremmo altrimenti conoscere.
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
Italy Agriculture Equipment Market Outlook to 2027harveenkaur52
Agriculture and Animal Care
Ken Research has an expertise in Agriculture and Animal Care sector and offer vast collection of information related to all major aspects such as Agriculture equipment, Crop Protection, Seed, Agriculture Chemical, Fertilizers, Protected Cultivators, Palm Oil, Hybrid Seed, Animal Feed additives and many more.
Our continuous study and findings in agriculture sector provide better insights to companies dealing with related product and services, government and agriculture associations, researchers and students to well understand the present and expected scenario.
Our Animal care category provides solutions on Animal Healthcare and related products and services, including, animal feed additives, vaccination
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
9. Architecture
use mysql
show tables like ‘spider%’
spider_link_failed_log : HA에서 사용되는 테이블로 스파이더에 구성된 노드가 에러가 발생하면 로그 기록
spider_link_mon_servers : HA에서 사용되는 테이블로 스파이더의 HA 모니터링하는 서버의 계정을 관리
spider_tables : 스파이더 엔진으로 구성된 테이블 리스트 출력
spider_xa : 서로 다른 노드에 분산 트랜잭션 관리
spider_xa_member : 서로 다른 노드에 분산 트랜잭션을 원격 관리하는 서버 정보 저장
10. What is Spider Engine?
Architecture
How to Create Spider Engine
Limitation of a Spider Engine
Guideline
Q&A
Agenda
11. ■ Step by Step
1. 서버 준비
Spider node Node 1
Node 2
Node 3
Node 4
※ 5대 DB Server 준비
① Spider Node
(mariaDB 10.0.9)
② Data Node
⑴ MySQL 5.6.24
⑵ MySQL 5.5.30
⑶ MySQL 5.7.9
⑷ mariaDB 10.0.4
How to Create Spider Engine
12. ■ Step by Step
2-1. 아래 링크에서 스파이더 엔진 설치 파일을 받아 설치
2-2. 스파이더 엔진을 사용하기 위한 소스 적용
mysql -uroot -p < ../mariadb/share/install_spider.sql
2-3. 스파이더 엔진 설치 완료확인
SELECT engine, support, transactions, xa
FROM information_schema.engines;
https://mariadb.com/kb/en/mariadb/spider-storage-engine-overview/
+--------------------+---------+--------------+------+
| engine | support | transactions | xa |
+--------------------+---------+--------------+------+
| SPIDER | YES | YES | YES |
| CSV | YES | NO | NO |
| MyISAM | YES | NO | NO |
| BLACKHOLE | YES | NO | NO |
| FEDERATED | YES | YES | NO |
| MRG_MyISAM | YES | NO | NO |
| ARCHIVE | YES | NO | NO |
| MEMORY | YES | NO | NO |
| PERFORMANCE_SCHEMA | YES | NO | NO |
| Aria | YES | NO | NO |
| InnoDB | DEFAULT | YES | YES |
+--------------------+---------+--------------+------+
How to Create Spider Engine
13. Spider Engine
Node 1 (mysql 5.7.9)
Node 2 (mysql 5.6.24)
Node 3 (mysql 5.5.30)
Node 4 (mariadb 10.0.4)
Local Server
Spider Table
*.frm
*.par
File
Remote Server
Spider Table
*.frm
File
Data
(*.ibd)
192.168.124.134
192.168.124.137
192.168.124.138
192.168.124.139
192.168.124.140
Schema info (O)
Partition info (O)
Index data (X)
How to Create Spider Engine
■ Step by Step II
1. 서버 준비
14. Spider Engine
1. 원격서버 등록
sheard_node1 : 192.168.124.137
sheard_node2 : 192.168.124.138
sheard_node3 : 192.168.124.139
sheard_node4 : 192.168.124.140
CREATE SERVER shard_node1 FOREIGN DATA WRAPPER mysql
OPTIONS(
HOST '192.168.124.137',
DATABASE 'log_db',
USER 'root',
PASSWORD '123qwe!@#',
PORT 3306
);
CREATE SERVER shard_node4 FOREIGN DATA WRAPPER mysql
OPTIONS(
HOST '192.168.124.140',
DATABASE 'log_db',
USER 'root',
PASSWORD '123qwe!@#',
PORT 3306
);
☞ Reference
https://dev.mysql.com/doc/refman/5.7/en/create-server.html
...
Spider node
Node 1
Node 2
Node 3
Node 4
How to Create Spider Engine
■ Step by Step II
15. CREATE TABLE log_db.shardTest
(
id int unsigned NOT NULL AUTO_INCREMENT
, name char(120) NOT NULL DEFAULT ''
, PRIMARY KEY (id)
)
ENGINE=spider
COMMENT='wrapper "mysql", table "shardTest"'
PARTITION BY KEY (id)
(
PARTITION shard1 COMMENT = 'srv "shard_node1"'
, PARTITION shard2 COMMENT = 'srv "shard_node2"'
, PARTITION shard3 COMMENT = 'srv "shard_node3"'
, PARTITION shard4 COMMENT = 'srv "shard_node4"'
) ;
2. Shard 테이블을 생성
1) 데이터가 저장될 실제 테이블 이름과
파티션 Key 값에 따라
저장될 원격 서버의 정보를 Comment 에 입력함
Spider node
Node 1
Node 2
Node 3
Node 4
How to Create Spider Engine
■ Step by Step II
Spider Engine
16. Node 1 Node 2 Node 3 Node 4
CREATE TABLE log_db.shardTest
(
id int unsigned NOT NULL AUTO_INCREMENT
, name char(120) NOT NULL DEFAULT ''
, PRIMARY KEY (id)
)
ENGINE=innodb;
Spider node
Node 1
Node 2
Node 3
Node 4
How to Create Spider Engine
■ Step by Step II
3. 각 노드에 Data 테이블 생성
17. insert into log_db.shardTest(name) values ('spider_01');
insert into log_db.shardTest(name) values ('spider_01');
insert into log_db.shardTest(name) values ('spider_01');
insert into log_db.shardTest(name) values ('spider_01');
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | spider_01 |
| 4 | spider_01 |
| 3 | spider_01 |
| 2 | spider_01 |
+----+-----------+
Spider node
Node 1
Node 2
Node 3
Node 4
How to Create Spider Engine
■ Step by Step II
4. Spider 노드에서 데이터 입력 및 확인
Spider Engine
18. Node 1 Node 2
Node 3 Node 4
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | spider_01 |
+----+-----------+
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 3 | spider_01 |
+----+-----------+
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 4 | spider_01 |
+----+-----------+
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 2 | spider_01 |
+----+-----------+
Spider node
Node 1
Node 2
Node 3
Node 4
How to Create Spider Engine
■ Step by Step II
5. 각 Data 노드에서 데이터 확인
1) 데이터가 잘 분산 됐는지 체크
19. Spider node
Node 1
Node 2
Node 3
Node 4
CREATE TABLE shardTest_node1(id int unsigned NOT NULL AUTO_INCREMENT,name char(120) NOT NULL DEFAULT '',PRIMARY KEY (id) )
ENGINE=FEDERATED CONNECTION='shard_node1/shardTest';
CREATE TABLE shardTest_node2(id int unsigned NOT NULL AUTO_INCREMENT,name char(120) NOT NULL DEFAULT '',PRIMARY KEY (id) )
ENGINE=FEDERATED CONNECTION='shard_node2/shardTest';
CREATE TABLE shardTest_node3(id int unsigned NOT NULL AUTO_INCREMENT,name char(120) NOT NULL DEFAULT '',PRIMARY KEY (id) )
ENGINE=FEDERATED CONNECTION='shard_node3/shardTest';
CREATE TABLE shardTest_node4(id int unsigned NOT NULL AUTO_INCREMENT,name char(120) NOT NULL DEFAULT '',PRIMARY KEY (id) )
ENGINE=FEDERATED CONNECTION='shard_node4/shardTest';
How to Create Spider Engine
■ Step by Step II
Spider Engine
0. 테스트를 위해 FEDERATED Table 구성
20. [localhost] ((none)) 06:31> show storage engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| SPIDER | YES | Spider storage engine | YES | YES | NO |
| MRG_MyISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO |
| MyISAM | YES | MyISAM storage engine | NO | NO | NO |
| InnoDB | DEFAULT | Percona-XtraDB, Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
| ARCHIVE | YES | Archive storage engine | NO | NO | NO |
| FEDERATED | NO | FederatedX pluggable storage engine | NULL | NULL | NULL |
| PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO |
| Aria | YES | Crash-safe tables with MyISAM heritage | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
CREATE TABLE log_db.shardTest
(
id int(10) unsigned NOT NULL AUTO_INCREMENT
, name char(120) NOT NULL DEFAULT ''
, PRIMARY KEY (id)
)
ENGINE=spider
COMMENT='wrapper "mysql", table "shardTest"'
PARTITION BY KEY (id)
(
PARTITION shard1 COMMENT = 'srv "shard_node1"'
, PARTITION shard2 COMMENT = 'srv "shard_node2"'
, PARTITION shard3 COMMENT = 'srv "shard_node3"'
, PARTITION shard4 COMMENT = 'srv "shard_node4"'
) ;
SELECT * FROM log_db.shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
| 5 | key1 |
| 4 | SPIDER_01 |
…
| 6 | key1 |
+----+-----------+
(원격데이터 정상추출)
결론 :
FEDERATED 와 SPIDER ENGINES
간의 의존성 없슴!
How to Create Spider Engine
Q. Spider Engine 이 Federated Engine 과 의존성이 있는 것은 아닐까? (FEDERATED STORAGE ENGINES OFF?)
21. How to Create Spider Engine
■ spider_internal_sql_log_off
shard된 서버에 spider 엔진에서 보낸 Query를
general log 기록합니다.
구문
(show table status from `log_db` like 'v_shardtest_group‘)
(show index from `log_db`.`v_shardtest_group`)
■ spider_remote_sql_log_off
shard된 서버에 sql_log_off 를 설정합니다.
구문
(set session sql_log_off = 1; )
22. What is Spider Engine?
Architecture
How to Create Spider Engine
Limitation of a Spider Engine
Guideline
Q&A
Agenda
23. Spider Engine
insert into shardtest(name) values ('spider_01');
Node 1
root@192.168.124.137 on using TCP/IP
set session transaction isolation level read committed;
set session autocommit = 1;
start transaction
SET NAMES utf8
log_db
select `id`,`name` from `log_db`.`shardTest`order by `id` desc
limit 1 for update
insert into `log_db`.`shardTest`(`id`,`name`)values(5,'spider_01')
commit
Node 2 Node 3 Node 4
select `id`,`name` from `log_db`.`shardTest` order by `id` desc
limit 1 for update
UPDATE Lock?
UPDATE Lock?
Limitation of a Spider Engine
1. INSERT (Auto Increment 발번)
24. Spider Engine Node 1 Node 2 Node 3 Node 4
Limitation of a Spider Engine
1. INSERT (Auto Increment 발번)
Auto_increment = 1
insert into shardtest(name) values ('spider_01');
select `id`,`name` from `log_db`.`shardTest`order by
`id` desc limit 1 for update
각 노드 Auto_increment MAX 확인
Auto_increment = 2
노드 선택
insert into shardtest(id,name) values (1,'spider_01');
insert into shardtest(id,name) values
(1,'spider_01');
Auto_increment = 2
insert into shardtest(name) values ('spider_02');
Node 1
select `id`,`name` from `log_db`.`shardTest`order by
`id` desc limit 1 for update
각 노드 Auto_increment MAX 확인
Auto_increment = 3
노드 선택
insert into shardtest(id,name) values (2,'spider_01');
Node 3
insert into shardtest(id,name) values
(2,'spider_01');
Node 1 Node 2 Node 3 Node 4
25. Limitation of a Spider Engine
■ spider_auto_increment_mode
-1 : 테이블 매개 변수를 사용한다.
0 : 일반모드
자동 증가 값을 원격 서버에서 얻은 카운터를 사용한다.
1 : Quick 모드
자동 증가 값을 Spider Node 내부 카운터를 사용한다.
2 : Set Zero Value
자동 증가 값을 Data Node 내부 카운터를 사용한다.
3 : null 입력시 자동 증가 값은 원격 서버에서 생성
0을 입력하면 로컬 서버에서 생성
1. INSERT (Auto Increment 발번)
☞ Reference
https://mariadb.com/kb/en/mariadb/spider-server-system-variables/
26. Node 3Node 2Node 1
Node 4
Limitation of a Spider Engine
Spider Engine
set spider_auto_increment_mode = 1;
insert into shardtest(name) values ('spider_01');
root@192.168.124.137 on using TCP/IP
set session transaction isolation level read committed;
set session autocommit = 1;
start transaction
SET NAMES utf8
log_db
insert into `log_db`.`shardTest`(`id`,`name`)values(6,'spider_01')
commit
1. INSERT (Auto Increment 발번)
28. Limitation of a Spider Engine
2) spider_auto_increment_mode = 0 문제 발생
1. INSERT (Auto Increment 발번)
응답이 없음
Spider Engine
?
29. Limitation of a Spider Engine
2) spider_auto_increment_mode = 0 문제 발생
1. INSERT (Auto Increment 발번)
Node 1 Node 2 Node 3 Node 4
Commit 이 없음
30. Limitation of a Spider Engine
2) spider_auto_increment_mode = 0 문제 발생
1. INSERT (Auto Increment 발번)
Node 1 Node 2 Node 3 Node 4
general_log Commit
( X )
31. Spider Engine
Node 1
update shardtest set id = 9 where id = 8;
(select 0,`id`,`name` from `log_db`.`shardTest` where `id` = 8 for update)
delete from `log_db`.`shardTest` where `id` = 8 and `name` = 'spider_01' limit 1
insert high_priority into `log_db`.`shardTest`(`id`,`name`)values(9,'spider_01')
Data Move
Data Move
Node 1Node 4
Data move
CREATE TABLE `shardtest` (
...
) ENGINE=SPIDER AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COMMENT='wrapper
"mysql", table "shardTest" '
Limitation of a Spider Engine
2. Update (Shard Key Update)
Change
Auto
increment
Node 4
32. Spider Engine
update shardtest set id = 9 where id = 8;
(select 0,`id`,`name` from `log_db`.`shardTest` where `id` = 8 for update)
delete from `log_db`.`shardTest` where `id` = 8 and `name` = 'spider_01' limit 1
insert high_priority into `log_db`.`shardTest`(`id`,`name`)values(9,'spider_01')
update shardtest set id = 17 where id = 9; (select 0,`id`,`name` from `log_db`.`shardTest` where `id` = 9 for update)
update `shardTest` set `id` = 17 where `id` = 9 and `name` = 'spider_01' limit 1
Node 1Spider Engine
CREATE TABLE `shardtest` (
...
) ENGINE=SPIDER AUTO_INCREMENT=18 DEFAULT CHARSET=utf8 COMMENT='wrapper
"mysql", table "shardTest" '
Node 1 Data move
CREATE TABLE `shardtest` (
...
) ENGINE=SPIDER AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COMMENT='wrapper
"mysql", table "shardTest" '
Change
Auto
increment
Limitation of a Spider Engine
2. Update (Shard Key Update)
Node 1Node 4
Data move
Change
Auto
increment
Node 1
Node 4
Data Move
Data Move
33. Node 1
update shardtest set name = 'spider_02' where id = 1;
update shardtest set name = 'spider_03' where name = 'spider_01';
Spider Engine
(select 0,`id`,`name` from `log_db`.`shardTest` where `id` = 1 for update)
update `log_db`.`shardTest` set `name` = 'spider_02' where `id` = 1 limit 1
update `log_db`.`shardTest` set `name` = 'spider_03' where `id` = 2 and
`name` = 'spider_01' limit 1
(select 0,`id`,`name` from `log_db`.`shardTest` where `name` = 'spider_01'
for update)
Node 2 Node 3 Node 4
Node 2 Node 3 Node 4
Shard key (X)
Node 1
Limitation of a Spider Engine
2. Update (Shard Key 가 아닌 컬럼의 Update)
Spider Engine
Node 3
34. update shardtest set name = 'spider_01' where name = 'spider_02';
Spider Engine
update `log_db`.`shardTest` set `name` = 'spider_01' where `id` = 1 and
`name` = 'spider_02' limit 1
update `log_db`.`shardTest` set `name` = 'spider_01' where `id` = 13 and
`name` = 'spider_02' limit 1
(select 0,`id`,`name` from `log_db`.`shardTest` where `name` = 'spider_02'
for update)
Shard key (X)
1 건 업데이트
Limitation of a Spider Engine
2. Update (Shard Key 가 아닌 컬럼의 Update)
update shardtest set name = 'spider_03' where name = 'spider_01';
Node 1
Spider Engine
update `log_db`.`shardTest` set `name` = 'spider_03' where `id` = 1 and
`name` = 'spider_01' limit 1
(select 0,`id`,`name` from `log_db`.`shardTest` where `name` = 'spider_01'
for update)
Node 2 Node 3 Node 4
Shard key (X)
Node 1
여러 건 업데이트
Node 1
Node 2 Node 3 Node 4Node 1
1건씩 처리됨
⇒ 운영시 binlog size 급증 이슈발생가능!
36. Alter table shardTest ADD COLUMN TEST_Col1 int;
Alter table shardTest DROP COLUMN TEST_Col1 int;
CREATE INDEX idx_shardtest ON shardtest(name);
DROP INDEX idx_shardtest ON shardtest;
Shard Shema 변경 없음
Limitation of a Spider Engine
4. Alter
Spider Engine Node 2 Node 3 Node 4Node 1
37. Shard Shema 변경 없음
Alter table shardtest ADD COLUMN (add_col1 int);
SELECT * FROM shardtest;
(ERROR 1054 (42S22): Unknown column 'add_col1' in 'field list‘)
SELECT id, name FROM shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
…
| 2 | SPIDER_01 |
+----+-----------+
4 rows in set (0.26 sec)
select `id`,`name` from `log_db`.`shardTest` order by `id`,`name`
General_log Query
Limitation of a Spider Engine
4. Alter
Spider Engine Node 2 Node 3 Node 4Node 1
38. Shard Shema 변경 없음
Alter table shardtest ADD COLUMN (add_col1 int);
Limitation of a Spider Engine
4. Alter
Spider Engine Node 2 Node 3 Node 4Node 1
INSERT INTO shardtest(name) VALUES('key2');
Query OK, 1 row affected (0.45 sec)
select id, name from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
| 2 | SPIDER_01 |
| … |
| 5 | key2 |
+----+-----------+
UPDATE shardtest
set name = 'key3'
where id = 5;
ERROR 1054 (42S22): Unknown column 'add_col1' in 'field list'
insert into `log_db`.`shardTest`(`id`,`name`) values (5,'key2')
(select 0,`id`,`name`,`add_col1` from `shardTest` where `id` = 5 for update)
rollback
General_log Query
39. Alter table shardtest DROP COLUMN (add_col1 int);
Limitation of a Spider Engine
4. Alter
Spider Engine Node 2 Node 3 Node 4Node 1
General_log Query
select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
| 2 | SPIDER_01 |
| 3 | SPIDER_01 |
| 4 | SPIDER_01 |
| 5 | key1 |
| 6 | key1 |
| 7 | SPIDER_01 |
| 8 | SPIDER_01 |
+----+-----------+
Select OK
Alter table shardtest ADD COLUMN (add_col1 int);
(각 Data 노드에 컬럼추가)
select `id`,`name` from `log_db`.`shardTest` order by `id`,`name`
※ Schema 수정 순서에 유의
1) Data node 수정
2) Spider node 수정
40. select `id`,`name` from `log_db`.`shardTest`select * from shardtest;
+----+-----------+
| id | name |
+----+-----------+
| 1 | spider_01 | -- shard1
| 4 | spider_01 | -- shard2
| 3 | spider_01 | -- shard3
| 2 | spider_01 | -- shard4
+----+-----------+
select `id`,`name` from `log_db`.`shardTest`
select * from shardtest order by id;
+----+-----------+
| id | name |
+----+-----------+
| 1 | spider_01 | -- shard1
| 2 | spider_01 | -- shard4
| 3 | spider_01 | -- shard3
| 4 | spider_01 | -- shard2
+----+-----------+
Limitation of a Spider Engine
5. SELECT (order by)
Spider Engine Node 2 Node 3 Node 4Node 1
PK 순서 ( X )
Spider Node 에서
Sort 발생
41. Spider Node
“Sort buffer size”
설정에 따른 성능테스트
약 100만 row TEST
+-----+--------------+-------------+
| int | varchar(120) | varchar(10) |
+-----+--------------+-------------+
| id | name | node_num |
+-----+--------------+-------------+
| 1 | aaa | node1 |
+-----+--------------+-------------+
TEST QUERY
select * from shardtest order by id;
Variables (sort_buffer_size) avg retun time
256K 16.612
512K 12.76
1M 10.562
2M 9.988
4M 7.812
8M 7.93
16M 7.508 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
256K 512K 1M 2M 4M 8M 16M
avg retun time
Limitation of a Spider Engine
5. SELECT (order by)
42. File Sort Algorithm
1. Single Pass
SELECT되는 모든 컬럼
버퍼에 담아 정렬하여
바로 리턴
2. Two Pass
정렬 대상 + KEY로 정렬
하여 해당 정렬 기준으
로 KEY값 리턴
※ Two Pass는 정렬을 위하여
temp file을 사용한다.
Sort_merge_passes는 Merge
횟수를 카운트 한다.
약 100만 row TEST
+-----+--------------+-------------+
| int | varchar(120) | varchar(10) |
+-----+--------------+-------------+
| id | name | node_num |
+-----+--------------+-------------+
| 1 | aaa | node1 |
+-----+--------------+-------------+
TEST QUERY
select * from shardtest order by id;
Variables (sort_buffer_size) avg retun time
256K 16.612
512K 12.76
1M 10.562
2M 9.988
4M 7.812
8M 7.93
16M 7.508
Limitation of a Spider Engine
5. SELECT (order by)
상태 확인 : show status like ‘%sort%’
43. INSERT INTO shardtest(name) values('SPIDER_01');
…
INSERT INTO shardtest(name) values('SPIDER_04’);
INSERT INTO shardtest(name) values('key1');
select `id`,`name` from `log_db`.`shardTest`SELECT * FROM shardtest where name = 'key1';
+----+------+
| id | name |
+----+------+
| 5 | key1 |
+----+------+
Spider Engine
Filtering
Shard DB Filter ? ?
Limitation of a Spider Engine
5. SELECT (where)
Spider Engine Node 2 Node 3 Node 4Node 1
CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=SPIDER DEFAULT CHARSET=utf8 COMMENT='wrapper "mysql", table
"shardTest" '
/*!50100 PARTITION BY KEY (`id`)
(PARTITION shard1 COMMENT = 'srv "shard_node1" ' ENGINE = SPIDER,
…
PARTITION shard4 COMMENT = 'srv "shard_node4" ' ENGINE = SPIDER) */
44. select `id`,`name` from `log_db`.`shardTest` where `name` = 'key1'
Shard DB Filtering !!!
CREATE INDEX idx_shardtest_01 ON shardtest(name);
SELECT * FROM shardtest where name = 'key1';
+----+------+
| id | name |
+----+------+
| 5 | key1 |
+----+------+
CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_shardtest_01` (`name`)
) ENGINE=SPIDER DEFAULT CHARSET=utf8 COMMENT='wrapper "mysql", table
"shardTest" '
/*!50100 PARTITION BY KEY (`id`)
(PARTITION shard1 COMMENT = 'srv "shard_node1" ' ENGINE = SPIDER,
PARTITION shard2 COMMENT = 'srv "shard_node2" ' ENGINE = SPIDER,
PARTITION shard3 COMMENT = 'srv "shard_node3" ' ENGINE = SPIDER,
PARTITION shard4 COMMENT = 'srv "shard_node4" ' ENGINE = SPIDER)
*/
Limitation of a Spider Engine
5. SELECT (where)
Spider Engine Node 2 Node 3 Node 4Node 1
45. select name from shardtest group by name;
KEY `idx_shardtest_01` (`name`) 있는 상태
select count(*) from `log_db`.`shardTest`
select `name` from `log_db`.`shardTest` order by `name` desc
select `name` from `log_db`.`shardTest` order by `name`
select `name` from `log_db`.`shardTest` where `name` > 'key1' order by `name`
select `name` from `log_db`.`shardTest` where `name` > 'SPIDER_01' order by `name`
+-----------+
| name |
+-----------+
| key1 |
| SPIDER_01 |
+-----------+
KEY `idx_shardtest_01` (`name`) 를 제거하면…
DROP INDEX idx_shardtest_01 ON shardtest;
select name from shardtest group by name;
+-----------+
| name |
+-----------+
| key1 |
| SPIDER_01 |
+-----------+
select `name` from `log_db`.`shardTest`
Shard DB Grouping??
Limitation of a Spider Engine
Spider Engine Node 2 Node 3 Node 4Node 1
5. SELECT (Group By)
Shard DB Full ScanShard DB Grouping
46. select id from shardtest group by name;
+----+
| id |
+----+
| 5 |
| 1 |
+----+
Shard DB Sorting
select `id`,`name` from `log_db`.`shardTest` order by `name`
Shard DB Grouping
select name, count(*) from shardtest group by name;
+-----------+----------+
| name | count(*) |
+-----------+----------+
| key1 | 2 |
| SPIDER_01 | 6 |
+-----------+----------+
select `name` from `log_db`.`shardTest` order by `name`
select count(*) from `log_db`.`shardTest`
select count(*) from `log_db`.`shardTest` ?
Limitation of a Spider Engine
5. SELECT (Group By)
Spider Engine Node 2 Node 3 Node 4Node 1
47. CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
`node_num` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=utf8
Limitation of a Spider Engine
5. SELECT (Group By MIN, MAX, COUNT(*) {INDEX (O)})
Spider Engine Node 2 Node 3 Node 4Node 1
select max(id) from shardtest;
select min(id) from shardtest;
select count(id) from shardtest;
select count(*) from shardtest;
select name, min(id) from shardtest;
select name, max(id) from shardtest;
select name, count(*) from shardtest;
select name, count(id) from shardtest;
select `id` from `log_db`.`shardTest` order by `id` desc limit 1;
select `id` from `log_db`.`shardTest` order by `id` limit 1;
select count(*) from `log_db`.`shardTest`;
select count(*) from `log_db`.`shardTest`;
select `id`,`name` from `log_db`.`shardTest` order by `id`,`name`;
select `id`,`name` from `log_db`.`shardTest` order by `id`,`name`;
select `name` from `log_db`.`shardTest` order by `name`;
select `id`,`name` from `log_db`.`shardTest` order by `id`,`name`;
“Name”
index (X)
CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
`node_num` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_shardtest_01` (`name`),
KEY `idx_shardtest_02` (`id`,`name`)
) …
48. Limitation of a Spider Engine
5. SELECT (Group By MIN, MAX, COUNT(*) {INDEX (X)})
Spider Engine Node 2 Node 3 Node 4Node 1
select max(name) from shardtest;
select min(name) from shardtest;
select count(name) from shardtest;
select count(*) from shardtest;
select name, min(id) from shardtest;
select name, max(id) from shardtest;
select name, count(*) from shardtest;
select name, count(id) from shardtest;
select `name` from `log_db`.`shardTest`;
select `name` from `log_db`.`shardTest`;
select count(*) from `log_db`.`shardTest`;
select count(*) from `log_db`.`shardTest`;
select `id`,`name` from `log_db`.`shardTest`;
select `id`,`name` from `log_db`.`shardTest`;
select `id`,`name` from `log_db`.`shardTest`;
select `id`,`name` from `log_db`.`shardTest`;
CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
`node_num` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=utf8
CREATE TABLE `shardtest` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` char(120) NOT NULL DEFAULT '',
`node_num` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) …
Full Scan
49. CREATE VIEW v_shardtest_group
AS
select name, max(id) as id, count(*) as cnt
from shardtest
group by name;
CREATE TABLE `v_shardtest_group` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` CHAR(120) NOT NULL DEFAULT '',
`cnt` INT NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_shardtest_01` (`name`)
)
ENGINE=SPIDER
DEFAULT CHARSET=utf8
COMMENT='wrapper "mysql", table "v_shardtest_group" '
/*!50100 PARTITION BY KEY (`id`)
(PARTITION shard1 COMMENT = 'srv "shard_node1"' ENGINE = SPIDER,
PARTITION shard2 COMMENT = 'srv "shard_node2"' ENGINE = SPIDER,
PARTITION shard3 COMMENT = 'srv "shard_node3"' ENGINE = SPIDER,
PARTITION shard4 COMMENT = 'srv "shard_node4"' ENGINE = SPIDER) */ ;
select name, sum(cnt) from v_shardtest_group group by name; select `name`,`cnt` from `log_db`.`v_shardtest_group` order by `name`
Limitation of a Spider Engine
5. SELECT (Group By)
Spider Engine Node 2 Node 3 Node 4Node 1
50. select * from shardtest order by id;
select * from shardtest order by name;
select id from shardtest order by id;
select name from shardtest order by id;
select id from shardtest order by name;
select name from shardtest order by name;
Limitation of a Spider Engine
5. SELECT (Order By)
Spider Engine Node 2 Node 3 Node 4Node 1
select `id`,`name` from `log_db`.`shardTest`
select `id`,`name` from `log_db`.`shardTest`
select count(*) from `log_db`.`shardTest`
select `id` from `log_db`.`shardTest` order by `id`
select `id`,`name` from `log_db`.`shardTest`
select `id`,`name` from `log_db`.`shardTest`
select count(*) from `log_db`.`shardTest`
select `name` from `log_db`.`shardTest` order by `name`
Shard DB DataSpider DB Sorting
52. insert into join_tb(ct_code, ct_cnt) values ('A001',1);
insert into join_tb(ct_code, ct_cnt) values ('A002',2);
insert into join_tb(ct_code, ct_cnt) values ('A003',3);
…
insert into join_tb(ct_code, ct_cnt) values ('A009',49);
insert into join_tb(ct_code, ct_cnt) values ('A010',50);
insert into common_tb(ct_code, ct_name) values ('A001','TEST1');
insert into common_tb(ct_code, ct_name) values ('A002','TEST2');
insert into common_tb(ct_code, ct_name) values ('A003','TEST3');
…
insert into common_tb(ct_code, ct_name) values ('A011','TEST11');
insert into common_tb(ct_code, ct_name) values ('A012','TEST12');
insert into common_tb(ct_code, ct_name) values ('A013','TEST13');
select a.ct_cnt, a.ct_code, b.ct_name
from join_tb a
, common_tb b
where a.ct_code = b.ct_code;
select `ct_code`,`ct_cnt` from `log_db`.`join_tb`
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A010'
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A006'
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A012'
…
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A007'
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A013'
select `ct_code`,`ct_name` from `log_db`.`common_tb` where `ct_code` = 'A009'
…
Limitation of a Spider Engine
5. SELECT (Join)
Spider Engine Node 2 Node 3 Node 4Node 1
join_tb
common_tb
join_tb join_tb join_tb
1단계 : join_tb 데이터 스캔
2단계 : 1단계에서 읽은 데이터를 대상
으로 common_tb 데이터 확인
55. alter table common_tb
COMMENT='wrapper "mysql", table "common_tb" '
/*!50100 PARTITION BY KEY (ct_no)
(PARTITION shard1
COMMENT = 'srv "shard_node2 shard_node3 shard_node4 shard_node1"'
ENGINE = SPIDER)
*/
TRUNCATE TABLE common_tb;
insert into common_tb(ct_code, ct_name) values ('A001','TEST1');
insert into common_tb(ct_code, ct_name) values ('A002','TEST2');
insert into common_tb(ct_code, ct_name) values ('A003','TEST3');
…
insert into common_tb(ct_code, ct_name) values ('A011','TEST11');
insert into common_tb(ct_code, ct_name) values ('A012','TEST12');
insert into common_tb(ct_code, ct_name) values ('A013','TEST13');
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(1,'A001','TEST01')
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(2,'A002','TEST02')
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(3,'A003','TEST03')
…
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(11,'A011','TEST11')
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(12,'A012','TEST12')
insert into `common_tb`(`ct_no`,`ct_code`,`ct_name`)values(13,'A013','TEST13')
Limitation of a Spider Engine
5. SELECT (Join)
Spider Engine Node 2 Node 3 Node 4Node 1
56. CREATE ALGORITHM = MERGE VIEW v_join_tb
as
select a.jt_no, a.ct_cnt, a.ct_code a_ct_code, b.ct_code b_ct_code, b.ct_name
from join_tb a
, common_tb b
where a.ct_code = b.ct_code;
CREATE TABLE v_join_tb(
jt_no int
, ct_cnt int
, a_ct_code char(4)
, b_ct_code char(4)
, ct_name varchar(30)
, primary key (jt_no)
, key (a_ct_code, b_ct_code)
)
ENGINE=SPIDER DEFAULT CHARSET=utf8
COMMENT='wrapper "mysql", table "v_join_tb" '
/*!50100 PARTITION BY KEY (`jt_no`)
(PARTITION shard1 COMMENT = 'srv "shard_node1"' ENGINE = SPIDER,
..
PARTITION shard4 COMMENT = 'srv "shard_node4"' ENGINE = SPIDER) */;
select * from v_join_tb
where a_ct_code = 'A001' and b_ct_code = 'A001';
select `jt_no`,`ct_cnt`,`a_ct_code`,`b_ct_code`,`ct_name`
from `log_db`.`v_join_tb`
where `a_ct_code` = 'A001' and `b_ct_code` = 'A001'
Key Point !!
Limitation of a Spider Engine
5. SELECT (Join)
Spider Engine Node 2 Node 3 Node 4Node 1
CREATE VIEW ALGORITHM
1) MERGE
2) TEMPTABLE
57. select * from v_join_tb
where a_ct_code = 'A001’and b_ct_code = 'A001'; select `jt_no`,`ct_cnt`,`a_ct_code`,`b_ct_code`,`ct_name`
from `log_db`.`v_join_tb`
where `a_ct_code` = 'A001' and `b_ct_code` = 'A001'
Limitation of a Spider Engine
5. SELECT (Join)
Spider Engine Node 2 Node 3 Node 4Node 1
INDEX Access OK !
58. SELECT *
FROM shardtest
WHERE name IN ('key1','SPIDER_01');
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
| 5 | key1 |
| 4 | SPIDER_01 |
| 8 | SPIDER_01 |
| 3 | SPIDER_01 |
| 7 | SPIDER_01 |
| 2 | SPIDER_01 |
| 6 | key1 |
+----+-----------+
Temporary Table ← IN KEY
?
Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( IN( ) )
drop temporary table if exists log_db.tmp_spider_bka_0x7f5ba7f23420_shardTest;
create temporary table log_db.tmp_spider_bka_0x7f5ba7f23420_shardTest(
id bigint
, c0 char(120) collate utf8_general_ci
)
engine=memory default charset=utf8 collate utf8_general_ci;
insert into log_db.tmp_spider_bka_0x7f5ba7f23420_shardTest(id,c0)values(0,'key1'),(1,'SPIDER_01');
select a.id,b.`id`,b.`name`
from log_db.tmp_spider_bka_0x7f5ba7f23420_shardTest a,`log_db`.`shardTest` b
where a.c0 <=> b.`name`;
drop temporary table if exists log_db.tmp_spider_bka_0x7f5ba7f23420_shardTest;
OR equal
59. Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( IN( ) )
SELECT * FROM shardtest WHERE name = 'key1'
UNION ALL
SELECT * FROM shardtest WHERE name = 'SPIDER_01';
+----+-----------+
| id | name |
+----+-----------+
| 5 | key1 |
| 6 | key1 |
| 1 | SPIDER_01 |
| 4 | SPIDER_01 |
| 8 | SPIDER_01 |
| 3 | SPIDER_01 |
| 7 | SPIDER_01 |
| 2 | SPIDER_01 |
+----+-----------+
select `id`,`name` from `log_db`.`shardTest` where `name` = 'SPIDER_01'
select `id`,`name` from `log_db`.`shardTest` where `name` = 'key1'
60. create table shardtest_key (
shard_key int
, shard_no int NOT NULL AUTO_INCREMENT
, val varchar(100)
, PRIMARY KEY (shard_no, shard_key)
)
ENGINE=SPIDER DEFAULT CHARSET=utf8
COMMENT='wrapper "mysql", table "shardtest_key" '
/*!50100 PARTITION BY LIST (`shard_key`)
(PARTITION shard1 VALUES IN (1) COMMENT = 'srv "shard_node1"' ENGINE = SPIDER,
PARTITION shard3 VALUES IN (3) COMMENT = 'srv "shard_node3"' ENGINE = SPIDER,
PARTITION shard4 VALUES IN (4) COMMENT = 'srv "shard_node4"' ENGINE = SPIDER,
PARTITION shard2 VALUES IN (2) COMMENT = 'srv "shard_node2"' ENGINE = SPIDER) */
create table shardtest_key (
shard_key int
, shard_no int NOT NULL AUTO_INCREMENT
, val varchar(100)
, PRIMARY KEY (shard_no)
);
insert into shardtest_key(shard_key, val) values (1,'node1');
insert into shardtest_key(shard_key, val) values (2,'node2');
insert into shardtest_key(shard_key, val) values (3,'node3');
insert into shardtest_key(shard_key, val) values (4,'node4');
…
insert into shardtest_key(shard_key, val) values (2,'node2');
insert into shardtest_key(shard_key, val) values (3,'node3');
insert into shardtest_key(shard_key, val) values (4,'node4');
Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( IN( ) + shardkey)
61. SELECT *
FROM shardtest_key
WHERE shard_key in (1,2)
AND shard_no in (1,5,9,2,26);
+-----------+----------+-------+
| shard_key | shard_no | val |
+-----------+----------+-------+
| 1 | 1 | node1 |
| 1 | 5 | node1 |
| 1 | 9 | node1 |
| 2 | 2 | node2 |
| 2 | 26 | node2 |
+-----------+----------+-------+
Spider Engine Node 2
Node 3 Node 4
Node 1
Limitation of a Spider Engine
5. SELECT ( IN( ) + shardkey )
drop temporary table if exists log_db.tmp_spider_bka_0x7f0c403d1820_shardtest_key;
create temporary table log_db.tmp_spider_bka_0x7f0c403d1820_shardtest_key(
id bigint
, c0 int(11)
, c1 int(11)
)
engine=memory default charset=utf8 collate utf8_general_ci;
insert into log_db.tmp_spider_bka_0x7f0c403d1820_shardtest_key(id,c0,c1)
values(0,1,1),(1,2,1),(2,5,1),(3,9,1),(4,26,1)
select a.id,b.`shard_key`,b.`shard_no`,b.`val`
from log_db.tmp_spider_bka_0x7f0c403d1820_shardtest_key a
,`log_db`.`shardtest_key` b
where a.c0 <=> b.`shard_no` and a.c1 <=> b.`shard_key`
drop temporary table if exists log_db.tmp_spider_bka_0x7f0c403d1820_shardtest_key
values(0,1,2),(1,2,2),(2,5,2),(3,9,2),(4,26,2)
62. id >= 1 and id <= 6
equal
Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( BETWEEN )
SELECT * FROM shardtest where id between 1 and 6;
+----+-----------+
| id | name |
+----+-----------+
| 1 | SPIDER_01 |
| 5 | key1 |
| 4 | SPIDER_01 |
| 3 | SPIDER_01 |
| 2 | SPIDER_01 |
| 6 | key1 |
+----+-----------+
(select 0,`id`,`name`
from `log_db`.`shardTest`
where `id` >= 1 and `id` <= 6
order by `id`)
order by `id`
63. explain SELECT * FROM shardtest where name = 'key1' and id >= 1 and id <= 6;
+------+-------------+-----------+-------+--------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+-------+--------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | shardtest | range | PRIMARY,idx_shardtest_01 | PRIMARY | 4 | NULL | 8 | Using where |
+------+-------------+-----------+-------+--------------------------+---------+---------+------+------+-------------+
Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( BETWEEN )
SELECT *
FROM shardtest
WHERE name = 'key1‘
AND id >= 1 and id <= 6;
+----+------+
| id | name |
+----+------+
| 5 | key1 |
| 6 | key1 |
+----+------+
(select 0,`id`,`name`
from `log_db`.`shardTest`
where `id` >= 1 and `id` <= 6
order by `id`)
order by `id`
Spider Engine
“name” filter
Shard DB Data
Index PRIMARY KEY
64. explain SELECT * FROM shardtest use index(idx_shardtest_01) where name = 'key1' and id >= 1 and id <= 6;
+------+-------------+-----------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+------+------------------+------------------+---------+-------+------+-------------+
| 1 | SIMPLE | shardtest | ref | idx_shardtest_01 | idx_shardtest_01 | 360 | const | 9687 | Using where |
+------+-------------+-----------+------+------------------+------------------+---------+-------+------+-------------+
idx_shardtest_01 : name
Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( BETWEEN )
SELECT *
FROM shardtest use index(idx_shardtest_01)
WHERE name = 'key1‘
AND id >= 1 and id <= 6;
+----+------+
| id | name |
+----+------+
| 5 | key1 |
| 6 | key1 |
+----+------+
select `id`,`name` from `log_db`.`shardTest` where `name` = 'key1'
Spider Engine
“id” filter
Shard DB Data
65. Spider Engine Node 2 Node 3 Node 4Node 1
Limitation of a Spider Engine
5. SELECT ( BETWEEN )
create index idx_shardtest_03 on shardtest(name, id);
explain SELECT * FROM shardtest use index(idx_shardtest_03) where name = 'key1' and id between 1 and 6;
+------+-------------+-----------+-------+------------------+------------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+-------+------------------+------------------+---------+------+------+--------------------------+
| 1 | SIMPLE | shardtest | range | idx_shardtest_03 | idx_shardtest_03 | 364 | NULL | 8 | Using where; Using index |
+------+-------------+-----------+-------+------------------+------------------+---------+------+------+--------------------------+
SELECT *
FROM shardtest use index(idx_shardtest_03)
WHERE name = 'key1‘
AND id between 1 and 6;
+----+------+
| id | name |
+----+------+
| 5 | key1 |
| 6 | key1 |
+----+------+
(select 0,`id`,`name`
from `log_db`.`shardTest`
where `id` >= 1 and `id` <= 6
and `name` = 'key1'
order by `id`,`name`
)
order by `id`,`name`
66. What is Spider Engine?
Architecture
How to Create Spider Engine
Limitation of a Spider Engine
Guideline
Q&A
Agenda
67. Schema
• Shard 별로 스키마 구성 (물리적으로 같은 위치에)
• 자동 증가값은 Spider Engine 에서 처리하도록 구성
• Global 스카마의 경우 HA 기능을 사용하여 각 서버에 복제본 구성
• Index 는, Data node 에는 전통적으로 필요한 index 생성
Spider node 에는 필요한 모든 컬럼 index 생성
Performance
• 각 Node에서 가장 성능이 나쁜 서버의 속도가 전체 성능을 결정한다.
(아쉽게도 각 Node에 Query를 parallel 하게 전달하지는 않는다.)
Query
• IN절과 ORDER BY 절은 최소화
• 조건절에 Filter 위치를 확실히 확인
• 설계 및 개발단계에 선반영 검토 (ex. 모든 쿼리에 Shard Key 조건을 추가하여 개발)
Guideline
68. What is Spider Engine?
Architecture
How to Create Spider Engine
Limitation of a Spider Engine
Guideline
Q&A
Agenda
69. Q & A
■ Spider_internal_limit
shard node에 레코드 LIMIT 제한을 준다.
■ Spider_internal_limit = 100
Query : SELECT * FROM shardtest limit 101
Node 2
Node 3
Node 4
Node 1
70. Q & A
■ Spider_internal_limit = 100
Query : SELECT * FROM shardtest order by id limit 101
Node 2 Node 3 Node 4Node 1
■ Spider_internal_limit = -1
Query : SELECT * FROM shardtest order by id limit 100
Node 2 Node 3 Node 4Node 1
71. Q & A
■ Spider_internal_limit = 100
Query : SELECT * FROM shardtest limit 1000
Node 2 Node 3 Node 4Node 1
■ 결과
72. Q & A
■ 스키마 변경은 각 노트에 상태와 무관한가?
Node 4
Shutdown
Spider Engine