SlideShare a Scribd company logo
How we switched to columnar
w/ SpendHQ
Allen Herrera
2 https://www.spendhq.com/
Drivers For Change?
• Massive growth in the last couple years
• Legacy application architecture not built to scale
• Need to Improve query performance
• Need to modernize
3
Why Leave our old database?
• Old DB
• Modernization
• Based off MySQL 5.1.X
• Performance
• Slow
• Single Threaded
• Couldn't Scale Vertically Anymore
• Not Clusterable
• What were we looking for
• Ease of transition
• Scalability
• Lower cost if possible
• Community Support
4
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Identify Options
Quantify Targets
Overcome Challenges
Set up cluster
Professional Services
Define Migration Process
Automate Cluster Creation
Fail Deploying
Refactor ETLs
Actually Deploy
The Journey
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Identify Options
Identifying Alternative Databases
Consultant identified 7 open source database technologies
7
Database Name Released Notes
Calpont InfiniDB 2010 C/C++ MySQL front end
ClickHouse 2014 C/C++
CreateDB 2013 Java Based
Greenplum Database 2005 Postgres Based
MariaDB ColumnStore 2016 MySQL /Inifinibd branch
MapD Technologies 2016 C/C++
MonetDB 2004 C
Chose MariaDB Columnstore - syntax similarity to our prior DB
• ANSI SQL
• Open Source
• Enterprise Support
• Professional Services
• Scalable
• Performant
8
Why MariaDB Columnstore!
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Quantify Targets
Quantify Targets
• Goals
• 71% reduction by switching databases
• 95% reduction if we de-normalize our schemas
10
-6.00
4.00
14.00
24.00
34.00
44.00
54.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Seconds
Query
Query Performance Chart
InfoBright Joins MCS Joins MCS Flat queries
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Overcome Challenges
Setting up our first Columnstore DB
Really Easy !
https://github.com/toddstoffel/columnstore_easy_setup
Lots of my.cnf optimizations out of the box, very few we had to adjust including
» interactive_timeout
» wait_timeout
» max_length_for_sort_data
» innodb_buffer_pool_size
12
Connecting the first Columnstore database
13
1st Challenge
14
Array
(
[0] => Array
(
[0] => Array
(
[min_date] => 2015-10-01
)
[Company] => Array
(
[lft] => 731
)
)
)
Array
(
[0] => Array
(
[$vtable_723] => Array
(
[max_date] => 2013-05-01
[lft] => 29
)
)
)
Root Cause:
Cakephp ORM use of mysqli_fetch_field_direct()
Overcoming legacy framework limitations
2nd Challenge
15
Bad SQL:
SELECT uuid , `vendor_name` , SUM(amount) FROM table GROUP BY
name;
Proper SQL
SELECT MIN(uuid) , ` vendor_name` , SUM(amount) FROM table GROUP
BY name;
Overcoming legacy code
Internal error: IDB-2021: 'table. uuid’ is not in GROUP BY clause.
All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
3rd Challenge
16
Overcoming case sensitive group bys
id name
1 allen
2 Allen
SELECT COUNT(id), `name` FROM test_table GROUP BY `name`;
MariaDB -
Old DB -
Results
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Professional Services
Reviewing progress with professional services
Analyzing performance
1. Hard drives
• Fio testing - https://github.com/axboe/fio.git
˗ /usr/local/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 --
gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --
size=4G --readwrite=randrw --rwmixread=75
˗ We noticed mixed iops of ~2,000
˗ After switching to SSDs ~ 13,000
2. Query Configuration
• Adjusted innodb buffer size
• Adjusted columnstore.xml
• PmMaxMemorySmallSide – small side table joins memory size
18
Reviewing progress with professional services
Analyzing performance
» Queries
» Page loads
• Confirmed improved query performance translated to improved
uncached page load times in our app
19
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Automate Cluster Creation
Automating Cluster Creation
21 Based off of: https://github.com/toddstoffel/columnstore_easy_setup
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Define Migration Process
Defining our data transfer process
64 minutes - insert into {columnstore} select * from {innodb}
46 minute - load from outfile
26 minute - cpimport
For InnoDB – 5 hours vs 15 hours - split large csv
23
181 Million records from InnoDB to Columnstore
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Fail Deploying
Solution
First deployment Fail
1. Attach more storage – doubled to 32 TB
2. Utilize /etc/rc.local to connect to iscsi target and remount automatically
25
Problems
1. Storage drives – 16TB wasn’t enough!
2. iSCSI volumes in fstab – no no
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Refactor ETLs
Refactoring data processes for Columnstore
Write operations were not plug and play
27
40%
44%
1040%
100+ %
1200%
100%
Refactoring data processes for Columnstore
7x - ETL – utilize new multi processes architecture to take advantage
of innodb row level locking
Client Shard Rebuilds - export to csv and import from outfile
28
Refactoring data processes for Columnstore
Where we ended up
29
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Actually Deploy
Releasing!
Storage Networking on our UM
latency
bandwidth
write speeds
Multipath
yum install device-mapper-multipath
31
ProblemsSolution
What Next!
Dec ‘ 17 Mar Aug Nov Dec
Where we are going next
Refactor legacy critical performance areas as needed
Building a new version of our APP
Addressing data schema
not to use as many joins
separate
application data (transactional/state based)
client data (columnar)
Testing GPU databases
Brytlyt
Omnisci
33
Read Time
~78%
Write Time
~10%
Storage
10 times more
Modify Application
Time Consuming
Biggest wins Biggest Losses
ETL
25x
Concurrency
About Same
Questions?
@allenherrera
aherrera@spendhq.com

More Related Content

What's hot

A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB
 
Handbook essential office_etiquette
Handbook essential office_etiquetteHandbook essential office_etiquette
Handbook essential office_etiquetteGaurav Singh
 
TEMPLATE PROFIL USAHA.docx
TEMPLATE PROFIL USAHA.docxTEMPLATE PROFIL USAHA.docx
TEMPLATE PROFIL USAHA.docxFajar Baskoro
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
lý thuyết cơ sở dữ liệu phân tán
lý thuyết cơ sở dữ liệu phân tánlý thuyết cơ sở dữ liệu phân tán
lý thuyết cơ sở dữ liệu phân tánNgo Trung
 
Impacto del internet en bancos y finanzas
Impacto del internet en bancos y finanzasImpacto del internet en bancos y finanzas
Impacto del internet en bancos y finanzasSilvia Aracely
 
Chuong 3- CSDL phân tán
Chuong 3- CSDL phân tánChuong 3- CSDL phân tán
Chuong 3- CSDL phân tánduysu
 
Bài 5: Chuẩn hóa cơ sở dữ liệu
Bài 5: Chuẩn hóa cơ sở dữ liệuBài 5: Chuẩn hóa cơ sở dữ liệu
Bài 5: Chuẩn hóa cơ sở dữ liệuMasterCode.vn
 
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...nataliej4
 
91684060 356-cau-trắc-nghiệm-csdl-2
91684060 356-cau-trắc-nghiệm-csdl-291684060 356-cau-trắc-nghiệm-csdl-2
91684060 356-cau-trắc-nghiệm-csdl-2tranquanthien
 
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thống
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thốngTìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thống
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thốngtNguynMinh11
 
Giao trinh kien truc may tinh
Giao trinh kien truc may tinhGiao trinh kien truc may tinh
Giao trinh kien truc may tinhTung Huynh
 
PostgreSQL 공간관리 살펴보기 이근오
PostgreSQL 공간관리 살펴보기 이근오PostgreSQL 공간관리 살펴보기 이근오
PostgreSQL 공간관리 살펴보기 이근오PgDay.Seoul
 
Chuong 2 - CSDL phân tán
Chuong 2 - CSDL phân tánChuong 2 - CSDL phân tán
Chuong 2 - CSDL phân tánduysu
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 

What's hot (20)

A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
 
Handbook essential office_etiquette
Handbook essential office_etiquetteHandbook essential office_etiquette
Handbook essential office_etiquette
 
TEMPLATE PROFIL USAHA.docx
TEMPLATE PROFIL USAHA.docxTEMPLATE PROFIL USAHA.docx
TEMPLATE PROFIL USAHA.docx
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
lý thuyết cơ sở dữ liệu phân tán
lý thuyết cơ sở dữ liệu phân tánlý thuyết cơ sở dữ liệu phân tán
lý thuyết cơ sở dữ liệu phân tán
 
Ngôn ngữ lập trình C#
Ngôn ngữ lập trình C#Ngôn ngữ lập trình C#
Ngôn ngữ lập trình C#
 
Impacto del internet en bancos y finanzas
Impacto del internet en bancos y finanzasImpacto del internet en bancos y finanzas
Impacto del internet en bancos y finanzas
 
Luận văn: Khai phá dữ liệu; Phân cụm dữ liệu, HAY
Luận văn: Khai phá dữ liệu; Phân cụm dữ liệu, HAYLuận văn: Khai phá dữ liệu; Phân cụm dữ liệu, HAY
Luận văn: Khai phá dữ liệu; Phân cụm dữ liệu, HAY
 
Chuong 3- CSDL phân tán
Chuong 3- CSDL phân tánChuong 3- CSDL phân tán
Chuong 3- CSDL phân tán
 
Bài 5: Chuẩn hóa cơ sở dữ liệu
Bài 5: Chuẩn hóa cơ sở dữ liệuBài 5: Chuẩn hóa cơ sở dữ liệu
Bài 5: Chuẩn hóa cơ sở dữ liệu
 
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...
ChuyenDeANM ung dung he thong IDS securityonion vao giam sat moi truong mang ...
 
91684060 356-cau-trắc-nghiệm-csdl-2
91684060 356-cau-trắc-nghiệm-csdl-291684060 356-cau-trắc-nghiệm-csdl-2
91684060 356-cau-trắc-nghiệm-csdl-2
 
Danh Sách 200 Đề Tài Báo Cáo Thực Tập Công Nghệ Thông Tin, 9 Điểm
Danh Sách 200 Đề Tài Báo Cáo Thực Tập Công Nghệ Thông Tin, 9 ĐiểmDanh Sách 200 Đề Tài Báo Cáo Thực Tập Công Nghệ Thông Tin, 9 Điểm
Danh Sách 200 Đề Tài Báo Cáo Thực Tập Công Nghệ Thông Tin, 9 Điểm
 
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thống
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thốngTìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thống
Tìm hiểu hệ mã hoá RSA và cách triển khai vào hệ thống
 
Giao trinh kien truc may tinh
Giao trinh kien truc may tinhGiao trinh kien truc may tinh
Giao trinh kien truc may tinh
 
PostgreSQL 공간관리 살펴보기 이근오
PostgreSQL 공간관리 살펴보기 이근오PostgreSQL 공간관리 살펴보기 이근오
PostgreSQL 공간관리 살펴보기 이근오
 
Chuong 2 - CSDL phân tán
Chuong 2 - CSDL phân tánChuong 2 - CSDL phân tán
Chuong 2 - CSDL phân tán
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 

Similar to How we switched to columnar at SpendHQ

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with PostgresEDB
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 

Similar to How we switched to columnar at SpendHQ (20)

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with Postgres
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 

More from MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBMariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoringMariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 

More from MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024Shane Coughlan
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfFurqanuddin10
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockSkilrock Technologies
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024vaibhav130304
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 

Recently uploaded (20)

5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 

How we switched to columnar at SpendHQ

  • 1. How we switched to columnar w/ SpendHQ Allen Herrera
  • 3. Drivers For Change? • Massive growth in the last couple years • Legacy application architecture not built to scale • Need to Improve query performance • Need to modernize 3
  • 4. Why Leave our old database? • Old DB • Modernization • Based off MySQL 5.1.X • Performance • Slow • Single Threaded • Couldn't Scale Vertically Anymore • Not Clusterable • What were we looking for • Ease of transition • Scalability • Lower cost if possible • Community Support 4
  • 5. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Identify Options Quantify Targets Overcome Challenges Set up cluster Professional Services Define Migration Process Automate Cluster Creation Fail Deploying Refactor ETLs Actually Deploy The Journey
  • 6. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Identify Options
  • 7. Identifying Alternative Databases Consultant identified 7 open source database technologies 7 Database Name Released Notes Calpont InfiniDB 2010 C/C++ MySQL front end ClickHouse 2014 C/C++ CreateDB 2013 Java Based Greenplum Database 2005 Postgres Based MariaDB ColumnStore 2016 MySQL /Inifinibd branch MapD Technologies 2016 C/C++ MonetDB 2004 C Chose MariaDB Columnstore - syntax similarity to our prior DB
  • 8. • ANSI SQL • Open Source • Enterprise Support • Professional Services • Scalable • Performant 8 Why MariaDB Columnstore!
  • 9. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Quantify Targets
  • 10. Quantify Targets • Goals • 71% reduction by switching databases • 95% reduction if we de-normalize our schemas 10 -6.00 4.00 14.00 24.00 34.00 44.00 54.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Seconds Query Query Performance Chart InfoBright Joins MCS Joins MCS Flat queries
  • 11. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Overcome Challenges
  • 12. Setting up our first Columnstore DB Really Easy ! https://github.com/toddstoffel/columnstore_easy_setup Lots of my.cnf optimizations out of the box, very few we had to adjust including » interactive_timeout » wait_timeout » max_length_for_sort_data » innodb_buffer_pool_size 12
  • 13. Connecting the first Columnstore database 13
  • 14. 1st Challenge 14 Array ( [0] => Array ( [0] => Array ( [min_date] => 2015-10-01 ) [Company] => Array ( [lft] => 731 ) ) ) Array ( [0] => Array ( [$vtable_723] => Array ( [max_date] => 2013-05-01 [lft] => 29 ) ) ) Root Cause: Cakephp ORM use of mysqli_fetch_field_direct() Overcoming legacy framework limitations
  • 15. 2nd Challenge 15 Bad SQL: SELECT uuid , `vendor_name` , SUM(amount) FROM table GROUP BY name; Proper SQL SELECT MIN(uuid) , ` vendor_name` , SUM(amount) FROM table GROUP BY name; Overcoming legacy code Internal error: IDB-2021: 'table. uuid’ is not in GROUP BY clause. All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
  • 16. 3rd Challenge 16 Overcoming case sensitive group bys id name 1 allen 2 Allen SELECT COUNT(id), `name` FROM test_table GROUP BY `name`; MariaDB - Old DB - Results
  • 17. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Professional Services
  • 18. Reviewing progress with professional services Analyzing performance 1. Hard drives • Fio testing - https://github.com/axboe/fio.git ˗ /usr/local/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 -- gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 -- size=4G --readwrite=randrw --rwmixread=75 ˗ We noticed mixed iops of ~2,000 ˗ After switching to SSDs ~ 13,000 2. Query Configuration • Adjusted innodb buffer size • Adjusted columnstore.xml • PmMaxMemorySmallSide – small side table joins memory size 18
  • 19. Reviewing progress with professional services Analyzing performance » Queries » Page loads • Confirmed improved query performance translated to improved uncached page load times in our app 19
  • 20. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Automate Cluster Creation
  • 21. Automating Cluster Creation 21 Based off of: https://github.com/toddstoffel/columnstore_easy_setup
  • 22. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Define Migration Process
  • 23. Defining our data transfer process 64 minutes - insert into {columnstore} select * from {innodb} 46 minute - load from outfile 26 minute - cpimport For InnoDB – 5 hours vs 15 hours - split large csv 23 181 Million records from InnoDB to Columnstore
  • 24. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Fail Deploying
  • 25. Solution First deployment Fail 1. Attach more storage – doubled to 32 TB 2. Utilize /etc/rc.local to connect to iscsi target and remount automatically 25 Problems 1. Storage drives – 16TB wasn’t enough! 2. iSCSI volumes in fstab – no no
  • 26. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Refactor ETLs
  • 27. Refactoring data processes for Columnstore Write operations were not plug and play 27 40% 44% 1040% 100+ % 1200% 100%
  • 28. Refactoring data processes for Columnstore 7x - ETL – utilize new multi processes architecture to take advantage of innodb row level locking Client Shard Rebuilds - export to csv and import from outfile 28
  • 29. Refactoring data processes for Columnstore Where we ended up 29
  • 30. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Actually Deploy
  • 31. Releasing! Storage Networking on our UM latency bandwidth write speeds Multipath yum install device-mapper-multipath 31 ProblemsSolution
  • 32. What Next! Dec ‘ 17 Mar Aug Nov Dec
  • 33. Where we are going next Refactor legacy critical performance areas as needed Building a new version of our APP Addressing data schema not to use as many joins separate application data (transactional/state based) client data (columnar) Testing GPU databases Brytlyt Omnisci 33
  • 34. Read Time ~78% Write Time ~10% Storage 10 times more Modify Application Time Consuming Biggest wins Biggest Losses ETL 25x Concurrency About Same

Editor's Notes

  1. Thanks for coming! Hi I’m Allen Herrera. I’m an Engineer with SpendHQ, Most recently encharged with the migration from our prior database to MariaDB Columnstore in the later half of 2018. I’m excited and nervous to be here speaking, sharing the results of our journey as this is a first for me. So how we’ll do this is Ill start with a business level summary/ justification and then jump into a time lined story of our process ,challenges and results of switching to MariaDB Columnstore.
  2. So lets start with some high level background information about SpendHQ! We are In the business of cleaning up client data and helping them identify savings opportunities from your data We do this through our sister consulting company ISG and our Data Analytic / Visualization web application It all starts with the client sending us raw data in any format they have. This includes excel files, csv files and more. We then take this data and consolidate, normalize it into a single schema . Part of this process includes normalizing vendor/company names Next the data is categorized against a custom taxonomy defined by the client. We then have internal experts review the results with the client in case further data processing is necessary All this to result in clean data being uploaded into our production web application for our clients to browse their data and drive conversations around potential savings opportunities. This final part , step 7, is where we’ve migrated from our old columnar database Infobright to MariaDB Columnstore.
  3. So why change? At SpendHQ, we’ve been going through some massive growth that’s exposed scalability flaws with our legacy architecture. One of those was our database. Over the last two plus years as we’ve over doubled in size as a company but the data we get is 10 to 20 times greater than before. Naturally, as data grew, we realized we needed to address query performance and modernization.
  4. So why specifically was our old database flawed. Simple our prior DB was old. It was based of MySQL 5.1 ( similar to infiniDB actually which Columstore was created out of but our old database stopped giving updates). This older version translates to slower performance compared to modern databases. Infobrights columnar db was single threaded which didnt help performance. We couldn’t clusterize it. We didn’t have access to Innodb tables for transactions, thus we were left with MyIsam. Furthermore we couldn’t scale vertically anymore to marginally improve performance either like in years past. All that said, when defining what to move to, these were our top priorities. Ease of transition, Scalability , Lower Cost and Support.
  5. With that said, I’ve set the stage to take us back to December 2017 when we began considering other databases. Now when going through this, we didn’t plan on three sections but when looking backwards this is how I see. We had three phases, analyze, prepare and refine.
  6. Step one of analyzing was to identify options
  7. To do this, we engaged with pythian on a consulting engagement to identify and recommend a database that fit our needs. Taking into consideration our wants from a couple slides ago, (ease of transition, scalability , Cost and Support ) and our business model, they identified 7 column-oriented databases and recommended one. MariaDB Columnstore. By the way thank you John Shults for your work on this here.
  8. MariaDB Columnstore met all our need to haves. Its ANSI SQL, apart from some special columnstore commands and intricacies, Its open source helping keep costs down and community support big There’s enterprise support for those who want it, which we at SpendHQ definitely take advantage of There’s Professional services to ramp up team education and to be a partner in any project. Plus MariaDB Columnstore is scalable and performant.
  9. Next was to quantify what we aimed to accomplish by switching to MariaDB so we could sell Bussiness Folks that this is a good decision and so that we could measure our criteria of success.
  10. Working with our data team of Robert Little and Dan Mackey, they identified roughly 25 problematic queries that we wanted to see improved. In our final report from pythian they estimated we could achieve a 71% reduction in query time by simply switching databases without significantly refactoring the queries or the schema. Management was blown away. 71% reduction in reads. That means 19 second queries in only 3 and half . Furthermore, if we were to refactor the schema to de-normalize the tables, we could achieve 95%. (blue is for our old db, red was mariadb columnstore, green is a denormalized tables in columnstore) So these became our goals.
  11. It took us until March/April to begin actual work for the preparation of the migration. We hired professional services to come out and setup out first Columnstore instance so we could connect to it and do some minor performance tuning of the database.
  12. Our consultant from MariaDB was Todd Stoffel. Thank you Todd, great and knowledgeable guy. He has a git repo that we used to easily install Columnstore using ansible. It auto tunes the configs to the hardware better than the Columnstore defaults. So it was a great place to start. However the challenges we faced were NOT on behalf of mariadb, but rather our legacy app.
  13. Fail fail fail when making queries to the database. Let me quickly summarize the 3 challenges we had to overcome to keep moving forward.
  14. Our framework’s ORM picked up the use of vtables on Columnstore and modified the data objects returns to include them. The root cause of the issue had to do with a specific php function returning the vtable value within our frameworks ORM that we had to wrap around custom logic tying values to tables they were queried from.
  15. The next challenge was improper SQL. Somehow our prior database let queries like this above to execute.
  16. The 3rd challenge was minor but revolved around case sensitive group bys relative to our old database.
  17. Once we got past our application level challenges, we were ready to move from a standalone Columnstore instance to a cluster and actually benchmark performance. We brought Todd back in for a 2nd time to look at the cluster we had setup on our own and help drive optimizations.
  18. The first thing we did was look at our hard drives with FIO testing. We identified our storage solution was HDD and had low IOPS speeds. Thus Todd recommend faster hard drives. After switching to SSDs we noticed better concurrency performance from the cluster as well as better individual performance. Next we looked at the configuration files The two key changes that yielded the best results were increasing both our innodb buffer size in the my.cnf and PmMaxMemorySmallSide from the columnstore.xml
  19. After other minor adjustments, we went back to the original 25 ish queries and re-benchmarked them all again. Results were great!
  20. Next we moved on to making cluster creation faster with some automation
  21. [ start playing video then speak] Todd built the original version that we then modified for subsequent deployments. Here is a small video of a 1 um 3 pm setup.
  22. Next we worked on the process to move the actual data from our old database to MariaDB Columnstore. At a high level what we chose to do was essentially export all our data as CSVs onto a hard drive, move it to MariaDB and import with cpimport.
  23. To optimize this performance, we adjusted key sql variables and split CSVs for tables that were InnoDB. This took our migration time down from 35 hours to 8 hours as Innodb tables were the slow ones to insert. [Talk about slide numbers]
  24. With the migration process in place, we were ready to test deploying production.
  25. And it failed. The reason for it had to do with our hard drives. We didn’t have enough storage which was shocking to us. The other was a silly mistake of having iSCSI volumes in the fstab without having authenticated with the target first. Simple solutions, more storage and adding some logic to rc.local
  26. Furthermore, once we felt ready to deploy again, we noticed performance issues in writes.
  27. We then benchmarked 6 core data changing processes that write performance wasn’t good. So I sought out to refactor some of the critical write performance areas. This included the ETL.
  28. Utilizing a new multi process architecture to take advantage of innodb row level locking. This resulted in our ETL being 7 times faster on the same hardware as Infobright, but also opened the door to more vertical scaling resulting in 25 times faster ETL uploads.
  29. With that and 2 more refactors, we made a huge leap of performance.
  30. Now we could deploy
  31. The main issue we had when releasing was concurrency and our hard drives not working fast enough. When we asked our storage provider, they recommended we use multipath to open additional sessions from the server to the storage, opening up additional bandwidth.
  32. So what do next!
  33. Next to do is to de-normalize our schema but instead of trying to refactor our existing app, we’ll be starting a new app. We’ll also want to pilot a GPU database to see the results of the Brytlyt partnership with MariaDB. Overall we are happy with MariaDB Columnstore. Performance is great, the only issues we’ve come across with it are really just our own.
  34. So to summarize Where we stand is great compared to where we were. Faster writes, significantly faster reads, significantly faster ETL. Storage needs took a hit and our application needed quite a bit of work given its age. Concurrency isn’t any better for us at the moment but can be solved if we utilize maxscale with additional UMs. Before concluding I want to give a special shout out to all the support we’ve received from MariaDB, specifically Todd Stoffel, Geoff Montee, David Hill plus more