SlideShare a Scribd company logo
1
Galaxy Semiconductor Intelligence
Case Study: Big Data with MariaDB 10
Bernard Garros, Sandrine Chirokoff, Stéphane Varoqui
Galaxy confidential
Galaxy Big Data scalability Menu
• About Galaxy Semiconductor (BG)
• The big data challenge (BG)
• Scalable, fail-safe architecture for big data (BG)
• MariaDB challenges: compression (SV)
• MariaDB challenges: sharding (SC)
• Results (BG)
• Next Steps (BG)
• Q&A
2
Galaxy confidential
About Galaxy Semiconductor
• A software company dedicated to semiconductor:
 Quality improvement
 Yield enhancement
 NPI acceleration
 Test cell OEE optimization
• Founded in 1988
• Track record of building products that offer the best
user experience + premier customer support
• Products used by 3500+ users and all major ATE
companies
3
via
SEMICONDUCTOR
INTELLIGENCE
Galaxy confidential
4
Galaxy Teo, Ireland
HQ, G&A
Galaxy East
Sales, Marketing, Apps
Galaxy France
R&D, QA, & Apps
Partner
Taiwan Sales & Apps
Partner
Israel Sales
Partner
Singapore Sales & Apps
Galaxy West
Sales, Apps
Partner
Japan Sales & Apps
Partner
China Sales & Apps
Worldwide Presence
Galaxy confidential
Test Data production / consumption
5
ATE
Test Data
Files
ETL,
Data
Cleansing
Yield-Man
Data
Cube(s)
ETL
Galaxy TDR
Examinator-Pro
Browser-based
dashboards
Custom Agents
Data Mining
OEE Alarms
PAT
Automated Agents
SYA
Galaxy confidential
Growing volumes
6
MB
GEX
STDF
STDF
STDF
GB/TB
GEX, Dashboard,
Monitoring
TDR
YM
STDF
STDF
STDF
TB/PB
GEX, Dashboard,
Monitoring
TDR
YM
STDF
STDF
STDF
Galaxy confidential
Big Data, Big Problem
• More data can produce more knowledge and higher profits
• Modern systems make it easy to generate more data
• The problem is how to create a hardware and software platform
that can make full and effective use of all this data as it
continues to grow
• Galaxy has the expertise to guide you to a solution for this big
data problem that includes:
– Real-time data streams
– High data insertion rates
– Scalable database to extreme data volumes
– Automatic compensation for server failures
– Use of inexpensive, commodity servers
– Load balancing
7
Galaxy confidential
First-level solutions
• Partitioning
– SUMMARY data
• High level reports
• 10% of the volume
• Must be persistent for a long period (years)
– RAW data
• Detailed data inspection
• 90% of the volume
• Must be persistent for a short period (months)
• PURGE
– Partitioning per date (e.g. daily) on RAW data
tables
– Instant purge by drop partitions
• Parallel insertion
8
Yield-Man
Yield-Man
Yield-Man
Galaxy confidential
New customer use case
9
• Solution needs to be easily setup
• Solution needs to handle large (~50TB+) data
• Need to handle large insertion speed of approximately 2 MB/sec
Solutions
• Solution 1: Single scale-up node (lots of RAM, lots of CPU,
expensive high-speed SSD storage, single point of failure, not
scalable, heavy for replication)
• Solution 2: Cluster of commodity nodes (see later)
Galaxy confidential
Cluster of Nodes
Other customer applications
and systems
Other Test Data Files
Event Data Stream
ATE config &
maintenance events
Real-time Tester Status
Test Floor
Data Sources
STDF Data Files
.
.
.
RESTful
API
RESTful API
Test
Hardware
Management
System
MES
Galaxy Cluster of Commodity Servers
DB Node
DB Node
DB Node
DB Node
Compute
Node
Compute
Node
Head Node
Dashboard
Node
Yield-Man
PAT-Man
Yield-Man
PAT-Man
Real-Time Interface
Test Data Stream
10
Galaxy confidential
Easy Scalability
Other customer applications
and systems
Other Test Data Files
Event Data Stream
ATE config &
maintenance events
Real-time Tester Status
Test Floor
Data Sources
STDF Data Files
.
.
.
RESTful
API
Test
Hardware
Management
System
MES
Galaxy Cluster of Commodity Servers
DB Node
DB Node
DB Node
DB Node
Compute
Node
Compute
Node
Head Node
Dashboard
Node
Yield-Man
PAT-Man
Yield-Man PAT-Man
Real-Time Interface
Test Data Stream
DB Node
DB Node
Compute
Node
RESTful API
11
Galaxy confidential
MariaDB challenges
12
❏ From a single box to elastic architecture
❏ Reducing the TCO
❏ OEM solution
❏ Minimizing the impact on existing code
❏ Reach 200B records
Galaxy confidential
A classic case
13
SENSOR
SENSOR
SENSOR
SENSOR
SENSOR
STORE
QUERY
QUERY
QUERY
QUERY
QUERY
❏ Millions of records/s sorted by timeline
❏ Data is queried in other order
❏ Indexes don’t fit into main memory
❏ Disk IOps become bottleneck
Galaxy confidential
B-tree gotcha
14
2ms disk or network latency, 100 head
seeks/s, 2 options:
❏ Increase concurrency
❏ Increase packet size
Increased both long time ago using
innodb_write_io_threads , innodb_io_capacity, bulk load
Galaxy confidential
B-tree gotcha
15
With a Billion records, a single partition B-tree stops staying in
main memory, a single write produces read IOps to traverse the
tree:
❏ Use partitioning
❏ Insert in primary key order
❏ Big redo log and smaller amount of dirty pages
❏ Covering index
The next step is to radically change the IO pattern
Galaxy confidential
Data Structuring modeling
16
INDEXES MAINTENANCE
NO INDEXES
COLUMN STORE
TTREE BTREE FRACTAL TREE
STORE NDB
InnoDB - MyISAM
ZFS
TokuDB
LevelDB
Cassandra
Hbase
InfiniDB
Vertica
MEMORY
WRITE
+++++
++++ +++ +++++ +++++
READ 99% ++ + ++++ ++++++
READ 1% +++++ ++++ +++ ------- ------
DISK
WRITE
BTREE
- +++ ++++ +++++
READ 99% - + ++++ +++++
READ 1% + +++ ----- -
Galaxy confidential
INDEXES MAINTENANCE
NO INDEXES
COLUMN STORE
TTREE BTREE FRACTAL TREE
NDB
InnoDB - MyISAM
ZFS
TokuDB
LevelDB
Cassandra
Hbase
InfiniDB
Average Compression Rate
NA 1/2 1/6 1/3 1/12
IO Size
NA 4K to 64K
Variable base on
compression & Depth
64M 8M To 64M
READ Disk Access Model
NA O(Log(N)/ Log(B)) ~O(Log(N)/ Log(B)) O(N/B )
O(N/B - B
Elimination)
WRITE Disk Access Model
NA O(Log(N)/ Log(B)) ~O(Log(N)/B) O(1/B ) O(1/B)
Data Structure for big data
17
Galaxy confidential
Top 10 Alexa’s PETA Bytes store is InnoDB
18
Top Alexa
InnoDB
Galaxy
TokuDB
❏ DBA to setup Insert buffer + Dirty pages
❏ Admins to monitor IO
❏ Admins to increase # nodes
❏ Use flash & hybride storage
❏ DBAs to partition and shard
❏ DBAs to organize maintenance
❏ DBAs to set covering and clustering
indexes
❏ Zipf read distribution
❏ Concurrent by design
❏ Remove fragmentation
❏ Constant insert rate regardless
memory/disk ratio
❏ High compression rate
❏ No control over client architecture
❏ All indexes can be clustered
Galaxy confidential
19
1/5 Compression on 6 Billion Rows
Key point for 200 Billion records
Galaxy confidential
20
2 times slower insert time vs. InnoDB
2.5 times faster insert vs. InnoDB compressed
Key point for 200 Billion records
Galaxy confidential
21
❏ Disk IOps on InnoDB was bottleneck,
despite partitioning
❏ Moving to TokuDB, move bottleneck to
CPU for compression
❏ So how to increase performance more?
Sharding!!
Galaxy take away for 200 Billion records
Galaxy confidential
22
INDEXES MAINTENANCE NO INDEXES
COLUMN STORE
TTREE BTREE FRACTAL TREE
NDB
InnoDB
MyISAM
ZFS
TokuDB
LevelDB
Cassandra
Hbase
InfiniDB
Vetica
CLUSTERING
Native
Manual, Spider,
Vitess, Fabric,
Shardquery
Manual, Spider,
Vitess, Fabric,
Shardquery
Native Native
# OF NODES
+++++ +++ ++ +++++ +
Sharding to fix CPU Bottleneck
Galaxy confidential
23
NO DATA IS STORED IN SPIDER NODES
Spider… it’s a MED storage engine
Galaxy confidential
24
Preserve data consistency
between shards
Allow shard replica
Enable joining
between shards
ha_spider.cc SEMI TRX
Galaxy confidential
Spider - A Sharding + HA solution
25
Galaxy confidential
Implemented architecture
26
SUMMARY
universal tables
RAW
Sharded tables
DATA NODE #1
COMPUTE NODE #1
…
DATA NODE #2 DATA NODE #3 DATA NODE #4
HEAD NODE COMPUTE NODE #2
…
•SPIDER
•NO DATA
•MONITORING
•TOKUDB
•COMPRESSED DATA
•PARTITIONS
Delay current
insertion
Replay insertion with
new shard key
1/4
OR
1/2
1/4
OR
1/2
1/4
OR
1/2
1/4
OR
1/2
Galaxy confidential
Re-sharding without data copy
27
Spider table L1.1
Node 01
Node 02
Spider table L1.2
Node 01
Node 02
Node 03
Node 04
Spider table L2
CURRENT
Toku table
P#Week 01
P#Week 02
Spider table L2
BEFORE
AFTER
Toku table
P#Week 01
P#Week 02
Toku table
P#Week 03
P#Week 04
Toku table
P#Week 03
P#Week 04
Toku table
P#Week 03
P#Week 04
Toku table
P#Week 03
P#Week 04
Partition by date (e.g. daily) Shard by node modulo Shard by date range
Galaxy confidential
Proven Performance
28
Galaxy has deployed its big data solution at a major test subcontractor in Asia
with the following performance:
• Peak data insertion rate : 2 TB of STDF data per day
• Data compression of raw data : 60-80 %
• DB retention of raw data : 3 months
• DB retention of summary data : 1 year
• Archiving of test data : Automatic
• Target was 2MB/sec, we get about 10MB/sec
• Since 17th June, steady production :
– Constant insertion speed
– 1400 files/day, 120 GB/day
– ft_ptest_results: 92 billion rows / 1.5 TB across 4 nodes
– ft_mptest_results: 14 billion rows / 266 GB acroos 4 nodes
– wt_ptest_results: 9 billion rows / 153 GB across 4 nodes
– 50TB available volume, total DB size is 8TB across all 4 nodes
• 7 servers (22k$) + SAN ($$$) OR DAS (15k$)
Galaxy confidential
File count inserted per day
29
• Integration issues up to May 7
• Raw & Summary-only data insertion up to May 18
• Raw & Summary data insertion, Problem solving, fine tuning up to June 16
• Steady production insertion of Raw & Summary data since June 17
Galaxy confidential
File count and data size per day
30
• Up to 2TB inserted per day
• Up to 20k files per day
Galaxy confidential
Raw data insertion duration over file size
(each colored series is 1 day)
31
Consistant insertion performance
Galaxy confidential
What’s next?
32
• Make Yield-Man more SPIDER-aware:
– Integrated scale-out (add compute/data nodes)
– Native database schema upgrade on compute/data nodes
• Add more monitoring capability to monitor SPIDER events (node
failure, table desynchronization across nodes…)
• Automate recover after failures/issues, today:
– Manual script to detect de-synchronization
– PT table sync from Percona to manually re-sync
– Manual script to reintroduce table nodes in the cluster
IN SPIDER 2014 ROADMAP
Thank you!!
Q&A?
33

More Related Content

What's hot

MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
MariaDB plc
 
Advanced Sharding Techniques with Spider (MUC2010)
Advanced Sharding Techniques with Spider (MUC2010)Advanced Sharding Techniques with Spider (MUC2010)
Advanced Sharding Techniques with Spider (MUC2010)
Kentoku
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
MariaDB plc
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting started
MariaDB plc
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
MariaDB plc
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
Mydbops
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
Juhong Park
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
Maven Logix
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
Amazon Web Services
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at Scale
Databricks
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
Navid Kalaei
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
Databricks
 

What's hot (20)

MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
Advanced Sharding Techniques with Spider (MUC2010)
Advanced Sharding Techniques with Spider (MUC2010)Advanced Sharding Techniques with Spider (MUC2010)
Advanced Sharding Techniques with Spider (MUC2010)
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting started
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at Scale
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 

Viewers also liked

CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
MariaDB Corporation
 
hs_spider_hs_something_20110906
hs_spider_hs_something_20110906hs_spider_hs_something_20110906
hs_spider_hs_something_20110906
Kentoku
 
High Performance Drupal with MariaDB
High Performance Drupal with MariaDBHigh Performance Drupal with MariaDB
High Performance Drupal with MariaDB
MariaDB Corporation
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
Tim Callaghan
 
Presentation mariaDB 10 and fork
Presentation mariaDB 10 and forkPresentation mariaDB 10 and fork
Presentation mariaDB 10 and fork
LEQUOY Aurélien
 
Mariadb mysql avancé
Mariadb mysql avancéMariadb mysql avancé
Mariadb mysql avancé
Pierre Mavro
 

Viewers also liked (6)

CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
CCM Escape Case Study - SkySQL Paris Meetup 17.12.2013
 
hs_spider_hs_something_20110906
hs_spider_hs_something_20110906hs_spider_hs_something_20110906
hs_spider_hs_something_20110906
 
High Performance Drupal with MariaDB
High Performance Drupal with MariaDBHigh Performance Drupal with MariaDB
High Performance Drupal with MariaDB
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Presentation mariaDB 10 and fork
Presentation mariaDB 10 and forkPresentation mariaDB 10 and fork
Presentation mariaDB 10 and fork
 
Mariadb mysql avancé
Mariadb mysql avancéMariadb mysql avancé
Mariadb mysql avancé
 

Similar to Galaxy Big Data with MariaDB

Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Stuart Pook
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red_Hat_Storage
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Welcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storageWelcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storage
BOSTON Server & Storage Solutions GmbH
 
Seagate – Next Level Storage (Webinar mit Boston Server & Storage, 2018 09-28)
Seagate – Next Level Storage (Webinar mit Boston Server & Storage,  2018 09-28)Seagate – Next Level Storage (Webinar mit Boston Server & Storage,  2018 09-28)
Seagate – Next Level Storage (Webinar mit Boston Server & Storage, 2018 09-28)
BOSTON Server & Storage Solutions GmbH
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
Prabhat gangwar
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
Amazon Web Services
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 

Similar to Galaxy Big Data with MariaDB (20)

Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Welcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storageWelcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storage
 
Seagate – Next Level Storage (Webinar mit Boston Server & Storage, 2018 09-28)
Seagate – Next Level Storage (Webinar mit Boston Server & Storage,  2018 09-28)Seagate – Next Level Storage (Webinar mit Boston Server & Storage,  2018 09-28)
Seagate – Next Level Storage (Webinar mit Boston Server & Storage, 2018 09-28)
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 

More from MariaDB Corporation

Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
MariaDB Corporation
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
MariaDB Corporation
 
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin FrankfurtMariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB Corporation
 
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
MariaDB Corporation
 
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
MariaDB Corporation
 
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
MariaDB Corporation
 
The New MariaDB Offering: MariaDB 10, MaxScale and More
The New MariaDB Offering: MariaDB 10, MaxScale and MoreThe New MariaDB Offering: MariaDB 10, MaxScale and More
The New MariaDB Offering: MariaDB 10, MaxScale and More
MariaDB Corporation
 
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 ParisMaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MariaDB Corporation
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Corporation
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Corporation
 
Automatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
Automatisation et Gestion de Cluster de Bases de Données MariaDB RoadshowAutomatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
Automatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
MariaDB Corporation
 
Automation and Management of Database Clusters MariaDB Roadshow 2014
Automation and Management of Database Clusters MariaDB Roadshow 2014Automation and Management of Database Clusters MariaDB Roadshow 2014
Automation and Management of Database Clusters MariaDB Roadshow 2014
MariaDB Corporation
 
Automation and Management of Database Clusters
Automation and Management of Database ClustersAutomation and Management of Database Clusters
Automation and Management of Database Clusters
MariaDB Corporation
 
The New MariaDB Offering - MariaDB 10, MaxScale and more
The New MariaDB Offering - MariaDB 10, MaxScale and moreThe New MariaDB Offering - MariaDB 10, MaxScale and more
The New MariaDB Offering - MariaDB 10, MaxScale and more
MariaDB Corporation
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
MariaDB Corporation
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
 
MariaDB 10 and Beyond
MariaDB 10 and BeyondMariaDB 10 and Beyond
MariaDB 10 and Beyond
MariaDB Corporation
 
MaxScale - the pluggable router
MaxScale - the pluggable routerMaxScale - the pluggable router
MaxScale - the pluggable router
MariaDB Corporation
 
Galera cluster - SkySQL Paris Meetup 17.12.2013
Galera cluster - SkySQL Paris Meetup 17.12.2013Galera cluster - SkySQL Paris Meetup 17.12.2013
Galera cluster - SkySQL Paris Meetup 17.12.2013
MariaDB Corporation
 

More from MariaDB Corporation (20)

Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
 
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin FrankfurtMariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
 
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
Skalierbarkeit mit MariaDB und MaxScale - MariaDB Roadshow Summer 2014 Hambur...
 
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
 
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
Automatisierung & Verwaltung von Datenbank - Clustern mit Severalnines - Mari...
 
The New MariaDB Offering: MariaDB 10, MaxScale and More
The New MariaDB Offering: MariaDB 10, MaxScale and MoreThe New MariaDB Offering: MariaDB 10, MaxScale and More
The New MariaDB Offering: MariaDB 10, MaxScale and More
 
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 ParisMaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
MaxScale - The Pluggibale Router MariaDB Roadshow 2014 Paris
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
 
Automatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
Automatisation et Gestion de Cluster de Bases de Données MariaDB RoadshowAutomatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
Automatisation et Gestion de Cluster de Bases de Données MariaDB Roadshow
 
Automation and Management of Database Clusters MariaDB Roadshow 2014
Automation and Management of Database Clusters MariaDB Roadshow 2014Automation and Management of Database Clusters MariaDB Roadshow 2014
Automation and Management of Database Clusters MariaDB Roadshow 2014
 
Automation and Management of Database Clusters
Automation and Management of Database ClustersAutomation and Management of Database Clusters
Automation and Management of Database Clusters
 
The New MariaDB Offering - MariaDB 10, MaxScale and more
The New MariaDB Offering - MariaDB 10, MaxScale and moreThe New MariaDB Offering - MariaDB 10, MaxScale and more
The New MariaDB Offering - MariaDB 10, MaxScale and more
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
 
MariaDB 10 and Beyond
MariaDB 10 and BeyondMariaDB 10 and Beyond
MariaDB 10 and Beyond
 
MaxScale - the pluggable router
MaxScale - the pluggable routerMaxScale - the pluggable router
MaxScale - the pluggable router
 
Galera cluster - SkySQL Paris Meetup 17.12.2013
Galera cluster - SkySQL Paris Meetup 17.12.2013Galera cluster - SkySQL Paris Meetup 17.12.2013
Galera cluster - SkySQL Paris Meetup 17.12.2013
 

Recently uploaded

Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

Galaxy Big Data with MariaDB

  • 1. 1 Galaxy Semiconductor Intelligence Case Study: Big Data with MariaDB 10 Bernard Garros, Sandrine Chirokoff, Stéphane Varoqui
  • 2. Galaxy confidential Galaxy Big Data scalability Menu • About Galaxy Semiconductor (BG) • The big data challenge (BG) • Scalable, fail-safe architecture for big data (BG) • MariaDB challenges: compression (SV) • MariaDB challenges: sharding (SC) • Results (BG) • Next Steps (BG) • Q&A 2
  • 3. Galaxy confidential About Galaxy Semiconductor • A software company dedicated to semiconductor:  Quality improvement  Yield enhancement  NPI acceleration  Test cell OEE optimization • Founded in 1988 • Track record of building products that offer the best user experience + premier customer support • Products used by 3500+ users and all major ATE companies 3 via SEMICONDUCTOR INTELLIGENCE
  • 4. Galaxy confidential 4 Galaxy Teo, Ireland HQ, G&A Galaxy East Sales, Marketing, Apps Galaxy France R&D, QA, & Apps Partner Taiwan Sales & Apps Partner Israel Sales Partner Singapore Sales & Apps Galaxy West Sales, Apps Partner Japan Sales & Apps Partner China Sales & Apps Worldwide Presence
  • 5. Galaxy confidential Test Data production / consumption 5 ATE Test Data Files ETL, Data Cleansing Yield-Man Data Cube(s) ETL Galaxy TDR Examinator-Pro Browser-based dashboards Custom Agents Data Mining OEE Alarms PAT Automated Agents SYA
  • 6. Galaxy confidential Growing volumes 6 MB GEX STDF STDF STDF GB/TB GEX, Dashboard, Monitoring TDR YM STDF STDF STDF TB/PB GEX, Dashboard, Monitoring TDR YM STDF STDF STDF
  • 7. Galaxy confidential Big Data, Big Problem • More data can produce more knowledge and higher profits • Modern systems make it easy to generate more data • The problem is how to create a hardware and software platform that can make full and effective use of all this data as it continues to grow • Galaxy has the expertise to guide you to a solution for this big data problem that includes: – Real-time data streams – High data insertion rates – Scalable database to extreme data volumes – Automatic compensation for server failures – Use of inexpensive, commodity servers – Load balancing 7
  • 8. Galaxy confidential First-level solutions • Partitioning – SUMMARY data • High level reports • 10% of the volume • Must be persistent for a long period (years) – RAW data • Detailed data inspection • 90% of the volume • Must be persistent for a short period (months) • PURGE – Partitioning per date (e.g. daily) on RAW data tables – Instant purge by drop partitions • Parallel insertion 8 Yield-Man Yield-Man Yield-Man
  • 9. Galaxy confidential New customer use case 9 • Solution needs to be easily setup • Solution needs to handle large (~50TB+) data • Need to handle large insertion speed of approximately 2 MB/sec Solutions • Solution 1: Single scale-up node (lots of RAM, lots of CPU, expensive high-speed SSD storage, single point of failure, not scalable, heavy for replication) • Solution 2: Cluster of commodity nodes (see later)
  • 10. Galaxy confidential Cluster of Nodes Other customer applications and systems Other Test Data Files Event Data Stream ATE config & maintenance events Real-time Tester Status Test Floor Data Sources STDF Data Files . . . RESTful API RESTful API Test Hardware Management System MES Galaxy Cluster of Commodity Servers DB Node DB Node DB Node DB Node Compute Node Compute Node Head Node Dashboard Node Yield-Man PAT-Man Yield-Man PAT-Man Real-Time Interface Test Data Stream 10
  • 11. Galaxy confidential Easy Scalability Other customer applications and systems Other Test Data Files Event Data Stream ATE config & maintenance events Real-time Tester Status Test Floor Data Sources STDF Data Files . . . RESTful API Test Hardware Management System MES Galaxy Cluster of Commodity Servers DB Node DB Node DB Node DB Node Compute Node Compute Node Head Node Dashboard Node Yield-Man PAT-Man Yield-Man PAT-Man Real-Time Interface Test Data Stream DB Node DB Node Compute Node RESTful API 11
  • 12. Galaxy confidential MariaDB challenges 12 ❏ From a single box to elastic architecture ❏ Reducing the TCO ❏ OEM solution ❏ Minimizing the impact on existing code ❏ Reach 200B records
  • 13. Galaxy confidential A classic case 13 SENSOR SENSOR SENSOR SENSOR SENSOR STORE QUERY QUERY QUERY QUERY QUERY ❏ Millions of records/s sorted by timeline ❏ Data is queried in other order ❏ Indexes don’t fit into main memory ❏ Disk IOps become bottleneck
  • 14. Galaxy confidential B-tree gotcha 14 2ms disk or network latency, 100 head seeks/s, 2 options: ❏ Increase concurrency ❏ Increase packet size Increased both long time ago using innodb_write_io_threads , innodb_io_capacity, bulk load
  • 15. Galaxy confidential B-tree gotcha 15 With a Billion records, a single partition B-tree stops staying in main memory, a single write produces read IOps to traverse the tree: ❏ Use partitioning ❏ Insert in primary key order ❏ Big redo log and smaller amount of dirty pages ❏ Covering index The next step is to radically change the IO pattern
  • 16. Galaxy confidential Data Structuring modeling 16 INDEXES MAINTENANCE NO INDEXES COLUMN STORE TTREE BTREE FRACTAL TREE STORE NDB InnoDB - MyISAM ZFS TokuDB LevelDB Cassandra Hbase InfiniDB Vertica MEMORY WRITE +++++ ++++ +++ +++++ +++++ READ 99% ++ + ++++ ++++++ READ 1% +++++ ++++ +++ ------- ------ DISK WRITE BTREE - +++ ++++ +++++ READ 99% - + ++++ +++++ READ 1% + +++ ----- -
  • 17. Galaxy confidential INDEXES MAINTENANCE NO INDEXES COLUMN STORE TTREE BTREE FRACTAL TREE NDB InnoDB - MyISAM ZFS TokuDB LevelDB Cassandra Hbase InfiniDB Average Compression Rate NA 1/2 1/6 1/3 1/12 IO Size NA 4K to 64K Variable base on compression & Depth 64M 8M To 64M READ Disk Access Model NA O(Log(N)/ Log(B)) ~O(Log(N)/ Log(B)) O(N/B ) O(N/B - B Elimination) WRITE Disk Access Model NA O(Log(N)/ Log(B)) ~O(Log(N)/B) O(1/B ) O(1/B) Data Structure for big data 17
  • 18. Galaxy confidential Top 10 Alexa’s PETA Bytes store is InnoDB 18 Top Alexa InnoDB Galaxy TokuDB ❏ DBA to setup Insert buffer + Dirty pages ❏ Admins to monitor IO ❏ Admins to increase # nodes ❏ Use flash & hybride storage ❏ DBAs to partition and shard ❏ DBAs to organize maintenance ❏ DBAs to set covering and clustering indexes ❏ Zipf read distribution ❏ Concurrent by design ❏ Remove fragmentation ❏ Constant insert rate regardless memory/disk ratio ❏ High compression rate ❏ No control over client architecture ❏ All indexes can be clustered
  • 19. Galaxy confidential 19 1/5 Compression on 6 Billion Rows Key point for 200 Billion records
  • 20. Galaxy confidential 20 2 times slower insert time vs. InnoDB 2.5 times faster insert vs. InnoDB compressed Key point for 200 Billion records
  • 21. Galaxy confidential 21 ❏ Disk IOps on InnoDB was bottleneck, despite partitioning ❏ Moving to TokuDB, move bottleneck to CPU for compression ❏ So how to increase performance more? Sharding!! Galaxy take away for 200 Billion records
  • 22. Galaxy confidential 22 INDEXES MAINTENANCE NO INDEXES COLUMN STORE TTREE BTREE FRACTAL TREE NDB InnoDB MyISAM ZFS TokuDB LevelDB Cassandra Hbase InfiniDB Vetica CLUSTERING Native Manual, Spider, Vitess, Fabric, Shardquery Manual, Spider, Vitess, Fabric, Shardquery Native Native # OF NODES +++++ +++ ++ +++++ + Sharding to fix CPU Bottleneck
  • 23. Galaxy confidential 23 NO DATA IS STORED IN SPIDER NODES Spider… it’s a MED storage engine
  • 24. Galaxy confidential 24 Preserve data consistency between shards Allow shard replica Enable joining between shards ha_spider.cc SEMI TRX
  • 25. Galaxy confidential Spider - A Sharding + HA solution 25
  • 26. Galaxy confidential Implemented architecture 26 SUMMARY universal tables RAW Sharded tables DATA NODE #1 COMPUTE NODE #1 … DATA NODE #2 DATA NODE #3 DATA NODE #4 HEAD NODE COMPUTE NODE #2 … •SPIDER •NO DATA •MONITORING •TOKUDB •COMPRESSED DATA •PARTITIONS Delay current insertion Replay insertion with new shard key 1/4 OR 1/2 1/4 OR 1/2 1/4 OR 1/2 1/4 OR 1/2
  • 27. Galaxy confidential Re-sharding without data copy 27 Spider table L1.1 Node 01 Node 02 Spider table L1.2 Node 01 Node 02 Node 03 Node 04 Spider table L2 CURRENT Toku table P#Week 01 P#Week 02 Spider table L2 BEFORE AFTER Toku table P#Week 01 P#Week 02 Toku table P#Week 03 P#Week 04 Toku table P#Week 03 P#Week 04 Toku table P#Week 03 P#Week 04 Toku table P#Week 03 P#Week 04 Partition by date (e.g. daily) Shard by node modulo Shard by date range
  • 28. Galaxy confidential Proven Performance 28 Galaxy has deployed its big data solution at a major test subcontractor in Asia with the following performance: • Peak data insertion rate : 2 TB of STDF data per day • Data compression of raw data : 60-80 % • DB retention of raw data : 3 months • DB retention of summary data : 1 year • Archiving of test data : Automatic • Target was 2MB/sec, we get about 10MB/sec • Since 17th June, steady production : – Constant insertion speed – 1400 files/day, 120 GB/day – ft_ptest_results: 92 billion rows / 1.5 TB across 4 nodes – ft_mptest_results: 14 billion rows / 266 GB acroos 4 nodes – wt_ptest_results: 9 billion rows / 153 GB across 4 nodes – 50TB available volume, total DB size is 8TB across all 4 nodes • 7 servers (22k$) + SAN ($$$) OR DAS (15k$)
  • 29. Galaxy confidential File count inserted per day 29 • Integration issues up to May 7 • Raw & Summary-only data insertion up to May 18 • Raw & Summary data insertion, Problem solving, fine tuning up to June 16 • Steady production insertion of Raw & Summary data since June 17
  • 30. Galaxy confidential File count and data size per day 30 • Up to 2TB inserted per day • Up to 20k files per day
  • 31. Galaxy confidential Raw data insertion duration over file size (each colored series is 1 day) 31 Consistant insertion performance
  • 32. Galaxy confidential What’s next? 32 • Make Yield-Man more SPIDER-aware: – Integrated scale-out (add compute/data nodes) – Native database schema upgrade on compute/data nodes • Add more monitoring capability to monitor SPIDER events (node failure, table desynchronization across nodes…) • Automate recover after failures/issues, today: – Manual script to detect de-synchronization – PT table sync from Percona to manually re-sync – Manual script to reintroduce table nodes in the cluster IN SPIDER 2014 ROADMAP