A corrected comparison between the databases by Tyler Weatherby 2017 Spring. A benchmark is done between MySQL MyISAM engine, MySQL Memory engine, and MonetDB engine on TPC-H data. In this project, we added the index/key to important tables.
Overview
• History
• The Difference Between Tables
• Engine Background
• Goals of This Project
• TPC-H Background
• Installing TPC-H
• Main Project Issue
• Issue Resolved
• Expansion of the Original Data
• Creating Tables and Loading Data
• Query Scripts
• Graphical Interpretation of Results
• Numeric Interpretation of Results
• Breakdown and Comparison
• Challenges Encountered
• Interpretation
• Possibilities to Further Expand
• Conclusion
History
• MySQL: Developed 1994
• MySQL acquired in 2008 by Sun Microsystems then by Oracle in 2010
• MySQL has a proprietary license
• MySQL is a row store database
• MonetDB: Developed around 1996
• MonetDB is open source and cross-platform
• R and Python support (2014-Present)
• MonetDB is a column store database
The Difference Between Tables
• Row Store
• Stores data by rows like a typical table
• Uses Primary and Foreign Keys
• Primary Key: Unique identifier
• Foreign Key: Targets a Primary Key to another table
• Column Store
• Stores data within the columns instead of rows
• Only affected columns need to be read when queried
Engine Background
• MySQL: MYISAM: Stored on disk in three files
• Row store and the default engine
• MySQL: Memory: Contents loaded into memory
• Row store
• Vulnerable to crashes, hardware issues, and power loss
• MonetDB: Uses main memory for processing
• Column store
• Does not require all data be active in physical memory at once
Goals of This Project
• This project was intended to provide benchmark comparisons against
MySQL engines and MonetDB
• Expand upon current benchmarked data
• Provide fairness for an accurate interpretation
• Use TPH-C to achieve this goal and benchmark 1GB of data
TPC-H Background
• Decision support benchmark
• Useful tools to quickly generate data
• Can handle large volumes of data
• Can produce queries with great complexity
• Generates 8 tables
• Some tables have over millions of records
Installing TPC-H: Step 1
• Recommended that you make a dir to store tpc-h files
• mkdir tpch
• Download tpch files with the following command
• wget http://www.tpc.org/TPC_Documents_Current_Versions/download_programs/tools-
download-request.asp?bm_type=TPC-H&bm_vers=2.17.2&mode=CURRENT-ONLY
• Extract downloaded files from compressed format and install
• unzip TPCH_FileName.zip –tpch
Installing TPC-H: Step 2
• Create makefile before installing, this will set some parameters we
need
• CC = gcc DATABASE = ORACLE
• MACHINE = LINUX WORKLOAD = TPCH
• After we have set the proper parameters for the machine, we can
then make TPC-H by simply running the following command
• Make
• TPC-H should now be installed
Main Project Issue: Running Time Analysis
• Claim: MonetDB was as much as 141,000 times faster than MySQL
engines (InnoDB & MYISAM)
• MySQL MYISAM engine queried previous data with times ranging
from ten to thirty minutes
• Original theory was to contribute this speed to memory hierarchy
Issue Resolved: Not Memory Hierarchy
• Examination of the original data showed the neglect to follow the
benchmarks proper table schema
• Turns out that keys are useful in a database
• Old benchmarks are therefore invalid because of the failure to
provide fairness
Expansion of the Original Data
• Generated 1 GB of data using TPH-C benchmarking tools
• -s is scaled as gigabytes, so -s 0.1 would be 100 MB and -s 1 would be 1 GB
• ./dbgen -s 1
• Generated queries using TPH-C benchmarking tools
• ./qgen (random seed)
• After you’ve generated the data and queries, you can begin to focus
on the database side of things
Creating Tables and Loading Data
• Tables are defined in the TPC-H Documentation, there are 8 of them
• Loading the data into a MySQL table: MySQL must be running from the same
directory as *.tbl files (wherever the user started the program)
• LOAD DATA LOCAL INFILE ‘TableName.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
• Loading the data into MonetDB tables were a bit trickier
• copy into customer from '/home/teweatherby/tpch_2_17_0/dbgen/1g/customer.tbl';
• In MonetDB you have to know your full directory name to load data to the table!
Query 1: 2.sql
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey
and p_size = 4 and p_type like '%STEEL' and s_nationkey = n_nationkey
and n_regionkey = r_regionkey and r_name = 'MIDDLE EAST‘
and ps_supplycost = (select min(ps_supplycost)
from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey and n_regionkey = r_regionkey
and r_name = 'MIDDLE EAST') order by s_acctbal desc, n_name, s_name, p_partkey;
Note: Spacing reduced to preserve readability
Query 2: 3.sql
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate, o_shippriority from customer, orders, lineitem
where c_mktsegment = 'AUTOMOBILE' and c_custkey = o_custkey
and l_orderkey = o_orderkey and o_orderdate < date '1995-03-27' and
l_shipdate > date '1995-03-27‘ group by l_orderkey, o_orderdate,
o_shippriority order by revenue desc, o_orderdate;
Note: Spacing reduced to preserve readability
Query 3: 18.sql
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice,
sum(l_quantity) from customer, orders, lineitem
where o_orderkey in (select l_orderkey from lineitem
group by l_orderkey having sum(l_quantity) > 315)
and c_custkey = o_custkey and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice order by o_totalprice desc, o_orderdate;
Note: Spacing reduced to preserve readability
Results: Query 1: Three Trials Each
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
MYISAM (MySQL) Memory (MySQL) MonetDB
Time(ms)
Query 1 Results
Trial 1 Trial 2 Trial 3
Time is listed in milliseconds
Results: Query 2: Three Trials Each
0
1000
2000
3000
4000
5000
6000
7000
MYISAM (MySQL) Memory (MySQL) MonetDB
Time(ms)
Query 2 Results
Trial 1 Trial 2 Trial 3
Time is listed in milliseconds
Results: Query 3: Three Trials Each
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
MYISAM (MySQL) Memory (MySQL) MonetDB
Time(ms)
Query 3 Results
Trial 1 Trial 2 Trial 3
Time is listed in milliseconds
Total Results: Query 1
Time in milliseconds MySQL (MYISAM) MySQL (Memory) MonetDB
Trial 1 4930 1060 48
Trial 2 4950 1090 51
Trial 3 4970 1080 62
Average Running Time: 4950 1077 54
Total Results: Query 2
Time in milliseconds MySQL (MYISAM) MySQL (Memory) MonetDB
Trial 1 6040 1810 138
Trial 2 6070 1820 146
Trial 3 6020 1800 136
Average Running Time: 6043 1810 140
Total Results: Query 3
Time in milliseconds MySQL (MYISAM) MySQL (Memory) MonetDB
Trial 1 8440 5600 231
Trial 2 8410 5530 209
Trial 3 8420 5540 205
Average Running Time: 8423 5557 215
Breakdown and Comparison
Query MySQL
(MYISAM)
MySQL
(Memory)
MonetDB MYISAM/
Memory
MYISAM/
MonetDB
Memory/
MonetDB
Query 1 4950 1077 54 4.60 times
faster than
MYISAM
91.7 times
faster than
MYISAM
19.94 times
faster than
Memory
Query 2 6043 1810 140 3.34 times
faster than
MYISAM
43.16 times
faster than
MYISAM
12.93 times
faster than
Memory
Query 3 8423 5557 215 1.52 times
faster than
MYISAM
39.18 times
faster than
MYISAM
25.85 times
faster than
Memory
Note: When we say “… times faster than Memory” we are referring to a MySQL Engine
Time is listed in milliseconds
Challenges Encountered
• Learning Ubuntu command lines and proficiently manipulating the
environment
• Had to increase MySQL’s maximum memory storage to store 1GB of
data in memory. Otherwise table full error.
• SET GLOBAL tmp_table_size = 1024 * 1024 * 1024 * 2; SET GLOBAL
max_heap_table_size = 1024 * 1024 * 1024 * 2
• MonetDB administrative structure
Interpretation
• Certainly, MySQL Memory is faster than MySQL MYISAM
• MonetDB does have a faster time over MySQL MYISAM engines
• MonetDB seems to be faster than MySQL Memory Engines
• Keys are useful for databases!!!
• Is MonetDB better?
Possibilities to Further Expand
• Only compared for querying, how would they perform for
modification?
• Is MonetDB simpler? Easier to understand?
• System resource limitation (memory)
• Other databases (Cassandra)
Conclusion
• Keys in a database matter
• MonetDB seems to have and edge on MySQL’s Memory Engine
• MonetDB certainly has an advantage on MySQL’s MYISAM Engine
• There are opportunities to further expand on this examination
Editor's Notes
Aikin’s original data had shown that MonetDB was benchmarking anywhere from 10,000 times faster to 141,000 times faster than MySQL Engines he tested on.
Speculated on the possibility that the benchmarking software was just doing something weird, because it’s a benchmark. However, MySQL databases have been around for a while and they’re used in enterprise systems, so this couldn’t be a real benchmark.
Aikin’s had forgotten the primary and foreign keys… Invalidating most of his project.
Mention the original data of 100MB I did not think was sufficient to benchmark again.
Note the huge difference from Aikin’s original 141,000 times and 32,000 times faster under MYISAM category.