MySQL vs. MonetDB

MySQL vs. MonetDB
A benchmark comparison between in-memory and out-of-memory
databases
Derek Aikins
Advisor: Dr. Feng Yu

Overview
• History of MySql.
• What is a relational database.
• History of Monetdb.
• What is a column-store database.
• Relational vs. Column-Store databases operations
• Goal of Project
• TPC-H
• Installing and compiling TPC-H
• Generating data using TPC-H
• Generating queries using TPC-H
• Queries
• Results of Tests
• Problems Encountered
• Conclusion

History of MySQL
• The world’s most popular open source relational database.
• Leading database choice for web-based applications and used by high
profile web properties including Facebook, Twitter, and Youtube.
• Created by a Swedish company, MySQL AB, originally developed by
David Axmark and Micheal Widenius in 1994.
• First Version release on May 23, 1995.
• MySQL AB was acquired by Sun Microsystems in 2008
• Oracle acquired Sun Microsystems on January 27, 2010

What is a Relational Database?
• MySql is a relational database
• A relational database is a digital database that organizes data into one
or more tables of columns and rows.
• Tables are known as relations
• Each table represents one “entity type” such as customer or product
• Rows (records) represent instances of that type of entity such as
“Lee” or “chair”
• Columns represent values to that instance such as address or price.

Examples of a database
Region Table
R_REGIONKEY R_NAME R_COMMENT
0 AFRICA
lar deposits. blithely final packages cajole. regular waters are
final requests. regular accounts are
1 AMERICA hs use ironic, even requests. s
2 ASIA ges. thinly even pinto beans ca
3 EUROPE ly final courts cajole furiously final excuse
4 MIDDLE EAST
uickly special accounts cajole carefully blithely close requests.
carefully final asymptotes
Nation Table
N_NATION
KEY
N_NAME N_REGIONKE
Y
N_COMMENT
0 ALGERIA 0 haggle. carefully final
1 ARGINTINA 1 al foxes promise slyly
2 BRAZIL 1 y alongside of the pending
3 CANADA 1 eas hang ironic,
4 EGYPT 4 y above the carefully

History of Monetdb
• An open source column-store database
• Developed in the Netherlands at the Centrum Wiskunde & Informatica (CWI)
• Data mining project in the 1990s required improved database support which
resulted in a CWI spin-off called Data Distilleries, which used early
implementations in its analytical suite
• Data Distilleries became a subsidiary of SPSS in 2003, which was later acquired
by IBM in 2009
• MonetDB in its current form was first created by Peter A. Boncz and Martin L.
Kerstein at the University of Amsterdam
• Please refer to Dr. Boncz’s thesis for more details
Monet: a next-Generation DBMS Kernel For Query-Intensive Applications
• The first version was released on September 30, 2004

What is a Column-Store Database
• Column-store databases store data as columns rather than rows
• By storing data in columns rather than rows, the database can access
the data it needs to answer a query rather than scanning and
discarding unwanted data in rows
• Query performance is often increased as a result, particularly in very
large data sets
• In most cases Column-Store databases store data in-memory (RAM)
unlike most row based databases that store their data on the
harddrive

Relational vs. Column-Store database
operation
Cust_ID Name Address City State Zip code Area
Code
Phone # Rent/Own Annual
Income
1 Jack 12 A St. Howland OH 44481 330 369-3597 Rent 74,000
2 Brian 13 B St. Howland OH 44481 330 856-1534 Rent 58,000
3 Mike 8 K St Warren OH 44483 330 373-1215 Own 92,000
4 Anna 62 Main St. Sharon PA 16101 724 654-0893 Own 110,000
5 Tasha 546 1st St. Stow OH 44752 216 849-5775 Rent 52,000
6 Sidney 84 Third St. Gilbert AZ 76534 480 758-6549 Own 90,000
7 Tyler 846 Wick Rd. Las Vegas NV 65487 231 654-5473 Own 60,000
8 Aaron 213 Maple St. Daytona FL 32547 519 159-3425 Rent 66,000
9 Beth 8749 Trump St. Detriot MI 87945 375 325-1849 Own 50,000

Goal of this Project
• Take a standard dataset and a standard set of queries and run the test
on two different databases, MySQL and Monetdb.
• By doing so, I intend to demonstrate the efficiency and speed that a
column-store database has over a traditional relational database.
• To do this I will be using a data generator, TPC-H, for benchmarking
databases which can also generate the queries for the data.
• Then push all data generated by TPC-H into both databases and run
each queries multiple times to get average times on both databases.
• After all runs are complete, I will gather all results generated and
compare how the two databases performed.

TPC-H
• A decision support benchmark
• Consists of a suite of business oriented ad-hoc queries and
concurrent data modifications
• The queries and the data populating the database have been chosen
to have broad industry-wide relevance
• This benchmark illustrates decision support systems that examine
large volumes of data, execute queries with a high degree of
complexity, and give answers to critical business questions

Installing and Compiling TPC-H
• The program used to generate the data from TPC-H is called dbgen.
• To install dbgen first I need to download the file from the TPC-H site using the
following command cd Downloads/tpch_2_16_0/tpch_2_15_0/dbgen/
• Then I had to create a makefile and go in and change the lines to CC = gcc
DATABASE=SQLSERVER Machine=LINUX
Workload=TPCH
• Next in the dbgen folder I had to find the tpcd.h file and edit the lines
define START_TRAN "BEGIN WORK;“ define
END_TRAN "COMMIT WORK;“
• Then I ran the make command.

Generating the data- 100 Mb
• After installing and setup TPC-H, I generated the data using dbgen
• Using the command ./dbgen -s 0.1 to generate the data where the 0.1 in the
command dictates the amount of data to be generated in this case I use 100 Mb
• Once the data was generated I created the database in MySQL using the CREATE
DATABASE tpch; command and then chose the database to load the data in the
tables.
• I then created each table with the CREATE TABLE command and set all the
descriptions for each column
• Once the tables were created it was time to load the data into each table using
the LOAD DATA LOCAL INFILE 'customer.tbl' INTO TABLE CUSTOMER FIELDS
TERMINATED BY '|‘ and changing the table name to each table

Query 1
mysql> select
s_acctbal,
s_name,
n_name,
p_partkey, where
p_mfgr, p_partkey = ps_partkey
s_address, and s_suppkey = ps_suppkey
s_phone, and s_nationkey = n_nationkey
s_comment and n_regionkey = r_regionkey
from and r_name = 'ASIA'
part, order by
supplier, s_acctbal desc,
partsupp, n_name,
nation, s_name
region p_partkey:
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 19
and p_type like 'PROMO ANODIZED BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
)

Query 2
select
-> l_orderkey,
-> sum(l_extendedprice * (1 - l_discount)) as revenue,
-> o_orderdate,
-> o_shippriority
-> from
-> customer,
-> orders,
-> lineitem
-> where
-> c_mktsegment = 'AUTOMOBILE'
-> and c_custkey = o_custkey
-> and l_orderkey = o_orderkey
-> and o_orderdate < date '1995-08-15'
-> and l_shipdate > date '1995-08-27'
-> group by
-> l_orderkey,
-> o_orderdate,
-> o_shippriority
-> order by
-> revenue desc,
-> o_orderdate;

Query 3
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1996-10-29'
and o_orderdate < date '1996-10-29' + interval '3' month
and exists (
select
*
from
lineitem
where
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
)
group by
o_orderpriority
order by
o_orderpriority;

Query 4
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1993-03-05'
and l_shipdate < date '1993-03-06' + interval '1' year
and l_discount between .03 and .06
and l_quantity < 3;

Query 5
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkeyt
and p_brand = 'Brand#11'
and p_container = 'MED JAR'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);

Results for Queries 1 & 2
Query 1 Results
26.156667 25.97666667
0.0120647
0
5
10
15
20
25
30
Query 1
TIMEINSECONDS
Chart Title
MySQL (InnoDB) MySQL (MyISAM) Monetdb
Query 2 Results
645.31667 652.2966667
0.0201565
0
100
200
300
400
500
600
700
Query 2
TIMEINSECONDS
Chart Title

Results for Query 3 & 4
853.25
2086.01
0.014732667
0
500
1000
1500
2000
2500
Query 3
TIMEINSECONDS
QUERY 3
0.483333333
0.29333333
0.011313667
0
0.1
0.2
0.3
0.4
0.5
0.6
Query 4
TIMEINSECONDS
QUERY 4

Result for Query 5
137.1533333
337.2766667
0.018765667
0
50
100
150
200
250
300
350
400
Query 5
TIMEINSECONDS
QUERY 5

Total Numerical Results
QUERY TIME TIME (in seconds) QUERY TIME TIME (in seconds) Query TIME TIME (in seconds)
Query 1 Query 1 Query 1
run 1 26.14 sec 26.14 run 1 25.83 sec 25.83 run 1 12.854 ms 0.012854
Average Time 26.15666667 Average Time 25.977 Average Time 0.012064667

Query 2 TIME TIME (in seconds) Query 2 TIME TIME (in seconds) Query 2 TIME TIME (in seconds)
run 1 10 min 42.03 sec 642.03 run 1 10 min 52.36 sec 652.36 run 1 23.548 ms 0.023548
Average time 645.3166667 Average time 652.2966667 Average time 0.0201565

QUERY TIME TIME (in seconds) QUERY TIME TIME (in seconds) Query TIME TIME (in seconds)
run 1 .48 sec 0.48 run 1 0.29 sec 0.29 run 1 12.992 ms 0.012992

Comparison Results
Queries MySQL
(InnoDB)
MySQL
(MyISAM)
Monetdb InnoDB/
MyISAM
InnoDB/
Monetdb
MyISAM/
Monetdb
Average time Average time Average time
Query 1 26.157 25.977 0.0121 0.0069 times faster
than InnoDB
2,168 times faster
than InnoDB
2,153 times faster
than MyISAM
Query 2 645.317 652.297 0.0201 0.0108 times slower
than InnoDB
32,015 times faster
than InnoDB
32,361 times faster
than MyISAM
than InnoDB
57,915 times faster
than InnoDB
141,591 times faster
than MyISAM
Query 4 0.0483 0.293 0.0113 1.648 times faster
than InnoDB
43 times faster than
InnoDB
MyISAM
than InnoDB
InnoDB
17,973 times faster
than MyISAM

Challenges Encountered
• Throughout this project I encountered several challenges:
• The first difficulty encountered was the installation of the several programs
used for this project
• Once all programs were installed the next challenge was the uploading of the
data to the databases
• After all data was loaded into the database tables, one of the largest
challenges was to examine each query and fill in areas that needed exact
information from the tables for the query to even run
• The largest challenge I faced through this entire project was learning to use
the command line to do everything as I have not had much experience with
this.

Summary
• A relational database is a digital database that organizes data into one
or more tables of columns and rows.
• Column-store databases store data as columns rather than rows
• TPC-H is a decision support benchmark that examine large volumes of
data, execute queries with a high degree of complexity, and give
answers to critical business questions
• As the data shows from the tests conducted on the two different
databases, column-store databases such as Monetdb are considerably
faster in run time compared to traditional relational databases such
as MySQL.

ALL Results and Test Queries can be found at:
https://github.com/Djaikins/MySQL-vs-Monetdb
or
By searching djaikins on Github

MySQL vs. MonetDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MySQL vs. MonetDB

Similar to MySQL vs. MonetDB (20)

Recently uploaded

Recently uploaded (20)

MySQL vs. MonetDB