Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems

Strudel: Framework for
Transaction Performance
Analyses on
SQL/NoSQL Systems
JunichiTatemura Oliver Po
Zheng Li Hakan Hacigumus
NEC Labs America
Cupertino, CA, USA
EDBT 2016 @ Bordeaux, France

Outline
We developed a framework of performance
analyses on transactions over SQL/NoSQL
systems
• Motivation
• Architecture
• Implementations
• Demo (use-cases)
© https://www.flickr.com/photos/omefrans/
Now released as OSS called
“Strudel”
https://github.com/tatemura/strudel

Motivation
“SQL or NoSQL” Problem (OLTP)
• NoSQL has evolved with so many varieties
• There are also additional components (transaction
servers, indexing add-ons, query language layers…)
• What is my best choice? Is SQL still good?
OLTP applications

Motivation
Vendors and Researchers
• Vendors: “How can we tell our new product is better
than others?”
• Researchers: “How can we tell our new transaction
management technique is really effective?”
MyNoSQL
“Novel techniques
on….”

Existing Benchmarks
SQL
• Varieties of application-
level benchmarks
• Standard:TPC-C,TPC-W
• OLTP-Bench covers a lot
more OLTP use cases
 not directly applicable
to NoSQL systems
NoSQL
• YCSB is the most
popular benchmark
 it only covers micro-
benchmarking w/o
transactions
Common benchmarking platform is desirable
both for micro-level and application-level

Strudel Framework: History
We have developed and used the framework for our
research and development of transactional key value
subsystems of a product
SQL SQL SQL
Partiqle: SQL over KVS
[SIGMOD 2012 Demo]
 A product version (IERS)
We needed to study/improve
performance of key-value store
architecture for transaction
A framework of performance
evaluation tools has been
developed and used
Released as open-source
software to be used in wider
contexts
KeyValue Store

Strudel’s Approach
wrap with abstraction layers
© https://www.flickr.com/photos/iaiaross/
 apple-to-apple comparison

Strudel’s Approach
wrap with abstraction layers
EntityDB: Data access
API to cover common
features of SQL/NoSQL
systems
SessionWorkload:
Framework to separate
application logic and
data access logic

Entity DB: Cover Common Data
Access Features
• SQL systems already have standard Java API (Java
PersistenceAPI)
• Employ its subset and tailor it to fit NoSQL as well
SQLNoSQL
Entity DB
API
Java PersistenceAPI
(JPA)

In Case It Can’t Cover…
Provide an application-level framework to
decouple data access logic from application logic
Benchmark app
Data
access
Entity DB
API
SQL
specific
features
NoSQL
specific
features
SessionWorkload
Framework
pluggable

Architecture
(layers of abstractions)
© https://www.flickr.com/photos/70253321@N00/

Architecture
Transactional KVS
Implementation
JPA
Implementation
[D]TKVS
Implementations
NoSQL (HBase, MongoDB,…)
Performance Experiments andAnalyses
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[D]
Native
Impl.
[A] Benchmark application
SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer

Architecture
Components that are provided by the framework
Transactional KVS
Implementation
JPA
Implementation
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer

Architecture
Components that should be implemented for each
NoSQL system
Entity DB API
Transactional KVS
API
[D]TKVS
Implementations
[A, D]
data
access
(NoSQL)
[D]
Native
Impl.
experiments
layer
application
layer
datamanagement
layer

Architecture
Components that should be implemented by each
benchmark
Entity DB API
Java
Persistence
API (JPA)
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[A] Benchmark application
experiments
layer
application
layer
datamanagement
layer

Architecture
Components that should be implemented by each pair
of NoSQL system and benchmark
[A, D]
data
access
(NoSQL)
experiments
layer
application
layer
datamanagement
layer
Our Goal:
minimize need of such
components!

EntityDB
Transactional KVS
Implementation
JPA
Implementation
[D]TKVS
Implementations
NoSQL
[D]
Native
Impl.
SQL
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
Entity DB API

JPA vs. EntityDB
DDL DML Transac
tionSingle entity Multi-entity Query
Language
JPA Object-
Relational
Mapping
Annotations
CRUD One-to-
many
relationship,
etc.
JPQL (Java
Persistence
QL)
Full ACID
transaction
JPA (Java Persistence API): Object-Relational Mapping API
EntityDB: limitation in DML andTransaction
Entity DB Subset of
JPA +
Entity Group
annotations
CRUD Secondary
key access
N/A Entity
Group
transaction

Entity GroupTransaction
One way to represent NoSQL’s limited transaction support
• Entities are divided into disjoint sets (entity groups)
• Transactions within a single group is efficiently supported
• Transactions across multiple groups are expensive or
unsupported
Item 1
bid
Entity group
bid bid
Item 2
bid bid
Item 3
bid bid bid
T1 T2 T3
E.g.,Google Megastore, Google Cloud
Datastore

OR-MappingAnnotations
JPA Standard: @Entity, @Id, @IdClass,…
 Used in EntityDB as well

Group Key Definition
Primary Key @Id
(sellerId, itemNo, bidNo)
Group Key @GroupId
(sellerId, itemNo)

Grouping Relations
AuctionItem
Bid

Secondary Indices
JPA: Physical design – transparent from
the application
Entity DB: logically required for the application
to access entities by secondary keys

Implementations
© https://www.flickr.com/photos/dinksi/

Implementations
• JPA  trivial implementation
• HBase: Open-source version of Bigtable
• Omid:Transaction Server on HBase
• MongoDB: Document-oriented NoSQL
• TokuMX: MongoDB enhancement with multi-
statement transactions

HBase Implementation
• Use HBase’s check-and-put operation (atomic compare-and-swap)
to update a single row in an atomic manner
• Map each group into a single row
– Row ID = Group Key
– Column = Primary Key
– Cell = Entity
ROW1
ROW2
ROW3
COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
item
bid
Entity group

Omid Implementation
• Omid enables optimistic concurrency control over
multiple rows in HBase tables using multi-versioning
(timestamp)
ROW1
ROW2
ROW3
item
bid
Omid Server States for recovery
Omid
Client
commit
Put/get
Manages timestamp and
transaction states

MongoDB Implementation
• Similar to HBase: use an atomic query-and-
update operation on a single document
DOC1
DOC2
DOC3
item
bid
Entity group = one document

TokuMX Implementation
• TokuMX enables pessimistic concurrency control (i.e., lock-
based) on multiple documents in MongoDB
• Limitation: it only supports a single node
 application-level sharding: records in the same group are
placed on the same node (no elasticity…)
TokuMX Server
DOC1
DOC2
DOC3
TokuMX
Client TokuMX Server
DOC4
DOC5
DOC6
TokuMX Server
DOC7
DOC8
DOC9
TokuMX
Client
TokuMX
Client
Grouprouting

Missing Pieces to Implement
• Mapping entity class to NoSQL data structure
• Implementing secondary index
• Auto key generation
Strudel provides a generic implementation
(Transactional KVS)

Transactional KVS API
• Mapping entity to byte-array key-
value objects
• Mapping secondary index to byte-
array key-value objects
• Auto key generation
Transactional KVS
Implementation
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
HBase
Implementation
Type mapping,
Auto-key generation,
Index implementation
byte[] group, key, value
start/commit
put/get/delete
entity
start/commit
create / get / update / delete
get-by-index

SessionWorkload
Framework
Entity DB API
JPA
Native data
access
(NoSQL)
Data access components (Entity DB)
Data access
(JPA)
Benchmark application logic

• A session = interaction with one user
• State transition model (in XML) to define user actions
(interactions)
• Each interaction is implemented as a Java class
(home)
Sell item View bids
Store bid
View
items
User
(state
parameters)
State
manipulation
Data access
Parameter
generation
User interaction
(Java class)
XML document
Java classes

User Interaction Implementation
• A base class that implements logic not specific to
data stores
• For each data access API, implement a class that
extends the base class
Store bid
User
(state
parameters)State manipulation
Data access
Data access
(JPA)
Data access
(EntityDB)
Entities
Base class

Example Benchmarks
© https://www.flickr.com/photos/comunicati/

Example Benchmarks
Micro-benchmark
• Item types based on user
access pattern
– personal, shared, public,
message items
• Set of data access
interactions
Application-level benchmark
• Auction benchmark
• Similar to existing SQL
benchmarks
– AuctionMark (OLTP-Bench)
– RUBiS
• Customized for entity
group transactions
Two data access implementations: EntityDB, JPA

Configuration Description
Language
Performance Experiments and Analyses
various components

XML-based Configuration
Description Language
• Lets a document extend (inherit) other template documents
(of components) to compose a complex system
• Enhances reproducibility of experiments
• Released separately: https://github.com/tatemura/congenio
XML
XML XML XML
XML XML
XML
XML
XML
XMLData Stores
HBaseOmidMongoDB
Experiment set
State transitions
Workload mix
generate
Experiment #0
Experiment #1
Experiment #2
XML
Servers
extend
extend
extend

Ease of Development
(Code Reuse)
© https://www.flickr.com/photos/stijnnieuwendijk/

Code Reuse:
For Each NoSQL System
TKVS HBase Omid MongoDB TokuMX
LOC 3130 796 454 680 507
Classes 36 6 4 4 4
Transactional KVS
Implementation
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
Line-of-Code (LOC)
Common part : ~3000
NoSQL specific part :500~800

Code Reuse:
For Each Benchmark
LOC (Class) Entities Parameters Base
Interactions
EntityDB
Data Access
JPA
Data Access
Auction 943 (9) 202 (3) 1346 (17) 1090 (18) 1043 (17)
Micro 681 (8) 212 (4) 1004 (19) 931 (19) 985 (19)
data
access
(NoSQL)
Entity DB API
JPA
Data access (Entity DB)
Data access
(JPA)
Benchmark application logic
+ XML configuration documents to define state transition
Separation of
concerns: implement
only data access part
as required
Small classes as many as
interactions

Demo
(taste of use cases…)

Demo Scenarios
1. Scale-out comparison with simple workloads
2. HBase vs. Omid (transaction server or not)
3. MongoDB vs.TokuMX (concurrency control)
4. SQL vs. NoSQL with application-level
workloads

Demo 1: Scalability on simple
workloads
• Transactions without conflict
• Max throughput on different systems with
different number of servers
– Micro-benchmark: update 4 personal items in the
same group (= same user) x 1600 session
concurrency
– # servers: NoSQL: 3,5,10 / MySQL: 1

Scalability Results (throughput)

SQL vs. NoSQL
1 Node MySQL3 Node HBase
RDBMS seems efficient even for simple
(transactional) put/get workloads
Winner will depend on other application needs (max throughput,
elasticity, availability, budget…)

HBase vs. Omid
Transaction Server or not
Omid is scalable but overhead is significant for simple workloads

Demo 2:
When to use aTransaction Server?
• [obvious] when transactions cannot be divided by
groups
• [in general] when group granularity is large
TXN TXN TXN TXN TXN TXN TXN TXN TXN
Consider: transaction that updates 1 item
HBase implementation (check-and-update) can only allow sequential update in one group

Demo 2:
When to use aTransaction Server?
• Micro-benchmark: update 1 shared item x 3200 concurrent
sessions
• 8oK items divided into 200, 2K, 20K groups
TXN TXN TXN TXN TXN TXN TXN TXN TXN

Group Granularity Results
(Throughput)
HBase: larger concurrency / group key  throughput goes down

Demo 2: Implications
• HBase or Omid depends on application needs
– Combined approach may be ideal – but using
these two approaches on the same data is not
trivial
• Suggested approach
– Configure micro-benchmark to mimic the
applications access pattern
– Develop application-level benchmark for further
insights

Demo 3: Optimistic vs. Pessimistic
Concurrency Control
• Optimistic CC with MongoDB vs. Pessimistic CC
withTokuMX
• Micro-benchmark: update 4 items in a (randomly
chosen) group (out of 3200 groups) x 3200
concurrent sessions
• [A] no-conflict: 400 personal items per group
• [B] mild-conflict: 400 shared items per group
• [C] heavy-conflict: 40 shared items per group
Well-known rule-of-thumb: “use pessimistic CC when conflict is
frequent”

Transaction Conflict Result
(Throughput)
Light contention  Pessimistic CC wins
Heavy contention  Optimistic CC wins (!)

What is going on?
• TokuMX version suffers from deadlock
• Deadlock causes failure on conflicting
transactions  no progress
– It requires retrying with back-off to proceed
• A simple check-and-update approach (on
MongoDB) lets one conflicting transaction be
successful  progress
– A transaction can retry more agressively

• A common practice in a loosely-coupled distributed
environment is to use optimistic CC (non-blocking)
– It seems true for our NoSQL transaction case
• Pessimistic CC should be used carefully as a final
resort
– In SQL, RDBMS has more control to how to execute
multi-record read/write. It also uses more sophisticated
lock management. In NoSQL, it is often the application’s
responsibility

Demo 4: Application-level
performance: SQL or NoSQL
• Auction Benchmark
• Concurrent sessions: 200  400 …3200
– Data scale (#user): 10K  20K  ... 160K
• # servers: HBase 10, MySQL 1
• 2 MySQL versions: Entity DB, JPA

Auction Benchmark Result
(Throughput)
10 node HBase 1 node MySQL
HBase version is scalable but not very efficient…

Closer Look at ResponseTime
• Measure interaction response time when a server
is not overloaded (200 concurrent sessions)
• 2 Read-write transactions
– sell-auction-item, store-bid
• 3 read-only transactions
– view-auction-items-by-seller, view-bids-by-bidder,
view-winning-bids-by-bidder

ResponseTime Result
Big differences in read-only transactions

Execution Costs
HBase EDB MySQL EDB MySQL JPA
Sell-auction-item 1 row update 1 row insertion 1 row insertion
Store-bid 3 row updates
(secondary index,
key-generation)
1 row insertion 1 row insertion
View-auction-
items-by-seller
Get index + get
item x N
Select item by
seller ID
Select item by
seller ID
View-bids-by-
bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join (item
and bid)
View-winning-bids-
by-bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join with
selection
MySQL EntityDB
(single table SELECT)
MySQL JPA
(JOIN)
HBase EntityDB
(key-value gets)

• Distribution does not come for free…
• Applications may need more efficient
secondary-key entity retrieval
– Parallelize get operations (generic
implementation)
– Explore index implementation specific to a
particular NoSQL system (use its specific feature)
• The Strudel framework should be useful to
test various solutions

Future Extensions: Entity DB API
• Multi-group transactions
• JPA one-to-many relationship
– Retrieve parent-child entities together
– Opportunity for the underlying NoSQL to map
parent-child entities into nested data for better
performance

Future Extensions:
Implementations
• EntityDB Implementation toolkit beyond the
genericTransactional KVS
– Various indexing solutions
– Various data mappings (e.g. nesting)
• Native implementations (e.g., HBase)
– EntityDB for HBase
– Auction benchmark for HBase

Conclusion
• SQL or NoSQL decision involves various
trade-off specific to applications’ needs
• Performance experiments should be tailored
for such specific needs
• The Strudel provides a framework to develop,
reuse, and share performance experiments

Thank you
Strudel is open source software:
https://github.com/tatemura/strudel

Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems

Similar to Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems (20)

Recently uploaded

Recently uploaded (20)

Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems