SlideShare a Scribd company logo
Strudel: Framework for
Transaction Performance
Analyses on
SQL/NoSQL Systems
JunichiTatemura Oliver Po
Zheng Li Hakan Hacigumus
NEC Labs America
Cupertino, CA, USA
EDBT 2016 @ Bordeaux, France
Outline
We developed a framework of performance
analyses on transactions over SQL/NoSQL
systems
• Motivation
• Architecture
• Implementations
• Demo (use-cases)
© https://www.flickr.com/photos/omefrans/
Now released as OSS called
“Strudel”
https://github.com/tatemura/strudel
Motivation
“SQL or NoSQL” Problem (OLTP)
• NoSQL has evolved with so many varieties
• There are also additional components (transaction
servers, indexing add-ons, query language layers…)
• What is my best choice? Is SQL still good?
OLTP applications
Motivation
Vendors and Researchers
• Vendors: “How can we tell our new product is better
than others?”
• Researchers: “How can we tell our new transaction
management technique is really effective?”
MyNoSQL
“Novel techniques
on….”
Existing Benchmarks
SQL
• Varieties of application-
level benchmarks
• Standard:TPC-C,TPC-W
• OLTP-Bench covers a lot
more OLTP use cases
 not directly applicable
to NoSQL systems
NoSQL
• YCSB is the most
popular benchmark
 it only covers micro-
benchmarking w/o
transactions
Common benchmarking platform is desirable
both for micro-level and application-level
Strudel Framework: History
We have developed and used the framework for our
research and development of transactional key value
subsystems of a product
SQL SQL SQL
Partiqle: SQL over KVS
[SIGMOD 2012 Demo]
 A product version (IERS)
We needed to study/improve
performance of key-value store
architecture for transaction
A framework of performance
evaluation tools has been
developed and used
Released as open-source
software to be used in wider
contexts
KeyValue Store
Strudel’s Approach
wrap with abstraction layers
© https://www.flickr.com/photos/iaiaross/
 apple-to-apple comparison
Strudel’s Approach
wrap with abstraction layers
EntityDB: Data access
API to cover common
features of SQL/NoSQL
systems
SessionWorkload:
Framework to separate
application logic and
data access logic
Entity DB: Cover Common Data
Access Features
• SQL systems already have standard Java API (Java
PersistenceAPI)
• Employ its subset and tailor it to fit NoSQL as well
SQLNoSQL
Entity DB
API
Java PersistenceAPI
(JPA)
In Case It Can’t Cover…
Provide an application-level framework to
decouple data access logic from application logic
Benchmark app
Data
access
Entity DB
API
SQL
specific
features
NoSQL
specific
features
SessionWorkload
Framework
pluggable
Architecture
(layers of abstractions)
© https://www.flickr.com/photos/70253321@N00/
Architecture
Transactional KVS
Implementation
JPA
Implementation
[D]TKVS
Implementations
NoSQL (HBase, MongoDB,…)
Performance Experiments andAnalyses
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[D]
Native
Impl.
[A] Benchmark application
SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer
Architecture
Components that are provided by the framework
Transactional KVS
Implementation
JPA
Implementation
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer
Architecture
Components that should be implemented for each
NoSQL system
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
[D]TKVS
Implementations
[A, D]
data
access
(NoSQL)
[D]
Native
Impl.
experiments
layer
application
layer
datamanagement
layer
Architecture
Components that should be implemented by each
benchmark
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Java
Persistence
API (JPA)
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[A] Benchmark application
experiments
layer
application
layer
datamanagement
layer
Architecture
Components that should be implemented by each pair
of NoSQL system and benchmark
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
SessionWorkload Framework
Configuration Description Language
[A, D]
data
access
(NoSQL)
experiments
layer
application
layer
datamanagement
layer
Our Goal:
minimize need of such
components!
EntityDB
© https://www.flickr.com/photos/70253321@N00/
Transactional KVS
Implementation
JPA
Implementation
[D]TKVS
Implementations
NoSQL
[D]
Native
Impl.
SQL
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
Entity DB API
JPA vs. EntityDB
DDL DML Transac
tionSingle entity Multi-entity Query
Language
JPA Object-
Relational
Mapping
Annotations
CRUD One-to-
many
relationship,
etc.
JPQL (Java
Persistence
QL)
Full ACID
transaction
JPA (Java Persistence API): Object-Relational Mapping API
EntityDB: limitation in DML andTransaction
Entity DB Subset of
JPA +
Entity Group
annotations
CRUD Secondary
key access
N/A Entity
Group
transaction
Entity GroupTransaction
One way to represent NoSQL’s limited transaction support
• Entities are divided into disjoint sets (entity groups)
• Transactions within a single group is efficiently supported
• Transactions across multiple groups are expensive or
unsupported
Item 1
bid
Entity group
bid bid
Item 2
bid bid
Item 3
bid bid bid
T1 T2 T3
E.g.,Google Megastore, Google Cloud
Datastore
OR-MappingAnnotations
JPA Standard: @Entity, @Id, @IdClass,…
 Used in EntityDB as well
EntityDB Annotations
Group Key Definition
Primary Key @Id
(sellerId, itemNo, bidNo)
Group Key @GroupId
(sellerId, itemNo)
Grouping Relations
AuctionItem
Bid
Secondary Indices
JPA: Physical design – transparent from
the application
Entity DB: logically required for the application
to access entities by secondary keys
Implementations
© https://www.flickr.com/photos/dinksi/
Implementations
• JPA  trivial implementation
• HBase: Open-source version of Bigtable
• Omid:Transaction Server on HBase
• MongoDB: Document-oriented NoSQL
• TokuMX: MongoDB enhancement with multi-
statement transactions
HBase Implementation
• Use HBase’s check-and-put operation (atomic compare-and-swap)
to update a single row in an atomic manner
• Map each group into a single row
– Row ID = Group Key
– Column = Primary Key
– Cell = Entity
ROW1
ROW2
ROW3
COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
item
bid
Entity group
Omid Implementation
• Omid enables optimistic concurrency control over
multiple rows in HBase tables using multi-versioning
(timestamp)
ROW1
ROW2
ROW3
item
bid
Omid Server States for recovery
Omid
Client
commit
Put/get
Manages timestamp and
transaction states
MongoDB Implementation
• Similar to HBase: use an atomic query-and-
update operation on a single document
DOC1
DOC2
DOC3
item
bid
Entity group = one document
TokuMX Implementation
• TokuMX enables pessimistic concurrency control (i.e., lock-
based) on multiple documents in MongoDB
• Limitation: it only supports a single node
 application-level sharding: records in the same group are
placed on the same node (no elasticity…)
TokuMX Server
DOC1
DOC2
DOC3
TokuMX
Client TokuMX Server
DOC4
DOC5
DOC6
TokuMX Server
DOC7
DOC8
DOC9
TokuMX
Client
TokuMX
Client
Grouprouting
Missing Pieces to Implement
• Mapping entity class to NoSQL data structure
• Implementing secondary index
• Auto key generation
Strudel provides a generic implementation
(Transactional KVS)
Transactional KVS API
• Mapping entity to byte-array key-
value objects
• Mapping secondary index to byte-
array key-value objects
• Auto key generation
Transactional KVS
Implementation
NoSQL (HBase, MongoDB,…)
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
HBase
Implementation
Type mapping,
Auto-key generation,
Index implementation
byte[] group, key, value
start/commit
put/get/delete
entity
start/commit
create / get / update / delete
get-by-index
SessionWorkload
Framework
© https://www.flickr.com/photos/70253321@N00/
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
Configuration Description Language
JPA
Native data
access
(NoSQL)
Data access components (Entity DB)
Data access
(JPA)
Benchmark application logic
SessionWorkload Framework
SessionWorkload Framework
• A session = interaction with one user
• State transition model (in XML) to define user actions
(interactions)
• Each interaction is implemented as a Java class
(home)
Sell item View bids
Store bid
View
items
User
(state
parameters)
State
manipulation
Data access
Parameter
generation
User interaction
(Java class)
XML document
Java classes
User Interaction Implementation
• A base class that implements logic not specific to
data stores
• For each data access API, implement a class that
extends the base class
Store bid
User
(state
parameters)State manipulation
Data access
Data access
(JPA)
Data access
(EntityDB)
Entities
Base class
Example Benchmarks
© https://www.flickr.com/photos/comunicati/
Example Benchmarks
Micro-benchmark
• Item types based on user
access pattern
– personal, shared, public,
message items
• Set of data access
interactions
Application-level benchmark
• Auction benchmark
• Similar to existing SQL
benchmarks
– AuctionMark (OLTP-Bench)
– RUBiS
• Customized for entity
group transactions
Two data access implementations: EntityDB, JPA
Configuration Description
Language
© https://www.flickr.com/photos/70253321@N00/
NoSQL (HBase, MongoDB,…)
Performance Experiments and Analyses
SQL (MySQL, DB-X,…)
Configuration Description Language
various components
XML-based Configuration
Description Language
• Lets a document extend (inherit) other template documents
(of components) to compose a complex system
• Enhances reproducibility of experiments
• Released separately: https://github.com/tatemura/congenio
XML
XML XML XML
XML XML
XML
XML
XML
XMLData Stores
HBaseOmidMongoDB
Experiment set
State transitions
Workload mix
generate
Experiment #0
Experiment #1
Experiment #2
XML
Servers
extend
extend
extend
Ease of Development
(Code Reuse)
© https://www.flickr.com/photos/stijnnieuwendijk/
Code Reuse:
For Each NoSQL System
TKVS HBase Omid MongoDB TokuMX
LOC 3130 796 454 680 507
Classes 36 6 4 4 4
Transactional KVS
Implementation
NoSQL (HBase, MongoDB,…)
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
Line-of-Code (LOC)
Common part : ~3000
NoSQL specific part :500~800
Code Reuse:
For Each Benchmark
LOC (Class) Entities Parameters Base
Interactions
EntityDB
Data Access
JPA
Data Access
Auction 943 (9) 202 (3) 1346 (17) 1090 (18) 1043 (17)
Micro 681 (8) 212 (4) 1004 (19) 931 (19) 985 (19)
NoSQL (HBase, MongoDB,…)
data
access
(NoSQL)
SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
JPA
Data access (Entity DB)
Data access
(JPA)
Benchmark application logic
+ XML configuration documents to define state transition
Separation of
concerns: implement
only data access part
as required
Small classes as many as
interactions
Demo
(taste of use cases…)
© https://www.flickr.com/photos/26838346@N03/
Demo Scenarios
1. Scale-out comparison with simple workloads
2. HBase vs. Omid (transaction server or not)
3. MongoDB vs.TokuMX (concurrency control)
4. SQL vs. NoSQL with application-level
workloads
Demo 1: Scalability on simple
workloads
• Transactions without conflict
• Max throughput on different systems with
different number of servers
– Micro-benchmark: update 4 personal items in the
same group (= same user) x 1600 session
concurrency
– # servers: NoSQL: 3,5,10 / MySQL: 1
Scalability Results (throughput)
SQL vs. NoSQL
1 Node MySQL3 Node HBase
RDBMS seems efficient even for simple
(transactional) put/get workloads
Winner will depend on other application needs (max throughput,
elasticity, availability, budget…)
HBase vs. Omid
Transaction Server or not
Omid is scalable but overhead is significant for simple workloads
Demo 2:
When to use aTransaction Server?
• [obvious] when transactions cannot be divided by
groups
• [in general] when group granularity is large
TXN TXN TXN TXN TXN TXN TXN TXN TXN
Consider: transaction that updates 1 item
HBase implementation (check-and-update) can only allow sequential update in one group
Demo 2:
When to use aTransaction Server?
• Micro-benchmark: update 1 shared item x 3200 concurrent
sessions
• 8oK items divided into 200, 2K, 20K groups
TXN TXN TXN TXN TXN TXN TXN TXN TXN
Group Granularity Results
(Throughput)
HBase: larger concurrency / group key  throughput goes down
Demo 2: Implications
• HBase or Omid depends on application needs
– Combined approach may be ideal – but using
these two approaches on the same data is not
trivial
• Suggested approach
– Configure micro-benchmark to mimic the
applications access pattern
– Develop application-level benchmark for further
insights
Demo 3: Optimistic vs. Pessimistic
Concurrency Control
• Optimistic CC with MongoDB vs. Pessimistic CC
withTokuMX
• Micro-benchmark: update 4 items in a (randomly
chosen) group (out of 3200 groups) x 3200
concurrent sessions
• [A] no-conflict: 400 personal items per group
• [B] mild-conflict: 400 shared items per group
• [C] heavy-conflict: 40 shared items per group
Well-known rule-of-thumb: “use pessimistic CC when conflict is
frequent”
Transaction Conflict Result
(Throughput)
Light contention  Pessimistic CC wins
Heavy contention  Optimistic CC wins (!)
What is going on?
• TokuMX version suffers from deadlock
• Deadlock causes failure on conflicting
transactions  no progress
– It requires retrying with back-off to proceed
• A simple check-and-update approach (on
MongoDB) lets one conflicting transaction be
successful  progress
– A transaction can retry more agressively
Demo 3: Implications
• A common practice in a loosely-coupled distributed
environment is to use optimistic CC (non-blocking)
– It seems true for our NoSQL transaction case
• Pessimistic CC should be used carefully as a final
resort
– In SQL, RDBMS has more control to how to execute
multi-record read/write. It also uses more sophisticated
lock management. In NoSQL, it is often the application’s
responsibility
Demo 4: Application-level
performance: SQL or NoSQL
• Auction Benchmark
• Concurrent sessions: 200  400 …3200
– Data scale (#user): 10K  20K  ... 160K
• # servers: HBase 10, MySQL 1
• 2 MySQL versions: Entity DB, JPA
Auction Benchmark Result
(Throughput)
10 node HBase 1 node MySQL
HBase version is scalable but not very efficient…
Closer Look at ResponseTime
• Measure interaction response time when a server
is not overloaded (200 concurrent sessions)
• 2 Read-write transactions
– sell-auction-item, store-bid
• 3 read-only transactions
– view-auction-items-by-seller, view-bids-by-bidder,
view-winning-bids-by-bidder
ResponseTime Result
Big differences in read-only transactions
Execution Costs
HBase EDB MySQL EDB MySQL JPA
Sell-auction-item 1 row update 1 row insertion 1 row insertion
Store-bid 3 row updates
(secondary index,
key-generation)
1 row insertion 1 row insertion
View-auction-
items-by-seller
Get index + get
item x N
Select item by
seller ID
Select item by
seller ID
View-bids-by-
bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join (item
and bid)
View-winning-bids-
by-bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join with
selection
MySQL EntityDB
(single table SELECT)
MySQL JPA
(JOIN)
HBase EntityDB
(key-value gets)
Demo 4: Implications
• Distribution does not come for free…
• Applications may need more efficient
secondary-key entity retrieval
– Parallelize get operations (generic
implementation)
– Explore index implementation specific to a
particular NoSQL system (use its specific feature)
• The Strudel framework should be useful to
test various solutions
Future Extensions
© https://www.flickr.com/photos/fofie57/
Future Extensions: Entity DB API
• Multi-group transactions
• JPA one-to-many relationship
– Retrieve parent-child entities together
– Opportunity for the underlying NoSQL to map
parent-child entities into nested data for better
performance
Future Extensions:
Implementations
• EntityDB Implementation toolkit beyond the
genericTransactional KVS
– Various indexing solutions
– Various data mappings (e.g. nesting)
• Native implementations (e.g., HBase)
– EntityDB for HBase
– Auction benchmark for HBase
Conclusion
• SQL or NoSQL decision involves various
trade-off specific to applications’ needs
• Performance experiments should be tailored
for such specific needs
• The Strudel provides a framework to develop,
reuse, and share performance experiments
Thank you
© https://www.flickr.com/photos/94110441@N06/
Strudel is open source software:
https://github.com/tatemura/strudel

More Related Content

What's hot

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
Trivadis
 
Hibernate jj
Hibernate jjHibernate jj
Hibernate jjJoe Jacob
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
Michael Rys
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Gabriele Bartolini
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Databricks
 
Native JSON Support in SQL2016
Native JSON Support in SQL2016Native JSON Support in SQL2016
Native JSON Support in SQL2016
Ivo Andreev
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
llangit
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
DATAVERSITY
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
Khalid Salama
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database OverviewSteve Min
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Iulian Pintoiu
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
Bob Ward
 
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
NLJUG
 

What's hot (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
 
Hibernate jj
Hibernate jjHibernate jj
Hibernate jj
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
Native JSON Support in SQL2016
Native JSON Support in SQL2016Native JSON Support in SQL2016
Native JSON Support in SQL2016
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)
 
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
 

Similar to Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems

QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital.AI
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
RTTS
 
Microsoft Entity Framework
Microsoft Entity FrameworkMicrosoft Entity Framework
Microsoft Entity Framework
Mahmoud Tolba
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
Cloudera, Inc.
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
Tobias Koprowski
 
Introduction to SQL Server Analysis services 2008
Introduction to SQL Server Analysis services 2008Introduction to SQL Server Analysis services 2008
Introduction to SQL Server Analysis services 2008
Tobias Koprowski
 
70487.pdf
70487.pdf70487.pdf
70487.pdf
Karen Benoit
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
RTTS
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
Malam Team
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
Teamstudio
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
Amazon Web Services
 
Entity framework introduction sesion-1
Entity framework introduction   sesion-1Entity framework introduction   sesion-1
Entity framework introduction sesion-1
Usama Nada
 
Nosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureNosql why and how on Microsoft Azure
Nosql why and how on Microsoft Azure
Vito Flavio Lorusso
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Microsoft SQL Server 2012
Microsoft SQL Server 2012 Microsoft SQL Server 2012
Microsoft SQL Server 2012
Dhiren Gala
 
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Ontico
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7
Jeremy Likness
 

Similar to Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems (20)

QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Microsoft Entity Framework
Microsoft Entity FrameworkMicrosoft Entity Framework
Microsoft Entity Framework
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
 
Introduction to SQL Server Analysis services 2008
Introduction to SQL Server Analysis services 2008Introduction to SQL Server Analysis services 2008
Introduction to SQL Server Analysis services 2008
 
70487.pdf
70487.pdf70487.pdf
70487.pdf
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
Entity framework introduction sesion-1
Entity framework introduction   sesion-1Entity framework introduction   sesion-1
Entity framework introduction sesion-1
 
Nosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureNosql why and how on Microsoft Azure
Nosql why and how on Microsoft Azure
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Microsoft SQL Server 2012
Microsoft SQL Server 2012 Microsoft SQL Server 2012
Microsoft SQL Server 2012
 
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7
 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx
benykoy2024
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Ethernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.pptEthernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.ppt
azkamurat
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Ethernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.pptEthernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.ppt
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 

Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems

  • 1. Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems JunichiTatemura Oliver Po Zheng Li Hakan Hacigumus NEC Labs America Cupertino, CA, USA EDBT 2016 @ Bordeaux, France
  • 2. Outline We developed a framework of performance analyses on transactions over SQL/NoSQL systems • Motivation • Architecture • Implementations • Demo (use-cases) © https://www.flickr.com/photos/omefrans/ Now released as OSS called “Strudel” https://github.com/tatemura/strudel
  • 3. Motivation “SQL or NoSQL” Problem (OLTP) • NoSQL has evolved with so many varieties • There are also additional components (transaction servers, indexing add-ons, query language layers…) • What is my best choice? Is SQL still good? OLTP applications
  • 4. Motivation Vendors and Researchers • Vendors: “How can we tell our new product is better than others?” • Researchers: “How can we tell our new transaction management technique is really effective?” MyNoSQL “Novel techniques on….”
  • 5. Existing Benchmarks SQL • Varieties of application- level benchmarks • Standard:TPC-C,TPC-W • OLTP-Bench covers a lot more OLTP use cases  not directly applicable to NoSQL systems NoSQL • YCSB is the most popular benchmark  it only covers micro- benchmarking w/o transactions Common benchmarking platform is desirable both for micro-level and application-level
  • 6. Strudel Framework: History We have developed and used the framework for our research and development of transactional key value subsystems of a product SQL SQL SQL Partiqle: SQL over KVS [SIGMOD 2012 Demo]  A product version (IERS) We needed to study/improve performance of key-value store architecture for transaction A framework of performance evaluation tools has been developed and used Released as open-source software to be used in wider contexts KeyValue Store
  • 7. Strudel’s Approach wrap with abstraction layers © https://www.flickr.com/photos/iaiaross/  apple-to-apple comparison
  • 8. Strudel’s Approach wrap with abstraction layers EntityDB: Data access API to cover common features of SQL/NoSQL systems SessionWorkload: Framework to separate application logic and data access logic
  • 9. Entity DB: Cover Common Data Access Features • SQL systems already have standard Java API (Java PersistenceAPI) • Employ its subset and tailor it to fit NoSQL as well SQLNoSQL Entity DB API Java PersistenceAPI (JPA)
  • 10. In Case It Can’t Cover… Provide an application-level framework to decouple data access logic from application logic Benchmark app Data access Entity DB API SQL specific features NoSQL specific features SessionWorkload Framework pluggable
  • 11. Architecture (layers of abstractions) © https://www.flickr.com/photos/70253321@N00/
  • 12. Architecture Transactional KVS Implementation JPA Implementation [D]TKVS Implementations NoSQL (HBase, MongoDB,…) Performance Experiments andAnalyses [A, D] data access (NoSQL) [A] Benchmark application data access components (Entity DB) [A] data access (JPA) [D] Native Impl. [A] Benchmark application SQL (MySQL, DB-X,…) Entity DB API SessionWorkload Framework Configuration Description Language Transactional KVS API Java Persistence API (JPA) Java Persistence API (JPA) experiments layer application layer datamanagement layer
  • 13. Architecture Components that are provided by the framework Transactional KVS Implementation JPA Implementation NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…) Entity DB API SessionWorkload Framework Configuration Description Language Transactional KVS API Java Persistence API (JPA) Java Persistence API (JPA) experiments layer application layer datamanagement layer
  • 14. Architecture Components that should be implemented for each NoSQL system NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…) Entity DB API SessionWorkload Framework Configuration Description Language Transactional KVS API [D]TKVS Implementations [A, D] data access (NoSQL) [D] Native Impl. experiments layer application layer datamanagement layer
  • 15. Architecture Components that should be implemented by each benchmark NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…) Entity DB API SessionWorkload Framework Configuration Description Language Java Persistence API (JPA) [A, D] data access (NoSQL) [A] Benchmark application data access components (Entity DB) [A] data access (JPA) [A] Benchmark application experiments layer application layer datamanagement layer
  • 16. Architecture Components that should be implemented by each pair of NoSQL system and benchmark NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…) SessionWorkload Framework Configuration Description Language [A, D] data access (NoSQL) experiments layer application layer datamanagement layer Our Goal: minimize need of such components!
  • 18. JPA vs. EntityDB DDL DML Transac tionSingle entity Multi-entity Query Language JPA Object- Relational Mapping Annotations CRUD One-to- many relationship, etc. JPQL (Java Persistence QL) Full ACID transaction JPA (Java Persistence API): Object-Relational Mapping API EntityDB: limitation in DML andTransaction Entity DB Subset of JPA + Entity Group annotations CRUD Secondary key access N/A Entity Group transaction
  • 19. Entity GroupTransaction One way to represent NoSQL’s limited transaction support • Entities are divided into disjoint sets (entity groups) • Transactions within a single group is efficiently supported • Transactions across multiple groups are expensive or unsupported Item 1 bid Entity group bid bid Item 2 bid bid Item 3 bid bid bid T1 T2 T3 E.g.,Google Megastore, Google Cloud Datastore
  • 20. OR-MappingAnnotations JPA Standard: @Entity, @Id, @IdClass,…  Used in EntityDB as well
  • 22. Group Key Definition Primary Key @Id (sellerId, itemNo, bidNo) Group Key @GroupId (sellerId, itemNo)
  • 24. Secondary Indices JPA: Physical design – transparent from the application Entity DB: logically required for the application to access entities by secondary keys
  • 26. Implementations • JPA  trivial implementation • HBase: Open-source version of Bigtable • Omid:Transaction Server on HBase • MongoDB: Document-oriented NoSQL • TokuMX: MongoDB enhancement with multi- statement transactions
  • 27. HBase Implementation • Use HBase’s check-and-put operation (atomic compare-and-swap) to update a single row in an atomic manner • Map each group into a single row – Row ID = Group Key – Column = Primary Key – Cell = Entity ROW1 ROW2 ROW3 COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 item bid Entity group
  • 28. Omid Implementation • Omid enables optimistic concurrency control over multiple rows in HBase tables using multi-versioning (timestamp) ROW1 ROW2 ROW3 item bid Omid Server States for recovery Omid Client commit Put/get Manages timestamp and transaction states
  • 29. MongoDB Implementation • Similar to HBase: use an atomic query-and- update operation on a single document DOC1 DOC2 DOC3 item bid Entity group = one document
  • 30. TokuMX Implementation • TokuMX enables pessimistic concurrency control (i.e., lock- based) on multiple documents in MongoDB • Limitation: it only supports a single node  application-level sharding: records in the same group are placed on the same node (no elasticity…) TokuMX Server DOC1 DOC2 DOC3 TokuMX Client TokuMX Server DOC4 DOC5 DOC6 TokuMX Server DOC7 DOC8 DOC9 TokuMX Client TokuMX Client Grouprouting
  • 31. Missing Pieces to Implement • Mapping entity class to NoSQL data structure • Implementing secondary index • Auto key generation Strudel provides a generic implementation (Transactional KVS)
  • 32. Transactional KVS API • Mapping entity to byte-array key- value objects • Mapping secondary index to byte- array key-value objects • Auto key generation Transactional KVS Implementation NoSQL (HBase, MongoDB,…) Native data access (NoSQL) Native Impl. Entity DB API Transactional KVS API TKVS Implementations HBase Implementation Type mapping, Auto-key generation, Index implementation byte[] group, key, value start/commit put/get/delete entity start/commit create / get / update / delete get-by-index
  • 33. SessionWorkload Framework © https://www.flickr.com/photos/70253321@N00/ NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…) Entity DB API Configuration Description Language JPA Native data access (NoSQL) Data access components (Entity DB) Data access (JPA) Benchmark application logic SessionWorkload Framework
  • 34. SessionWorkload Framework • A session = interaction with one user • State transition model (in XML) to define user actions (interactions) • Each interaction is implemented as a Java class (home) Sell item View bids Store bid View items User (state parameters) State manipulation Data access Parameter generation User interaction (Java class) XML document Java classes
  • 35. User Interaction Implementation • A base class that implements logic not specific to data stores • For each data access API, implement a class that extends the base class Store bid User (state parameters)State manipulation Data access Data access (JPA) Data access (EntityDB) Entities Base class
  • 37. Example Benchmarks Micro-benchmark • Item types based on user access pattern – personal, shared, public, message items • Set of data access interactions Application-level benchmark • Auction benchmark • Similar to existing SQL benchmarks – AuctionMark (OLTP-Bench) – RUBiS • Customized for entity group transactions Two data access implementations: EntityDB, JPA
  • 38. Configuration Description Language © https://www.flickr.com/photos/70253321@N00/ NoSQL (HBase, MongoDB,…) Performance Experiments and Analyses SQL (MySQL, DB-X,…) Configuration Description Language various components
  • 39. XML-based Configuration Description Language • Lets a document extend (inherit) other template documents (of components) to compose a complex system • Enhances reproducibility of experiments • Released separately: https://github.com/tatemura/congenio XML XML XML XML XML XML XML XML XML XMLData Stores HBaseOmidMongoDB Experiment set State transitions Workload mix generate Experiment #0 Experiment #1 Experiment #2 XML Servers extend extend extend
  • 40. Ease of Development (Code Reuse) © https://www.flickr.com/photos/stijnnieuwendijk/
  • 41. Code Reuse: For Each NoSQL System TKVS HBase Omid MongoDB TokuMX LOC 3130 796 454 680 507 Classes 36 6 4 4 4 Transactional KVS Implementation NoSQL (HBase, MongoDB,…) Native data access (NoSQL) Native Impl. Entity DB API Transactional KVS API TKVS Implementations Line-of-Code (LOC) Common part : ~3000 NoSQL specific part :500~800
  • 42. Code Reuse: For Each Benchmark LOC (Class) Entities Parameters Base Interactions EntityDB Data Access JPA Data Access Auction 943 (9) 202 (3) 1346 (17) 1090 (18) 1043 (17) Micro 681 (8) 212 (4) 1004 (19) 931 (19) 985 (19) NoSQL (HBase, MongoDB,…) data access (NoSQL) SQL (MySQL, DB-X,…) Entity DB API SessionWorkload Framework JPA Data access (Entity DB) Data access (JPA) Benchmark application logic + XML configuration documents to define state transition Separation of concerns: implement only data access part as required Small classes as many as interactions
  • 43. Demo (taste of use cases…) © https://www.flickr.com/photos/26838346@N03/
  • 44. Demo Scenarios 1. Scale-out comparison with simple workloads 2. HBase vs. Omid (transaction server or not) 3. MongoDB vs.TokuMX (concurrency control) 4. SQL vs. NoSQL with application-level workloads
  • 45. Demo 1: Scalability on simple workloads • Transactions without conflict • Max throughput on different systems with different number of servers – Micro-benchmark: update 4 personal items in the same group (= same user) x 1600 session concurrency – # servers: NoSQL: 3,5,10 / MySQL: 1
  • 47. SQL vs. NoSQL 1 Node MySQL3 Node HBase RDBMS seems efficient even for simple (transactional) put/get workloads Winner will depend on other application needs (max throughput, elasticity, availability, budget…)
  • 48. HBase vs. Omid Transaction Server or not Omid is scalable but overhead is significant for simple workloads
  • 49. Demo 2: When to use aTransaction Server? • [obvious] when transactions cannot be divided by groups • [in general] when group granularity is large TXN TXN TXN TXN TXN TXN TXN TXN TXN Consider: transaction that updates 1 item HBase implementation (check-and-update) can only allow sequential update in one group
  • 50. Demo 2: When to use aTransaction Server? • Micro-benchmark: update 1 shared item x 3200 concurrent sessions • 8oK items divided into 200, 2K, 20K groups TXN TXN TXN TXN TXN TXN TXN TXN TXN
  • 51. Group Granularity Results (Throughput) HBase: larger concurrency / group key  throughput goes down
  • 52. Demo 2: Implications • HBase or Omid depends on application needs – Combined approach may be ideal – but using these two approaches on the same data is not trivial • Suggested approach – Configure micro-benchmark to mimic the applications access pattern – Develop application-level benchmark for further insights
  • 53. Demo 3: Optimistic vs. Pessimistic Concurrency Control • Optimistic CC with MongoDB vs. Pessimistic CC withTokuMX • Micro-benchmark: update 4 items in a (randomly chosen) group (out of 3200 groups) x 3200 concurrent sessions • [A] no-conflict: 400 personal items per group • [B] mild-conflict: 400 shared items per group • [C] heavy-conflict: 40 shared items per group Well-known rule-of-thumb: “use pessimistic CC when conflict is frequent”
  • 54. Transaction Conflict Result (Throughput) Light contention  Pessimistic CC wins Heavy contention  Optimistic CC wins (!)
  • 55. What is going on? • TokuMX version suffers from deadlock • Deadlock causes failure on conflicting transactions  no progress – It requires retrying with back-off to proceed • A simple check-and-update approach (on MongoDB) lets one conflicting transaction be successful  progress – A transaction can retry more agressively
  • 56. Demo 3: Implications • A common practice in a loosely-coupled distributed environment is to use optimistic CC (non-blocking) – It seems true for our NoSQL transaction case • Pessimistic CC should be used carefully as a final resort – In SQL, RDBMS has more control to how to execute multi-record read/write. It also uses more sophisticated lock management. In NoSQL, it is often the application’s responsibility
  • 57. Demo 4: Application-level performance: SQL or NoSQL • Auction Benchmark • Concurrent sessions: 200  400 …3200 – Data scale (#user): 10K  20K  ... 160K • # servers: HBase 10, MySQL 1 • 2 MySQL versions: Entity DB, JPA
  • 58. Auction Benchmark Result (Throughput) 10 node HBase 1 node MySQL HBase version is scalable but not very efficient…
  • 59. Closer Look at ResponseTime • Measure interaction response time when a server is not overloaded (200 concurrent sessions) • 2 Read-write transactions – sell-auction-item, store-bid • 3 read-only transactions – view-auction-items-by-seller, view-bids-by-bidder, view-winning-bids-by-bidder
  • 60. ResponseTime Result Big differences in read-only transactions
  • 61. Execution Costs HBase EDB MySQL EDB MySQL JPA Sell-auction-item 1 row update 1 row insertion 1 row insertion Store-bid 3 row updates (secondary index, key-generation) 1 row insertion 1 row insertion View-auction- items-by-seller Get index + get item x N Select item by seller ID Select item by seller ID View-bids-by- bidder Get index + get bid x N + get item x N Select bids by bidder ID + get item x N 2 table join (item and bid) View-winning-bids- by-bidder Get index + get bid x N + get item x N Select bids by bidder ID + get item x N 2 table join with selection MySQL EntityDB (single table SELECT) MySQL JPA (JOIN) HBase EntityDB (key-value gets)
  • 62. Demo 4: Implications • Distribution does not come for free… • Applications may need more efficient secondary-key entity retrieval – Parallelize get operations (generic implementation) – Explore index implementation specific to a particular NoSQL system (use its specific feature) • The Strudel framework should be useful to test various solutions
  • 64. Future Extensions: Entity DB API • Multi-group transactions • JPA one-to-many relationship – Retrieve parent-child entities together – Opportunity for the underlying NoSQL to map parent-child entities into nested data for better performance
  • 65. Future Extensions: Implementations • EntityDB Implementation toolkit beyond the genericTransactional KVS – Various indexing solutions – Various data mappings (e.g. nesting) • Native implementations (e.g., HBase) – EntityDB for HBase – Auction benchmark for HBase
  • 66. Conclusion • SQL or NoSQL decision involves various trade-off specific to applications’ needs • Performance experiments should be tailored for such specific needs • The Strudel provides a framework to develop, reuse, and share performance experiments
  • 67. Thank you © https://www.flickr.com/photos/94110441@N06/ Strudel is open source software: https://github.com/tatemura/strudel