Hybrid Transactional & Analytical Processing are a new breed of database queries offered by NewSQL engines. This talk features key engine capabilities required to offer HTAP and state of the art of PostgreSQL 11 that aligns with an HTAP vision in terms of sharding, fault tolerance, high availability, and replication
Reactive Programming: A New Asynchronous Database Access API, a possible new Java standard for accessing SQL databases, where user threads never block! - presented by Kuassi Mensah
Datomic – A Modern Database - StampedeCon 2014StampedeCon
At StampedeCon 2014, Alex Miller (Cognitect) presented "Datomic – A Modern Database."
Datomic is a distributed database designed to run on next-generation cloud architectures. Datomic stores facts and retractions using a flexible schema, consistent transactions, and a logic-based query language. The focus on facts over time gives you the ability to look at the state of the database at any point in time and traverse your transactional data in many ways.
We’ll take a tour of the Datomic data model, transactions, query language, and architecture to highlight some of the unique attributes of Datomic and why it is an ideal modern database.
Hybrid Transactional & Analytical Processing are a new breed of database queries offered by NewSQL engines. This talk features key engine capabilities required to offer HTAP and state of the art of PostgreSQL 11 that aligns with an HTAP vision in terms of sharding, fault tolerance, high availability, and replication
Reactive Programming: A New Asynchronous Database Access API, a possible new Java standard for accessing SQL databases, where user threads never block! - presented by Kuassi Mensah
Datomic – A Modern Database - StampedeCon 2014StampedeCon
At StampedeCon 2014, Alex Miller (Cognitect) presented "Datomic – A Modern Database."
Datomic is a distributed database designed to run on next-generation cloud architectures. Datomic stores facts and retractions using a flexible schema, consistent transactions, and a logic-based query language. The focus on facts over time gives you the ability to look at the state of the database at any point in time and traverse your transactional data in many ways.
We’ll take a tour of the Datomic data model, transactions, query language, and architecture to highlight some of the unique attributes of Datomic and why it is an ideal modern database.
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...Trivadis
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing PostgreSQL to Oracle, the best kept secrets; Konrad Häfeli, Jan Karremans - Trivadis
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...Lucas Jellema
Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope and volume of data and the place of data in the IT architecture. BigData, unstructured data and non-relational data stored on Hadoop, in NoSQL databases and held in Elastic Search, Caches and Message Queues complements data in the enterprise RDBMS. Trends such as microservices that contain their own data, BASE, CQRS and Event Sourcing have changed the way we store, share and govern data. This session introduces patterns, technologies and hypes around storing, processing and retrieving data using products such as Oracle Database, Cassandra, MySQL, Neo4J, Kafka, Redis, Elastic Search and Hadoop/Spark -locally,in containers and on the cloud. Key take away: what an application architect and a developer should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together – for a consistent (enough) overall data presentation.
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
Pinterest is moving all batch processing to Apache Spark, which includes a large amount of legacy ETL workflows written in Cascading/Scalding. In this talk, we will share the challenges and solutions we experienced during this migration, which includes the motivation of the migration, how to fill the semantic gap between different engines, the difficulty dealing with thrift objects widely used in Pinterest, how we improve Spark accumulators, how to tune the Spark performance after migration using our innovative Spark profiler, and also the performance improvements and cost saving we have achieved after the migration.
Alongside with all other features SQL 2016 now natively supports JSON – one of the most common formats for data exchange. SQL 2016 now has built-in capabilities to query, analyze, exchange and transform JSON data.
JSON functionality is quite similar to SQL XML support but despite this being one of the most desired additions to SQL 2016 there is a flavour of something missing – the JSON data type.
In this session we will talk about JSON support features, limitations and some tricks to overcome these.
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
Big Data and NoSQL have led to big changes In the data environment, but are they all in the best interest of data? Are they technologies that "free us from the harsh limitations of relational databases?"
In this month's webinar, we will be answering questions like these, plus:
Have we managed to free organizations from having to do Data Modeling?
Is there a need for a Data Modeler on NoSQL projects?
If we build Data Models, which types will work?
If we build Data Models, how will they be used?
If we build Data Models, when will they be used?
Who will use Data Models?
Where does Data Quality happen?
Finally, we will wrap with 10 tips for data modelers in organizations incorporating NoSQL in their modern Data Architectures.
The slides give an overview of how Spark can be used to tackle Machine learning tasks, such as classification, regression, clustering, etc., at a Big Data scale.
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...NLJUG
Applying domain driven design in a modular fashion has implications on how your data is structured and retrieved. A modular domain consists out of multiple loosely coupled sub-domains, each having their own modular schema in the database. How can we migrate and evolve the database schema's separately with each new sub-domain version? And how do we match this with reporting and cross-domain use cases, where aggregation of data from multiple sub-domains is essential? A case study concerning an OSGi-based business platform for automotive services has driven us to solve these challenges without sacrificing the hard-worked-on modularity and loose coupling. In this presentation you will learn how we used Modular Domain Driven Design with OSGi. 'Liquibase' is elevated to become a first class citizen in OSGi by extending multiple sub-domains with automatic database migration capabilities. On the other hand, 'Elasticsearch' is integrated in OSGi to become a separate search module coordinating cross-domain use cases. This unique combination enabled us to satisfy two important customer requirements. Functionally, the software should not be limited by module boundaries to answer business questions. Non-functionally, a future-proof platform is required in which the impact of change is contained and encapsulated in loosely coupled modules.
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
DocumentDB is a powerful NoSQL solution. It provides elastic scale, high performance, global distribution, a flexible data model, and is fully managed. If you are looking for a scaled OLTP solution that is too much for SQL Server to handle (i.e. millions of transactions per second) and/or will be using JSON documents, DocumentDB is the answer.
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...Trivadis
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing PostgreSQL to Oracle, the best kept secrets; Konrad Häfeli, Jan Karremans - Trivadis
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...Lucas Jellema
Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope and volume of data and the place of data in the IT architecture. BigData, unstructured data and non-relational data stored on Hadoop, in NoSQL databases and held in Elastic Search, Caches and Message Queues complements data in the enterprise RDBMS. Trends such as microservices that contain their own data, BASE, CQRS and Event Sourcing have changed the way we store, share and govern data. This session introduces patterns, technologies and hypes around storing, processing and retrieving data using products such as Oracle Database, Cassandra, MySQL, Neo4J, Kafka, Redis, Elastic Search and Hadoop/Spark -locally,in containers and on the cloud. Key take away: what an application architect and a developer should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together – for a consistent (enough) overall data presentation.
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
Pinterest is moving all batch processing to Apache Spark, which includes a large amount of legacy ETL workflows written in Cascading/Scalding. In this talk, we will share the challenges and solutions we experienced during this migration, which includes the motivation of the migration, how to fill the semantic gap between different engines, the difficulty dealing with thrift objects widely used in Pinterest, how we improve Spark accumulators, how to tune the Spark performance after migration using our innovative Spark profiler, and also the performance improvements and cost saving we have achieved after the migration.
Alongside with all other features SQL 2016 now natively supports JSON – one of the most common formats for data exchange. SQL 2016 now has built-in capabilities to query, analyze, exchange and transform JSON data.
JSON functionality is quite similar to SQL XML support but despite this being one of the most desired additions to SQL 2016 there is a flavour of something missing – the JSON data type.
In this session we will talk about JSON support features, limitations and some tricks to overcome these.
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
Big Data and NoSQL have led to big changes In the data environment, but are they all in the best interest of data? Are they technologies that "free us from the harsh limitations of relational databases?"
In this month's webinar, we will be answering questions like these, plus:
Have we managed to free organizations from having to do Data Modeling?
Is there a need for a Data Modeler on NoSQL projects?
If we build Data Models, which types will work?
If we build Data Models, how will they be used?
If we build Data Models, when will they be used?
Who will use Data Models?
Where does Data Quality happen?
Finally, we will wrap with 10 tips for data modelers in organizations incorporating NoSQL in their modern Data Architectures.
The slides give an overview of how Spark can be used to tackle Machine learning tasks, such as classification, regression, clustering, etc., at a Big Data scale.
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...NLJUG
Applying domain driven design in a modular fashion has implications on how your data is structured and retrieved. A modular domain consists out of multiple loosely coupled sub-domains, each having their own modular schema in the database. How can we migrate and evolve the database schema's separately with each new sub-domain version? And how do we match this with reporting and cross-domain use cases, where aggregation of data from multiple sub-domains is essential? A case study concerning an OSGi-based business platform for automotive services has driven us to solve these challenges without sacrificing the hard-worked-on modularity and loose coupling. In this presentation you will learn how we used Modular Domain Driven Design with OSGi. 'Liquibase' is elevated to become a first class citizen in OSGi by extending multiple sub-domains with automatic database migration capabilities. On the other hand, 'Elasticsearch' is integrated in OSGi to become a separate search module coordinating cross-domain use cases. This unique combination enabled us to satisfy two important customer requirements. Functionally, the software should not be limited by module boundaries to answer business questions. Non-functionally, a future-proof platform is required in which the impact of change is contained and encapsulated in loosely coupled modules.
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
DocumentDB is a powerful NoSQL solution. It provides elastic scale, high performance, global distribution, a flexible data model, and is fully managed. If you are looking for a scaled OLTP solution that is too much for SQL Server to handle (i.e. millions of transactions per second) and/or will be using JSON documents, DocumentDB is the answer.
Introduction to QuerySurge Webinar
Wednesday, April 29th 2020 @11am ET
Eric Smyth, Director of Alliances
Bill Hayduk, CEO
Matt Moss, Product Manager
This is the slide deck for our webinar. Learn how QuerySurge automates the data validation and testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Applications with full DevOps functionality for continuous testing.
---------------------------------------------------------------------------------
Objective
During this webinar, we demonstrate how QuerySurge solves the following challenges:
- Your need for data quality at speed
- How to automate your ETL testing process
- Your ability to test across your different data platforms
- How to integrate ETL testing into your DataOps pipeline
- How to analyze your data and pinpoint anomalies quickly
-------------------------------------------------------------------------------------
Who should view this?
- ETL Developers /Testers
- Data Architects / Analysts
- DBAs
- BI Developers / Analysts
- IT Architects
- Managers of Data, BI & Analytics groups: CTOs, Directors, Vice Presidents, Project Leads
And anyone else with an interest in the Data & Analytics space who is interested in an automation solution for data validation & testing while improving data quality.
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
Learn what the course covers, from capturing data to building a search interface; the spectrum of processing engines, Apache projects, and ecosystem tools available for converged analytics; who is best suited to attend the course and what prior knowledge you should have; and the benefits of building applications with an enterprise data hub.
Microsoft Azure zmienia się. Jego częśc poświęcona bazie danych (Windows Azure SQL Database) zmienia się jeszcze szybciej. Podczas tej sesji chciałbym pokazac tym, którzy nie widzieli, oraz przypomniec tym, którzy już coś wiedzą - o co chodzi z WASD, jakie zmiany nastapiły i czego możemy po tej bazie oczekiwać. Dla odważnych będzie okazja podłączenia się do konta w chmurze i przetestowania ych rozwiązań samemu.
Introduction to SQL Server Analysis services 2008Tobias Koprowski
This is my presentation from 17th Polish SQL server User Group Meeting in Wroclaw. It\'s first part of Quadrology Bussiness Intelligence for ITPros Cycle.
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
Session Overview
------------------------------------------------
During this webinar, we covered the following topics while demonstrating our plug-in for Azure DevOps:
- Installing the QuerySurge Azure DevOps Extension
- Key features of Azure DevOps
- Azure DevOps Pipeline creation
- QuerySurge offerings in the Azure Marketplace
- Virtual machine options in the Azure Cloud
- Azure Cloud versus on-prem deployment options for QuerySurge
And we answered the following questions:
- Is QuerySurge in the Azure Cloud the right solution for my team?
- Where does QuerySurge fit into the Azure DevOps platform?
- What are QuerySurge’s various offerings in the Azure Cloud?
- If QuerySurge in the cloud is not the right choice, what is my best deployment option?
T o see a recording of the wwebinar, go to:
https://www.youtube.com/watch?v=Cd7P_nJOejE
Access Data from XPages with the Relational ControlsTeamstudio
Did you know that Domino and XPages allows for the easy access of relational data? These exciting capabilities in the Extension Library can greatly enhance the capability of your applications and allow access to information beyond Domino. Howard and Paul will discuss what you need to get started, what controls allow access to relational data, and the new @Functions available to incorporate relational data in your Server Side JavaScript programming.
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
In this session, we discuss the benefits of NoSQL databases and take a tour of the main NoSQL services offered by AWS—Amazon DynamoDB and Amazon ElastiCache. Then, we hear from two leading customers, Expedia and Mapbox, about their use cases and architectural challenges, and how they addressed them using AWS NoSQL services, including design patterns and best practices. You will walk out of this session having a better understanding of NoSQL and its powerful capabilities, ready to tackle your database challenges with confidence.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
MAIA Intelligence was invited to give a technical session on MS-SQL at Microsoft Dreamspark Yatra 2012 event in which around 300 budding techies learnt about the emerging technologies
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Ontico
РИТ++ 2017, AppsConf
Зал Касабланка, 6 июня, 11:00
Тезисы:
http://appsconf.ru/2017/abstracts/2698.html
Clean architecture в связке с MVP - самый распространенный подход к архитектуре Android-приложений. Но подойдет ли он всем? Скорее всего, нет.
В данном докладе будет рассмотрен альтернативный поход под названием Archetype, основанный на reactive extensions, и еще нескольких универсальных паттернов, которые позволяют быстро и гибко реализовать технические и бизнес-требования.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Planning Of Procurement o different goods and services
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
1. Strudel: Framework for
Transaction Performance
Analyses on
SQL/NoSQL Systems
JunichiTatemura Oliver Po
Zheng Li Hakan Hacigumus
NEC Labs America
Cupertino, CA, USA
EDBT 2016 @ Bordeaux, France
3. Motivation
“SQL or NoSQL” Problem (OLTP)
• NoSQL has evolved with so many varieties
• There are also additional components (transaction
servers, indexing add-ons, query language layers…)
• What is my best choice? Is SQL still good?
OLTP applications
4. Motivation
Vendors and Researchers
• Vendors: “How can we tell our new product is better
than others?”
• Researchers: “How can we tell our new transaction
management technique is really effective?”
MyNoSQL
“Novel techniques
on….”
5. Existing Benchmarks
SQL
• Varieties of application-
level benchmarks
• Standard:TPC-C,TPC-W
• OLTP-Bench covers a lot
more OLTP use cases
not directly applicable
to NoSQL systems
NoSQL
• YCSB is the most
popular benchmark
it only covers micro-
benchmarking w/o
transactions
Common benchmarking platform is desirable
both for micro-level and application-level
6. Strudel Framework: History
We have developed and used the framework for our
research and development of transactional key value
subsystems of a product
SQL SQL SQL
Partiqle: SQL over KVS
[SIGMOD 2012 Demo]
A product version (IERS)
We needed to study/improve
performance of key-value store
architecture for transaction
A framework of performance
evaluation tools has been
developed and used
Released as open-source
software to be used in wider
contexts
KeyValue Store
8. Strudel’s Approach
wrap with abstraction layers
EntityDB: Data access
API to cover common
features of SQL/NoSQL
systems
SessionWorkload:
Framework to separate
application logic and
data access logic
9. Entity DB: Cover Common Data
Access Features
• SQL systems already have standard Java API (Java
PersistenceAPI)
• Employ its subset and tailor it to fit NoSQL as well
SQLNoSQL
Entity DB
API
Java PersistenceAPI
(JPA)
10. In Case It Can’t Cover…
Provide an application-level framework to
decouple data access logic from application logic
Benchmark app
Data
access
Entity DB
API
SQL
specific
features
NoSQL
specific
features
SessionWorkload
Framework
pluggable
12. Architecture
Transactional KVS
Implementation
JPA
Implementation
[D]TKVS
Implementations
NoSQL (HBase, MongoDB,…)
Performance Experiments andAnalyses
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[D]
Native
Impl.
[A] Benchmark application
SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer
13. Architecture
Components that are provided by the framework
Transactional KVS
Implementation
JPA
Implementation
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
Java
Persistence
API (JPA)
Java
Persistence
API (JPA)
experiments
layer
application
layer
datamanagement
layer
14. Architecture
Components that should be implemented for each
NoSQL system
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Transactional KVS
API
[D]TKVS
Implementations
[A, D]
data
access
(NoSQL)
[D]
Native
Impl.
experiments
layer
application
layer
datamanagement
layer
15. Architecture
Components that should be implemented by each
benchmark
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
Configuration Description Language
Java
Persistence
API (JPA)
[A, D]
data
access
(NoSQL)
[A] Benchmark application data
access components (Entity DB)
[A] data
access (JPA)
[A] Benchmark application
experiments
layer
application
layer
datamanagement
layer
16. Architecture
Components that should be implemented by each pair
of NoSQL system and benchmark
NoSQL (HBase, MongoDB,…) SQL (MySQL, DB-X,…)
SessionWorkload Framework
Configuration Description Language
[A, D]
data
access
(NoSQL)
experiments
layer
application
layer
datamanagement
layer
Our Goal:
minimize need of such
components!
18. JPA vs. EntityDB
DDL DML Transac
tionSingle entity Multi-entity Query
Language
JPA Object-
Relational
Mapping
Annotations
CRUD One-to-
many
relationship,
etc.
JPQL (Java
Persistence
QL)
Full ACID
transaction
JPA (Java Persistence API): Object-Relational Mapping API
EntityDB: limitation in DML andTransaction
Entity DB Subset of
JPA +
Entity Group
annotations
CRUD Secondary
key access
N/A Entity
Group
transaction
19. Entity GroupTransaction
One way to represent NoSQL’s limited transaction support
• Entities are divided into disjoint sets (entity groups)
• Transactions within a single group is efficiently supported
• Transactions across multiple groups are expensive or
unsupported
Item 1
bid
Entity group
bid bid
Item 2
bid bid
Item 3
bid bid bid
T1 T2 T3
E.g.,Google Megastore, Google Cloud
Datastore
24. Secondary Indices
JPA: Physical design – transparent from
the application
Entity DB: logically required for the application
to access entities by secondary keys
26. Implementations
• JPA trivial implementation
• HBase: Open-source version of Bigtable
• Omid:Transaction Server on HBase
• MongoDB: Document-oriented NoSQL
• TokuMX: MongoDB enhancement with multi-
statement transactions
27. HBase Implementation
• Use HBase’s check-and-put operation (atomic compare-and-swap)
to update a single row in an atomic manner
• Map each group into a single row
– Row ID = Group Key
– Column = Primary Key
– Cell = Entity
ROW1
ROW2
ROW3
COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
item
bid
Entity group
28. Omid Implementation
• Omid enables optimistic concurrency control over
multiple rows in HBase tables using multi-versioning
(timestamp)
ROW1
ROW2
ROW3
item
bid
Omid Server States for recovery
Omid
Client
commit
Put/get
Manages timestamp and
transaction states
29. MongoDB Implementation
• Similar to HBase: use an atomic query-and-
update operation on a single document
DOC1
DOC2
DOC3
item
bid
Entity group = one document
30. TokuMX Implementation
• TokuMX enables pessimistic concurrency control (i.e., lock-
based) on multiple documents in MongoDB
• Limitation: it only supports a single node
application-level sharding: records in the same group are
placed on the same node (no elasticity…)
TokuMX Server
DOC1
DOC2
DOC3
TokuMX
Client TokuMX Server
DOC4
DOC5
DOC6
TokuMX Server
DOC7
DOC8
DOC9
TokuMX
Client
TokuMX
Client
Grouprouting
31. Missing Pieces to Implement
• Mapping entity class to NoSQL data structure
• Implementing secondary index
• Auto key generation
Strudel provides a generic implementation
(Transactional KVS)
32. Transactional KVS API
• Mapping entity to byte-array key-
value objects
• Mapping secondary index to byte-
array key-value objects
• Auto key generation
Transactional KVS
Implementation
NoSQL (HBase, MongoDB,…)
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
HBase
Implementation
Type mapping,
Auto-key generation,
Index implementation
byte[] group, key, value
start/commit
put/get/delete
entity
start/commit
create / get / update / delete
get-by-index
34. SessionWorkload Framework
• A session = interaction with one user
• State transition model (in XML) to define user actions
(interactions)
• Each interaction is implemented as a Java class
(home)
Sell item View bids
Store bid
View
items
User
(state
parameters)
State
manipulation
Data access
Parameter
generation
User interaction
(Java class)
XML document
Java classes
35. User Interaction Implementation
• A base class that implements logic not specific to
data stores
• For each data access API, implement a class that
extends the base class
Store bid
User
(state
parameters)State manipulation
Data access
Data access
(JPA)
Data access
(EntityDB)
Entities
Base class
37. Example Benchmarks
Micro-benchmark
• Item types based on user
access pattern
– personal, shared, public,
message items
• Set of data access
interactions
Application-level benchmark
• Auction benchmark
• Similar to existing SQL
benchmarks
– AuctionMark (OLTP-Bench)
– RUBiS
• Customized for entity
group transactions
Two data access implementations: EntityDB, JPA
39. XML-based Configuration
Description Language
• Lets a document extend (inherit) other template documents
(of components) to compose a complex system
• Enhances reproducibility of experiments
• Released separately: https://github.com/tatemura/congenio
XML
XML XML XML
XML XML
XML
XML
XML
XMLData Stores
HBaseOmidMongoDB
Experiment set
State transitions
Workload mix
generate
Experiment #0
Experiment #1
Experiment #2
XML
Servers
extend
extend
extend
41. Code Reuse:
For Each NoSQL System
TKVS HBase Omid MongoDB TokuMX
LOC 3130 796 454 680 507
Classes 36 6 4 4 4
Transactional KVS
Implementation
NoSQL (HBase, MongoDB,…)
Native
data
access
(NoSQL)
Native
Impl.
Entity DB API
Transactional KVS
API
TKVS
Implementations
Line-of-Code (LOC)
Common part : ~3000
NoSQL specific part :500~800
42. Code Reuse:
For Each Benchmark
LOC (Class) Entities Parameters Base
Interactions
EntityDB
Data Access
JPA
Data Access
Auction 943 (9) 202 (3) 1346 (17) 1090 (18) 1043 (17)
Micro 681 (8) 212 (4) 1004 (19) 931 (19) 985 (19)
NoSQL (HBase, MongoDB,…)
data
access
(NoSQL)
SQL (MySQL, DB-X,…)
Entity DB API
SessionWorkload Framework
JPA
Data access (Entity DB)
Data access
(JPA)
Benchmark application logic
+ XML configuration documents to define state transition
Separation of
concerns: implement
only data access part
as required
Small classes as many as
interactions
44. Demo Scenarios
1. Scale-out comparison with simple workloads
2. HBase vs. Omid (transaction server or not)
3. MongoDB vs.TokuMX (concurrency control)
4. SQL vs. NoSQL with application-level
workloads
45. Demo 1: Scalability on simple
workloads
• Transactions without conflict
• Max throughput on different systems with
different number of servers
– Micro-benchmark: update 4 personal items in the
same group (= same user) x 1600 session
concurrency
– # servers: NoSQL: 3,5,10 / MySQL: 1
47. SQL vs. NoSQL
1 Node MySQL3 Node HBase
RDBMS seems efficient even for simple
(transactional) put/get workloads
Winner will depend on other application needs (max throughput,
elasticity, availability, budget…)
48. HBase vs. Omid
Transaction Server or not
Omid is scalable but overhead is significant for simple workloads
49. Demo 2:
When to use aTransaction Server?
• [obvious] when transactions cannot be divided by
groups
• [in general] when group granularity is large
TXN TXN TXN TXN TXN TXN TXN TXN TXN
Consider: transaction that updates 1 item
HBase implementation (check-and-update) can only allow sequential update in one group
50. Demo 2:
When to use aTransaction Server?
• Micro-benchmark: update 1 shared item x 3200 concurrent
sessions
• 8oK items divided into 200, 2K, 20K groups
TXN TXN TXN TXN TXN TXN TXN TXN TXN
52. Demo 2: Implications
• HBase or Omid depends on application needs
– Combined approach may be ideal – but using
these two approaches on the same data is not
trivial
• Suggested approach
– Configure micro-benchmark to mimic the
applications access pattern
– Develop application-level benchmark for further
insights
53. Demo 3: Optimistic vs. Pessimistic
Concurrency Control
• Optimistic CC with MongoDB vs. Pessimistic CC
withTokuMX
• Micro-benchmark: update 4 items in a (randomly
chosen) group (out of 3200 groups) x 3200
concurrent sessions
• [A] no-conflict: 400 personal items per group
• [B] mild-conflict: 400 shared items per group
• [C] heavy-conflict: 40 shared items per group
Well-known rule-of-thumb: “use pessimistic CC when conflict is
frequent”
55. What is going on?
• TokuMX version suffers from deadlock
• Deadlock causes failure on conflicting
transactions no progress
– It requires retrying with back-off to proceed
• A simple check-and-update approach (on
MongoDB) lets one conflicting transaction be
successful progress
– A transaction can retry more agressively
56. Demo 3: Implications
• A common practice in a loosely-coupled distributed
environment is to use optimistic CC (non-blocking)
– It seems true for our NoSQL transaction case
• Pessimistic CC should be used carefully as a final
resort
– In SQL, RDBMS has more control to how to execute
multi-record read/write. It also uses more sophisticated
lock management. In NoSQL, it is often the application’s
responsibility
59. Closer Look at ResponseTime
• Measure interaction response time when a server
is not overloaded (200 concurrent sessions)
• 2 Read-write transactions
– sell-auction-item, store-bid
• 3 read-only transactions
– view-auction-items-by-seller, view-bids-by-bidder,
view-winning-bids-by-bidder
61. Execution Costs
HBase EDB MySQL EDB MySQL JPA
Sell-auction-item 1 row update 1 row insertion 1 row insertion
Store-bid 3 row updates
(secondary index,
key-generation)
1 row insertion 1 row insertion
View-auction-
items-by-seller
Get index + get
item x N
Select item by
seller ID
Select item by
seller ID
View-bids-by-
bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join (item
and bid)
View-winning-bids-
by-bidder
Get index + get bid
x N + get item x N
Select bids by
bidder ID + get item
x N
2 table join with
selection
MySQL EntityDB
(single table SELECT)
MySQL JPA
(JOIN)
HBase EntityDB
(key-value gets)
62. Demo 4: Implications
• Distribution does not come for free…
• Applications may need more efficient
secondary-key entity retrieval
– Parallelize get operations (generic
implementation)
– Explore index implementation specific to a
particular NoSQL system (use its specific feature)
• The Strudel framework should be useful to
test various solutions
64. Future Extensions: Entity DB API
• Multi-group transactions
• JPA one-to-many relationship
– Retrieve parent-child entities together
– Opportunity for the underlying NoSQL to map
parent-child entities into nested data for better
performance
65. Future Extensions:
Implementations
• EntityDB Implementation toolkit beyond the
genericTransactional KVS
– Various indexing solutions
– Various data mappings (e.g. nesting)
• Native implementations (e.g., HBase)
– EntityDB for HBase
– Auction benchmark for HBase
66. Conclusion
• SQL or NoSQL decision involves various
trade-off specific to applications’ needs
• Performance experiments should be tailored
for such specific needs
• The Strudel provides a framework to develop,
reuse, and share performance experiments