SlideShare a Scribd company logo
CAN THE ELEPHANTS HANDLE
THE NO-SQL ONSLAUGHT?


AUNG THU RHA HEIN
G5537871
OUTLINE
 Introduction
 Background
 Evaluation
    Traditional DSS Workload: Hive vs PDW
    Modern OLTP Workload: MongoDB vs SQL Server
 Discussion & Conclusion
INTRODUCTION

 Motivation
How does the performance and scalability of RDBMs solutions compare
to the NoSQL systems?
 Proposition
compare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD,
and analyze the performance and scalability aspects on two workloads
(decision support analysis and interactive data-serving).
 Use YCSB and TPC-H DSS benchmarks respectively
BACKGROUND
 Parallel Data Warehouse (PDW)
    shared-nothing parallel database system built on top of SQL
     Server
    multiple compute nodes, a single control node and other
     administrative service nodes.
 Hive
    an open-source data warehouse built on top of Hadoop
    a structured data model for data that is stored in the Hadoop
     Distributed Filesystem (HDFS), and a SQL-like declarative query
     language called HiveQL
BACKGROUND(CONT.)
 MongoDB
  Features

   a document-oriented storage layer, indexing in the form of B-
    trees, auto-sharding, asynchronous replication of data between
    servers.
   Data stored in collections which contain documents
   Each document is serialized using BSON

  For implementation, it is created two types of MongoDB servers:
             MongoDB-CS (with client-side sharding )
             MongoDB-AS (Auto-Sharding)
EVALUATION
 Make hardware and software configuration for all four systems
 For PDW and Hive, use 8 disks to store the data
 For YCSB benchmark, 8 nodes are used as servers and another 8 for
  client-benchmarks
Hive and Hadoop
 Use RCFile format to store data
 All TPC-H tables are stored in Gzip RcCile format
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Workload Description
 use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs)
 TPC-H generator doesn’t produce correct result at 16000 scale
 Executed all 22 TPC-H queries
 But leave 2 TPC-H refresh functions
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Data Layout in
Hive and PDW
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Data Preparation and Load Times
Hive
 Generated dataset across 16 nodes
 Create one hive table for each TPC-H table
 Data is loaded in 2 phases:
    data files loaded onto each node
    data is converted from text to RCfile format.
PDW
 Load data into landed node
 Create necessary tables
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Performance Analysis
TRADITIONAL DSS WORKLOAD:
HIVE VS PDW
Performance Analysis(cont.)
 PDW is faster than Hive in for all TPC-H queries
 The average speedup of PDW over Hive is greater for small datasets
     Hive has high overheads for small datasets.

Scalability Analysis
 Hive scales better than PDW
 Hive scales well as the dataset size increases.
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Workload description




Extends YCSB into 2 ways:
 added support for multiple instances on many database servers
 Supports for Stored procedures in YCSB JBDC driver
ran the YCSB benchmark on a database that consists of 640 million records
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Data Preparation
 Mongo-AS can automatically manage the shards by using a
  “balancer” process
 The loading time for SQL-CS and Mongo-CS was 146 and 45
   minutes respectively
 SQL load time take longer because a bulk insert method was not
  used
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER
Experimental Evaluation


“Read-Only” workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

95% Read
5% Update Workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

50% Read &
50% Update workload
MODERN OLTP WORKLOAD:
MONGODB VS SQL SERVER

95% Read
5% Append Workload
DISCUSSION & CONCLUSION
 This evaluation shows that NoSQL systems are still behind RDBMS in
  performance.
 PDW is also 9 times faster than Hive running TPC-H at 16TB scale
   SQL-CS was able to achieve higher throughput than MongoDB
AUTHORS
 Avrilia Floratou
University of Wisconsin-Madison
 Nikhil Teletia
Microsoft Jim Gray Systems Lab
 David J. DeWitt
Microsoft Jim Gray Systems Lab
 Jignesh M. Patel
University of Wisconsin-Madison
 Donghui Zhang
Paradigm4

More Related Content

What's hot

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-Boyang Niu
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
RubiX
RubiXRubiX
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
Mydbops
 
Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
Syed Hadoop
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Databricks
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
Omid Vahdaty
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
Alluxio, Inc.
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
Thai Bui
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
TO THE NEW | Technology
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio, Inc.
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
ScyllaDB
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
Alluxio, Inc.
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation confShridhar Joshi
 

What's hot (20)

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
RubiX
RubiXRubiX
RubiX
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
 

Similar to Can the elephants handle the no sql onslaught

It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?Srihari Srinivasan
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
Hive
HiveHive
Apache drill
Apache drillApache drill
Apache drill
Jakub Pieprzyk
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
Nicolas Poggi
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Andrew Brust
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
Khalid Salama
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
Neil Mackenzie
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
Principled Technologies
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
PAVANKUMARNOOKALA
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Nosql Introduction, Basics
Nosql Introduction, BasicsNosql Introduction, Basics
Nosql Introduction, Basics
Camellia Ghoroghi
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
Continuent
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
NETWAYS
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 

Similar to Can the elephants handle the no sql onslaught (20)

It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Hive
HiveHive
Hive
 
Apache drill
Apache drillApache drill
Apache drill
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
מיכאל
מיכאלמיכאל
מיכאל
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Nosql Introduction, Basics
Nosql Introduction, BasicsNosql Introduction, Basics
Nosql Introduction, Basics
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

More from Aung Thu Rha Hein

Writing with ease
Writing with easeWriting with ease
Writing with ease
Aung Thu Rha Hein
 
Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists
Aung Thu Rha Hein
 
Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)
Aung Thu Rha Hein
 
Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)
Aung Thu Rha Hein
 
Private Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityPrivate Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic Opportunity
Aung Thu Rha Hein
 
Network switching
Network switchingNetwork switching
Network switching
Aung Thu Rha Hein
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research Challenge
Aung Thu Rha Hein
 
Survey & Review of Digital Forensic
Survey & Review of Digital ForensicSurvey & Review of Digital Forensic
Survey & Review of Digital Forensic
Aung Thu Rha Hein
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression Verification
Aung Thu Rha Hein
 
CRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web ApplicationsCRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web Applications
Aung Thu Rha Hein
 
Botnets 101
Botnets 101Botnets 101
Botnets 101
Aung Thu Rha Hein
 
Session initiation protocol
Session initiation protocolSession initiation protocol
Session initiation protocol
Aung Thu Rha Hein
 
TPC-H in MongoDB
TPC-H in MongoDBTPC-H in MongoDB
TPC-H in MongoDB
Aung Thu Rha Hein
 
Web application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresWeb application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresAung Thu Rha Hein
 
Cloud computing security
Cloud computing securityCloud computing security
Cloud computing security
Aung Thu Rha Hein
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessment
Aung Thu Rha Hein
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
Aung Thu Rha Hein
 
Chat bot analysis
Chat bot analysisChat bot analysis
Chat bot analysis
Aung Thu Rha Hein
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
Aung Thu Rha Hein
 

More from Aung Thu Rha Hein (19)

Writing with ease
Writing with easeWriting with ease
Writing with ease
 
Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists Bioinformatics for Computer Scientists
Bioinformatics for Computer Scientists
 
Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)Analysis of hybrid image with FFT (Fast Fourier Transform)
Analysis of hybrid image with FFT (Fast Fourier Transform)
 
Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)Introduction to Common Weakness Enumeration (CWE)
Introduction to Common Weakness Enumeration (CWE)
 
Private Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityPrivate Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic Opportunity
 
Network switching
Network switchingNetwork switching
Network switching
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research Challenge
 
Survey & Review of Digital Forensic
Survey & Review of Digital ForensicSurvey & Review of Digital Forensic
Survey & Review of Digital Forensic
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression Verification
 
CRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web ApplicationsCRAXweb: Automatic Exploit Generation for Web Applications
CRAXweb: Automatic Exploit Generation for Web Applications
 
Botnets 101
Botnets 101Botnets 101
Botnets 101
 
Session initiation protocol
Session initiation protocolSession initiation protocol
Session initiation protocol
 
TPC-H in MongoDB
TPC-H in MongoDBTPC-H in MongoDB
TPC-H in MongoDB
 
Web application security: Threats & Countermeasures
Web application security: Threats & CountermeasuresWeb application security: Threats & Countermeasures
Web application security: Threats & Countermeasures
 
Cloud computing security
Cloud computing securityCloud computing security
Cloud computing security
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessment
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
 
Chat bot analysis
Chat bot analysisChat bot analysis
Chat bot analysis
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

Can the elephants handle the no sql onslaught

  • 1. CAN THE ELEPHANTS HANDLE THE NO-SQL ONSLAUGHT? AUNG THU RHA HEIN G5537871
  • 2. OUTLINE  Introduction  Background  Evaluation  Traditional DSS Workload: Hive vs PDW  Modern OLTP Workload: MongoDB vs SQL Server  Discussion & Conclusion
  • 3. INTRODUCTION  Motivation How does the performance and scalability of RDBMs solutions compare to the NoSQL systems?  Proposition compare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD, and analyze the performance and scalability aspects on two workloads (decision support analysis and interactive data-serving).  Use YCSB and TPC-H DSS benchmarks respectively
  • 4. BACKGROUND  Parallel Data Warehouse (PDW)  shared-nothing parallel database system built on top of SQL Server  multiple compute nodes, a single control node and other administrative service nodes.  Hive  an open-source data warehouse built on top of Hadoop  a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL
  • 5. BACKGROUND(CONT.)  MongoDB Features  a document-oriented storage layer, indexing in the form of B- trees, auto-sharding, asynchronous replication of data between servers.  Data stored in collections which contain documents  Each document is serialized using BSON For implementation, it is created two types of MongoDB servers:  MongoDB-CS (with client-side sharding )  MongoDB-AS (Auto-Sharding)
  • 6. EVALUATION  Make hardware and software configuration for all four systems  For PDW and Hive, use 8 disks to store the data  For YCSB benchmark, 8 nodes are used as servers and another 8 for client-benchmarks Hive and Hadoop  Use RCFile format to store data  All TPC-H tables are stored in Gzip RcCile format
  • 7. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Workload Description  use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs)  TPC-H generator doesn’t produce correct result at 16000 scale  Executed all 22 TPC-H queries  But leave 2 TPC-H refresh functions
  • 8. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Data Layout in Hive and PDW
  • 9. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Data Preparation and Load Times Hive  Generated dataset across 16 nodes  Create one hive table for each TPC-H table  Data is loaded in 2 phases:  data files loaded onto each node  data is converted from text to RCfile format. PDW  Load data into landed node  Create necessary tables
  • 10. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Performance Analysis
  • 11. TRADITIONAL DSS WORKLOAD: HIVE VS PDW Performance Analysis(cont.)  PDW is faster than Hive in for all TPC-H queries  The average speedup of PDW over Hive is greater for small datasets  Hive has high overheads for small datasets. Scalability Analysis  Hive scales better than PDW  Hive scales well as the dataset size increases.
  • 12. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Workload description Extends YCSB into 2 ways:  added support for multiple instances on many database servers  Supports for Stored procedures in YCSB JBDC driver ran the YCSB benchmark on a database that consists of 640 million records
  • 13. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Data Preparation  Mongo-AS can automatically manage the shards by using a “balancer” process  The loading time for SQL-CS and Mongo-CS was 146 and 45 minutes respectively  SQL load time take longer because a bulk insert method was not used
  • 14. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER Experimental Evaluation “Read-Only” workload
  • 15. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 95% Read 5% Update Workload
  • 16. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 50% Read & 50% Update workload
  • 17. MODERN OLTP WORKLOAD: MONGODB VS SQL SERVER 95% Read 5% Append Workload
  • 18. DISCUSSION & CONCLUSION  This evaluation shows that NoSQL systems are still behind RDBMS in performance.  PDW is also 9 times faster than Hive running TPC-H at 16TB scale  SQL-CS was able to achieve higher throughput than MongoDB
  • 19. AUTHORS  Avrilia Floratou University of Wisconsin-Madison  Nikhil Teletia Microsoft Jim Gray Systems Lab  David J. DeWitt Microsoft Jim Gray Systems Lab  Jignesh M. Patel University of Wisconsin-Madison  Donghui Zhang Paradigm4