CAN THE ELEPHANTS HANDLETHE NO-SQL ONSLAUGHT?AUNG THU RHA HEING5537871
OUTLINE Introduction Background Evaluation    Traditional DSS Workload: Hive vs PDW    Modern OLTP Workload: MongoDB ...
INTRODUCTION MotivationHow does the performance and scalability of RDBMs solutions compareto the NoSQL systems? Proposit...
BACKGROUND Parallel Data Warehouse (PDW)    shared-nothing parallel database system built on top of SQL     Server    m...
BACKGROUND(CONT.) MongoDB  Features   a document-oriented storage layer, indexing in the form of B-    trees, auto-shard...
EVALUATION Make hardware and software configuration for all four systems For PDW and Hive, use 8 disks to store the data...
TRADITIONAL DSS WORKLOAD:HIVE VS PDWWorkload Description use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs) TPC-...
TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Layout inHive and PDW
TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Preparation and Load TimesHive Generated dataset across 16 nodes Create one hiv...
TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis
TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis(cont.) PDW is faster than Hive in for all TPC-H queries The ave...
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERWorkload descriptionExtends YCSB into 2 ways: added support for multiple instan...
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERData Preparation Mongo-AS can automatically manage the shards by using a  “bala...
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERExperimental Evaluation“Read-Only” workload
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Update Workload
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER50% Read &50% Update workload
MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Append Workload
DISCUSSION & CONCLUSION This evaluation shows that NoSQL systems are still behind RDBMS in  performance. PDW is also 9 t...
AUTHORS Avrilia FloratouUniversity of Wisconsin-Madison Nikhil TeletiaMicrosoft Jim Gray Systems Lab David J. DeWittMic...
Upcoming SlideShare
Loading in...5
×

Can the elephants handle the no sql onslaught

508

Published on

Presentation of a decent paper from Jim Grays Lab

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
508
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Can the elephants handle the no sql onslaught

  1. 1. CAN THE ELEPHANTS HANDLETHE NO-SQL ONSLAUGHT?AUNG THU RHA HEING5537871
  2. 2. OUTLINE Introduction Background Evaluation  Traditional DSS Workload: Hive vs PDW  Modern OLTP Workload: MongoDB vs SQL Server Discussion & Conclusion
  3. 3. INTRODUCTION MotivationHow does the performance and scalability of RDBMs solutions compareto the NoSQL systems? Propositioncompare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD,and analyze the performance and scalability aspects on two workloads(decision support analysis and interactive data-serving). Use YCSB and TPC-H DSS benchmarks respectively
  4. 4. BACKGROUND Parallel Data Warehouse (PDW)  shared-nothing parallel database system built on top of SQL Server  multiple compute nodes, a single control node and other administrative service nodes. Hive  an open-source data warehouse built on top of Hadoop  a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL
  5. 5. BACKGROUND(CONT.) MongoDB Features  a document-oriented storage layer, indexing in the form of B- trees, auto-sharding, asynchronous replication of data between servers.  Data stored in collections which contain documents  Each document is serialized using BSON For implementation, it is created two types of MongoDB servers:  MongoDB-CS (with client-side sharding )  MongoDB-AS (Auto-Sharding)
  6. 6. EVALUATION Make hardware and software configuration for all four systems For PDW and Hive, use 8 disks to store the data For YCSB benchmark, 8 nodes are used as servers and another 8 for client-benchmarksHive and Hadoop Use RCFile format to store data All TPC-H tables are stored in Gzip RcCile format
  7. 7. TRADITIONAL DSS WORKLOAD:HIVE VS PDWWorkload Description use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs) TPC-H generator doesn’t produce correct result at 16000 scale Executed all 22 TPC-H queries But leave 2 TPC-H refresh functions
  8. 8. TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Layout inHive and PDW
  9. 9. TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Preparation and Load TimesHive Generated dataset across 16 nodes Create one hive table for each TPC-H table Data is loaded in 2 phases:  data files loaded onto each node  data is converted from text to RCfile format.PDW Load data into landed node Create necessary tables
  10. 10. TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis
  11. 11. TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis(cont.) PDW is faster than Hive in for all TPC-H queries The average speedup of PDW over Hive is greater for small datasets  Hive has high overheads for small datasets.Scalability Analysis Hive scales better than PDW Hive scales well as the dataset size increases.
  12. 12. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERWorkload descriptionExtends YCSB into 2 ways: added support for multiple instances on many database servers Supports for Stored procedures in YCSB JBDC driverran the YCSB benchmark on a database that consists of 640 million records
  13. 13. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERData Preparation Mongo-AS can automatically manage the shards by using a “balancer” process The loading time for SQL-CS and Mongo-CS was 146 and 45 minutes respectively SQL load time take longer because a bulk insert method was not used
  14. 14. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERExperimental Evaluation“Read-Only” workload
  15. 15. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Update Workload
  16. 16. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER50% Read &50% Update workload
  17. 17. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Append Workload
  18. 18. DISCUSSION & CONCLUSION This evaluation shows that NoSQL systems are still behind RDBMS in performance. PDW is also 9 times faster than Hive running TPC-H at 16TB scale SQL-CS was able to achieve higher throughput than MongoDB
  19. 19. AUTHORS Avrilia FloratouUniversity of Wisconsin-Madison Nikhil TeletiaMicrosoft Jim Gray Systems Lab David J. DeWittMicrosoft Jim Gray Systems Lab Jignesh M. PatelUniversity of Wisconsin-Madison Donghui ZhangParadigm4

×