Your SlideShare is downloading. ×
Can the elephants handle the no sql onslaught
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Can the elephants handle the no sql onslaught

478

Published on

Presentation of a decent paper from Jim Grays Lab

Presentation of a decent paper from Jim Grays Lab

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
478
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CAN THE ELEPHANTS HANDLETHE NO-SQL ONSLAUGHT?AUNG THU RHA HEING5537871
  • 2. OUTLINE Introduction Background Evaluation  Traditional DSS Workload: Hive vs PDW  Modern OLTP Workload: MongoDB vs SQL Server Discussion & Conclusion
  • 3. INTRODUCTION MotivationHow does the performance and scalability of RDBMs solutions compareto the NoSQL systems? Propositioncompare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD,and analyze the performance and scalability aspects on two workloads(decision support analysis and interactive data-serving). Use YCSB and TPC-H DSS benchmarks respectively
  • 4. BACKGROUND Parallel Data Warehouse (PDW)  shared-nothing parallel database system built on top of SQL Server  multiple compute nodes, a single control node and other administrative service nodes. Hive  an open-source data warehouse built on top of Hadoop  a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL
  • 5. BACKGROUND(CONT.) MongoDB Features  a document-oriented storage layer, indexing in the form of B- trees, auto-sharding, asynchronous replication of data between servers.  Data stored in collections which contain documents  Each document is serialized using BSON For implementation, it is created two types of MongoDB servers:  MongoDB-CS (with client-side sharding )  MongoDB-AS (Auto-Sharding)
  • 6. EVALUATION Make hardware and software configuration for all four systems For PDW and Hive, use 8 disks to store the data For YCSB benchmark, 8 nodes are used as servers and another 8 for client-benchmarksHive and Hadoop Use RCFile format to store data All TPC-H tables are stored in Gzip RcCile format
  • 7. TRADITIONAL DSS WORKLOAD:HIVE VS PDWWorkload Description use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs) TPC-H generator doesn’t produce correct result at 16000 scale Executed all 22 TPC-H queries But leave 2 TPC-H refresh functions
  • 8. TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Layout inHive and PDW
  • 9. TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Preparation and Load TimesHive Generated dataset across 16 nodes Create one hive table for each TPC-H table Data is loaded in 2 phases:  data files loaded onto each node  data is converted from text to RCfile format.PDW Load data into landed node Create necessary tables
  • 10. TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis
  • 11. TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis(cont.) PDW is faster than Hive in for all TPC-H queries The average speedup of PDW over Hive is greater for small datasets  Hive has high overheads for small datasets.Scalability Analysis Hive scales better than PDW Hive scales well as the dataset size increases.
  • 12. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERWorkload descriptionExtends YCSB into 2 ways: added support for multiple instances on many database servers Supports for Stored procedures in YCSB JBDC driverran the YCSB benchmark on a database that consists of 640 million records
  • 13. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERData Preparation Mongo-AS can automatically manage the shards by using a “balancer” process The loading time for SQL-CS and Mongo-CS was 146 and 45 minutes respectively SQL load time take longer because a bulk insert method was not used
  • 14. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERExperimental Evaluation“Read-Only” workload
  • 15. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Update Workload
  • 16. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER50% Read &50% Update workload
  • 17. MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Append Workload
  • 18. DISCUSSION & CONCLUSION This evaluation shows that NoSQL systems are still behind RDBMS in performance. PDW is also 9 times faster than Hive running TPC-H at 16TB scale SQL-CS was able to achieve higher throughput than MongoDB
  • 19. AUTHORS Avrilia FloratouUniversity of Wisconsin-Madison Nikhil TeletiaMicrosoft Jim Gray Systems Lab David J. DeWittMicrosoft Jim Gray Systems Lab Jignesh M. PatelUniversity of Wisconsin-Madison Donghui ZhangParadigm4

×