• Save
Can the elephants handle the no sql onslaught
Upcoming SlideShare
Loading in...5
×
 

Can the elephants handle the no sql onslaught

on

  • 755 views

Presentation of a decent paper from Jim Grays Lab

Presentation of a decent paper from Jim Grays Lab

Statistics

Views

Total Views
755
Views on SlideShare
755
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Can the elephants handle the no sql onslaught Can the elephants handle the no sql onslaught Presentation Transcript

  • CAN THE ELEPHANTS HANDLETHE NO-SQL ONSLAUGHT?AUNG THU RHA HEING5537871
  • OUTLINE Introduction Background Evaluation  Traditional DSS Workload: Hive vs PDW  Modern OLTP Workload: MongoDB vs SQL Server Discussion & Conclusion
  • INTRODUCTION MotivationHow does the performance and scalability of RDBMs solutions compareto the NoSQL systems? Propositioncompare MongoDB(AS/CS) with SQL Server and Hive with SQL PWD,and analyze the performance and scalability aspects on two workloads(decision support analysis and interactive data-serving). Use YCSB and TPC-H DSS benchmarks respectively
  • BACKGROUND Parallel Data Warehouse (PDW)  shared-nothing parallel database system built on top of SQL Server  multiple compute nodes, a single control node and other administrative service nodes. Hive  an open-source data warehouse built on top of Hadoop  a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL
  • BACKGROUND(CONT.) MongoDB Features  a document-oriented storage layer, indexing in the form of B- trees, auto-sharding, asynchronous replication of data between servers.  Data stored in collections which contain documents  Each document is serialized using BSON For implementation, it is created two types of MongoDB servers:  MongoDB-CS (with client-side sharding )  MongoDB-AS (Auto-Sharding)
  • EVALUATION Make hardware and software configuration for all four systems For PDW and Hive, use 8 disks to store the data For YCSB benchmark, 8 nodes are used as servers and another 8 for client-benchmarksHive and Hadoop Use RCFile format to store data All TPC-H tables are stored in Gzip RcCile format
  • TRADITIONAL DSS WORKLOAD:HIVE VS PDWWorkload Description use TPC-H at 4 scale factors (250,500,1000,4000,16000 GBs) TPC-H generator doesn’t produce correct result at 16000 scale Executed all 22 TPC-H queries But leave 2 TPC-H refresh functions
  • TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Layout inHive and PDW
  • TRADITIONAL DSS WORKLOAD:HIVE VS PDWData Preparation and Load TimesHive Generated dataset across 16 nodes Create one hive table for each TPC-H table Data is loaded in 2 phases:  data files loaded onto each node  data is converted from text to RCfile format.PDW Load data into landed node Create necessary tables
  • TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis
  • TRADITIONAL DSS WORKLOAD:HIVE VS PDWPerformance Analysis(cont.) PDW is faster than Hive in for all TPC-H queries The average speedup of PDW over Hive is greater for small datasets  Hive has high overheads for small datasets.Scalability Analysis Hive scales better than PDW Hive scales well as the dataset size increases.
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERWorkload descriptionExtends YCSB into 2 ways: added support for multiple instances on many database servers Supports for Stored procedures in YCSB JBDC driverran the YCSB benchmark on a database that consists of 640 million records
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERData Preparation Mongo-AS can automatically manage the shards by using a “balancer” process The loading time for SQL-CS and Mongo-CS was 146 and 45 minutes respectively SQL load time take longer because a bulk insert method was not used
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVERExperimental Evaluation“Read-Only” workload
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Update Workload
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER50% Read &50% Update workload
  • MODERN OLTP WORKLOAD:MONGODB VS SQL SERVER95% Read5% Append Workload
  • DISCUSSION & CONCLUSION This evaluation shows that NoSQL systems are still behind RDBMS in performance. PDW is also 9 times faster than Hive running TPC-H at 16TB scale SQL-CS was able to achieve higher throughput than MongoDB
  • AUTHORS Avrilia FloratouUniversity of Wisconsin-Madison Nikhil TeletiaMicrosoft Jim Gray Systems Lab David J. DeWittMicrosoft Jim Gray Systems Lab Jignesh M. PatelUniversity of Wisconsin-Madison Donghui ZhangParadigm4