JackHare- a framework for SQL to NoSQL translation using MapReduce
Upcoming SlideShare
Loading in...5
×
 

JackHare- a framework for SQL to NoSQL translation using MapReduce

on

  • 547 views

20131022論文報告

20131022論文報告

Statistics

Views

Total Views
547
Views on SlideShare
547
Embed Views
0

Actions

Likes
4
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

JackHare- a framework for SQL to NoSQL translation using MapReduce JackHare- a framework for SQL to NoSQL translation using MapReduce Presentation Transcript

  • JackHare a framework for SQL to NoSQL translation using MapReduce Wu-Chun Chung·Hung-Pin Lin· Shih-Chang Chen·Mon-Fong Jiang· Yeh-Ching Chung Received: 15 December 2012 / Accepted: 6 September 2013 © Springer Science+Business Media New York 2013 Presented by 康志強 2013.10.22 1
  • Outline • Introduction • Related work • The JackHare framework architecture • Unstructured data processing in HBase • Experimental results • Conclusions 2
  • Introduction • BigData 的問題 (massive data) – 資料的存取速度 – 資料合併的問題 平行處理時資料的即時性、正確性。 • Hadoop MapReduce – to process the massive data in parallel. • Hadoop distributed file system – difficult to update data frequently 3 View slide
  • Introduction • Hbase – to place the data over a scale-out storage system – to manipulate the changeable data in a transparent way – the Hbase interface is not friendly • JackHare – 遵守ANSI-SQL和JDBC-4.0規格的API,用來操作 Apache Hbase – using MapReduce framework for processing the unstructured data in HBase 4 View slide
  • Introduction • 資料的存取速度 – 1990, 硬碟可存1,370M,傳輸速度4.4MB/s – 現在,1 TB,傳輸速度 100MB/s – 平行進行資料讀取及寫入,加快速度 • Hadoop Distributed File System – difficult to update data frequently in such file system 5
  • Introduction • 資料合併的問題 – 正確性 • MapReduce – 分散式程式框架 – Map就是將一個工作分到多個Node – Reduce就是將各個Node的結果再重新結合成最後 的結果 – 資料本地化 – 運用高階的查詢語言 (Pig, Hive) 6
  • Introduction • MapReduce 7
  • Introduction • Hbase – 架構在HDFS上的分散式資料庫 – 使用列 (row) 和行 (column) 為索引存取資料值 – 每一筆資料都有一個時間戳記 (timestamp),因此 同一個欄位可依不同時間存在多筆資料。 (Version) – HBase的資料表 (table) 是由許多row及數個column family組成 – 可供MapReduce的程式當作資料來源或儲存媒介 8
  • Introduction • Hbase 9
  • Introduction • NoSQL資料庫 • http://www.ithome.com.tw/itadm/article.php?c=6336 0&s=5 10
  • Introduction • JackHare – allowing users to use the ANSI-SQL queries to manipulate large-scale data – 遵守ANSI-SQL和JDBC-4.0規格的API,用來操作 Apache Hbase – using MapReduce framework for processing the unstructured data in Hbase 11
  • Related work • Pig – HDFS 與 MapReduce 叢集環境中執行 – Pig Latin - a simpler procedural language – http://pig.apache.org/docs/r0.12.0/basic.html#nest edblock • Hive – 提供類似SQL的查詢語言來查詢資料(HiveQL) – 可管理HDFS的資料 – https://cwiki.apache.org/confluence/display/Hive/T utorial 12
  • Related work • YSmart – An SQL-to-MapReduce Translator – http://ysmart.cse.ohio-state.edu/ • S2MART – Smart Sql to Map-Reduce Translators 13
  • Related work • HadoopDB – An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical – HadoopDB provides SQL query via a translation called SQL-MR-SQL (SMS), based on Hive. – http://db.cs.yale.edu/hadoopdb/hadoopdb.html • Clydesdale – structured data processing on MapReduce – focuses on processing the data fitting a star schema 14
  • Related work • SQL查詢轉換為MapReduce • Hbase – 滿足頻繁的數據更新 – 維持NoSQL數據庫的可擴展性和可靠性 15
  • The JackHare framework architecture 16
  • The JackHare framework architecture • User submits an ANSI-SQL query by SQL client application. • The compiler scans and parses the ANSI-SQL query. • Lookup the related table name, column families and column qualifier of HBase. • Generate MapReduce code according to the query commands and metadata. 17
  • The JackHare framework architecture • Access HBase and execute the MapReduce job. • The results wrapped back from the back-end. • The returned results are shown on SQL client application according to RDB schema. 18
  • The JackHare framework architecture SQuirreL 19
  • Unstructured data processing in HBase • remap the data in relational database to HBase 20
  • Unstructured data processing in HBase • remap the data in relational database to HBase 21
  • Unstructured data processing in HBase • Analysis of SQL clauses – SELECT, FROM and WHERE clauses – Extended clauses • • • • • GROUP BY HAVING ORDER BY JOIN AGGREGATE FUNCTIONs 22
  • Experimental results • Experimental environment – two Intel Xeon L5640 CPU, 24 GB ram and 3 TB HD – 16-node virtual machine cluster on four physical machines – Hadoop 0.20.203 (15 October, 2013: release 2.2.0 available) – Hbase 0.92.0 (2013-09-20 | Version: 0.97.0-SNAPSHOT) – Hive 0.9.0 – JAVA 1.6.0, maximum heap size is 512 MB 23
  • Experimental results • Experimental environment – Node : two cores at 2 GHz with 4 GB ram and 400 GB storage space – MySQL : two cores at 2 GHz, 4 GB ram and – 800 GB hard disk – 3 Table : LOT, WAFER and DIE 24
  • Experimental results • Results 25
  • Experimental results 26
  • Experimental results 27
  • Experimental results 28
  • Conclusions 29
  • • 報告完畢…. 30