Your SlideShare is downloading. ×
JackHare- a framework for SQL to NoSQL translation using MapReduce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

JackHare- a framework for SQL to NoSQL translation using MapReduce

299
views

Published on

20131022論文報告

20131022論文報告

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
299
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. JackHare a framework for SQL to NoSQL translation using MapReduce Wu-Chun Chung·Hung-Pin Lin· Shih-Chang Chen·Mon-Fong Jiang· Yeh-Ching Chung Received: 15 December 2012 / Accepted: 6 September 2013 © Springer Science+Business Media New York 2013 Presented by 康志強 2013.10.22 1
  • 2. Outline • Introduction • Related work • The JackHare framework architecture • Unstructured data processing in HBase • Experimental results • Conclusions 2
  • 3. Introduction • BigData 的問題 (massive data) – 資料的存取速度 – 資料合併的問題 平行處理時資料的即時性、正確性。 • Hadoop MapReduce – to process the massive data in parallel. • Hadoop distributed file system – difficult to update data frequently 3
  • 4. Introduction • Hbase – to place the data over a scale-out storage system – to manipulate the changeable data in a transparent way – the Hbase interface is not friendly • JackHare – 遵守ANSI-SQL和JDBC-4.0規格的API,用來操作 Apache Hbase – using MapReduce framework for processing the unstructured data in HBase 4
  • 5. Introduction • 資料的存取速度 – 1990, 硬碟可存1,370M,傳輸速度4.4MB/s – 現在,1 TB,傳輸速度 100MB/s – 平行進行資料讀取及寫入,加快速度 • Hadoop Distributed File System – difficult to update data frequently in such file system 5
  • 6. Introduction • 資料合併的問題 – 正確性 • MapReduce – 分散式程式框架 – Map就是將一個工作分到多個Node – Reduce就是將各個Node的結果再重新結合成最後 的結果 – 資料本地化 – 運用高階的查詢語言 (Pig, Hive) 6
  • 7. Introduction • MapReduce 7
  • 8. Introduction • Hbase – 架構在HDFS上的分散式資料庫 – 使用列 (row) 和行 (column) 為索引存取資料值 – 每一筆資料都有一個時間戳記 (timestamp),因此 同一個欄位可依不同時間存在多筆資料。 (Version) – HBase的資料表 (table) 是由許多row及數個column family組成 – 可供MapReduce的程式當作資料來源或儲存媒介 8
  • 9. Introduction • Hbase 9
  • 10. Introduction • NoSQL資料庫 • http://www.ithome.com.tw/itadm/article.php?c=6336 0&s=5 10
  • 11. Introduction • JackHare – allowing users to use the ANSI-SQL queries to manipulate large-scale data – 遵守ANSI-SQL和JDBC-4.0規格的API,用來操作 Apache Hbase – using MapReduce framework for processing the unstructured data in Hbase 11
  • 12. Related work • Pig – HDFS 與 MapReduce 叢集環境中執行 – Pig Latin - a simpler procedural language – http://pig.apache.org/docs/r0.12.0/basic.html#nest edblock • Hive – 提供類似SQL的查詢語言來查詢資料(HiveQL) – 可管理HDFS的資料 – https://cwiki.apache.org/confluence/display/Hive/T utorial 12
  • 13. Related work • YSmart – An SQL-to-MapReduce Translator – http://ysmart.cse.ohio-state.edu/ • S2MART – Smart Sql to Map-Reduce Translators 13
  • 14. Related work • HadoopDB – An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical – HadoopDB provides SQL query via a translation called SQL-MR-SQL (SMS), based on Hive. – http://db.cs.yale.edu/hadoopdb/hadoopdb.html • Clydesdale – structured data processing on MapReduce – focuses on processing the data fitting a star schema 14
  • 15. Related work • SQL查詢轉換為MapReduce • Hbase – 滿足頻繁的數據更新 – 維持NoSQL數據庫的可擴展性和可靠性 15
  • 16. The JackHare framework architecture 16
  • 17. The JackHare framework architecture • User submits an ANSI-SQL query by SQL client application. • The compiler scans and parses the ANSI-SQL query. • Lookup the related table name, column families and column qualifier of HBase. • Generate MapReduce code according to the query commands and metadata. 17
  • 18. The JackHare framework architecture • Access HBase and execute the MapReduce job. • The results wrapped back from the back-end. • The returned results are shown on SQL client application according to RDB schema. 18
  • 19. The JackHare framework architecture SQuirreL 19
  • 20. Unstructured data processing in HBase • remap the data in relational database to HBase 20
  • 21. Unstructured data processing in HBase • remap the data in relational database to HBase 21
  • 22. Unstructured data processing in HBase • Analysis of SQL clauses – SELECT, FROM and WHERE clauses – Extended clauses • • • • • GROUP BY HAVING ORDER BY JOIN AGGREGATE FUNCTIONs 22
  • 23. Experimental results • Experimental environment – two Intel Xeon L5640 CPU, 24 GB ram and 3 TB HD – 16-node virtual machine cluster on four physical machines – Hadoop 0.20.203 (15 October, 2013: release 2.2.0 available) – Hbase 0.92.0 (2013-09-20 | Version: 0.97.0-SNAPSHOT) – Hive 0.9.0 – JAVA 1.6.0, maximum heap size is 512 MB 23
  • 24. Experimental results • Experimental environment – Node : two cores at 2 GHz with 4 GB ram and 400 GB storage space – MySQL : two cores at 2 GHz, 4 GB ram and – 800 GB hard disk – 3 Table : LOT, WAFER and DIE 24
  • 25. Experimental results • Results 25
  • 26. Experimental results 26
  • 27. Experimental results 27
  • 28. Experimental results 28
  • 29. Conclusions 29
  • 30. • 報告完畢…. 30