Successfully reported this slideshow.
Your SlideShare is downloading. ×

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 23 Ad

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries

Download to read offline

Huan-Ping Su (蘇桓平), Yi-Sheng Lien (連奕盛) National Cheng Kung University
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/

Huan-Ping Su (蘇桓平), Yi-Sheng Lien (連奕盛) National Cheng Kung University
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/

Advertisement
Advertisement

More Related Content

Similar to hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries (20)

Advertisement

More from Michael Stack (20)

Advertisement

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries

  1. 1. Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries Huan-Ping Su (蘇桓平), Yi-Sheng Lien (連奕盛) National Cheng Kung University
  2. 2. Agenda •Introduction •Background •Goal •Design •Performance •Summary
  3. 3. Intro In the semiconductor manufacturing industry, data volume increases exponentially during the manufacturing process, which greatly helps in monitoring and improving production quality.
  4. 4. Background •Heterogeneous storages (such as FTP, SQL Server, HDFS, HBASE...) •Data transfer between storages (moving, coping, ETL...) •Learning curve for the sophisticated storage
  5. 5. Goal •For easy administration between storages •Compatible with underlying storages (We can communicate with different storages in one protocol ) •Intuitive operation
  6. 6. Design
  7. 7. Design •HDFS Interface
  8. 8. Design •HttpServer (Usage Pattern) http://<hds_host>/access?from=smb://user/a.data&to=ftp:///dir/a.data
  9. 9. Design •Transparency by Mixing HDFS and HBase
  10. 10. Design •Compliance with HDFS Interfaces, and thus Hadoop Ecosystem
  11. 11. Design •Load Balancing
  12. 12. Design •Load Balancing
  13. 13. Experimental Setup •Server Spec CPU : Intel Xeon E7-8850 @2GHz (80 cores) Mem : 512GB Disk : 750GB * 16 •Virtual Machine Spec (16 nodes) CPU : 80 * 2GHz Mem : 32GB Disk : 750GB
  14. 14. Experimental Setup •Cluster Settings Hadoop 2.6.0-cdh5.10.0 HBase 1.2.0-cdh5.10.0 ZooKeeper 3.4.5-cdh5.10.0 Yarn 2.6.0-cdh5.10.0 Hive 1.1.0-cdh5.10.0
  15. 15. Performance Results (Transparency) Hive and Spark are studied over our HDS •Hive (r,s) : read small files with hive (SELECT queries over 30000 files, each with 0.001 MBytes) •Hive (r,l) : read large files with hive (SELECT queries over 12 files, each with 16000 MBytes) •Hive (w) : write with hive (Hive consequently generates one 32 GBytes file)
  16. 16. Performance Results (Transparency)
  17. 17. Performance Results (Load balancing)
  18. 18. Performance Results (Overheads) Overhead by HDS, compared with the native HDFS. The workloads are represented by (1,10000), (100,100), (1000,10), and (10000,1), where (x,y) denotes the replication of y files of each in x MBytes from an FTP server to the HDS cluster.
  19. 19. Performance Results (Overheads)
  20. 20. Summary •Solve small file problem in HDFS •Transparency to different storages •Compatible with hadoop-eco project •Improve 1% yield rate of semiconductor manufacturing
  21. 21. Thanks!

×