Your SlideShare is downloading. ×
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data

456

Published on

Caching is widely known to be an effective method for improving I/O performance by storing frequently used data on higher speed storage components. However, most existing studies that focus on caching …

Caching is widely known to be an effective method for improving I/O performance by storing frequently used data on higher speed storage components. However, most existing studies that focus on caching performance evaluate fairly small files populating a relatively small cache. Few reports are available that detail the performance of traditional cache replacement policies on extremely large caches. Do such traditional caching
policies still work effectively when applied to systems with petabytes of data? In this paper, we comprehensively evaluate
the performance of several cache policies, which include First-In-First-Out (FIFO), Least Recently Used (LRU) and Least
Frequently Used (LFU), on the global satellite imagery distribution application maintained by the U.S. Geological Survey
(USGS) Earth Resources Observation and Science Center (EROS). Evidence is presented suggesting traditional caching
policies are capable of providing performance gains when applied
to large data sets as with smaller data sets. Our evaluation is based on approximately three million real-world satellite images
download requests representing global user download behavior since October 2008.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
456
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Remote sensing data setFinancial engineering
  • The EROS Data Center is home to the U.S. National Satellite Land Remote Sensing Data Archive. The EROS Data Center is managed by the United States Geological Survey.Provides one of the largest satellite image distribution services in the world.Released image repository to public domain in October 2008.
  • When a user requests an image over the web, the download starts immediately if the requested image is available on the FTP server. Otherwise, the user is sent an e-mail with a link to the requested image once the processing is complete.
  • Detailed description of the log file. From left to right, the attributes in the log file is as follows: scene ID (further broken down into sub-attributes), user ID, request date.
  • A clear power distribution is visible; few people request many images, many people request few images.
  • An exceptionally large amount of duplicate (same user requests same image again) requests were made. We were told by USGS that this was a result of the currentuserinterface.
  • Normally, LRU and LFU improve FIFO significantly. Is this going to be the case on a global system with petabytes of data?
  • Cache size in simulations are smaller than the actual FTP server size.
  • Instantaneous image processing is assumed to simplify simulation. With this simplification, we can not measure the wait time for users. However, we can still calculate the cache hit-rates.
  • No images removed from FTP server until first clean up date. Therefore, all three cache policies behave exactly the same before the first clean up. During clean up. different set images is removed from the server, depending on the cache adopted policy.
  • Number of clean-upsThe monthly hit rate of each of the three cachereplacement policies is shown individually. Hollow bars representmonths in which a “clean-up” occurred. Hilighting monthsin which “clean-ups” occur, we find that all replacementpolicies examined will execute a “clean up”, on average, onceevery 1.8 months. Several brief periods were observed whereinFIFO and the pair of LRU and LFU will exhibit differences in their respective lengths of time between ”clean-ups” beinginitiated, however differences in ”rest times” were marginallysmall (two months at maximum). In addition, we confirm thatLRU and LFU behave extremely similarly, with “clean-up”taking place in the same month for the two policies across allresultant data sets.
  • Number of clean-upsThe monthly hit rate of each of the three cachereplacement policies is shown individually. Hollow bars representmonths in which a “clean-up” occurred. Hilighting monthsin which “clean-ups” occur, we find that all replacementpolicies examined will execute a “clean up”, on average, onceevery 1.8 months. Several brief periods were observed whereinFIFO and the pair of LRU and LFU will exhibit differences in their respective lengths of time between ”clean-ups” beinginitiated, however differences in ”rest times” were marginallysmall (two months at maximum). In addition, we confirm thatLRU and LFU behave extremely similarly, with “clean-up”taking place in the same month for the two policies across allresultant data sets.
  • When requests from top-9 aggressive users are included, LRU seems to be the best performer.
  • When requests from top-9 aggressive users are removed, LFU outperforms LRU. It may be the case that LFU is better when requests are evenly distributed throughout users. Whereas LRU is better when there is a Power Distribution.
  • In September 2008, EROS announced its new policy to place all of their satellite imagery in the public domain for free download. Since then, the number of downloads has significantly increased (already reached 2M). This number is expected to grow steadily.No existing visualization tools available to utilize these data for the purpose of system optimization, user download pattern analysis, data mining, etc. A well designed visualization tool will be helpful to:monitor the historical and current global user download behaviorsshow the current global download “hot spots”demonstrate the importance and actual usage of EROS dataoptimize the storage system performanceimprove the satellite image distribution service provided by EROS
  • Transcript

    • 1. Performance Evaluation of Traditional Caching Policies on A Large System with Petabytes of Data Ribel Fares1, Brian Romoser1, Ziliang Zong1, Mais Nijim2 and Xiao Qin3 Texas State University, TX, USA1 Texas A&M University-Kingsville2 Auburn University, AL, USA3 Presented at the 7th IEEE International Conference6/28/2012 on Networking, Architecture, and Storage (NAS2012)
    • 2. Motivation • Large-scale data processing • High-performance storage systems6/28/2012 2
    • 3. High-Performance Clusters • The Architecture of a Cluster Storage subsystems Head Node (or Storage Area Network) InternetClient Network switch Computing Nodes 6/28/2012 3
    • 4. Techniques for High-Performance Storage Systems• Caching• Prefetching• Active Storage• Parallel Processing 4
    • 5. Earth Resources Observation and Science (EROS) Center  Over 4 petabytes of satellite imagery available.  More than 3 million image requests since 2008. Do traditional caching policies still work effectively? 5
    • 6. EROS Data Center - System Workflow data data data 6
    • 7. USGS / EROS Storage System Hardwar BusType Model Capacity e Interface Sun/Oracle 1 F5100 100 TB SSD SAS/FC 2 IBM DS3400 1 PB HDD SATA 3 Sun/Oracle T10K 10 PB Tape Infiniband The FTP server from which users download images is of type 1.
    • 8. The USGS / EROS Distribution System• Each cache miss costs 20–30 minutes of processing time.
    • 9. USGS / EROS Log File L • Landsat E • ETM+ sensor 7 • Satellite designation 004 • WRS path 063 • WRS row 2006 • Acquisition year 247 • Acquisition day of year ASN • Capture station 00 • Version
    • 10. Observation 1• Top 9 aggressive users account for 18% of all requests.• A second log file was created by removing requests made by the top 9 aggressive users.
    • 11. Observation 2 35.00% Duplicate Request Percentage 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 0 2 4 6 8 10 12 14 16 Time Window (Days)• Duplicate images within 7 days were removed from the log file.
    • 12. Caching Algorithms• FIFO: First entry in cache gets removed first.• LRU: Least recently requested image removed first.• LFU: Least popular image removed first.
    • 13. Case StudiesSimulation Number Cache Policy Cache Size (TB) Top 9 Aggressive Users 1 FIFO 30 Included 2 FIFO 30 Not Included 3 FIFO 60 Included 4 FIFO 60 Not Included 5 LRU 30 Included 6 LRU 30 Not Included 7 LRU 60 Included 8 LRU 60 Not Included 9 LFU 30 Included 10 LFU 30 Not Included 11 LFU 60 Included 12 LFU 60 Not Included
    • 14. Simulation Assumptions/Restrictions• When cache server reaches 90% capacity, images will be removed according to adopted cache policy until server load is reduced down to 45%.• Images are assumed to be processed instantaneously.• A requested image can not be removed from the server before 7 days.
    • 15. Results – Hit Rates ofDiffering Cache Replacement Policies Hit Rates: 60TB – Aggressive Users Included First Clean Up0.650.55 LRU Hit Rate LFU Hit Rate0.45 FIFO Hit Rate0.350.250.15
    • 16. Monthly hit ratiosAggressive users excluded
    • 17. Monthly hit ratiosAggressive users excluded
    • 18. Results – Impact of Inclusion of Aggressive Users With Aggressive Users 0.5 0.45 0.4 0.35Hit Rate 0.3 0.25 0.2 0.15 0.1 60 TB 0.05 0 FIFO 30 TB LRU LFU FIFO LRU LFU30 TB 0.32661 0.345919 0.33951560 TB 0.438536 0.457727 0.454811
    • 19. Results – Impact of Exclusion of Aggressive Users No Aggressive Users 0.5 0.4 Hit Rate 0.3 0.2 0.1 60 TB 0 FIFO 30 TB LRU LFU FIFO LRU LFU 30 TB 0.319171 0.332741 0.345208 60 TB 0.430349 0.449621 0.45871
    • 20. Conclusion & Future Work• LRU and LFU initiate cache clean-up at similar points.• Aggressive users destabilize monthly hit rates• LFU was least affected by the inclusion of aggressive users.
    • 21. Conclusion & Future Work cont’d.• LRU and LFU methods improve FIFO as expected.• However, improvements are on the weaker side.• Global user behaviors should be further investigated to design more complex caching and/or prefetching strategies.
    • 22. Summary• Data-Intensive Processing – EROS (Earth Resources Observation and Science) Data Center – visEROS• Improving I/O Performance – Prefetching – Active Storage – Parallel Processing 22
    • 23. The VisEROS Project – Motivation• 2M downloads from the EROS data center.• No existing visualization tools available to utilize these data• Need a tool to: – Monitor user download behaviors – Show the current global download “hot spots” – Demonstrate the actual usage of EROS data – Optimize the storage system performance – Improve the satellite image distribution service 23
    • 24. The VisEROS Prototype 24
    • 25. This project is supported bythe U.S. National Science Foundation No. 0917137 25
    • 26. Download the presentation slides http://www.slideshare.net/xqin74 Google: slideshare Xiao Qin
    • 27. Many Thanks!6/28/2012 27

    ×