Masters of Science presentation: Bringing The Grid Home


Published on

Masters of Science presentation of my work on G-ICING

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Today I’ll be talking about G-ICING, an Installable File System for Windows that interfaces with a Grid-Backend. As this photo shows, G-ICING is intended to be a layer on top of the Grid intended to make the grid tastier ;-)
  • For example, Biomedical researchers want to integrate patient history, demographics and trial records to better study diseases and treatments.
  • Explosion of information systems has led to vast amounts of data stored in widely varying formats, locations, and subject to numerous access control and privacy policies. Numerous users, projects and organizations spanning research industry and government have recognized the enormous potential of integrating the data. However, the complexity of integrating data between various sources presents a large obstacle to this goal. Data exists across organizational boundaries and in varying formats. Finding and naming these resources presents another problem. In essence, the energy barrier is too high.
  • Grids just have little uptake. They often tend to be hard to use, thus requiring users to get past a large learning curve. When we talk about difficulty, we also tend to talk about transparency to the user and application. Grids tend not to have either transparency … users must learn something new, and applications either have to be modified and recompiled or specifically made for the Grid. Most grids have one security model. Organizations much completely switch to using that model or cannot securely use the Grid Grids don’t play well with others. While Ian Foster pushed for standards and many people believe standards are essential to Grids. There are various such “standards” in play and not one set of standards that all should follow. THE MAJORITY OF USERS WHO COULD BENEFIT FROM GRID COMPUTING HAVE NEVER USED OR EVEN HEARD OF IT.
  • A solution to that uses Grid computing should follow these four criteria. First it should be simple and familiar … it seems that most Grids have ignored this fact or treated it as a second-level goal. Solutions are exclusively either user transparent, or application transparent. Some are neither.
  • Talk briefly about each point The numbers differ depending on your source, but the majority of people still use Windows. At the same time, however, there have been few solutions in Windows that have attempted to answer the four criteria. It is clear that if we want more uptake to the Grid and to solve this problem for a lot of people, we really need to have a solution in Windows.
  • G-ICING is a filesystem for Windows that maps a Grid platform into the Windows file namespace. You can access G-ICING as you would any network drive … e.g. you could mount it to G: and access the Grid namespace from there.
  • Here is the overall design of G-ICING. You can read this diagram from left to right. G-ICING is divided into three main components, the Kernel Management Service, the User Forwarding Service and the Grid Interface Service. I will discuss each of these modules in turn and how G-ICING communicates to the Grid backend in the following slides. For clarity, I will be highlighting in yellow each aspect before I talk about them. I’m going to try to gloss over some of these details so please feel free to stop me for any clarifications you may want.
  • What are these I/O requests and where do they come from
  • By far the most difficult aspect of G-ICING to design and the main reason why most people haven’t built filesystems for Windows. A particularly difficult issues with writing a kernel driver was being able to communicate to user processes. Why do I need to do this? Because using standards based in XML, you’re very limited to the libraries you can use in the kernel. Therefore, in terms of amount of time it takes to develop, its much easier to develop this in user mode.
  • Inverted Call Model Takes advantage of the I/O mechanisms in WinNT User level program makes special I/O Request: “Hello, I’m waiting for an operation” and Kernel mode stores it Kernel forwards actual I/O requests, to the user mode by responding to the above I/O with the forwarded I/O call Problems with limited buffer size between processes
  • Security – within the host G-icing relies upon Windows security to protection information between users of the same machine. Host-grid secuirty is complex. G-icing acts as a proxy for users. GIS prompts user for credentials and user is given a choice of using one of their certificates. GIS gets a signed delegated credential. Uses WS-Security* family of specifications and profiles.
  • People are familiar with Windows – because filesystem paradigm is a familiar one (this proves our point)
  • 3 – double clicking on an application etc
  • Masters of Science presentation: Bringing The Grid Home

    1. 1. Bringing the Grid Home Master’s Thesis Presentation for Chris Sosa University of Virginia April 28, 2009
    2. 2. Overview <ul><li>Motivation </li></ul><ul><li>G-ICING Design </li></ul><ul><li>Prototype </li></ul><ul><li>Evaluation </li></ul><ul><li>Demo </li></ul><ul><li>Conclusion </li></ul>
    3. 3. Motivating Example: Biomedical Researcher
    4. 4. More Motivating Examples <ul><li>Many examples </li></ul><ul><ul><li>Medical clinicians want patient records that complete and up-to-date </li></ul></ul><ul><ul><li>Researchers wants access to data provided at other institutions </li></ul></ul><ul><ul><li>Industry wants access to integrated customer and supply management data </li></ul></ul><ul><li>Commonalities </li></ul><ul><ul><li>Lots of data to integrate – data is stronger together than separate </li></ul></ul><ul><ul><li>Store in various locations, with different access control and security policies </li></ul></ul>
    5. 5. Current Solutions OR
    6. 6. A Better Solution: Data Grids <ul><li>Grid computing is a form of distributed computing with </li></ul><ul><ul><li>Loosely coupled machines </li></ul></ul><ul><ul><li>Machines cover multiple organizations </li></ul></ul><ul><li>A Data Grid a type of Grid Computing system that deals with controlled sharing and management of large loads of data </li></ul>
    7. 7. Why don’t more people use Data Grids? Hard to Use Inflexible Security Doesn’t play well with others
    8. 8. Solution Criteria
    9. 9. Simple and Familiar: More difficult than it seems <ul><li>Often overlooked or treated as a secondary goal </li></ul><ul><li>Two aspects </li></ul><ul><ul><li>User Transparency </li></ul></ul><ul><ul><li>Application Transparency </li></ul></ul><ul><li>Solutions </li></ul><ul><ul><li>Shell Extensions </li></ul></ul><ul><ul><li>Shells </li></ul></ul><ul><ul><li>Special Libraries </li></ul></ul><ul><ul><li>Filesystems </li></ul></ul>
    10. 10. Related Work <ul><li>OpenAFS creates a modified Samba server but stuck to Samba/CIFS security model </li></ul><ul><li>LUFS and FUSE are filesystem in user-space technologies for UNIX / Mac </li></ul><ul><ul><li>Lack support for Windows </li></ul></ul><ul><ul><li>Tied to UNIX security semantics </li></ul></ul><ul><li>Gfarm uses FUSE + syscall hook library </li></ul><ul><ul><li>Same problems with just FUSE </li></ul></ul><ul><ul><li>Overly complex for Windows, requires set up of a separate Linux box to forward messages through </li></ul></ul><ul><li>Glite provides POSIX-like interface that is neither user or application transparent </li></ul>
    11. 11. Bring in G-ICING <ul><li>Real filesystem for Windows </li></ul><ul><ul><li>User transparency </li></ul></ul><ul><ul><li>Application transparency </li></ul></ul><ul><li>Full filesystem stack so not tied to Windows security model </li></ul>
    12. 12. G-ICING Design
    13. 13. G-ICING Design
    14. 14. IFS Development in Windows
    15. 15. G-ICING Design
    16. 16. Kernel Management Service (KMS) <ul><li>Installable File System Driver </li></ul><ul><ul><li>Network Redirector </li></ul></ul><ul><ul><li>Kernel driver that interacts with other Kernel components </li></ul></ul><ul><li>Communicates to User-mode UFS with Inverted Call Model </li></ul>
    17. 17. User to Kernel Communication
    18. 18. G-ICING Design
    19. 19. User Forwarding Service (UFS) <ul><li>Uses JNI to communicate and forwards requests to GIS </li></ul><ul><li>Prompts user for credentials and obtains a delegated credential for use </li></ul>Flexible Security through Delegation
    20. 20. G-ICING Design
    21. 21. Grid Interface Service (GIS) <ul><li>Converts FS requests into ByteIO/RNS calls </li></ul><ul><ul><li>Resource Naming Service (RNS) </li></ul></ul><ul><ul><ul><li>Basic directory services </li></ul></ul></ul><ul><ul><li>ByteIO - files </li></ul></ul><ul><li>Caches meta-information </li></ul><ul><li>ByteIO buffering </li></ul><ul><li>In Java </li></ul><ul><ul><li>Easy xml serialization/deserialization </li></ul></ul><ul><ul><li>Problems with garbage collection </li></ul></ul>
    22. 22. Prototype Implementation <ul><li>Genesis II as Grid-backend </li></ul><ul><ul><li>Open-source </li></ul></ul><ul><ul><li>Standards-based </li></ul></ul><ul><ul><li>Developed at UVA </li></ul></ul><ul><li>Semantics </li></ul><ul><ul><li>Time-out cache semantics – 45 seconds </li></ul></ul><ul><ul><li>Write-through cache </li></ul></ul>
    23. 23. Evaluation <ul><li>Performance: Do we perform well enough? </li></ul><ul><li>Usability </li></ul><ul><ul><li>Compare to alternatives </li></ul></ul><ul><ul><li>Usability Study </li></ul></ul>
    24. 24. Performance: Evaluation Setup <ul><li>Client </li></ul><ul><ul><li>Single-core 2.34 GHz desktop machine with 1GB memory running WinXP </li></ul></ul><ul><ul><li>100 Mbps connection </li></ul></ul><ul><li>Grid-Backend </li></ul><ul><ul><li>Genesis II running on seven 8-core Xeon processors running at 2.33 GHz with 16 GB memory </li></ul></ul><ul><ul><li>1 Gbps connection </li></ul></ul>
    25. 25. Performance: Test Plan <ul><li>Performance tests using Iozone </li></ul><ul><li>Compare against Samba Share </li></ul><ul><ul><li>Samba commonly used in organizations with shared filesystems </li></ul></ul><ul><ul><li>Compare G-ICING’s Iozone results with Samba results </li></ul></ul>
    26. 26.
    27. 27.
    28. 28. Usability Evaluation <ul><li>Alternatives? </li></ul><ul><ul><li>Shell Extension (not app transparent) </li></ul></ul><ul><ul><li>Posix-like libraries (neither user or app transparent) </li></ul></ul><ul><ul><li>Shell-like interfaces (not user transparent) </li></ul></ul><ul><ul><li>Web Portals (not app transparent) </li></ul></ul><ul><li>Usability Study follows </li></ul>
    29. 29. Usability Study - Overview <ul><li>The Usability Study: Is the filesystem paradigm really simple and familiar ? </li></ul><ul><li>10 participants </li></ul><ul><ul><li>6 non-engineering students in their first/second </li></ul></ul><ul><ul><li>3 graduate students with shell experience </li></ul></ul><ul><ul><li>1 user with knowledge of the Genesis II </li></ul></ul>
    30. 30. Usability Study – A Look Inside <ul><li>Two tests run </li></ul><ul><ul><li>Edit a MS Word document </li></ul></ul><ul><ul><li>Run a “Grid-job” by copying a job description file (JSDL) appropriately </li></ul></ul><ul><ul><li>Each run either using G-ICING or the Genesis II shell </li></ul></ul><ul><li>Questionnaire </li></ul><ul><ul><li>Background </li></ul></ul><ul><ul><li>How long each task took </li></ul></ul><ul><ul><li>Measure how difficult each method was </li></ul></ul><ul><ul><li>Give an overall preference </li></ul></ul>
    31. 31. Usability Study - Results <ul><li>9/10 users preferred G-ICING either strongly or moderately </li></ul><ul><ul><li>Sole user who did not was also previous Genesis II user </li></ul></ul><ul><ul><li>Concerned with performance </li></ul></ul><ul><li>6/9 felt strongly for G-ICING </li></ul>
    32. 32. Usability Study - Quantitative Results <ul><li>Average difficulty levels on a scale of 1-5 </li></ul><ul><ul><li>For Genesis II Shell – 3.6 </li></ul></ul><ul><ul><li>For G-ICING – 1.3 </li></ul></ul><ul><li>Results below for duration of tasks </li></ul>Shell Edit Shell Run G-ICING Edit G-ICING Run Overall: Avg. Duration (mins) 10 5.889 1.6 1.9 Shell Users: Avg. Duration (mins) 8 3 1.4 1.8
    33. 33. Demo
    34. 34. Conclusions <ul><li>The Grid is useless without users </li></ul><ul><li>By providing a simple and familiar interface, G-ICING has the potential to bring more users </li></ul><ul><ul><li>9/10 prefer G-ICING over shell interface </li></ul></ul><ul><ul><li>Takes five to ten times less time to perform common data operations </li></ul></ul><ul><li>Provides both user and application transparency without sacrificing flexible security, usage of standards and performance </li></ul>
    35. 35. Future Work <ul><li>Stretching filesystem paradigm to perform more Grid services </li></ul><ul><li>FUSE for Windows </li></ul>
    36. 36. Questions <ul><li>? </li></ul>
    37. 37. Usability Study - Observations <ul><li>Graduate Students and Grid user had higher expectations of Word </li></ul><ul><li>Performance issues not discussed in undergraduate session </li></ul><ul><li>Most common complaint, “help” not helpful </li></ul>
    38. 38. Prototype in Action
    39. 39. Prototype in Action (Continued)