The System Formerly Known As P.R.O.S.E. Partitioned Reliable Operating System Environment (now known as Libra) <ul><ul><li...
Background <ul><li>Motivation: Push the mainstream heavy-weight operating systems out of the way. </li></ul><ul><li>Why: <...
Virtualization Kernel <-> Hypervisor Interface Hardware Platform Hypervisor Logical Partition Logical Partition Logical Pa...
PROSE Approach <ul><li>Run applications in stand-alone partition </li></ul><ul><li>Enable execution environment which make...
Resource Sharing Requirements <ul><li>Simple client or server implementation requiring minimal transport </li></ul><ul><li...
Re-introducing 9P2000 (from Plan 9) <ul><li>Plan 9 is a research operating system developed at Bell Labs during the 90's d...
9P2000 Characteristics <ul><li>Simple architecture-independent asynchronous RPC driven resource sharing protocol with buil...
PROSE/Libra Architecture Xen Storage Network File System Linux Control Partition (Dom0) User Partition (DomU) in   channel...
Hardware Devices System Services Application Services Disk Network TCP/IP Stack Database GUI /dev/eth0 /dev/tap0 /dev/tap1...
rHype: IBM's Research Hypervisor for Power <ul><li>Small (~30k lines of code for both x86 & PowerPC) </li></ul><ul><li>Dev...
arlx112 arlx113 Both <ul><li>IBM JS20 Blade </li></ul><ul><li>SLOF Firmware </li></ul><ul><li>4 GB DRAM Memory </li></ul><...
Sparse Memory Benchmark Performance
Noise Control w/PROSE & Hypervisors <ul><li>Allow strict control of percentage of CPU devoted to application versus system...
Noise Comparison Linux Idle Linux Loaded PROSE Idle PROSE Loaded
Second Iteration: Scale Out Incarnation <ul><li>Run J9 JVM on top of PROSE libOS </li></ul><ul><li>Politics required a por...
Inferno Xen Channel Device <ul><li>Simple clone file system based on Plan 9 TCP/IP stack interface </li></ul><ul><li>Pass ...
Generalized Shared Memory Network Driver <ul><li>Make the interface look more like devip </li></ul><ul><li>Different proto...
9P File System I/O <ul><li>Performance of forwarding standard read/write </li></ul><ul><li>128 MB file with varying buffer...
 
Status <ul><li>Code reorganization and cleanup  </li></ul><ul><li>Name change to Libra </li></ul><ul><li>Support for x86 (...
Future Work <ul><li>Move server components into Linux kernel. </li></ul><ul><li>Generalized accommodation of existing appl...
Acknowledgments <ul><li>The original work would not be possible without the contributions of  Jimi   Xenidis , Michal  Ost...
Additional Information <ul><li>http://www.research.ibm.com/prose </li></ul><ul><li>http://www.research.ibm.com/hypervisor ...
BACKUP SLIDES
rHype scheduler explanation <ul><li>Simple fixed-slot round-robin scheduler. </li></ul><ul><li>Quanta is determined by spe...
Phase Scheduling Noise <ul><li>FWQ aren't aligned to scheduler quanta </li></ul><ul><li>Noise is exacerbated by fixed leng...
Transparent Application Development Process Original Application PROSE Application Custom OS Library
PROSE Reliability in channel out channel Shared Memory open read write close tcp/ip Ethernet Disk Partition File System in...
Upcoming SlideShare
Loading in …5
×

Libra Library OS

2,041
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,041
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • To replace the title / subtitle with your own: Click on the title block -&gt; select all the text by pressing Ctrl+A -&gt; press Delete key -&gt; type your own text
  • A pot file is a Design Template file, which provides you the “look” of the presentation You apply a pot file by opening the Task Pane with View &gt; Task Pane and select Slide Design – Design Templates. Click on the word Browse… at bottom of Task Pane and navigate to where you stored BlueOnyx Deluxe.pot (black background) or BluePearl Deluxe.pot (white background) and click on Apply. You can switch between black and white background by navigating to that pot file and click on Apply. Another easier way to switch background is by changing color scheme. Opening the Task Pane, select Slide Design – Color Schemes and click on one of the two schemes. All your existing content (including Business Unit or Product Names) will be switched without any modification to color or wording. Start with Blank Presentation, then switch to the desired Design Template Start a new presentation as Blank Presentation You can switch to Blue Onyx Deluxe.pot by opening the Task Pane with View &gt; Task Pane and select Slide Design – Design Templates. Click on the word Browse… at bottom of Task Pane and navigate to where you stored BlueOnyx Deluxe.pot (black background) and click on Apply. Your existing content will take on Blue Onyx’s black background, and previous black text will turn to white. You should add your Business Unit or Product Name by modifying it on the Slide Master You switch to the Slide Master view by View &gt; Master &gt; Slide Master. Click on the Title Page thumbnail icon on the left, and click on the Business Unit or Product Name field to modify it. Click on the Bullet List Page thumbnail icon on the left, and click on the Business Unit or Product Name field to modify it. Click on Close Master View button on the floating Master View Toolbar You can turn on the optional date and footer fields by View &gt; Header and Footer Suggested footer on all pages including Title Page: Presentation Title | Confidential Date and time field can be fixed, or Update automatically. It appears to the right of the footer. Slide number field can be turned on as well. It appears to the left of the footer.
  • So here’s what the system looks like in the end, … but I want to start at the beginning.
  • Libra Library OS

    1. 1. The System Formerly Known As P.R.O.S.E. Partitioned Reliable Operating System Environment (now known as Libra) <ul><ul><li>Eric Van Hensbergen </li></ul></ul><ul><ul><li>(bergevan@us.ibm.com) </li></ul></ul>
    2. 2. Background <ul><li>Motivation: Push the mainstream heavy-weight operating systems out of the way. </li></ul><ul><li>Why: </li></ul><ul><ul><li>Finer grain of control over system services: scheduling, memory allocation, interrupt handling (or lack thereof) </li></ul></ul><ul><ul><li>Reliability: application-specific kernels are likely to be smaller and may even be verifiable using formal methods </li></ul></ul><ul><ul><li>Hardware support: Enable use of hardware-specific features which may not be well-matched to generalized mainstream operating system. </li></ul></ul>
    3. 3. Virtualization Kernel <-> Hypervisor Interface Hardware Platform Hypervisor Logical Partition Logical Partition Logical Partition Logical Partition Hardware <-> Hypervisor Interface
    4. 4. PROSE Approach <ul><li>Run applications in stand-alone partition </li></ul><ul><li>Enable execution environment which makes starting a partition as easy as starting an application </li></ul><ul><li>Development environment allowing creation of specialized kernels as easy as developing an application (library-OS) </li></ul><ul><li>Resource sharing between library-OS partitions and traditional partitions keeping library-OS kernels simple and reliable </li></ul><ul><li>Extensions to allow bridging resource sharing and management across the entire cluster. </li></ul><ul><li>Unified communication protocol for resource sharing and control with built-in failure detection and recovery. </li></ul>Kernel <-> Hypervisor Interface Logical Partition Hardware Platform Hypervisor Logical Partition Logical Partition Logical Partition Hardware <-> Hypervisor Interface DB2 lib OS lib OS J9 9P 9P Controller Controller App 9p
    5. 5. Resource Sharing Requirements <ul><li>Simple client or server implementation requiring minimal transport </li></ul><ul><li>Single protocol for all resources (application interfaces, system services, devices) </li></ul><ul><li>Intuitive interface (high productivity) </li></ul><ul><li>Reliability (Failure detection, recovery, & failover) </li></ul><ul><li>Scalability </li></ul>
    6. 6. Re-introducing 9P2000 (from Plan 9) <ul><li>Plan 9 is a research operating system developed at Bell Labs during the 90's designed to address the deficiencies of UNIX in the presence of a network. </li></ul><ul><li>Three major design elements: </li></ul><ul><ul><li>All resources are accessed through synthetic file systems. </li></ul></ul><ul><ul><li>These file systems are organized in a per-process stackable private name space. </li></ul></ul><ul><ul><li>Local and remote resources are accessed using a single simple unified protocol. </li></ul></ul>
    7. 7. 9P2000 Characteristics <ul><li>Simple architecture-independent asynchronous RPC driven resource sharing protocol with built-in support for mutual authentication and encryption </li></ul><ul><li>Only requires an underlying reliable, in-order interface </li></ul><ul><li>Based on 11 primitive operations, most common file operations </li></ul><ul><li>Integrated support for hierarchical namespace </li></ul>
    8. 8. PROSE/Libra Architecture Xen Storage Network File System Linux Control Partition (Dom0) User Partition (DomU) in channel out channel Libra 9p Client Inferno (9p Server) Memory mgmt. Threads Libra API J9 (JVM) J9 port layer File Ops Sockets Sys Svc Application Proxy Process Console Authority User Environment
    9. 9. Hardware Devices System Services Application Services Disk Network TCP/IP Stack Database GUI /dev/eth0 /dev/tap0 /dev/tap1 /net /arp /udp /tcp /clone /stats /0 /1 /ctl /data /listen /local /remote /status File System /mnt/9p_root /mnt/common_fs /mnt/remote_nfs /sql /clone /0 /query /result /1 /win /clone /0 /1 /ctl /data /opengl /refresh /2 Resources Sharing via File Name Space /dev/hda1 /dev/hda2
    10. 10. rHype: IBM's Research Hypervisor for Power <ul><li>Small (~30k lines of code for both x86 & PowerPC) </li></ul><ul><li>Developed as a validation test for Cell virtualization features and as a research platform for LPAR research </li></ul><ul><li>Uses same system interfaces as IBM's commercial Power virtualization engine </li></ul><ul><li>Open Sourced: http://www.research.ibm.com/hypervisor </li></ul>
    11. 11. arlx112 arlx113 Both <ul><li>IBM JS20 Blade </li></ul><ul><li>SLOF Firmware </li></ul><ul><li>4 GB DRAM Memory </li></ul><ul><li>Single * 1.66 GHZ 970 </li></ul><ul><li>Linux 2.6.10 </li></ul><ul><li>Controller Partition </li></ul><ul><li>Linux 2.6.10 </li></ul><ul><li>64 MB of memory </li></ul><ul><li>PROSE Partition </li></ul><ul><li>Application + lib-os </li></ul><ul><li>1 GB of memory </li></ul><ul><li>Console & Time over 9P </li></ul>Performance Experimental Setup
    12. 12. Sparse Memory Benchmark Performance
    13. 13. Noise Control w/PROSE & Hypervisors <ul><li>Allow strict control of percentage of CPU devoted to application versus system daemons and I/O requests </li></ul><ul><li>Can eliminate jitter associated with interrupt service routines </li></ul><ul><li>Provides a higher degree of determinism that vanilla Linux, but does so at a performance cost </li></ul>
    14. 14. Noise Comparison Linux Idle Linux Loaded PROSE Idle PROSE Loaded
    15. 15. Second Iteration: Scale Out Incarnation <ul><li>Run J9 JVM on top of PROSE libOS </li></ul><ul><li>Politics required a port from research hypervisor to Xen-PPC </li></ul><ul><li>Single front-end system with multiple partitioned back-end systems </li></ul><ul><li>Resources shared over network as well as between partitions </li></ul><ul><ul><li>All virtual nodes used a single TCP/IP stack shared from the front-end machine </li></ul></ul>
    16. 16. Inferno Xen Channel Device <ul><li>Simple clone file system based on Plan 9 TCP/IP stack interface </li></ul><ul><li>Pass Xen partition name as the target of a “socket” connect with optional channel information </li></ul><ul><li>Then export file system over “socket” </li></ul><ul><li>Detect hangups through badc0ffee magic so that connections can be cleaned up and recycled. </li></ul><ul><li>Started with simple sleep-based polling, moved to sched_yield to reduce latency from 10ms to 50usec. </li></ul>
    17. 17. Generalized Shared Memory Network Driver <ul><li>Make the interface look more like devip </li></ul><ul><li>Different protocol directories for different mediums (UNIX shared memory, Xen shared channels, rHype /dev/hcall shared channels, etc.) </li></ul><ul><li>Connect, Announce, and Hangup -> ctl file </li></ul><ul><li>Listen doesn't make sense – at least not right now. </li></ul>
    18. 18. 9P File System I/O <ul><li>Performance of forwarding standard read/write </li></ul><ul><li>128 MB file with varying buffer size </li></ul><ul><li>Data present on Linux cache </li></ul>
    19. 20. Status <ul><li>Code reorganization and cleanup </li></ul><ul><li>Name change to Libra </li></ul><ul><li>Support for x86 (32 bit or 64 bit) platforms using Xen </li></ul><ul><li>Preliminary transport support for other hypervisors (KVM) and other shared memory interfaces </li></ul><ul><li>Ongoing evaluation of other application domains </li></ul>
    20. 21. Future Work <ul><li>Move server components into Linux kernel. </li></ul><ul><li>Generalized accommodation of existing applications (via system call reflection or dynamic relinking). </li></ul><ul><li>Shared read-only file caches and text segments. </li></ul><ul><li>Checkpoints and “freeze-dried” partition images allowing rapid application execution and initialization. </li></ul>
    21. 22. Acknowledgments <ul><li>The original work would not be possible without the contributions of Jimi Xenidis , Michal Ostrowski , and Orran Krieger. The Libra work has been a collaborative effort also including Glen Ammons, Jonathan Appavoo, Maria Butrico, Dilma da Silva, David Grove, Kiyokuni Kawachiya, Bryan Rosenburg, and Robert Wisniewski. </li></ul><ul><li>This work was supported in part by the Defense Advanced Research Projects Agency under contract no. NBCH30390004. </li></ul>
    22. 23. Additional Information <ul><li>http://www.research.ibm.com/prose </li></ul><ul><li>http://www.research.ibm.com/hypervisor </li></ul><ul><li>http://www.research.ibm.com/systemsim </li></ul><ul><li>Libra: A Library Operating System for a JVM in a Virtualized Execution Environment; Glenn Ammons, et. al. In proceedings of the Third International ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments. San Diego, CA. June 2007. </li></ul>
    23. 24. BACKUP SLIDES
    24. 25. rHype scheduler explanation <ul><li>Simple fixed-slot round-robin scheduler. </li></ul><ul><li>Quanta is determined by special HDEC counter (default quanta=20ms) </li></ul><ul><li>Partitions can be given greater share of CPU by being assigned multiple slots. </li></ul>
    25. 26. Phase Scheduling Noise <ul><li>FWQ aren't aligned to scheduler quanta </li></ul><ul><li>Noise is exacerbated by fixed length scheduling slots. </li></ul><ul><li>Fixed noise ratio based on HDEC length </li></ul>...
    26. 27. Transparent Application Development Process Original Application PROSE Application Custom OS Library
    27. 28. PROSE Reliability in channel out channel Shared Memory open read write close tcp/ip Ethernet Disk Partition File System in channel out channel Shared Memory Ethernet Disk Partition File System Private namespace Network Private namespace
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×