SlideShare a Scribd company logo
Detecting and
           Correcting Transient
            Hardware Errors
            John Byrne (john.l.byrne@hp.com), Norman P. Jouppi,
            Laura Ramirez, Parthasarathy Ranganathan, Bruce J. Walker
            HP Labs

            Nidhi Aggarwal, Kewal K. Saluja, James E. Smith
            University of Wisconsin – Madison




© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Availability/Reliability Spectrum
    High Availability;    Fault Tolerance; No lost
                                                     Fault Correction; keep
    Restart/Failover;     work; keep running,
                                                     Running with correct results
    Lose ongoing work     even with bad results




2      24 February 2009
Availability/Reliability Spectrum
    High Availability;             Fault Tolerance; No lost
                                                                Fault Correction; keep
    Restart VM;                    work; keep running,
                                                                Running with correct results
    Lose ongoing work              even with bad results

                      Ongoing FT Work:
                          Remus
                                                Checkpoint and restart on complete node failure
                          Kemari
                                              Output comparison; pick one if they don’t match
                          Marathon
       Lockstep
                          VMware
       Execution                               Backup takes over on complete node failure




3      24 February 2009
Availability/Reliability Spectrum
    High Availability;             Fault Tolerance; No lost
                                                                Fault Correction; keep
    Restart VM;                    work; keep running,
                                                                Running with correct results
    Lose ongoing work              even with bad results

                      Ongoing FT Work:
                          Remus
                                                Checkpoint and restart on complete node failure
                          Kemari
                                              Output comparison; pick one if they don’t match
                          Marathon
       Lockstep
                          VMware
       Execution                               Backup takes over on complete node failure

     Theme: Deal with Complete Node Failure


    No one is detecting or correcting transient processor failures



4      24 February 2009
Transient Hardware Errors
    International Technology Roadmap for Semiconductors
•
    has predicted significant reliability problems
    Intel study in 2005 indicated 100-fold increase in
•
    transient faults in scaling from 180nm to 16nm

Errors can:
a. Crash the OS;
b. Corrupt data;
c. Cause execution to take a different path;


Goal: Detect and Correct Transient Errors


5   24 February 2009
Detect and Correct Transient Errors
      Lockstep VMs with ongoing checkpoints.
1.
      Tee input to both VMs
2.
      Compare output from VMs and re-execute on
3.
      miscompare
      Compare checkpoints and re-execute on
4.
      miscompare
      Log input, interrupts and non-deterministic
5.
      instructions to allow completely accurate re-
      execution;



6    24 February 2009
Lockstep VMs
• Create 2 identical images at VM start time;
• Ensure response to each VMexit is identical;
    − Force them to be identical if necessary (rdtsc)
• Deliver              interrupt at the identical instruction
    − At each VMexit, deliver interrupts if they are pending;
    − Use the count-down PMU counter to force a
      synchronization point if no VMexits happen and deliver
      interrupts at that point.
• Log VMexit return values as necessary (for replay);
• Log when interrupts are delivered (for replay)


7   24 February 2009
I/O
• Input        is sent to both VMs (network and disk);
    − Input is logged as being part of a specific checkpoint so
      on replay inputs can come from the log;
• Output               is compared and if equal, is sent out;
    − If not equal, re-execute from the last good checkpoint
    − Blktap driver modified to allow disk output to be
      compared
    − Network backend driver modified to allow network
      output to be compared;
    − Output is counted so on replay the correct number of re-
      created outputs can be discarded and not output twice.

8   24 February 2009
Checkpointing and Comparing
    Incremental, periodic checkpoints of each VM
•
    − At exactly the same instruction;
    − Utilize COW or copy at checkpoint time;
    − Mark the checkpoint event in the input and output streams
    − Immediate continue execution after checkpoint done;
    After checkpoint X is done, compare the incremental
•
    checkpoints
    − If equal, delete one of the checkpoint
    − If equal, delete checkpoint X-1 + any logs for x-1;
    − If not equal, then do a replay from checkpoint X-1;
  At checkpoint event, tell input to start a new log;
•
• At checkpoint event, tell output to record the output count
  and start a new count for the next checkpoint


9    24 February 2009
Replay from Checkpoint X
• Restore                the registers and memory image
• Tell  input i/o to replay input from log for
     checkpoint X
• Tell  output that we are replaying from checkpoint X
     and it then throws away as many i/o’s as were
     done since checkpoint X.




10    24 February 2009
Initial Limitations
• Uniprocessor            VMs
• HVM            guests
• Single           Node implementation




11   24 February 2009
Performance Data to Date
• Implementation      to date includes lockstep VMs and
     disk i/o input funneling and output checking
     (network i/o is implemented but not yet tested)
• Bonnie   i/o benchmark didn’t show any
     degradation;
• SpecCPU     benchmark suite showed 2-5%
     degradation.




12    24 February 2009
Plans
• Implement             the checkpoint and replay code
• Work            on performance of lockstep and checkpoint
• Investigate           UP, HVM and single-site restrictions.




13   24 February 2009
XS Oracle 2009 Error Detection

More Related Content

What's hot

XS Japan 2008 Services English
XS Japan 2008 Services EnglishXS Japan 2008 Services English
XS Japan 2008 Services English
The Linux Foundation
 
XS Boston 2008 Project Status
XS Boston 2008 Project StatusXS Boston 2008 Project Status
XS Boston 2008 Project Status
The Linux Foundation
 
XS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO EmulationXS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO Emulation
The Linux Foundation
 
Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008
The Linux Foundation
 
XS Oracle 2009 PVOps
XS Oracle 2009 PVOpsXS Oracle 2009 PVOps
XS Oracle 2009 PVOps
The Linux Foundation
 
XS 2008 Boston VTPM
XS 2008 Boston VTPMXS 2008 Boston VTPM
XS 2008 Boston VTPM
The Linux Foundation
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
The Linux Foundation
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
The Linux Foundation
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
The Linux Foundation
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
The Linux Foundation
 
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceHow to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
StarWind Software
 
Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008
The Linux Foundation
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXi
Masafumi Ohta
 
XS Oracle 2009 Just Run It
XS Oracle 2009 Just Run ItXS Oracle 2009 Just Run It
XS Oracle 2009 Just Run It
The Linux Foundation
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best Practis
Blauge
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and Tricks
Amit Gatenyo
 
Презентация RDS & App-V, VDI
Презентация RDS & App-V, VDIПрезентация RDS & App-V, VDI
Презентация RDS & App-V, VDI
Виталий Стародубцев
 
Windows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализацииWindows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализации
Виталий Стародубцев
 
i//:squared Business Continuity Event
i//:squared Business Continuity Eventi//:squared Business Continuity Event
i//:squared Business Continuity Event
Jonathan Allmayer
 
Building vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsBuilding vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring Tools
Pablo Roesch
 

What's hot (20)

XS Japan 2008 Services English
XS Japan 2008 Services EnglishXS Japan 2008 Services English
XS Japan 2008 Services English
 
XS Boston 2008 Project Status
XS Boston 2008 Project StatusXS Boston 2008 Project Status
XS Boston 2008 Project Status
 
XS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO EmulationXS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO Emulation
 
Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008
 
XS Oracle 2009 PVOps
XS Oracle 2009 PVOpsXS Oracle 2009 PVOps
XS Oracle 2009 PVOps
 
XS 2008 Boston VTPM
XS 2008 Boston VTPMXS 2008 Boston VTPM
XS 2008 Boston VTPM
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceHow to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
 
Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXi
 
XS Oracle 2009 Just Run It
XS Oracle 2009 Just Run ItXS Oracle 2009 Just Run It
XS Oracle 2009 Just Run It
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best Practis
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and Tricks
 
Презентация RDS & App-V, VDI
Презентация RDS & App-V, VDIПрезентация RDS & App-V, VDI
Презентация RDS & App-V, VDI
 
Windows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализацииWindows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализации
 
i//:squared Business Continuity Event
i//:squared Business Continuity Eventi//:squared Business Continuity Event
i//:squared Business Continuity Event
 
Building vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsBuilding vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring Tools
 

Viewers also liked

XS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gigXS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gig
The Linux Foundation
 
Xen Directions HXEN Slides
Xen Directions HXEN SlidesXen Directions HXEN Slides
Xen Directions HXEN Slides
The Linux Foundation
 
Ese Neno Da Rua
Ese Neno Da RuaEse Neno Da Rua
Ese Neno Da Rua
mpernascruz
 
Apollotheme presentation
Apollotheme presentationApollotheme presentation
Apollotheme presentation
Tien Nguyen
 
XS Boston 2008 Guest Spinning
XS Boston 2008 Guest SpinningXS Boston 2008 Guest Spinning
XS Boston 2008 Guest Spinning
The Linux Foundation
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
Tien Nguyen
 
XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64
The Linux Foundation
 

Viewers also liked (7)

XS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gigXS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gig
 
Xen Directions HXEN Slides
Xen Directions HXEN SlidesXen Directions HXEN Slides
Xen Directions HXEN Slides
 
Ese Neno Da Rua
Ese Neno Da RuaEse Neno Da Rua
Ese Neno Da Rua
 
Apollotheme presentation
Apollotheme presentationApollotheme presentation
Apollotheme presentation
 
XS Boston 2008 Guest Spinning
XS Boston 2008 Guest SpinningXS Boston 2008 Guest Spinning
XS Boston 2008 Guest Spinning
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64
 

Similar to XS Oracle 2009 Error Detection

Network Connectivity Test Script
Network Connectivity Test ScriptNetwork Connectivity Test Script
Network Connectivity Test Script
Mersedeh Arvaneh
 
A05
A05A05
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Jignesh Shah
 
The Verification Methodology Landscape
The Verification Methodology LandscapeThe Verification Methodology Landscape
The Verification Methodology Landscape
DVClub
 
Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to ask
lauraxthomson
 
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUISPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
andrew.macleod
 
Ruby и TestComplete
Ruby и TestCompleteRuby и TestComplete
Ruby и TestComplete
automated-testing.info
 
Designing and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale ServicesDesigning and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale Services
bigqiang zou
 
20080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture0120080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture01
Computer Science Club
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
Steve Loughran
 
Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010
Clemens Reijnen
 
Verify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e testsVerify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e tests
Ken'ichi Ohmichi
 
How Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME ResultsHow Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME Results
Safe Software
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Cωνσtantίnoς Giannoulis
 
Wndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testingWndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testing
Транслируем.бел
 
Ug. marketplace testing
Ug. marketplace testingUg. marketplace testing
Ug. marketplace testing
Транслируем.бел
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptx
mani723
 
Reliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated TestsReliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated Tests
Waseem Hamshawi
 
Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3
Alex Tregubov
 
Yield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applicationsYield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applications
Pete Sarson, PH.D
 

Similar to XS Oracle 2009 Error Detection (20)

Network Connectivity Test Script
Network Connectivity Test ScriptNetwork Connectivity Test Script
Network Connectivity Test Script
 
A05
A05A05
A05
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
The Verification Methodology Landscape
The Verification Methodology LandscapeThe Verification Methodology Landscape
The Verification Methodology Landscape
 
Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to ask
 
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUISPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
 
Ruby и TestComplete
Ruby и TestCompleteRuby и TestComplete
Ruby и TestComplete
 
Designing and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale ServicesDesigning and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale Services
 
20080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture0120080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture01
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010
 
Verify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e testsVerify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e tests
 
How Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME ResultsHow Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME Results
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
 
Wndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testingWndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testing
 
Ug. marketplace testing
Ug. marketplace testingUg. marketplace testing
Ug. marketplace testing
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptx
 
Reliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated TestsReliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated Tests
 
Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3
 
Yield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applicationsYield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applications
 

More from The Linux Foundation

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
The Linux Foundation
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
The Linux Foundation
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
The Linux Foundation
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
The Linux Foundation
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
The Linux Foundation
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
The Linux Foundation
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
The Linux Foundation
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
The Linux Foundation
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
The Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
The Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
The Linux Foundation
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
The Linux Foundation
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
The Linux Foundation
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
The Linux Foundation
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
The Linux Foundation
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
The Linux Foundation
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
The Linux Foundation
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
The Linux Foundation
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
The Linux Foundation
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
The Linux Foundation
 

More from The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

XS Oracle 2009 Error Detection

  • 1. Detecting and Correcting Transient Hardware Errors John Byrne (john.l.byrne@hp.com), Norman P. Jouppi, Laura Ramirez, Parthasarathy Ranganathan, Bruce J. Walker HP Labs Nidhi Aggarwal, Kewal K. Saluja, James E. Smith University of Wisconsin – Madison © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart/Failover; work; keep running, Running with correct results Lose ongoing work even with bad results 2 24 February 2009
  • 3. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart VM; work; keep running, Running with correct results Lose ongoing work even with bad results Ongoing FT Work: Remus Checkpoint and restart on complete node failure Kemari Output comparison; pick one if they don’t match Marathon Lockstep VMware Execution Backup takes over on complete node failure 3 24 February 2009
  • 4. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart VM; work; keep running, Running with correct results Lose ongoing work even with bad results Ongoing FT Work: Remus Checkpoint and restart on complete node failure Kemari Output comparison; pick one if they don’t match Marathon Lockstep VMware Execution Backup takes over on complete node failure Theme: Deal with Complete Node Failure No one is detecting or correcting transient processor failures 4 24 February 2009
  • 5. Transient Hardware Errors International Technology Roadmap for Semiconductors • has predicted significant reliability problems Intel study in 2005 indicated 100-fold increase in • transient faults in scaling from 180nm to 16nm Errors can: a. Crash the OS; b. Corrupt data; c. Cause execution to take a different path; Goal: Detect and Correct Transient Errors 5 24 February 2009
  • 6. Detect and Correct Transient Errors Lockstep VMs with ongoing checkpoints. 1. Tee input to both VMs 2. Compare output from VMs and re-execute on 3. miscompare Compare checkpoints and re-execute on 4. miscompare Log input, interrupts and non-deterministic 5. instructions to allow completely accurate re- execution; 6 24 February 2009
  • 7. Lockstep VMs • Create 2 identical images at VM start time; • Ensure response to each VMexit is identical; − Force them to be identical if necessary (rdtsc) • Deliver interrupt at the identical instruction − At each VMexit, deliver interrupts if they are pending; − Use the count-down PMU counter to force a synchronization point if no VMexits happen and deliver interrupts at that point. • Log VMexit return values as necessary (for replay); • Log when interrupts are delivered (for replay) 7 24 February 2009
  • 8. I/O • Input is sent to both VMs (network and disk); − Input is logged as being part of a specific checkpoint so on replay inputs can come from the log; • Output is compared and if equal, is sent out; − If not equal, re-execute from the last good checkpoint − Blktap driver modified to allow disk output to be compared − Network backend driver modified to allow network output to be compared; − Output is counted so on replay the correct number of re- created outputs can be discarded and not output twice. 8 24 February 2009
  • 9. Checkpointing and Comparing Incremental, periodic checkpoints of each VM • − At exactly the same instruction; − Utilize COW or copy at checkpoint time; − Mark the checkpoint event in the input and output streams − Immediate continue execution after checkpoint done; After checkpoint X is done, compare the incremental • checkpoints − If equal, delete one of the checkpoint − If equal, delete checkpoint X-1 + any logs for x-1; − If not equal, then do a replay from checkpoint X-1; At checkpoint event, tell input to start a new log; • • At checkpoint event, tell output to record the output count and start a new count for the next checkpoint 9 24 February 2009
  • 10. Replay from Checkpoint X • Restore the registers and memory image • Tell input i/o to replay input from log for checkpoint X • Tell output that we are replaying from checkpoint X and it then throws away as many i/o’s as were done since checkpoint X. 10 24 February 2009
  • 11. Initial Limitations • Uniprocessor VMs • HVM guests • Single Node implementation 11 24 February 2009
  • 12. Performance Data to Date • Implementation to date includes lockstep VMs and disk i/o input funneling and output checking (network i/o is implemented but not yet tested) • Bonnie i/o benchmark didn’t show any degradation; • SpecCPU benchmark suite showed 2-5% degradation. 12 24 February 2009
  • 13. Plans • Implement the checkpoint and replay code • Work on performance of lockstep and checkpoint • Investigate UP, HVM and single-site restrictions. 13 24 February 2009