Successfully reported this slideshow.
Your SlideShare is downloading. ×

OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challenges to Institutionalise Changes Required for Safety Certification - Lars Kurth, The Xen Project

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 38 Ad

OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challenges to Institutionalise Changes Required for Safety Certification - Lars Kurth, The Xen Project

Download to read offline

Safety certification is one of the essential requirements for software to be used in highly regulated industries. Besides technical and compliance issues (such as ISO 26262 vs IEC 611508) transitioning an existing project to become more easily safety certifiable requires significant changes to development practices within an open source project.

In this session, we will lay out some challenges of making safety certification achievable in open source and the Xen Project. We will outline the process the Xen Project has followed thus far and highlight lessons learned along the way. The talk will primarily focus on necessary process, tooling changes and community challenges that can prevent progress. We will be offering an in-depth review of how Xen Project is approaching this challenging goal and try to derive lessons for other projects and contributors.

Safety certification is one of the essential requirements for software to be used in highly regulated industries. Besides technical and compliance issues (such as ISO 26262 vs IEC 611508) transitioning an existing project to become more easily safety certifiable requires significant changes to development practices within an open source project.

In this session, we will lay out some challenges of making safety certification achievable in open source and the Xen Project. We will outline the process the Xen Project has followed thus far and highlight lessons learned along the way. The talk will primarily focus on necessary process, tooling changes and community challenges that can prevent progress. We will be offering an in-depth review of how Xen Project is approaching this challenging goal and try to derive lessons for other projects and contributors.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challenges to Institutionalise Changes Required for Safety Certification - Lars Kurth, The Xen Project (20)

Advertisement

More from The Linux Foundation (20)

Recently uploaded (20)

Advertisement

OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challenges to Institutionalise Changes Required for Safety Certification - Lars Kurth, The Xen Project

  1. 1. Lars Kurth Community Manager, Xen Project Chairman, Xen Project Advisory Board lars_kurth
  2. 2. Consolidation Reduce cost, size, weight and power consumption Reduce development costs: platform independence Security and Safety Support mixed criticality compositions (Apps with differing safety, security & real-time requirements) Safety Certification of the Hypervisor Embedded Requirements Minimal IRQ latency Low or 0 scheduling overhead Drivers for special I/O devices Flexible architecture
  3. 3. OpenXT, SecureView (desktop, laptops, tablets) Defense Applications Defense Applications Xenon Hypervisor family, Magrana Server, … First time formal methods were applied on a Xen fork Cloud Computing Amazon Web Services, Tencent, Alibaba Cloud, ` IBM SoftLayer, Rackspace, … Server Virtualization Linux Distros, Citrix Hypervisor, Huawei UVP, XCP-ng ARLX/Virtuosity OA, Bromium uXen, Crucible Hypervisor Various Safety Standards Embedded Defense / Security Applications Embedded/ Automotive Virtuosity, XILINX Xen Zynq, Perseus, GlobalLogic Nautilus, EPAM Fusion General purpose desktop and mobile Virtualization XenClient, NxTop, Neosphere, Samsung, Qubes OS
  4. 4. 2012 Xenon Separation VMM family, CC EAL5+ Fork of cut down version of Xen Project used by the US military. Certified to CC EAL 5+ (Semiformally Designed and Tested which has some similarity to safety standards). Tracks upstream and maintained with an effort of 1.5 man years per year
  5. 5. 2012 2012 Xenon Separation VMM family, CC EAL5+ DornerWorks ARLX DO-178 Level A packages, IEC 62304, ISO 26262, MILS EAL, ARINC 653 Support for commercial and FOSS guest OSes OpenGroup FACE certified Virtuosity OA Future Airborne Capability Environment (FACE™) defines the software computing environment and interfaces designed to support the development of portable components across the general-purpose, safety, and security profiles. FACE uses industry standards for distributed communications, programming languages, graphics, operating systems, and other areas as appropriate.
  6. 6. 2012 2012 2016 Xenon Separation VMM family, CC EAL5+ DornerWorks ARLX DO-178 with some level A packages, IEC 62304, ISO 26262, MILS EAL, ARINC 653 Star Lab Crucible OpenGroup FACE certified Virtuosity OA Secure embedded virtualization platform for security-critical operational environments, including aerospace & defense, industrial, transportation, and telecommunications
  7. 7. 2012 2012 2016 2015 Xenon Separation VMM family, CC EAL5+ DornerWorks ARLX DO-178 Level A packages, IEC 62304, ISO 26262, MILS EAL, ARINC 653 Star Lab Crucible Xilinx: Petalinux with Xen OpenGroup FACE certified Virtuosity OA 1st Xen distro for embedded with additional functionality Currently NO safety certification support
  8. 8. 2012 2016 2015 2017 DornerWorks ARLX DO-178 Level A packages, IEC 62304, ISO 26262, MILS EAL, ARINC 653 Star Lab Crucible GlobalLogic EPAM OpenGroup FACE certified Virtuosity OA 2015 Xilinx: Petalinux with Xen 1st Xen based stack for automotive No safety certification 2nd generation Xen based stack for automotive. No safety certification, but working with community and industry on progressing safety
  9. 9. 2016: EPAM and Renesas funded a study by HORIBA MIRA to assess whether it is possible to safety certify a subset of the Xen Project Answer: possible From 2015 – today: Close functional gaps, real-time capability, reducing code-size and create reference implementations (EPAM, XILINX) Answer: suitable platform for some use-cases Number of gaps to be a general purpose platform still worked on All is open source, but not all is upstreamed in Xen
  10. 10. Schedulers: ARINC, RTDS, Null and other real-time support Laid the foundation for embedded use-cases and use of Xen as a partitioning HV Low latency and real-time support A minimal Xen on Arm Configuration < 50 KSLOC of code for a specific HW environment PV drivers (and in future virtio drivers) and GPU mediation for rich IO Available in various upstreams OP-TEE virtualization support Both in Xen and in OP-TEE Dom0less Xen For now: allows booting VM’s without interaction with Dom0, but Dom0 still exists 2020: an architecture without a Dom0 and/or an RTOS as Dom0
  11. 11. Schedulers: ARINC, RTDS, Null and other real-time support Laid the foundation for embedded use-cases and use of Xen as a partitioning HV Low latency and real-time support A minimal Xen on Arm Configuration < 50 KSLOC of code for a specific HW environment PV drivers (and in future virtio drivers) and GPU mediation for rich IO Available in various upstreams OP-TEE virtualization support Both in Xen and in OP-TEE Dom0less Xen For now: allows booting VM’s without interaction with Dom0, but Dom0 still exists 2020: an architecture without a Dom0 and/or an RTOS as Dom0 Key Point: Xen on Arm, turned out to be a great open source hypervisor for embedded and mixed-criticality use-cases Despite having been designed for servers!
  12. 12. FreeRTOS / SafeRTOS FreeRTOS-compatible alternatives from Wittenstein SafeRTOS: proprietary FreeRTOS-rewrite complying with IEC 61508 SIL2LinuxMP Can Linux be Safety certified? Obstacles, tools and processes LF Projects with an ambition to become ”easy to certify” ACRN AGL – Virtualization may make achieving key AGL UCs easier ELISA Project – Develop tools and processes Xen Project Zephyr Each with different history, cultures and problems that have to be overcome
  13. 13. Community Challenges Funding Can FOSS SW be used for Functional Safety? Yes, but there are many barriers Requires major changes to the software Requires tools, infrastructure and expertise Requires changes in how FOSS projects work Until recently: assumption was that the two worlds cannot work together
  14. 14. Level Requirements Application Cost with Experience DAL E The software must exist Infotainment Failure is a minor inconvenience 0.11 hour / SLOC DAL D High-Level Docs/Tests Instruments Failure can be mitigated by operator 0.13 hour / SLOC DAL C Low-Level Docs/Unit Tests, Statement Coverage, and Code/Data Coupling Analysis 0.20 hour / SLOC DAL B Branch Coverage Engine Control Failure could kill someone without warning 0.40 hour / SLOC DAL A Source to Object Analysis and MC/DC Coverage 0.67 hour / SLOC Credit/Source: Dornerworks / XPDS14 - Xen and the Art of Certification.pdf
  15. 15. Level Requirements Application Cost with Experience DAL E The software must exist Infotainment Failure is a minor inconvenience 0.11 hour / SLOC DAL D High-Level Docs/Tests Instruments Failure can be mitigated by operator 0.13 hour / SLOC DAL C Low-Level Docs/Unit Tests, Statement Coverage, and Code/Data Coupling Analysis 0.20 hour / SLOC DAL B Branch Coverage Engine Control Failure could kill someone without warning 0.40 hour / SLOC DAL A Source to Object Analysis and MC/DC Coverage 0.67 hour / SLOC Credit/Source: Dornerworks / XPDS14 - Xen and the Art of Certification.pdf 3-4 times as much without experience
  16. 16. 0 10 20 30 40 50 60 70 30 KSLOC 50 KSLOC 100 KSLOC 200 KSLOC Cost in man years DAL C DAL B DAL A Already investment in the order of 20-30 man years on functionality An investment of 10-15 man years for safety is not outlandish
  17. 17. Examples of Xen based embedded products With some support for safety standards in proprietary spin-offs Expertise in ecosystem that covers Xen and Safety Primarily for hire: too small to fund speculatively Reference implementations with safety in mind EPAM Stack (automotive), XILINX Stack Another similar effort in progress elsewhere (generic safety case) Some limited adoption in niche use-cases today In a non-safety context In safety contexts where safety can be isolated in progress
  18. 18. Want to be in a position where upstream and vendors interested in safety certification collaborate with the goal of making Xen more cheaply safety certifiable With buy-in and support from multiple vendors Don’t want to be at the bleeding edge of this, but just behind Such that we can benefit from ELISA and other projects such as Zephyr
  19. 19. Xen Hypervisor (≤ 50 KSLOC) Dom 0 CPU VM 1 VM 2VM 3 CPU Dom0less VMs loaded by uBoot and booted by Xen (not Dom0), pinned to a CPU via the Null scheduler and I/O handled by device assignment Dom0 completes boot after VM 1 and VM 2. Static set-up CPU VM 1 VM 2 CPU Xen Hypervisor Ongoing work to fully implement true Dom0less for small systems • Shared memory and interrupts for VM-to- VM communications • PV frontends/backends drivers for Dom0- less VMs Dom0less initial safety certification scope
  20. 20. 25 Mix Safety Digital Cockpit In-Vehicle Computer
  21. 21. Picked MISRA C as an example, because … it is representative of the type of community problems that you should expect if you look at safety certification
  22. 22. Subset required by most safety standards 10 Mandatory, 111 Required and 38 Advisory rules Required rules depend on certification level can be deviated from Justifications of deviations would have to be signed off by an assessor Partnership with Perforce: access to QA Verify providing selected community members to results on Xen snapshots Goal: Experiment and Learn
  23. 23. Picked hardest and controversial rules to see what would happen! We did not expect to succeed !
  24. 24. MISRA C spec is proprietary Rule text cannot be copied into a posted patch series ➜ lack of clarity, lack of rationale: leading to unnecessary debate CI set-up does not allow upfront verification of fixes: Primarily a consequence of what we were offered for free Either: commit without knowing a fix worked Or: The developer would have to buy the tool Interactions w compilers, HW, assembly code problematic Ended up with 11 iterations and man weeks of review effort
  25. 25. Some rules will create a flame-war if there is a single argumentative maintainer E.g. MISRA C:2012, 15.7 "if ... else if" constructs should end with "else" clause if (x == 0) { doSomething(); } else if (x == 1) { doSomethingElse(); } else { error(); /* or justification why no action is taken */ }
  26. 26. Possibility of MISRA C Deviations encourage arguments Deviations: justification of a class or instance of non-compliance Deviation Permits: previously approved deviations for a use-case It’s all a bit like like “legal precedent” in common law legal systems: an expert (assessor) is needed to advise the project on a case-by-case basis Community Scalability Code review process encourages too much discussion, if there is no up-front plan on how to approach a disruptive set of changes Fix: A priori agreed strategy and plan on how to approach this
  27. 27. 2 day workshop in March 2019 with 25 attendees – keep it small Community Reps and Support Project leadership team (except for 2) Kate Stewart as observer / advisor Vendors with investment in Xen Vendors with product interest Safety Assessors
  28. 28. Create a understanding between the community and industry Terminology, Concepts, etc. How safety certification works: look at different standards, routes, requirements Explain assets and processes Establish community “red lines” Principles the community can agree to or would object to What level of change would be acceptable Identify potential obstacles Establish whether Xen Project is safety certifiable If so, create a candidate set of feasible certification routes Establish a rough action plan on how to progress
  29. 29. Split development model with an open and a closed part Everything that is valuable to the wider community ideally in the open part, e.g. documentation, some tests, traceability, automation and infrastructure,…. Everything that creates code churn if it wasn’t open as much as possible: e.g. coding standards (MISRA) Changes to the development workflow have to be kept minimal There must be a benefit the community (including for common code) Otherwise the community wont carry There are long-term implications for the community Make-up, scalability, decision making, conflicts – need to be managed No new barriers for contributors can be introduced
  30. 30. Yes: But assumes lightweight processes and automation in community Similar to challenges using Agile in a safety context
  31. 31. Picture by Lars Kurth

Editor's Notes

  • Disclosures:
    I am not a safety expert
    Also, I work for Citrix which has no stake in safety and embedded at all, and I am working with others on the safety side of Xen with the goal of supporting the community

    I am giving a high level overview today, and more details at a second talk on Friday.
    So some of the detail in this talk will be lacking.
  • So the first question to ask is why to virtualize embedded systems at all

    And there we will look at Consolidation, Security and Safety and Special Requirements for Embedded Systems
  • 1: Consolidation is all about reducing cost – both from a HW development and SW development perspective
    On the SW side, virtualization allows you to develop SW against an abstraction which reduces porting effort and makes you less dependent on particular hardware vendors.

    In many cases it is also about reducing size, weight and power consumption

    Security is also a key issue and virtualization provides mechanisms to sandbox different functions of your system in different VMs.
    And for many market segments safety certification is critical

    In addition, there are also a lot of extra requirement needed for some embedded use-cases,
    which I will cover in the talk.
  • Radar / Satellite pic
  • Aircraft carriers, radars, etc.
  • Xenon Familiy: Developed and used by the US military Were able to create a cut down version of Xen certified to CC EAL 5 (Semiformally Designed and Tested which has some similarity to safety standards) Were able to track upstream and maintain with an effort of 1.5 man years per year
  • Came out of a number of research grants that were funded by the US government and also by vendors such as XILINX who saw potential for virtualization in embedded
  • 15mins
  • Skim through this quickly!
  • 16 mins
  • IMPORTANT:
    Figures based on a study on top of Xen funded via “US Navy Small Business Innovative Research (SBIR)“ grant
  • Key point here is that the assumption here is for a one-off-certification of a Xen Project based branch
    It is also important to note that automotive certification is similar enough to DO-178b to use these figures as a baseline
  • 20 mins

    DAL B / DAL C is equivalent to ASIL B - aka certification we are looking for example for instrument clusters
  • (aka Xen is not or not fully safety certified)
    I know 3 examples
  • Much faster startup times
    total ~= xen + domU
    Enable true Dom0-less configurations
    Excellent for small systems
    Easier to certify
    Lower Complexity
    No need for the Xen tools
    Does not require Yocto, just cross-build Xen
    No need for Xen support in Dom0-less VMs, no need for CONFIG_XEN
    Cons:
    No monitoring and restarting DomUs without Dom0
    No PV frontends/backends without Dom0
  • Common theme
    Several ECUs in the car

    Left:
    One acting as a gateway / one as an application/cloud server – Evolution of a Telematik Control Unit [Fleet management, User behavior Insurance, …]

    Right:
    Digital cockpit: Cluster + IVI … possibly ADAS, etc.
  • 28 mins
  • 28 mins
  • Another problem which surfaced is the impact of CODE CHURN and how that impacts the project’s capability to backport security fixes. So there is a VERY GOOD and logical case against minimizing churn
  • Let’s for example say that 1000 MISRA issues have to be fixed
    That it takes on average 2 hours to create a fix and 4 hours to do perform a review

    That would mean that a contributor would spend 1 MAN YEAR creating fixes And established community members would spend 2 MAN YEARS reviewing the code
    That are 2 MAN YEARS not spent on other things, which may be equally as important for the whole community

    So the question then becomes how the code review burden can be minimized without affecting quality And whether this burden can be shifted to newcomers within the community
  • 30 minutes
  • Code churn is difficult: poses a fundamental unresolvable conflict which requires making a case-by-case priority call a) NOT upstreaming creates burden for vendors who want to safety certify
    b) Upstreaming increases the cost of upstream to maintain supported releases and security fixes
    What is interesting though is that in the last 2 years, as a community we had to deal with a similar set of trade-offs when you look at mitigations for side-channel attacks

×