Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Challenges and
Solutions in
Autonomous Vehicle
Validation
© 2017 Edge Case Research LLC
Prof. Philip Koopman
2© 2017 Edge Case Research LLC
 Self-driving cars are so cool!
 But also kind of scary
 Need many skills:
 Good techni...
3© 2017 Edge Case Research LLC
1985 1990 1995 2000 2005 2010
DARPA
Grand
Challenge
DARPA
LAGR
ARPA
Demo
II
DARPA
SC-ALV
NA...
4© 2017 Edge Case Research LLC
 Even the “easy” (well known) cases are challenging
Lesson Learned: Validating Robots Is D...
5© 2017 Edge Case Research LLC
The Tough Cases Are Legion
http://piximus.net/fun/funny-and-odd-things-spotted-on-the-road
...
6© 2017 Edge Case Research LLC
 6/2016: Tesla “AutoPilot” Crash
 Claimed first crash after 130M miles of operation
 But...
7© 2017 Edge Case Research LLC
 Need to test for at least ~3x crash rate to validate safety
 Hypothetical deployment: NY...
8© 2017 Edge Case Research LLC
 Mean time between road hazards is too loose a metric
 How many miles to see all hazards ...
9© 2017 Edge Case Research LLC
 At best, each road hazard type arrives independently
 But, that doesn’t tell us how ofte...
10© 2017 Edge Case Research LLC
Deep Learning Can Be Brittle & Inscrutable
https://goo.gl/5sKnZV
QuocNet:
Car Not a
Car
Ma...
11© 2017 Edge Case Research LLC
 Machine Learning doesn’t fit the V
 ML “learns” how to behave via training data
– ML so...
12© 2017 Edge Case Research LLC
 Validating everything at once is infeasible
 Might need 1 billion to 1 trillion miles o...
13© 2017 Edge Case Research LLC
 Safety Envelope:
 Specify unsafe regions for safety
 Specify safe regions for function...
14© 2017 Edge Case Research LLC
 “Doer” subsystem
 Implements normal functionality
 Allocate functional requirements to...
APD (Autonomous Platform Demonstrator)
How did we make this scenario safe?
TARGET GVW: 8,500 kg
TARGET SPEED: 80 km/hr
App...
The Autonomous
Platform Demonstrator
(APD) was the first
UGV to use a Safety
Monitor as part of its
safety case.
As a resu...
17© 2017 Edge Case Research LLC
 ASTAA: Automated Stress Testing of Autonomy Architectures
 Key idea: combination of exc...
18© 2017 Edge Case Research LLC
 Improper handling of floating-point numbers:
 Inf, NaN, limited precision
 Array index...
19© 2017 Edge Case Research LLC
 Good job of advocating for progress
 E.g., recommends ISO 26262 safety standard
 But, ...
20© 2017 Edge Case Research LLC
 Multi-prong approach required for validation
 Need rigorous engineering, not just vehic...
21© 2016 Edge Case Research LLC
 Specialists in Making Robust Autonomy Software
 Embedded software quality assessment an...
Upcoming SlideShare
Loading in …5
×

Challenges and Solutions in Autonomous Vehicle Validation

6,719 views

Published on

Presented at AV17, Detroit

Published in: Automotive
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Challenges and Solutions in Autonomous Vehicle Validation

  1. 1. Challenges and Solutions in Autonomous Vehicle Validation © 2017 Edge Case Research LLC Prof. Philip Koopman
  2. 2. 2© 2017 Edge Case Research LLC  Self-driving cars are so cool!  But also kind of scary  Need many skills:  Good technical skills  Good process skills  Safety & security  Management who “gets it”  AND: – Safety approach that is more than testing – Approaches to validate Machine Learning How Do You Validate Autonomous Vehicles? https://en.wikipedia.org/wiki/Autonomous_car
  3. 3. 3© 2017 Edge Case Research LLC 1985 1990 1995 2000 2005 2010 DARPA Grand Challenge DARPA LAGR ARPA Demo II DARPA SC-ALV NASA Lunar Rover NASA Dante II Auto Excavator Auto Harvesting Auto Forklift Mars Rovers Urban Challenge DARPA PerceptOR DARPA UPI Auto Haulage Auto Spraying Laser Paint RemovalArmy FCS Carnegie Mellon University Faculty, staff, students Off-campus Robotics Institute facility NREC: 30+ Years Of Cool Robots Software Safety
  4. 4. 4© 2017 Edge Case Research LLC  Even the “easy” (well known) cases are challenging Lesson Learned: Validating Robots Is Difficult Extreme contrast No markings Poor visibility Unusual obstacles Construction Water (appears flat!) [Wagner 2014]
  5. 5. 5© 2017 Edge Case Research LLC The Tough Cases Are Legion http://piximus.net/fun/funny-and-odd-things-spotted-on-the-road http://edtech2.boisestate.edu/robertsona/506/images/buffalo.jpg https://www.flickr.com/photos/hawaii-mcgraths/4458907270/in/photolist-Y59LC-7TNzAv-5WSEds-7N24My https://pixabay.com/en/bunde-germany-landscape-sheep-92931/
  6. 6. 6© 2017 Edge Case Research LLC  6/2016: Tesla “AutoPilot” Crash  Claimed first crash after 130M miles of operation  But, that does not assure 130M miles between crashes! Are Self-Driving Cars Safe Enough?
  7. 7. 7© 2017 Edge Case Research LLC  Need to test for at least ~3x crash rate to validate safety  Hypothetical deployment: NYC Medallion Taxi Fleet  13,437 vehicles @ 70,000 miles/yr = 941M miles/year  7 critical crashes in 2015 134M miles/critical crash (death or serious injury)  How much testing to validate critical crash rate?  Answer: 3x – ~10x the mean crash rate – 3x is without any crash – If you get a crash, you need to test longer  Design changes reset the testing clock  Assumes random independent arrivals  Exponential inter-arrival spacing  Is this a good assumption? Billions of Miles Of Testing? Testing Miles Confidence if NO critical crash seen 122.8M 60% 308.5M 90% 401.4M 95% 617.1M 99% [2014 NYC Taxi Fact Book] [Fatal and Critical Injury data / Local Law 31 of 2014]
  8. 8. 8© 2017 Edge Case Research LLC  Mean time between road hazards is too loose a metric  How many miles to see all hazards that arrive, on average, every 1M miles? – Case 1: 10 unique hazards at 10 million miles/hazard  ~100 million miles of testing – Case 2: 100,000 unique hazards at 100 billion miles/hazard  ~1 trillion miles of testing Heavy Tail Inter-Arrival Rates Random Independent Arrival Rate (exponential) Power Law Arrival Rate (80/20 rule) Random Independent Arrival Rate (exponential) Power Law Arrival Rate (80/20 rule) Cross-Over at 4000 hrs Random Independent Arrival Rate (exponential) Power Law Arrival Rate (80/20 rule) Many , Infrequent Scenarios Total Area is the same! Different Cross-Over at 4000 hrs
  9. 9. 9© 2017 Edge Case Research LLC  At best, each road hazard type arrives independently  But, that doesn’t tell us how often each event arrives  No surprise if distribution of scenarios is heavy-tailed – E.g., exponential arrivals, but power law distribution of scenarios  Beyond that, conditions are not random/independent  Correlations: geographic region, weather, holiday, commute  Betting against a heavy tail edge case distribution is risky  Need to think more deeply than “drive a lot of miles”  Need to have some assurance your system will work beyond accumulating miles  That’s what software safety approaches are for! – OK, so what happens if you try to apply ISO 26262? … Is Trillions of Miles of Testing Enough? https://goo.gl/VmfM22 Haleakalā National Park https://goo.gl/PUACcJ
  10. 10. 10© 2017 Edge Case Research LLC Deep Learning Can Be Brittle & Inscrutable https://goo.gl/5sKnZV QuocNet: Car Not a Car Magnified Difference Bus Not a Bus Magnified Difference AlexNet: Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).
  11. 11. 11© 2017 Edge Case Research LLC  Machine Learning doesn’t fit the V  ML “learns” how to behave via training data – ML software itself is just a framework  The training data forms de facto requirements – The “magic” of ML is being able to skip requirements – But, how do we trace that to acceptance tests? » We’ll need to argue training data or test data is “complete”  Is training data “complete” and correct?  Training data is safety critical  What if a moderately rare case isn’t trained? – It might not behave as you expect – People’s perception of “almost the same” does not necessarily predict ML responses! How Do We Trace To Safety Standard “V”? REQUIREMENTS SPECIFICATION SYSTEM SPECIFICATION SUBSYSTEM/ COMPONENT SPECIFICATION PROGRAM SPECIFICATION MODULE SPECIFICATION SOURCE CODE UNIT TEST PROGRAM TEST SUBSYSTEM/ COMPONENT TEST SYSTEM INTEGRATION & TEST ACCEPTANCE TEST VERIFICATION & TRACEABILITY VALIDATION & TRACEABILITY VERIFICATION & TRACEABILITY VERIFICATION & TRACEABILITY VERIFICATION & TRACEABILITY Review Review Review Review Review Review Review Review Review Review Review Cluster Analysis ? ?
  12. 12. 12© 2017 Edge Case Research LLC  Validating everything at once is infeasible  Might need 1 billion to 1 trillion miles of road test  Try breaking this up into two steps: requirements and implementation  Validate requirements: what is the correct behavior?  Taxonomy of driving scenarios tracing data collection to test scenarios  Taxonomy of edge cases tracing data collection to test scenarios  Challenge: edge case for ML system might be different than edge case for a human driver  Validate implementation: does vehicle behave correctly?  Test scenarios are only a start – what if system is brittle? – Does it correctly identify which scenario it is experiencing? – Robustness testing can help identify problems  It’s not all ML; use traditional safety for the rest (ISO 26262) – Might use non-ML software to ensure ML safety Validation: Requirements vs. Implementation DISTRIBUTION A – NREC case number STAA-2013-10-02 https://goo.gl/g3t3P9
  13. 13. 13© 2017 Edge Case Research LLC  Safety Envelope:  Specify unsafe regions for safety  Specify safe regions for functionality – Deal with complex boundary via: » Under-approximate safe region » Over-approximate unsafe region  Trigger system safety response upon transition to unsafe region  Partition the requirements:  Operation: functional requirements  Failsafe: safety requirements (safety functions) Safety Envelope Approach to ML Deployment
  14. 14. 14© 2017 Edge Case Research LLC  “Doer” subsystem  Implements normal functionality  Allocate functional requirements to Doer – Put all the Machine Learning content here!  “Checker” subsystem – Traditional SW  Implements failsafes (safety functions)  Allocate safety requirements to Checker – Use traditional software safety techniques  Checker is entirely responsible for safety  Doer can be at low Safety Integrity Level (SIL) – Failure is lack of availability, not a mishap  Checker must be at high SIL (failure is unsafe), e.g., ISO 26262 – Often, Checker can be much simpler than Doer Architecting A Safety Envelope System Doer/Checker Pair Low SIL High SIL Simple Safety Envelope Checker ML
  15. 15. APD (Autonomous Platform Demonstrator) How did we make this scenario safe? TARGET GVW: 8,500 kg TARGET SPEED: 80 km/hr Approved for Public Release. TACOM Case #20247 Date: 07 OCT 2009
  16. 16. The Autonomous Platform Demonstrator (APD) was the first UGV to use a Safety Monitor as part of its safety case. As a result, the U.S. Army approved APD for demonstrations involving soldier participation. U.S. Army cites high quality of APD safety case and turns to NREC to improve the safety of unmanned vehicles. Approved for Public Release – Distribution is Unlimited (NREC case #: STAA-2012-10-17) Fail-Safe Approach: Fail-Operational Approach:
  17. 17. 17© 2017 Edge Case Research LLC  ASTAA: Automated Stress Testing of Autonomy Architectures  Key idea: combination of exceptional & normal inputs to an interface  Builds on 20 years of research experience – Ballista OS robustness testing in 1990s … to … – Autonomous vehicle testing starting in 2010  Example: Ground Vehicle CAN interception:  Test Injector – Selectively modifies CAN messages on the fly – Modification based on data type information  Invariant monitor – Reads messages for invariant evaluation – “Checker” invariant monitor detects failures  Commercial tool build-out:  Edge Case Research Switchboard (software & hardware interface testing) Robustness Testing DISTRIBUTION A – NREC case number STAA-2013-10-02
  18. 18. 18© 2017 Edge Case Research LLC  Improper handling of floating-point numbers:  Inf, NaN, limited precision  Array indexing and allocation:  Images, point clouds, etc…  Segmentation faults due to arrays that are too small  Many forms of buffer overflow with complex data types  Large arrays and memory exhaustion  Time:  Time flowing backwards, jumps  Not rejecting stale data  Problems handling dynamic state:  For example, lists of perceived objects or command trajectories  Race conditions permit improper insertion or removal of items  Garbage collection causes crashes or hangs Robustness Testing Finds Problems DISTRIBUTION A – NREC case number STAA-2013-10-02
  19. 19. 19© 2017 Edge Case Research LLC  Good job of advocating for progress  E.g., recommends ISO 26262 safety standard  But, some areas need work, such as:  Machine Learning validation is immature  Need independent safety assessment  Re-certification for all updates to critical components  Which long-tail exceptional situations are “reasonable” to mitigate  Crash data recorders must have forensic validity  Open response to NHTSA at: https://goo.gl/ZumJss  https://betterembsw.blogspot.com/2016/10/draft-response-to- dot-policy-on-highly.html Comments On NHTSA Policy
  20. 20. 20© 2017 Edge Case Research LLC  Multi-prong approach required for validation  Need rigorous engineering, not just vehicle testing  Account for heavy-tail distribution of scenarios – Too many to just learn them all? – How does system recognize it doesn’t know a scenerio?  Unique Machine Learning validation challenges  How do we create “requirements” for an ML system?  How do we ensure that testing traces to the ML training data?  How do we ensure adequate requirements and testing coverage for the real world?  Two techniques that we’re putting into practice  Safety monitor: let ML optimize behavior while guarding against the unexpected  Robustness testing: inject faults into system building blocks to uncover faults Conclusions [General Motors]
  21. 21. 21© 2016 Edge Case Research LLC  Specialists in Making Robust Autonomy Software  Embedded software quality assessment and improvement  Embedded software skill evaluation and training  Lightweight software processes for mission critical systems  Lightweight automation for process & technical area improvement info@edge-case-research.com http://www.edge-case-research.com/

×