Successfully reported this slideshow.
Your SlideShare is downloading. ×

Raising the Bar on Robotics Code Quality

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 18 Ad

More Related Content

Slideshows for you (20)

Similar to Raising the Bar on Robotics Code Quality (20)

Advertisement

Recently uploaded (20)

Raising the Bar on Robotics Code Quality

  1. 1. Thomas Moulard tmoulard@amazon.com Raising the Bar on Robotics Code Quality 08/01/2019 Tooling and Methodology for Robotics Software Teams building critical ROS 2 Applications
  2. 2. Table of contents • Raising the bar on open-source code quality? • Code Instrumentation: ASAN/TSAN • ClangThread Safety Extensions • Fuzzing ROS 2
  3. 3. What is AWS RoboMaker? • AWS Cloud9 simplify ROS development • Cloud Simulation accelerate robot validation • Fleet Management provide over the air update capabilities to a robotic fleet. • Cloud Extensions easily interface ROS with AWS services such as Amazon Lex, Amazon Polly, Amazon Kinesis Video Streams, Amazon Rekognition, and Amazon CloudWatch. aws.amazon.com/robomaker Hello world Navigation and person recognition Voice commands Robot monitoring Sample Applications
  4. 4. Testing Robots is hard • Errors are critical: a single bug can break a robot. • Software input is uncontrolled. • Experimenting with hardware is slow. • Software is tightly coupled to hardware. • System behavior depends on a large number of parameters which need to be tuned. Finding bugs in a robotic system is time consuming and bugs have a high impact. … (Any) Server One robot serves a few users, deploying software is hard. One server serves a lots of users, deploying software is easier.
  5. 5. Raising the Bar on Open-Source Code Quality Ensuring Code Quality for OSS is challenging: • Shared Ownership • Decision Making slower/harder • Stakeholders are hard to identify • End-to-End Testing? Which strategy for your robotic team? 1. Fork? 2. Contribute back? 3. Both? Are you facing difficulties running ROS 1/2 in production → Talk to us!
  6. 6. Solution: better developer infrastructure! 1. We cannot review all PRs, 2. We cannot maintain all the packages …but we can build tooling! Automatic Code Analysis and CI running it automatically is crucial to code quality. Enable the community to work together on eliminating defects: • Memory Issues • Concurrency Issues • Performance AWS CodeBuild
  7. 7. Compiler Instrumentation
  8. 8. Automating C++ Code Defect Discovery ASAN/MSAN Valgrind Dr. Memory Mudflap Guard Page gperftools Technology CTI DBI DBI CTI Library Library ARCH x86, ARM, PPC x86, ARM, PPC, MIPS, … x86 All (?) All (?) All (?) OS Linux, OS X, Windows, … Linux, OS X, Solaris, … Windows, Linux Linux, Mac (?) All (?) Linux, Windows Slowdown 2x 20x 10x 2x-40x ? ? Heap OOB yes yes yes yes some some Stack OOB yes no no some no no Global OOB yes no no ? no no UAF yes yes yes yes yes yes UAR yes no no no no no UMR yes (MSAN) yes yes ? no no Leaks yes yes yes ? no yes Source: https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools
  9. 9. AdressSanitizer (ASan) Overview Detect a large variety of memory defects: • Out-of-bounds accesses to heap, stack and globals • Use-after-free • Use-after-return • Use-after-scope • Double-free, invalid free Integrated with recent version of Clang and GCC: -fsanitize=address Only find bugs in executed code paths. New! On ARM64, HWASAN is even more efficient. Source: https://android-developers.googleblog.com/2017/08/android-bug-swatting-with-sanitizers.html
  10. 10. ThreadSanitizer (TSan) Overview Detect concurrency-related defects: • Potential deadlocks • Race conditions • Unsafe signal callback - see man signal-safety(7) Integrated with recent version of Clang and GCC: -fsanitize=thread void signal_handler() { // Will fail and set errno to ABCD my_function_modifying_errno(); if (errno == ABCD) { /* do something */ } } int main() { install_signal_handler(&signal_handler); // Will fail and set errno to EFGH: my_other_function_modifying_errno(); // A signal is received! // signal_handler() gets executed here. // This gets executed: if (errno == ABCD) { /* do something */ } // ...but this should have been executed: else if (errno == EFGH) { /* do something else */ } }
  11. 11. Compiling ROS 2 with ASAN / TSAN # Initial Setup sudo apt-get install python3-colcon-mixin colcon mixin add default https://raw.githubusercontent.com/colcon/colcon-mixin-repository/master/index.yaml colcon mixin update default # Workspace Compilation (ASAN) cd ~/ros2_asan_ws colcon build --build-base=build-asan --install-base=install-asan --cmake-args -DOSRF_TESTING_TOOLS_CPP_DISABLE_MEMORY_TOOLS=ON -DINSTALL_EXAMPLES=OFF -DSECURITY=ON --no-warn-unused-cli -DCMAKE_BUILD_TYPE=Debug --mixin asan-gcc --packages-up-to test_communication --symlink-install # Workspace Compilation (TSAN) cd ~/ros2_tsan_ws colcon build --build-base=build-tsan --install-base=install-tsan --cmake-args -DOSRF_TESTING_TOOLS_CPP_DISABLE_MEMORY_TOOLS=ON -DINSTALL_EXAMPLES=OFF -DSECURITY=ON --no-warn-unused-cli -DCMAKE_BUILD_TYPE=Debug --mixin tsan --packages-up-to test_communication --symlink-install
  12. 12. ROS 2 CI Integration ci.ros2.org > Nightly > *_sanitizer Catch regressions early! Only run rcpputils and rcutils unit tests. Will expend the scope of those jobs as more and more packages get fixed! We are looking for volunteers to help us fix those bugs!
  13. 13. Thread Safety Annotations
  14. 14. Thread Safety Annotation • Clang + libclangcxx required. • Detect concurrency issues at compile time. • Need to annotate classes attributes and functions. • But does not require full instrumentation (can be migrated progressively!) • Need to pass specific flag: -Wthread-safety Race conditions are hard to find during code reviews. It can take very long before the bug is triggered on a production platform. Start annotating your code today! Real life ROS 2 example: rmw_fastrtps_shared_cpp/topic_cache.hpp #include "mutex.h" class BankAccount { private: Mutex mu; int balance GUARDED_BY(mu); void depositImpl(int amount) { balance += amount; // WARNING! Cannot write balance without locking mu. } void withdrawImpl(int amount) REQUIRES(mu) { balance -= amount; // OK. Caller must have locked mu. } public: void withdraw(int amount) { mu.Lock(); withdrawImpl(amount); // OK. We've locked mu. } // WARNING! Failed to unlock mu. void transferFrom(BankAccount& b, int amount) { mu.Lock(); b.withdrawImpl(amount); // WARNING! Calling withdrawImpl() requires locking b.mu. depositImpl(amount); // OK. depositImpl() has no requirements. mu.Unlock(); } }; Source: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
  15. 15. Fuzzing ROS 2
  16. 16. ROS 2 Fuzzing ROS 2 is writing and loading lots of data: • Config files: YAML, XML • ROS bags • URDFs • Messages (serialization/unserialization) • Etc. Fuzzing is essential (and easy!). This naive script relies on radamsa to generate ROS 2 messages was able to crash the ros2 cli! #!/usr/bin/env bash i=0 for word in $(aspell -d en dump master | aspell -l en expand | head -n 5); do echo "{data: "${word}"}" > "/tmp/sample-${i}" i=$((i+1)) done pgrep listener || exit 0 while true; do STR=$($HOME/radamsa/bin/radamsa /tmp/sample-*) echo "$STR" (ros2 topic pub --once /chatter std_msgs/String "${STR}" 2>&1) > /dev/null test $? -gt 127 && break # break on segfaults pgrep listener || break done echo "SEGV"
  17. 17. What’s next? UndefinedBehaviorSanitizer (UBSan) integration: • bool • integer-divide-by-zero • return • returns-nonnull-attribute • shift-exponent • unreachable • vla-bound Integrate Clang Control–Flow Integrity? Annotate ROS 2 code with the Thread Safety Annotations. Need ot fix ROS 2 Linux clang build with libclangcxx! Expend testing to more than core packages!
  18. 18. Thank you!

Editor's Notes

  • Talk about AWS RoboMaker and its main features (dev / simulation / fleet management)
    Those features integrate and extend open-source software
  • DBI: dynamic binary instrumentation CTI: compile-time instrumentation UMR: uninitialized memory reads UAF: use-after-free (aka dangling pointer) UAR: use-after-return OOB: out-of-bounds x86: includes 32- and 64-bit. mudflap was removed in GCC 4.9, as it has been superseded by AddressSanitizer. Guard Page: a family of memory error detectors (Electric fence or DUMA on Linux, Page Heap on Windows, libgmalloc on OS X) gperftools: various performance tools/error detectors bundled with TCMalloc. Heap checker (leak detector) is only available on Linux. Debug allocator provides both guard pages and canary values for more precise detection of OOB writes, so it's better than guard page-only detectors.

×