Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)


Published on

This is an interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD) tool intended for verifying parallel applications. In this article you will learn about the history of creating RRD, its basic abilities and also about some other similar tools and the way they differ from RRD.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)

  1. 1. Interview with Dmitriy Vyukov - theauthor of Relacy Race Detector (RRD)Author: Andrey KarpovDate: 06.04.2009AbstractThis is an interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD) tool intended forverifying parallel applications. In this article you will learn about the history of creating RRD, its basicabilities and also about some other similar tools and the way they differ from RRD.IntroductionWe draw your attention to the interview with the author of Relacy Race Detector (RRD) verifier fortesting multi-thread algorithms. Prospects of using RRD and other tools for testing parallel applicationsand related topics are discussed in the article.The questions are asked by (the questions are in bold):Andrey Nikolaevich Karpov. One of the founders of "Program Verification Systems" company, isengaged in developing static code analysis tools. Participates in developing Viva64 and VivaMP tools fortesting 64-bit and parallel applications. Supports the open library VivaCore intended for parsing C/C++code.The questions are answered by:Dmitriy Sergeevich Vyukov. A developer of high-performance C/C++ software in the sphere ofclient/server systems and network servers. In his spare time develops innovative synchronizationalgorithms, programming models for multi-core processors and systems of multi-thread codeverification. The author of Relacy Race Detector (RRD) tool.The interview textHello, Dmitriy. Please, tell us some words about yourself. In what sphereare you working and in what projects are you participating?To the best of my ability Im involved in everything relating to multi-threading and parallelism: scalablesynchronization algorithms, programming models for multi-core processors, multi-thread codeverification and so on. I publish my developments concerning synchronization algorithms in the groupScalable Synchronization Algorithms. I have also developed and now support the tool for verifying multi-thread code Relacy Race Detector (RRD).
  2. 2. What encouraged you to create Relacy Race Detector verifier?RRD appeared rather spontaneously. There had been three preconditions for its creation.The first one - I develop synchronization algorithms and testing, and error localization in them is a veryserious problem: errors occur very seldom or dont occur at all on some computers (for example, oncomputers with less than 4 processors or on computers with a certain OS version). But if an error doesoccur regularly, it is often very hard to understand its cause (i.e. at what moment and what exactly goeswrong). This led to the idea that it would be good to have some "tools" for solving the problem.The second precondition - during the time of dealing with synchronization algorithms some set ofmethods has been collected which I used for testing and locating errors. One of the main methods isinserting a large number of lines like those shown below into the program code:if ((rand() % 1000) == 0) Sleep (rand() % 10);and further stress-testing of the program. This method allows you to execute far more variousinterleavings of threads. This is actually the basic principle of RRDs operation.The third precondition appeared when I finally understood how I could assemble all my methods into anautomatic testing tool, how I could perform the necessary tooling of a program in a simple way and howI could provide high effectiveness of the tool. The rest was easy - the first operational prototype (whichreally found one specially introduced error) had been ready by the night. Although, of course, improvingRRD up to more or less acceptable tool took much more time.Please, tell us about RRD more detailed. On what principles andalgorithms is it based? In what spheres can it be used most effectively?RRD is a tool of dynamic verification without storing states. It is intended, first of all, for testing multi-thread algorithms (synchronization algorithms, multi-thread data structures and so on). For a useroperation with RRD looks like this: in the beginning the algorithm being tested is implemented.Implementation can be expressed through synchronization primitives C++09, POSIX threads (pthread),Win32 API, C#/.NET, Java. But you should use the listed API not directly but with "wrappings" providedby RRD; syntax is nearly the same but there are some differences. When the tested algorithm isimplemented you need to implement one or several unit tests for the algorithm. After that you canlaunch them for execution and RRD will see to effective execution of the tests, that is, as many differentinterleavings of threads will be checked as possible. During execution of each interleaving RRD willperform a lot of different checks of the algorithms correctness, including both users asserts andinvariants, and basic embedded checks - data races, addresses to the released memory, double memoryreleases, memory leaks, deadlocks, livelocks, incorrect use of API (for example, recursive capture of anon-recursive mutex) and so on. When detecting an error RRD shows a detailed history of executionwhich has led to the error. Possessing such a history you can easily locate the error (the history containssuch details as deviation from the sequentially consistent order, instances of ABA problems, falseawakenings at condition variables etc).A lot of embedded checks and thoroughness with which RRD carry them out, allow you in most cases toavoid performing any users checks in the code at all. For example, if were testing the reader-writermutex it is enough just to create several threads which will capture the mutex for rewriting and changethe same variable. If the mutex algorithm doesnt provide mutual exception the race at the protectedvariable will be automatically detected; if the algorithm is subject to a deadlock or a livelock, RRD will
  3. 3. find this out automatically as well. But if were testing a queue of producer-consumer type and thequeue must provide FIFO order of messages, well have to program this check manually.Now some words about the inner structure of RRD and about the algorithms used in it. RRD tools all theaddresses to variables, synchronization primitives and API calls. This allows you to introduce all thenecessary checks into them and also to fully control the thread switch. RRD contains 3 thread schedulers(you choose the scheduler when launching a test).The simplest scheduler is a so called random scheduler. After each primary action performed by aprogram (address to a variable, a synchronization primitive or an API-call) the scheduler chooses athread at random and switches control to it. This scheduler is good for preliminary testing of thealgorithm as it doesnt provide full check but works very quickly.The second scheduler performs full search of possible interleavings of threads (full search scheduler) butits disadvantage is a very long process of verification. It can be used in practice only for small tests.The last - the third - scheduler is the most interesting and useful - this is a so called context boundscheduler. It performs systematic search of interleavings of threads but checks only those interleavingsin which the general number of voluntary switches doesnt exceed some defined number. Because ofthis it provides a good compromise between the checks quality and operating time. I should alsomention that all the schedulers are fair - this allows you to test formally non-terminating algorithms, i.e.algorithms containing loops which can repeat potentially infinitely.On what conditions is RRD distributed?RRD can be used free for non-commercial development with open source codes, for educationalpurposes, for academic developments with non-patent results and also for personal non-commercialuse. For all the rest scopes of use RRD must be paid for. Although there can be private cases; forexample, I participated in some preliminary negotiations concerning providing special licenses fordevelopment of Linux kernel (there are some tricky points concerning patent algorithms andcommercialization), and also for development of Intel Threading Building Blocks (which is distributedunder a double license, one of which is a commercial one).Can you advise some additional resources relating to RRD? Where canone download RRD?The main resource devoting to RRD is situated here: can download the latest version of the library there, find some materials on RRD and ask questionsas well. RRD distribution kit includes some examples which can help master RRD.Perhaps you are familiar with many other verifiers of parallelapplications. Doesnt really any of them implement diagnostics whichRRD offers? In what way are they different from RRD?Of course, before creating RRD I studied many tools for verification (Intel Thread Checker, Chord, Zing,Spin, RacerX, CheckFence, Sober, Coverity Thread Analyzer, CHESS, KISS, PreFast, Prefix, FxCop) hoping
  4. 4. to find what I needed for my purposes. But most tools are intended for, so to say, developers of endapplications and not for developers of synchronization algorithms and parallelism support libraries.None of the tools provided such a level of refinement and accuracy of relaxed memory order [*] Ineeded. Figuratively, if the mentioned tools can verify a program which uses OpenMP, RRD can verifythe implementation of OpenMP itself.[*] Note. Relaxed Memory Order, RMO is a method of working with memory when the processor uses allthe means of caching and dynamic reordering of directions and doesnt try to provide any requirementsto access order and saving of operands in the main memory. Sometimes this mode is called "relaxedmemory model".You have mentioned a lot of different tools. Could you tell us about thembriefly? Perhaps many readers havent even heard about most of thesetools.Id like to say that I havent got acquainted with most of them (installation, launch of samples, usingthem in my own projects). I studied them briefly for I could understand from general descriptions thatthey were not what I wanted, and it was senseless to continue studying them. Thats why I can hardlytell anything interesting for end users but still...I can tell you about Spin tool which approximates RRD in some properties and I know that it has beenused for verifying some synchronization algorithms for Linux kernel and for Threading Building Blocks.Spin is, perhaps, the oldest and most thorough tool of this kind, its roots lie in the beginning of the 80-s,several books had been written on it and Im very pleased that it is still developing. Spin includes a lot ofvariants of check - dynamic check with and without storing states, full and partial (for very largeprograms) checks of the program model and so on, its just impossible to list them all. Promela compiler(the language used by Spin) and verifier (Protocol ANalyser, pan in terms of Spin) have a lot of keyscontrolling different aspects of operation (test mode, the degree of output refinement, memory limitetc), and there are also some GUI frames. In a word, if you need something special you are likely to findit in Spin.The process of working with Spin is in itself similar to working with RRD - a test is described in thespecial language Promela (a PRocess MEta LAnguage), after that you compile it and at the output youreceive the source file in C which must be compiled by a C compiler to get a verifier. Then you launchthe verifier and when an error is detected it creates a file with a thorough description of the error andexecution history. After that from this file you can generate a Postscript file for further browsing orusing it for "playback" of the execution history. As you can see the process of working with Spin is a bitmore complicated than with RRD... well, such is the status :).There is a logical question - why wasnt I content with Spin? Firstly, it is the special language Promela fordescribing tests; on the one hand its not such a fundamental issue but on the other hand I sometimescatch myself at being too lazy to carry out even that minimum code tooling which is necessary for RRD.And while rewriting a program manually into another language we still risk to test an absolutelydifferent thing. Secondly, it is the sequentially consistent memory model; here nothing can be said indefense of Spin - support of free access to memory ("relaxed memory model") is just necessary for theverifier of synchronization algorithms. Thirdly, it is absence of embedded support for such specific thingsas calls of Win32 API WaitForMultipleObjects() or SignalObjectAndWait(), or false awakenings at the
  5. 5. condition variable POSIX, or waitings with time-outs and so on. The sum of all these factors made meturn my back on Spin.However, I will once more emphasize that the tool is very worthy. The main site of the project is you give examples of code to make the principles of RRDoperation clearer and to show how it differs from other tools?Here is a simple example in which mutual exception on the basis of a spin-mutex occurs (the firstexample I will give in C++09 syntax and the second in RRD syntax to show the difference):std::atomic<int> mutex;int data;void thread1(){ // simple spin-mutex while (, std::memory_order_acquire)) std::this_thread::yield(); data = 1;, std::memory_order_release);}void thread2(){ // simple spin-mutex while (, std::memory_order_acquire)) std::this_thread::yield(); data = 2;, std::memory_order_relaxed);}This example contains a so called data race type 2. It is characteristic of data races type 2 that theconflicting accesses to the problem variable are not contiguous in any thread interleaving; however,they conflict with each other because of the possible reordering of memory accesses at the free access.RRD will detect this race and show in the resulting history what exact reorderings took place.Here is a more complex example - lock-free stack (written in RRD syntax; the main namespace used byRRD is "rl", also pay attention to the needed tooling of the code in the form of "($)"):
  6. 6. struct node{ rl::atomic<node*> next; rl::var<void*> data;};struct stack{ rl::atomic<node*> head;};void push(stack* s, void* data){ node* n = RL_NEW(node); n->data($) = data; node* next = s->head($).load(rl::memory_order_relaxed); for (;;) { n->next($).store(next, rl::memory_order_relaxed); if (s->head($).compare_exchange_weak( next, n, rl::memory_order_release)) break; }}void* pop(stack* s){ node* n = s->head($).load(rl::memory_order_relaxed); for (;;) { if (0 == n) return 0; node* next = n->next($).load(rl::memory_order_relaxed);
  7. 7. if (s->head($).compare_exchange_weak( n, next, rl::memory_order_acquire)) break; } void* data = n->data($); RL_DELETE(n); return data;}And this is a unit-test for RRD:// template parameter "2" defines the number of threads in the teststruct test : rl::test_suite<test, 2>{ stack s; // is executed in one thread // before execution of the main function of threads void before() { s.head($) = 0; } // the main function of threads void thread(unsigned /*thread_index*/) { push(&s, (void*)1); void* data = pop(&s); RL_ASSERT(data == (void*)1); }};int main(){
  8. 8. rl::simulate<test>();}If we launch the program well see the following result (Ive removed the history of executing separatethreads; the first figure in the line is the global serial number of the operation - to correlate with thehistory of executing separate threads, the second figure is the number of the thread):struct testACCESS TO FREED MEMORY (access to freed memory)iteration: 2execution history:[0] 1: [BEFORE BEGIN][1] 1: <0023DEA0> atomic store, value=00000000,(prev value=00000000), order=seq_cst, in test::before, main.cpp(70)[2] 1: [BEFORE END][3] 1: memory allocation: addr=0023CB78, size=52,in push, main.cpp(34)[4] 1: <0023CB9C> store, value=00000001, in push, main.cpp(35)[5] 1: <0023DEA0> atomic load, value=00000000, order=relaxed,in push, main.cpp(36)[6] 0: memory allocation: addr=0023CE80, size=52,in push, main.cpp(34)[7] 0: <0023CEA4> store, value=00000001, in push, main.cpp(35)[8] 1: <0023CB78> atomic store, value=00000000, (prev value=00000000),order=relaxed, in push, main.cpp(39)[9] 0: <0023DEA0> atomic load, value=00000000, order=relaxed,in push, main.cpp(36)[10] 0: <0023CE80> atomic store, value=00000000,(prev value=00000000), order=relaxed, in push, main.cpp(39)[11] 1: <0023DEA0> CAS fail [SPURIOUSLY] orig=00000000,cmp=00000000, xchg=0023CB78, order=release, in push, main.cpp(40)[12] 0: <0023DEA0> CAS succ orig=00000000, cmp=00000000,xchg=0023CE80, order=release, in push, main.cpp(40)
  9. 9. [13] 1: <0023CB78> atomic store, value=00000000,(prev value=00000000), order=relaxed, in push, main.cpp(39)[14] 0: <0023DEA0> atomic load, value=0023CE80, order=relaxed,in pop, main.cpp(47)[15] 1: <0023DEA0> CAS fail orig=0023CE80, cmp=00000000,xchg=0023CB78, order=release, in push, main.cpp(40)[16] 1: <0023CB78> atomic store, value=0023CE80,(prev value=00000000), order=relaxed, in push, main.cpp(39)[17] 0: <0023CE80> atomic load, value=00000000, order=relaxed,in pop, main.cpp(52)[18] 1: <0023DEA0> CAS succ orig=0023CE80, cmp=0023CE80,xchg=0023CB78, order=release, in push, main.cpp(40)[19] 1: <0023DEA0> atomic load, value=0023CB78, order=relaxed,in pop, main.cpp(47)[20] 0: <0023DEA0> CAS fail orig=0023CB78, cmp=0023CE80,xchg=00000000, order=acquire, in pop, main.cpp(53)[21] 1: <0023CB78> atomic load, value=0023CE80, order=relaxed,in pop, main.cpp(52)[22] 1: <0023DEA0> CAS succ orig=0023CB78, cmp=0023CB78,xchg=0023CE80, order=acquire, in pop, main.cpp(53)[23] 1: <0023CB9C> load, value=00000001, in pop, main.cpp(56)[24] 1: memory deallocation: addr=0023CB78, in pop, main.cpp(57)[25] 0: ACCESS TO FREED MEMORY (access to freed memory),in pop, main.cpp(52)From this summary we see that when checking the second thread interleaving RRD detected access toreleased memory. From the history analysis we can understand that thread 1 takes an element off thestack and releases it and after that thread 0 addresses this element.
  10. 10. What can you say about the new instrument VivaMP? Do you consider itappropriate now, for OpenMP technology is used only by a small numberof developers nowadays?I think that you are not quite sincere when saying that OpenMP is used by a small number ofdevelopers. Of course, everything is relatively but I think that Im very near the truth when saying thatOpenMP is the most wide-spread library of parallelism support in manufacturing code. Firstly, it is arelatively old and proved means supported by most commercial and non-commercial organizations, witha lot of independent implementations. Secondly, it is rather simple and solves its task well.And of course being a developer of my own tool of verifying multi-thread code, I find such tools veryurgent and necessary, especially now when everyone has a computer with a multi-core processor on histable. Proceeding from these two points I can say that VivaMP is an indispensable tool for developerswho are only beginners in the sphere of parallel programming. But VivaMP will be useful for moreexperienced developers as well because no one is secure both from "stupid" mistakes (inattention,copy-paste) and "clever" mistakes. And VivaMP will always "cover your back" with the help of its equityand computational power. I know a lot of examples when a multi-thread code developed by experts andexamined by many people had been working for years but then serious errors were detected in it whichhad caused hangs and crashes. Most of these errors had been or could have been detected by means ofverification such as VivaMP.What the technical aspect is concerned, VivaMP is a tool of static verification. And what I like aboutstatic verification is that you dont have to write unit-tests, the tool checks the target code by itself. Andthe question is not in the necessity of writing some additional code but in that it is again that veryhuman factor. A developer must decide which tests are necessary, how exactly they should work and soon; and the quality of the check will directly depend on the quality of unit-tests. When using VivaMPthere is no such a problem, you have only the code being checked and the tool. I think it is rather apowerful tool.You showed interest in the open code analysis library VivaCore createdby our company OOO "Program Verification Systems". What is thereason for this and can the library help in improving RRD?The idea was to avoid the necessity of manual tooling of code. That is, to write a personal codepreprocessor on the basis of VivaCore library so that it could insert all those notorious "($)" in the rightplaces and the user could test directly his "urgent" code. But preliminary investigations showed that thiswould demand a lot of resources and unfortunately we had to give up this idea.How are you planning to improve RRD?Well, I always have a lot of plans :). On the RRD site you can see TODO/Feature List in which I state myplans and ideas concerning further development of RRD. The most essential and interestingimprovements are support of a local thread storage (TSS/TLS) with wrappings for POSIX and Win32,support of UNIX signals and different types of hardware interrupts, optimization of the algorithm of thepartial-order reductions and paralleling the librarys operation, periodical saving at check points,detection of "dead" (non-tested) code, modeling program characteristics concerning performance andscaling. But at this moment the librarys development is, so to say, demand-driven, that is driven by the
  11. 11. needs of users. Thats why I will be glad to get some responses and ideas from the readers concerningthis issue.What would you like to say to those our readers who only begin tomaster parallel technologies?About parallel technologies I can say the same thing as about any other new technology - experimentmore, try to solve simple tasks and watch what you get, and if you dont succeed, put forwardhypotheses and check them, create new ones and check them and so on. Only practice and feedbackcan make you a professional. And of course dont be "squeamish" about means of automatic codeverification - they are like an expert standing behind you and watching you. Of course you can avoidthese means but they still will help you save much time.Thank you for the interview and interesting and detailed answers.Thank you. I wish you and our readers every success in developments.ConclusionWed like to once again thank Dmitriy for the interesting conversation and account of tools for verifyingparallel applications. In the reference section at end of the article you can get acquainted with the list ofresources devoted to RRD and some other similar tools.References 1. Anthony Williams. Petersons lock with C++0x atomics. 2. Q&A with a TBB Junkie. 3. Relacy Race Detector. 4. Scalable Synchronization Algorithms. 5. Spin - Formal Verification. 6. Evgeniy Ryzhkov. VivaMP - a tool for OpenMP. 7. Andrey Karpov. Testing parallel programs. 8. Open library VivaCore for parsing and analyzing C/C++ code. library/