Learning from 6,000 projects mining specifications in the large

5,356 views

Published on

Models—abstract and simple descriptions of some artifact—are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct usage, code analysis needs usage examples, though; the more, the better.
We have built a lightweight parser that efficiently extracts API usage models from source code—models that can then be used to detect anomalies. Applied on the 200 mil- lion lines of code of the Gentoo Linux distribution, we would extract more than 15 million API constraints. On the web site checkmycode.org, anyone can check his/her code against the “wisdom of Linux”.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,356
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide





















  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • You talk to these people, and you immediately realize they’re smart. They’re really smart – Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he’s the best paid professor of Germany.
  • They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However… notice that all these problems can be stated in very simple terms.
  • They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However… notice that all these problems can be stated in very simple terms.
  • They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However… notice that all these problems can be stated in very simple terms.
  • They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However… notice that all these problems can be stated in very simple terms.
  • They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However… notice that all these problems can be stated in very simple terms.
  • What do I mean by “easy to specify”? Here’s something that’s hard to verify – sorting.
  • Tell story of first NORA talk
    forall i in {0, dots, |x'|} :&: x'[i] < x'[i + 1] \
    |x| = |x'| \
    forall i in {0, dots, |x|}:&: iota i' in {0, dots, |x'|}: x[i] = x'[i'] \
    forall i' in {0, dots, |x'|}:&: iota i in {0, dots, |x|}: x'[i'] = x[i]

  • Tell story of first NORA talk
    forall i in {0, dots, |x'|} :&: x'[i] < x'[i + 1] \
    |x| = |x'| \
    forall i in {0, dots, |x|}:&: iota i' in {0, dots, |x'|}: x[i] = x'[i'] \
    forall i' in {0, dots, |x'|}:&: iota i in {0, dots, |x|}: x'[i'] = x[i]

  • We can introduce a vocabulary, and do things incrementally, but the burden remains.
    ext{is-sorted}(x') land ext{is-permutation}(x, x')

  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • It’s nice to know that MS word won’t dereference null pointers, but will it print my text?
    Full of functional properties
  • Why is it that things are hard to specify?
    ⇒ New language, ⇒ Effort duplicated, ⇒ Can’t abstract from details


  • and leverage the knowledge of 50 years of programming!
    This is what my talk today is about. In fact, it’s about mining specifications from 6,000 projects – the largest such attempt ever.
  • and leverage the knowledge of 50 years of programming!
    This is what my talk today is about. In fact, it’s about mining specifications from 6,000 projects – the largest such attempt ever.
  • Dynamic invariants – mined from executions
    Work by Michael Ernst – my big inspiration
    Describe what should hold – but not how to get there
  • API usage – as mined from executions
    Describe what holds – and how to achieve it!





















































  • This would be a pattern, if it were not for the missing element
  • This would be a pattern, if it were not for the missing element
  • This would be a pattern, if it were not for the missing element
  • This would be a pattern, if it were not for the missing element
  • We can detect such gaps by looking at overlapping patterns (concepts)
  • We can detect such gaps by looking at overlapping patterns (concepts)
  • We can detect such gaps by looking at overlapping patterns (concepts)
  • We can detect such gaps by looking at overlapping patterns (concepts)
  • We can detect such gaps by looking at overlapping patterns (concepts)
  • Produced in 8 minutes on this machine


  • On encountering a wrong typecode,
    <visitNEWARRAY()> should report the typecode to the user. However,
    it fails to do so, as it uses <'+t+'> instead of <"+t+"> when
    constructing the second parameter to the <constraintViolated()>
    method, causing the string <'+t+'> to be interpreted verbatim---the
    message contains <'+t+'> rather than the typecode in <t>.
    OPMiner{} reports this as an OP violation: the second parameter of
    <constraintViolated()> should be the result of a
    <StringBuffer.toString()> method call---i.e. a constructed string
    rather than a constant string. The rationale for using a constructed
    string is to include some information about the violation.
  • On encountering a wrong typecode,
    <visitNEWARRAY()> should report the typecode to the user. However,
    it fails to do so, as it uses <'+t+'> instead of <"+t+"> when
    constructing the second parameter to the <constraintViolated()>
    method, causing the string <'+t+'> to be interpreted verbatim---the
    message contains <'+t+'> rather than the typecode in <t>.
    OPMiner{} reports this as an OP violation: the second parameter of
    <constraintViolated()> should be the result of a
    <StringBuffer.toString()> method call---i.e. a constructed string
    rather than a constant string. The rationale for using a constructed
    string is to include some information about the violation.
  • In 48 cases: argument comes from String() constructor;
    only in 3 cases: from array
  • In 48 cases: argument comes from String() constructor;
    only in 3 cases: from array
  • Code smell → does not result in errors, but may cause maintainability problems
    Defects → reported & verified
  • Code smell → does not result in errors, but may cause maintainability problems
    Defects → reported & verified
  • 44% holds for AspectJ; same for other projects
    Lots of subtle defects in production code
    Unclear whether these would be found by other means
  • and leverage the knowledge of 50 years of programming!
    This is what my talk today is about
  • and leverage the knowledge of 50 years of programming!
    This is what my talk today is about
  • and leverage the knowledge of 50 years of programming!
    This is what my talk today is about
  • Die einleitende Geschichte erzählt von Francis Galtons Überraschung, dass Besucher einer Vieh-Ausstellung im Rahmen eines Gewinnspiels das Schlachtgewicht eines Rindes genau schätzten, wenn man als Schätzwert der Gruppe den Mittelwert aller Schätzungen annahm. (Die Schätzung der Gruppe war sogar besser als die jedes einzelnen Teilnehmers, darunter manche Metzger.)
  • First thing we needed was a lightweight parser


  • Wir müssen daher in der Lage sein, große Mengen Code zu analysieren – am besten Quellcode.
  • Wir müssen daher in der Lage sein, große Mengen Code zu analysieren – am besten Quellcode.










  • Next thing we needed was thousands of projects
  • We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
     [first quartile]:                1093
     [third quartile]:                16160
     [median]:                        4162
     [mean]:                          33020

  • We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
     [first quartile]:                1093
     [third quartile]:                16160
     [median]:                        4162
     [mean]:                          33020

  • We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
     [first quartile]:                1093
     [third quartile]:                16160
     [median]:                        4162
     [mean]:                          33020

  • We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
     [first quartile]:                1093
     [third quartile]:                16160
     [median]:                        4162
     [mean]:                          33020










  • Defect in Conspire 0.20
  • Defect in Conspire 0.20
  • Defect in cksfv-1.3.13
  • Defect in cksfv-1.3.13



  • As a special treat to SCAM attendees, we’re making all of our database available – today!
  • coming back to the beginning of my talk – are we facing a specification crisis? Yes.
  • coming back to the beginning of my talk – are we facing a specification crisis? Yes.
  • But we can alleviate it
  • by reusing and abstracting from all the code that’s around.

  • But still, we just scratch the surface of the knowledge that’s in there. Plenty of work lies ahead of us.
  • But still, we just scratch the surface of the knowledge that’s in there. Plenty of work lies ahead of us.
  • But still, we just scratch the surface of the knowledge that’s in there. Plenty of work lies ahead of us.
  • But still, we just scratch the surface of the knowledge that’s in there. Plenty of work lies ahead of us.
  • But with these future challenges, let’s not forget past challenges.
    My students faced these challenges not because they were easy, but because they were hard. And I am very grateful for the wonderful results they achieved.





  • Learning from 6,000 projects mining specifications in the large

    1. Learning from 6,000 Projects Mining Models in the Large Andreas Zeller Saarland University
    2. Saarbrücken
    3. Saarbrücken
    4. Saarbrücken
    5. Saarbrücken
    6. Saarbrücken
    7. Saarbrücken
    8. Saarbrücken ® Visual Computing Institute
    9. Saarbrücken
    10. Some numbers
    11. Some numbers • ~70 PhD advisors in computer science
    12. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science
    13. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year
    14. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year
    15. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year • 800–1400 € per month as a PhD stipend (+ laptop & office • starting right after BSc • all courses in English)
    16. Two Graduates Michael Backes Andrej Rybalchenko TR35 in 2009 TR35 in 2010
    17. Michael Backes Andrej Rybalchenko
    18. secure protocols Andrej Rybalchenko
    19. secure protocols loop termination
    20. secure protocols loop termination hard to verify
    21. secure protocols loop termination hard to verify
    22. information ow secure protocols loop termination hard to verify
    23. information ow liveness secure protocols loop termination hard to verify
    24. buffer over ow information ow liveness secure protocols loop termination hard to verify
    25. buffer over ow resource leaks information ow liveness secure protocols loop termination hard to verify
    26. buffer over ow resource leaks information ow liveness secure protocols loop termination easy to specify hard to verify
    27. hard to specify
    28. sorting hard to specify
    29. ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] hard to specify
    30. ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] easy to verify hard to specify
    31. is-sorted(x ) ∧ is-permutation(x, x ) still hard to specify
    32. microsoft word
    33. microsoft word travel booking
    34. microsoft word travel booking airplane control
    35. microsoft word mobile phones travel booking airplane control
    36. microsoft word mobile phones travel booking operating systems airplane control
    37. microsoft word mobile phones travel booking operating systems airplane control banking systems
    38. microsoft word mobile phones travel booking operating systems airplane control banking systems hard to specify
    39. microsoft word mobile phones travel booking operating systems airplane control banking systems easy to verify hard to specify
    40. hard to specify
    41. hard to specify new language • duplicate effort • can’t abstract from details
    42. speci cation crisis
    43. mine speci cations
    44. mine speci cations
    45. mine speci cations from 6,000 projects
    46. Speci cations ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] pre- and postconditions
    47. Speci cations auth()! <init>() openPort() socket: null socket: ¬null state: NOT_CON state: PLAIN quit() auth() socket: ¬null state: AUTH nite state models
    48. OP-Miner
    49. OP-Miner Program
    50. OP-Miner Usage Models Program iter.hasNext () iter.next ()
    51. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next
    52. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Patterns hasNext ≺ next hasNext ≺ hasNext
    53. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
    54. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
    55. public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
    56. public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
    57. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
    58. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); return s; }
    59. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n return s; i++; } s.push (rand (r));
    60. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
    61. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
    62. Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
    63. Stack s = new Stack (); s.push (-1); s.push (rand (r));
    64. s.<init>() s.push (_) s.push (_)
    65. Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
    66. Random r = new Random (); int n = r.nextInt (); s.push (rand (r));
    67. r.<init> () r.nextInt () Utils.rand (r)
    68. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
    69. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
    70. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Methods
    71. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods
    72. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open()
    73. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello()
    74. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
    75. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
    76. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
    77. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse()
    78. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse() Support
    79. Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
    80. Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Anomaly get() ✘ Methods open() hello() parse()
    81. AspectJ
    82. for (Iterator iter = itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... } }
    83. for (Iterator iter = itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... should be iter2 } }
    84. public void visitNEWARRAY (NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } }
    85. public void visitNEWARRAY (NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } } should be double quotes
    86. Name internalNewName (String[] identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } ... }
    87. Name internalNewName (String[] identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } should stay as is ... }
    88. public String getRetentionPolicy () { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... }
    89. public String getRetentionPolicy () { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... should be xed }
    90. 44% of violations are defects or code smells
    91. mine speci cations
    92. mine speci cations across thousands of projects
    93. Wisdom of the crowds Francis Galton Nein, links auch nicht
    94. Wisdom of the crowds Francis Galton Nein, links auch nicht
    95. lightweight parsing
    96. Target Languages Java C++ C PHP Javascript
    97. Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo()
    98. Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo() Similar keywords while if switch return
    99. Lightweight Parser Abstract Temporal Source Code Representation Properties
    100. Lightweight Parser Abstract Temporal Source Code Representation Properties } language-independent lightweight parsing
    101. Abstract Temporal Source Code Representation Properties
    102. Abstract Temporal Source Code Representation Properties int j; int fA; int fB = open(“newFile”); fA = open(“myFile”); j = 7; while (j > 3) { read(fA); write(fB, “Hello”); j--; } close(fA); close(fB);
    103. Abstract Temporal Source Code Representation Properties int j; fB: open(CONST) int fA; int fB = open(“newFile”); fA: open(CONST) fA = open(“myFile”); j = 7; while (j > 3) { Loop: read(fA); read(fA) write(fB, “Hello”); write(fB, CONST) j--; } close(fA) close(fA); close(fB); close(fB)
    104. Abstract Temporal Source Code Representation Properties fB: open(CONST) fA: open(CONST) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
    105. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
    106. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
    107. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
    108. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
    109. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
    110. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
    111. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) open() < write() close(fA) open() < close() write(fB, CONST) write() < write() close(fB) close(fB) write() < close()
    112. thousands of projects
    113. 8,000 6,000 4,000 2,000 0 C projects
    114. 8,000 6,097 6,000 4,000 2,000 0 C projects
    115. 200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
    116. 201,321,237 200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
    117. 6,097 C projects
    118. 201,321,237 lines of code
    119. 5,985,193 functions
    120. 15,803,766 properties (“f < g”)
    121. 6 GB database
    122. 18 hours analysis time single core
    123. 11 million lines of code per hour
    124. 11 seconds per project
    125. static int dcc_listen_init (…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); }
    126. static int dcc_listen_init (…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); should be called before bind() }
    127. static int find_file (…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; }
    128. static int find_file (…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; should call closedir() instead }
    129. Platform
    130. Check my Code • Check your code against the wisdom of Linux • Builds on millions of mined speci cations • Detects problems no other tool can detect www.checkmycode.org
    131. Check my Code • Check your code against the wisdom of Linux Dat abase • Builds on millions of ilable mined speci cations ava fo r dow nload • Detects problems no other tool can detect www.checkmycode.org
    132. speci cation crisis
    133. speci cation crisis
    134. microsoft word mobile phones travel booking operating systems airplane control banking systems
    135. microsoft word mobile phones travel booking operating systems airplane control banking systems easy to mine
    136. Challenges
    137. Challenges • Mining complete speci cations
    138. Challenges • Mining complete speci cations • Finding relevant abstractions
    139. Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations
    140. Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations • Integrating speci cation mining and programming
    141. Andrzej Wasylkowski Christian Lindig Natalie Gruska
    142. Summary
    143. Summary
    144. Summary
    145. Summary
    146. Summary
    147. Summary

    ×