A Lightweight Approach to Uncover Technical Information in Unstructured Data

423
-1

Published on

Talk given at the 2011 International Conference on Program Comprehension in Kingston, ON, Canada.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
423
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Lightweight Approach to Uncover Technical Information in Unstructured Data

  1. 1. A Lightweight Approach to Uncover Technical Information in Unstructured Data Nicolas Bettenburg, Bram Adams, Ahmed E. Hassan Queen’s University, Kingston, ON Michel Smidt University of Bremen, Germany 1
  2. 2. 2
  3. 3. The code after "if callback.isAcceleratorInUse(SWT.ALT | character ) )" inside EclipsesMenuManager.java removes the mnemonic, but it seemslike Eclipse should be checking isAcceleratorInUse. 2
  4. 4. “Our developers have a severe problem.” 3
  5. 5. “Our developers have a severe problem.” NLP[ Our_PRP$developers_NNS ])<: have_VBP :>([ a_DTsevere_JJproblem_NN ])._. 3
  6. 6. “Our developers have a severe problem.” NLP pronoun, possessive[ Our_PRP$developers_NNS ])<: have_VBP :>([ a_DTsevere_JJproblem_NN ])._. 3
  7. 7. “Our developers have a severe problem.” NLP pronoun, possessive[ Our_PRP$ noun, common, pluraldevelopers_NNS ])<: have_VBP :>([ a_DTsevere_JJproblem_NN ])._. 3
  8. 8. “Our developers have a severe problem.” NLP pronoun, possessive[ Our_PRP$ noun, common, pluraldevelopers_NNS ])<: have_VBP :> verb, present tense([ a_DT determinersevere_JJ adjective, ordinalproblem_NN ])._. noun, common, singular 3
  9. 9. Structured Text 4
  10. 10. NLP?can’t deal withthe source code partsPARSERS?can’t deal withthe natural language parts 5
  11. 11. Past SolutionsinfoZillaBettenburg et al. - MSR’08 • Based on heuristics • Stack Traces • Code snippets at block level • Patches 6
  12. 12. Past SolutionsMilerBacchelli et al. - ICSE’10,ICPC’10 • Based on heuristics • Classifies lines as source code • Classifies documents as containing code • Finds class names 7
  13. 13. Past SolutionsinfoZilla MilerBettenburg et al. - MSR’08 Bacchelli et al. - ICPC’10 • only specific kinds of technical information! • new heuristics to extend (complex, error prone) • for some kind of technical information, infeasible 8
  14. 14. More information:The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" insideEclipses MenuManager.java removes the mnemonic, but it seems like Eclipseshould be checking "isAcceleratorInUse" only for top level menumanagers likeFile,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */public void update(String property) {IContributionItem items[] = getItems();for (int i = 0; i < items.length; i++) {items[i].update(property);}[...]}Any status on this bug?Id consider any contributions for M6 (API) or M7 (non-API) [...]A 3.5 fix would be to make that behaviour optional in MenuManager with API andoff by default early in 3.5, and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActions contributed MenuManagers turn it on(if I can find MenuManagers in the correct place). 9
  15. 15. More information:The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" insideEclipses MenuManager.java removes the mnemonic, but it seems like Eclipseshould be checking "isAcceleratorInUse" only for top level menumanagers likeFile,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */public void update(String property) {IContributionItem items[] = getItems();for (int i = 0; i < items.length; i++) {items[i].update(property);}[...]}Any status on this bug?Id consider any contributions for M6 (API) or M7 (non-API) [...]A 3.5 fix would be to make that behaviour optional in MenuManager with API andoff by default early in 3.5, and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActions contributed MenuManagers turn it on(if I can find MenuManagers in the correct place). 9
  16. 16. More information:The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" insideEclipses MenuManager.java removes the mnemonic, but it seems like Eclipseshould be checking "isAcceleratorInUse" only for top level menumanagers likeFile,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */public void update(String property) { Challenge!IContributionItem items[] = getItems();for (int i = 0; i < items.length; i++) {items[i].update(property);}[...]}Any status on this bug?Id consider any contributions for M6 (API) or M7 (non-API) [...]A 3.5 fix would be to make that behaviour optional in MenuManager with API andoff by default early in 3.5, and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActions contributed MenuManagers turn it on(if I can find MenuManagers in the correct place). 9
  17. 17. Spelling and Grammar Checkers... are really good at finding “what’s not right“ in natural language text! 10
  18. 18. A 3.5 fix would be to make taht behaviour optional inMenuManager with API and off by default early in 3.5,and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActionscontributed MenuManagers turn it on (if I can findMenuManagers in the corect place). 11
  19. 19. A 3.5 fix would be to make taht behaviour optional inMenuManager with API and off by default early in 3.5,and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActionscontributed MenuManagers turn it on (if I can findMenuManagers in the corect place). 12
  20. 20. A 3.5 fix would be to make taht behaviour optional inMenuManager with API and off by default early in 3.5,and to have the WorkbenchActionBuilder contributedMenuManagers and actionSets/editorActionscontributed MenuManagers turn it on (if I can findMenuManagers in the corect place). Actual Spelling Mistakes! 13
  21. 21. Add Heuristics 14
  22. 22. Add HeuristicsH1: Camel Case camelCase, CamelCase, CamelCASE, ... 14
  23. 23. Add HeuristicsH1: Camel Case camelCase, CamelCase, CamelCASE, ...H2: Programming Language Keywords printf, fork, fi, ... 14
  24. 24. Add HeuristicsH1: Camel Case camelCase, CamelCase, CamelCASE, ...H2: Programming Language Keywords printf, fork, fi, ...H3: Special Characters tree(); 14
  25. 25. Evaluation Manually annotated 20 complete Bug Reports and Discussions from ECLIPSE.Manually annotated 20 complete Email Discussions from POSTGRESQL developers Mailing List. 15
  26. 26. Annotation GUI 21 3 16
  27. 27. Precision / Recall T PS iPrecision(Si ) = T PSi +F PSi T PS i Recall(Si ) = T PSi +F NSi TP = We annotated and tool annotated FP = Tool annotated, we did not FN = We annotated, tool did not 17
  28. 28. ResultsSpellchecker Precision Recall JOrtho 88.01% 64.31% Jazzy 84.16% 68.30% Hunspell 86.40% 68.34% 18
  29. 29. ResultsSpellchecker Precision Recall JOrtho 88.01% 64.31% Jazzy 84.16% 68.30% Hunspell 86.40% 68.34% 18
  30. 30. ResultsSpellchecker Precision Recall JOrtho 88.01% 64.31% Jazzy 84.16% 68.30% Hunspell 86.40% 68.34% Hunspell used by OpenOffice and Mozilla Suite 18
  31. 31. Comparison Line-wise classification of source code1 Launch the plugin as part of Eclipse IDE 3. Press Alt+H to2 bring down the Help menu (to go along with our example in #1)34 BUG: Notice "Software Updates" is missing its mnemonic.56 public void update(String property) {7 IContributionItem items[] = getItems();8 for (int i = 0; i < items.length; i++) {9 items[i].update(property);10 }11 }1213 Any status on this bug? 19
  32. 32. Comparison Line-wise classification of source code1 Launch the plugin as part of Eclipse IDE 3. Press Alt+H to2 bring down the Help menu (to go along with our example in #1)34 BUG: Notice "Software Updates" is missing its mnemonic.56 public void update(String property) {7 IContributionItem items[] = getItems();8 for (int i = 0; i < items.length; i++) {9 items[i].update(property);10 }11 }1213 Any status on this bug? 20
  33. 33. ComparisonLine-wise classification of source code Precision Recall Our approach 89.27% 86.46%State-of-the-Art 66.13% 69.37% 21
  34. 34. ComparisonLine-wise classification of source code Precision Recall Our approach 89.27% 86.46%State-of-the-Art 66.13% 69.37% 21
  35. 35. Summary 22
  36. 36. Summary 22
  37. 37. Summary 22
  38. 38. Summary 22
  39. 39. Summary 22
  40. 40. masayuki reed cjcypoi02 dietrich steve.england corevette steffen.wilberg davemgarrett mmortal03 timeless mano fittysix matspal longsonr zurtex matti edilee mconnor cwwmozilla beltzner dveditz adelfino zeniko kliu alice0775 sziadeh mark.finkle robert.bugzilla philringnalda sgautherie.bz kev faaborg johnath martijn.martijn jmjeffery jo.hermans nrthomas gavin.sharp polidobj m-wada jbecerra jdarmochwal john.p.baker jruderman mak77 ria.klaassenVYV03354 cbook bomfog dao elmar.ludwig sdaugherty vseerror nightstalkerz l10n highmind63 twalker mh+mozilla klaas1988 ehsan stephen.donner me.at.work phiw hskupin ctalbert tchung tomer marcia timwi rotis uliss sylvain.pasche bugzilla marco.zehe cl-bugs-new2 tonglebeak deletesoftware abillings info anselm.meyer eddy_nigg matt RainerStroebel samuel.sidler+old alex hasham8888 aarobertxtr manujsabarwal johnjbarton myles7897 paulc shaver smichaud mozilla zhangchunlin dtownsend jdaggett kbrosnan bzbarsky sdwilsh 23
  41. 41. Internet Explorer reed masayuki dietrich cjcypoi02 steve.england corevette steffen.wilberg davemgarrett mmortal03 timeless mano fittysix matspal longsonr zurtex matti edilee mconnor cwwmozilla beltzner dveditz adelfino zeniko kliu alice0775 sziadeh mark.finkle robert.bugzilla philringnalda sgautherie.bz kev faaborg johnath martijn.martijn nrthomas gavin.sharp XML Parser jmjeffery jo.hermans polidobj m-wada jbecerra jdarmochwal john.p.baker jruderman mak77 ria.klaassen VYV03354 cbook bomfog dao elmar.ludwig sdaugherty vseerror nightstalkerz l10n highmind63 twalker mh+mozilla klaas1988 ehsan stephen.donner me.at.work phiw hskupin ctalbert tchung tomer marcia timwi rotis uliss sylvain.pasche bugzillaJavaScript marco.zehe cl-bugs-new2 UI tonglebeak deletesoftware abillings info Engine anselm.meyer eddy_nigg matt RainerStroebel samuel.sidler+old alex hasham8888 aarobertxtr manujsabarwal johnjbarton myles7897 paulc shaver smichaud mozilla zhangchunlin dtownsend jdaggett kbrosnan bzbarsky sdwilsh 23
  42. 42. 24

×