Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FaCoY – A Code-to-Code Search Engine

96 views

Published on

2018 ACM/IEEE 40th International Conference on Software Engineering (ICSE ’18), May 27-June 3, 2018, Gothenburg, Sweden

Published in: Software
  • Be the first to comment

  • Be the first to like this

FaCoY – A Code-to-Code Search Engine

  1. 1. FaCoY – A code-to-code search engine Kisub Kim1, Dongsun Kim1,Tegawendé F. Bissyandé1, Eunjong Choi2, Li Li3, Jacques Klein1, andYves Le Traon1 1SnT, University of Luxembourg - Luxembourg 2Nara Institute of Science and Technology (NAIST) - Japan 3Faculty of Information Technology, Monash University - Australia 1 01. 06. 2018 SerVal 3.1 - the Interdisciplinary Centre for Security Reliability and Trust 1.1 - logotype of the University of Luxembourg
  2. 2. function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } I want to find a loop in a singly linked list 2
  3. 3. I want to find a loop in a singly linked list function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } Is this correct? Any better implementation? How can I improve my code? 3
  4. 4. Code-to-Code Search Given a code fragment, find semantically similar code fragments Input Code Fragment 4
  5. 5. Online Tools 5 searchcode.com krugle.com 5
  6. 6. State-of-the-art Static Approaches Dynamic Approaches - Code clone detection techniques. - Mostly focus on textually, structurally or syntactically similar code. - Leverage various intermediate representations to compute code similarity: Static approaches tend to miss fragments that have similar behaviour. - Identify programs that retrieve similar results for the same inputs. - Generate random inputs, rely on symbolic, concolic execution, 
 check abstract memory states. - Compare instruction-level execution traces [DyCLINK @FSE16] Dynamic approaches do not scale to large repositories and are not always operational. 6 • Token-based [CCFinder @TSE02] • AST-based [Deckard @ICSE07] • Graph-based [GPLAG @KDD06]
  7. 7. Key Challenge Semantically similar code rather than syntactically 7 “converToHex” is similar to “encode”
  8. 8. Intuition 8
  9. 9. Conceptual Steps 9 Code fragment What functionality is implemented? What are related implementations? What are other representative tokens for the functionality? Q&A Posts (questions + code snippets) Similar Code fragments Which code fragments best match these tokens? Target code base
  10. 10. Conceptual Steps 9 Code fragment What functionality is implemented? What are related implementations? What are other representative tokens for the functionality? Q&A Posts (questions + code snippets) Similar Code fragments Which code fragments best match these tokens? Target code base
  11. 11. Conceptual Steps 9 Code fragment What functionality is implemented? What are related implementations? What are other representative tokens for the functionality? Q&A Posts (questions + code snippets) Similar Code fragments Which code fragments best match these tokens? Target code base
  12. 12. Conceptual Steps 9 Code fragment What functionality is implemented? What are related implementations? What are other representative tokens for the functionality? Q&A Posts (questions + code snippets) Similar Code fragments Which code fragments best match these tokens? Target code base
  13. 13. Conceptual Steps 9 Code fragment What functionality is implemented? What are related implementations? What are other representative tokens for the functionality? Q&A Posts (questions + code snippets) Similar Code fragments Which code fragments best match these tokens? Target code base
  14. 14. FaCoYFind another Code other than Yours 10 http://code-search.uni.lu/facoy
  15. 15. How it works (Visually) Input code 11 function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; }
  16. 16. How it works (Visually) Input code 11 function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } used_classes : Node used_classes : hasLoop methods_called : next class_instance_creation : currentNode
  17. 17. How it works (Visually) Input code Syntactically similar code 11 function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } used_classes : Node used_classes : hasLoop methods_called : next class_instance_creation : currentNode
  18. 18. How it works (Visually) Input code Syntactically similar code Natural language description of input code 11 function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } used_classes : Node used_classes : hasLoop methods_called : next class_instance_creation : currentNode
  19. 19. How it works (Visually) Input code Syntactically similar code Natural language description of input code 11 function boolean hasLoop(Node startNode){ Node currentNode = startNode; while (currentNode = currentNode.next()); return false; } used_classes : Node used_classes : hasLoop methods_called : next class_instance_creation : currentNode
  20. 20. 12
  21. 21. Similar descriptions 12
  22. 22. Similar descriptions More code fragments for similar functionality 12 Natural language description of input code
  23. 23. 13
  24. 24. typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:System.exit extends:Linked used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next methods_called:data Alternate queries 13
  25. 25. typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:System.exit extends:Linked used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next methods_called:data typed_method_call:myLinkedList.insertFirstLink typed_method_call:System.exit extends:Linked used_classes:hasLoop used_classes:Node used_classes:HashMap used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next methods_called:data … An alternate query Alternate queries 13
  26. 26. typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:CircularSet.add typed_method_call:System.exit extends:Runnable used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next typed_method_call:myLinkedList.insertFirstLink typed_method_call:System.exit extends:Linked used_classes:hasLoop used_classes:Node used_classes:Integer used_classes:HashMap used_classes:BufferedReader used_classes:InputStreamReader used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next methods_called:data typed_method_call:myLinkedList.insertFirstLink typed_method_call:System.exit extends:Linked used_classes:hasLoop used_classes:Node used_classes:HashMap used_classes:System class_instance_creation:fast class_instance_creation:nod class_instance_creation:head methods_called:next methods_called:data … An alternate query Alternate queries Search results More code fragments for similar functionality 13
  27. 27. Step 0: Dataset Indexing • Parse the code snippets from answer posts and code files to generate an Abstract Syntax Tree (AST). • Preprocess the natural language text from posts. • Indexed as inverted indices. Answer Posts Post Analyzer Snippet Index Code Snippet with Metadata Question Posts Post Analyzer Question Index Full Post Information Project Repository Codes Code Analyzer Project Code Index 14
  28. 28. User Input Code Query Code Fragment Generating Code Query (1) Step 1: Query Structuring 15
  29. 29. User Input Answer Snippets Code Query Code Fragment (2) Generating Code Query (1) Question Answer Snippet Searching for Similar Code Snippets Stack Overflow Step 2: Syntactic Search in Q&A Answers 16
  30. 30. User Input Answer Snippets Code Query Code Fragment (2) (3) Question Posts Generating Code Query (1) Question Answer Snippet Question Answer Snippet Searching for Similar Code Snippets Searching for Similar Questions Stack Overflow Step 3: Collection of Descriptive Natural Language terms 17
  31. 31. User Input Answer Snippets Code Query Code Fragment (2) (3) Question Posts Generating Code Query (1) Question Answer Snippet Question Answer Snippet Searching for Similar Code Snippets Searching for Similar Questions Stack Overflow Generating Alternate Code Query Code Queries (4) Step 4: Query Alternation 18
  32. 32. User Input Answer Snippets Code Query Code Fragment (2) (3) Question Posts Generating Code Query (1) Question Answer Snippet Question Answer Snippet Searching for Similar Code Snippets Searching for Similar Questions Stack Overflow Generating Alternate Code Query Code Queries (4) GitHub Codebase Search Results (5) Searching for Code Examples Step 5: Results Retrieving from Codebase 19
  33. 33. User Input Snippet Index Code Index Code Query Generating Code Query Code Query Code Fragment Search Results (2) (3) (4)(5) Question Index Generating Code Query (1) Question Answer Snippet Question Answer Snippet Searching for Similar Code Snippets Searching for Similar Questions Searching for Code Examples Overview of the Approach 20
  34. 34. Evaluation 21
  35. 35. Front-endhttp://code-search.uni.lu/facoy 22
  36. 36. Recommendation Results 23
  37. 37. Recommendation Results 23
  38. 38. RQ1 How relevant are code examples found by FaCoY compared to other code- to-code search engines? RQ2 What is the effectiveness of FaCoY including semantic clones based on a code clone benchmark? RQ3 Do the semantically similar code fragments yielded by FaCoY exhibit similar runtime behavior? RQ4 Could FaCoY recommend correct code as alternative of buggy code? Research Questions 24
  39. 39. Code Clone Type Definitions Clone Type Definition Type-1 Identical code fragments, except for white-space, layout, and comments Type-2 Identical code fragments, except for identifier names and literal values + Type-1 Type-3 Syntactically similar, but statements are added, modified and/or removed with respect to each other + Type-1 and Type-2 Type-4 Syntactically dissimilar, but the same functionality = Semantic clones. 25 • Stefan Bellon, Rainer Koschke, Guildo Antoniol, Jens Krinke, and Ettore Merlo, 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering 22, 9 (2007), 557-591. • Chanchal K Roy, James R Cordy, and Rainer Koschke, 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470-395. • Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, and Tony Jebara, 2016. Code Relatives: Detecting similarly Behaving Software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, 702-714.
  40. 40. RQ 1. Comparison with Code-to-Code Search Engines • Query tools with top 10 Stackoverflow snippets • Manual checking • Does FaCoY find similar code fragments? • Syntactically or Semantically? 26
  41. 41. RQ 2. Benchmark Assessment • IJaDataset 2.0 of 25,000 projects (> 3 M files) • BigCloneBench annotates 8,345,104 clone pairs with 43 functionalities • Can FaCoY find more semantic clones (MT3 or T4) than the state-of-the-art? No initial user query No query alternation No query alternation & No query structuring 27
  42. 42. Clone Types T1 T2 VST3 ST3 MT3 WT3/T4 Total # of samples 10 10 10 10 10 10 60 BigCloneBench missed to include 4 1 2 25 32 • For the rest of the 28 files, FaCoY points 26 files correctly, • But failed to locate in the files. • In only 2 cases, FaCoY completely failed. FaCoY can detect clones that BigCloneBench missed. Double Checking FaCoY’s False Positives 28
  43. 43. F18: Play Sound F19: Take Screenshot to File F21: XMPP Send Message FaCoY’s Limitation Requiring external APIs and libraries Pure computation tasks F7: Bubble Sort Array F14: Binary Search F41: Transpose a Matrix 29 FaCoY performs much better with code that are requiring external APIs and Libraries.
  44. 44. RQ 3.Validating Semantic Clones • Use DyCLINK - A dynamic approach that computes similarity of execution traces to detect code relatives. • Index the benchmark of DyCLINK: 411 pairs as code relatives. • Results: •FaCoY’s hit ratio is at 68% (278 out of 411 code fragments) •FaCoY’s Mean Reciprocal Rank value is 0.18 (retrieves into lower rankings) Google Code Jam - 2011: Irregular Cake - 2012: Perfect Game - 2013: Cheaters - 2014: Magical Tour 30
  45. 45. RQ 4. Recommending code for patches Buggy code • Buggy information Project: Apache commons-lang File path: projects/Lang/14/org/apache/commons/lang3/StringUtils.java As maintainers, can we quickly find alternative implementations? 31
  46. 46. RQ 4. Recommending code for patches Buggy code Additionally, we consider more cases: 1. Are cs1 and cs2 nulls? 2. Are cs1 and cs2 have the same values? 3. Are cs1 and cs2 the same objects? 4. Are cs1 and cs2 are String objects? 5. If they are not pure String objects, check the equality character by character 32
  47. 47. RQ 4. Recommending code for patches Fix Buggy code • Commit: cf7211f9 by Matthew Jason Benson, 01/23/2012 06:47 PM • parent: c8afaa3e • git-svn-id: https://svn.apache.org/repos/asf/commons/proper/lang/trunk@1234915 • Log: [LANG-786] StringUtils equals() relies on undefined behavior; thanks to Daniel Trebbien 33
  48. 48. RQ 4. Recommending code for patches • 395 bugs in Defect4J repair benchmark [@ISSTA14] • 21 FaCoY recommendations are correct (manual assessment) Fix Buggy code 34
  49. 49. Summary 35
  50. 50. 36 Questions: kisub.kim@uni.lu Github page: https://github.com/facoy/facoy Web code search engine: http://code-search.uni.lu/facoy

×