Guide : Ms Sangeetha Jamal                Presented by            Dept of Computer Science       Merin Paul               ...
Contents  Introduction  Types of Source-code Plagiarism            Textual Similarity            Functional Similarity...
Introduction Plagiarism in source-code files occurs when source-code     is copied and edited without proper acknowledgme...
Introduction Structural changes: changes made to the source code that     will affect the parsing of the code and involve...
TYPES OF SOURCE CODE                PLAGIARISM  Textual Similarity  Functional Similarity9/25/2012                      ...
Textual Similarity  Two individual source codes look similar based on their     textual content.  Textual content mean t...
Type I  The copied code fragment is as same as the original one     without any modification except white spaces, comment...
Type I int a; /* Loop increasing of a and print a value of it */ for(a = 0; a < 5; a++){ printf(“a = %d”, a); } return 0;9...
Type II  Same as Type I and also with modifications to variable     names, function names and other user-defined identifi...
Type II if(m > n) {m=m - 5; n=n*m; //my comment 1 } else {n=m; //my comment 2m=0; }9/25/2012                10
Type III  A copied code fragment is done by inserting or   removing unnecessary statements.            if(a > b)         ...
Type III if(a > b)      {         a = a – 1;         c = 0; // this statement is added         b = b * a;       } else    ...
Functional similarity  It refers to the code fragments that have the same semantic or  functionality.fragment 1 :         ...
Source Code Detection Algorithms  Text based  Token-based  Parse tree-based  PDG-based  Metrics-based  Hybrid Approa...
CONTD..  Text based             Find                 textual match between two source codes..            Simple and Fas...
CONTD…  Parse Trees            Build and compare parsetrees            Contains the complete information about the     ...
CONTD…  Metrics           capture scores of code segments according to            certain criteria.           Metrics a...
Detecting Techniques Detection via Lexical Similarities            The process of lexical analysis takes source code and...
CONTD…   int[] A = {1,2,3,4};   int[] B = {1, 2, 3, 4};   for(int i = 0; i <     for(int j = 0; j < B.length;   A.length; ...
CONTD…    LITERAL_int LBRACK RBRACK IDENT ASSIGN    LCURLY NUM_INT COMMA NUM_INT    COMMA NUM_INT COMMA NUM_INT RCURLY SEM...
Detection via Parse Tree Similarities9/25/2012                                 21
Detection via Metrics  Calculate and compare attribute counts.  Programs with similar attribute counts are potentially  ...
Tools used for code based plagiarism Jplag  Finds similarities among multiple sets of source code files.  JPlag operate...
CONTD..MOSS (Measure Of Software Similarity) Measure Of Software Similarity was developed in 1994  by Alex Aiken. It an...
CONTD…  YAP (Yet Another Plague)  Token-based system.  YAP works in two phases.  The first phase generates a token fil...
Conclusion  Plagiarism in programming assignments is an inevitable   issue for most academics teaching programming.  Pla...
Conclusion  Plagiarism in programming assignments is an inevitable   issue for most academics teaching programming.  Mos...
References 1)     G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism        Detection and Investigation Using Lat...
References 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer    Program and Other Texts,” Proc. 27th SIG...
THANK U!!!9/25/2012                30
Upcoming SlideShare
Loading in …5
×

Plagiarism introduction

611
-1

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
611
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Plagiarism introduction

  1. 1. Guide : Ms Sangeetha Jamal Presented by Dept of Computer Science Merin Paul Mtech CS-IS S19/25/2012 1
  2. 2. Contents  Introduction  Types of Source-code Plagiarism Textual Similarity Functional Similarity  Source Code Detection Algorithms.  Detecting Techniques  Tools used for code based plagiarism.  Conclusion9/25/2012 2
  3. 3. Introduction Plagiarism in source-code files occurs when source-code is copied and edited without proper acknowledgment of the original author. Techniques for plagiarism: Lexical changes and structural changes. Lexical changes: changes that can be done to the source- code without affecting the parsing of the program9/25/2012 3
  4. 4. Introduction Structural changes: changes made to the source code that will affect the parsing of the code and involve program debugging. Reasons for code copying: Code reusing. Programmer limitation Coincidentally implement using the same logic9/25/2012 4
  5. 5. TYPES OF SOURCE CODE PLAGIARISM  Textual Similarity  Functional Similarity9/25/2012 5
  6. 6. Textual Similarity  Two individual source codes look similar based on their textual content.  Textual content mean the words, letters, variable names, etc  Type 1, Type 2, Type 3.9/25/2012 6
  7. 7. Type I  The copied code fragment is as same as the original one without any modification except white spaces, comments and line modifications. int a; // counter // count five times for(a = 0; a < 5; a++) { printf(“a = %d”, a); // print value of a } return 0;9/25/2012 7
  8. 8. Type I int a; /* Loop increasing of a and print a value of it */ for(a = 0; a < 5; a++){ printf(“a = %d”, a); } return 0;9/25/2012 8
  9. 9. Type II  Same as Type I and also with modifications to variable names, function names and other user-defined identifiers. if(a > b) { a = a - 1; b = b * a; // comment 1 } else { b = a; // comment 2a = 0; }9/25/2012 9
  10. 10. Type II if(m > n) {m=m - 5; n=n*m; //my comment 1 } else {n=m; //my comment 2m=0; }9/25/2012 10
  11. 11. Type III  A copied code fragment is done by inserting or removing unnecessary statements. if(a > b) { a = a - 1; b = b * a; } else { b = a; a = 0; }9/25/2012 11
  12. 12. Type III if(a > b) { a = a – 1; c = 0; // this statement is added b = b * a; } else { b = a; a = 0; }9/25/2012 12
  13. 13. Functional similarity It refers to the code fragments that have the same semantic or functionality.fragment 1 : fragment 2:int i , j = 1; int factorial(int n)for(i = 1; i <= VALUE; i++) {j = j * i; if(n == 0) return 1; else return factorial(n – 1)*n; }9/25/2012 13
  14. 14. Source Code Detection Algorithms  Text based  Token-based  Parse tree-based  PDG-based  Metrics-based  Hybrid Approaches9/25/2012 14
  15. 15. CONTD..  Text based  Find textual match between two source codes.. Simple and Fast.  Token based  Using a lexer to convert the program into tokens. Find a match in token sequences. More robust to simple text replacements.9/25/2012 15
  16. 16. CONTD…  Parse Trees Build and compare parsetrees Contains the complete information about the source code Tree comparison can normalize conditional statements.  Program Dependency Graphs (PDGs) Captures the actual flow of control in a program. Allows higher-level equivalences to be located. More complex.9/25/2012 16
  17. 17. CONTD…  Metrics capture scores of code segments according to certain criteria. Metrics are simple to calculate. Lead to false positives. • Hybrid Combination of two or more previous techniques.9/25/2012 17
  18. 18. Detecting Techniques Detection via Lexical Similarities The process of lexical analysis takes source code and converts it into a stream of lexical tokens. Source code undergoes a series of transformation. Identification of reserved words, identifiers, and numbers are beneficial for plagiarism detection.9/25/2012 18
  19. 19. CONTD… int[] A = {1,2,3,4}; int[] B = {1, 2, 3, 4}; for(int i = 0; i < for(int j = 0; j < B.length; A.length; i++) { j++) { A[i] = A[i] + 1; B[j] = B[j] + 1; } }9/25/2012 19
  20. 20. CONTD… LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INT COMMA NUM_INT COMMA NUM_INT RCURLY SEMI LITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY NUM_INT SEMI RCURLY9/25/2012 20
  21. 21. Detection via Parse Tree Similarities9/25/2012 21
  22. 22. Detection via Metrics  Calculate and compare attribute counts.  Programs with similar attribute counts are potentially similar programs.  Counts of operators and operands are typically used to construct attribute counts.9/25/2012 22
  23. 23. Tools used for code based plagiarism Jplag  Finds similarities among multiple sets of source code files.  JPlag operates in two phases.  First phase: All programs to be compared are parsed and converted into token strings.  Second phase: Token strings are compared in pairs for determining the similarity of each pair.  It is more robust. It supports Java, c#, C, C++ and natural language text.9/25/2012 23
  24. 24. CONTD..MOSS (Measure Of Software Similarity) Measure Of Software Similarity was developed in 1994 by Alex Aiken. It analyzes code written in languages like C, C++, Python, Visual Basic, Javascript, FORTRAN, Lisp, Ada etc. Provided as an internet service and given a list of source files.9/25/2012 24
  25. 25. CONTD…  YAP (Yet Another Plague)  Token-based system.  YAP works in two phases.  The first phase generates a token file for each submission.  The second phase compares pairs of token files using the token matching algorithm, Running-Karp-Rabin Greedy- String-Tiling algorithm (RKRGST)9/25/2012 25
  26. 26. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Plagiarism Detection systems are built based on a few languages.  Most of the detection software checking is done with some repository situated in an organization.  As the number of digital copies are going up the repository size should be large and the plagiarism Detection software should be able to handle it.9/25/2012 26
  27. 27. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Most popular plagiarism detection algorithms use string- matching to create token string representations of programs.  The tokens of each document are compared on a pair-wise basis to determine similar source-code segments between the files.  String-matching systems are language-dependent depending on the programming languages supported by their parsers9/25/2012 27
  28. 28. References 1) G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis” IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012 2) Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th August 2007 ,ICS,University of Ulster http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/ 3) Okiemute Omuta ”Electronic Source Code Plagiarism Detection” Computer Engineering Department,European University of Lefke, North Cyprus 4) S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proc. the ACM SIGMOD Int’l Conf. Management of Data, pp. 76-85, 20039/25/2012 28
  29. 29. References 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer Program and Other Texts,” Proc. 27th SIGCSE Technical Symp., pp. 130-134, 1996.9/25/2012 29
  30. 30. THANK U!!!9/25/2012 30
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×