SlideShare a Scribd company logo
1 of 30
Guide : Ms Sangeetha Jamal                Presented by
            Dept of Computer Science       Merin Paul
                                       Mtech CS-IS S1


9/25/2012                                          1
Contents
  Introduction
  Types of Source-code Plagiarism
            Textual Similarity
            Functional Similarity
    Source Code Detection Algorithms.
    Detecting Techniques
    Tools used for code based plagiarism.
    Conclusion


9/25/2012                                    2
Introduction
 Plagiarism in source-code files occurs when source-code
     is copied and edited without proper acknowledgment of
     the original author.

 Techniques for plagiarism: Lexical changes and structural
     changes.

 Lexical changes: changes that can be done to the source-
     code without affecting the parsing of the program


9/25/2012                                                     3
Introduction
 Structural changes: changes made to the source code that
     will affect the parsing of the code and involve program
     debugging.

 Reasons for code copying:
            Code reusing.
            Programmer limitation
            Coincidentally implement using the same logic


9/25/2012                                                      4
TYPES OF SOURCE CODE
                PLAGIARISM
  Textual Similarity


  Functional Similarity




9/25/2012                          5
Textual Similarity
  Two individual source codes look similar based on their
     textual content.

  Textual content mean the words, letters, variable
     names, etc

  Type 1, Type 2, Type 3.




9/25/2012                                                    6
Type I
  The copied code fragment is as same as the original one
     without any modification except white spaces, comments
     and line modifications.
        int a; // counter
        // count five times
        for(a = 0; a < 5; a++)
        {
            printf(“a = %d”, a); // print value of a
        }
        return 0;

9/25/2012                                                     7
Type I
 int a;
 /* Loop increasing of a and print a value of it */
 for(a = 0; a < 5; a++){
 printf(“a = %d”, a);
 }
 return 0;




9/25/2012                                             8
Type II
  Same as Type I and also with modifications to variable
     names, function names and other user-defined identifiers.

      if(a > b)
      {
              a = a - 1;
              b = b * a; // comment 1
       }
      else
      {
             b = a; // comment
             2a = 0;
      }
9/25/2012                                                        9
Type II
 if(m > n)
 {m=m - 5;
 n=n*m; //my comment 1
 }
 else
 {n=m; //my comment
 2m=0;
 }


9/25/2012                10
Type III
  A copied code fragment is done by inserting or
   removing unnecessary statements.
            if(a > b)
               {
                    a = a - 1;
                    b = b * a;
                }
            else
                 {
                     b = a;
                     a = 0;
                 }
9/25/2012                                           11
Type III
 if(a > b)
      {
         a = a – 1;
         c = 0; // this statement is added
         b = b * a;
       }
 else
      {
         b = a;
         a = 0;
     }
9/25/2012                                    12
Functional similarity
  It refers to the code fragments that have the same semantic or
  functionality.

fragment 1 :                      fragment 2:
int i , j = 1;                    int factorial(int n)
for(i = 1; i <= VALUE; i++)       {
j = j * i;                          if(n == 0) return 1;
                                    else return factorial(n – 1)*n;
                                  }


9/25/2012                                                         13
Source Code Detection Algorithms
  Text based
  Token-based
  Parse tree-based
  PDG-based
  Metrics-based
  Hybrid Approaches




9/25/2012                               14
CONTD..
  Text based
             Find
                 textual match between two source codes..
            Simple and Fast.

  Token based
             Using a lexer to convert the program into tokens.
            Find a match in token sequences.
            More robust to simple text replacements.



9/25/2012                                                         15
CONTD…
  Parse Trees
            Build and compare parsetrees
            Contains the complete information about the
             source code
            Tree comparison can normalize conditional
             statements.

  Program Dependency Graphs (PDGs)
            Captures the actual flow of control in a program.
            Allows higher-level equivalences to be located.
            More complex.
9/25/2012                                                        16
CONTD…
  Metrics
           capture 'scores' of code segments according to
            certain criteria.
           Metrics are simple to calculate.
           Lead to false positives.
 •   Hybrid
           Combination of two or more previous
            techniques.



9/25/2012                                                    17
Detecting Techniques
 Detection via Lexical Similarities


            The process of lexical analysis takes source code and
             converts it into a stream of lexical tokens.
            Source code undergoes a series of transformation.
            Identification of reserved words, identifiers, and
             numbers are beneficial for plagiarism detection.




9/25/2012                                                        18
CONTD…
   int[] A = {1,2,3,4};   int[] B = {1, 2, 3, 4};
   for(int i = 0; i <     for(int j = 0; j < B.length;
   A.length; i++) {       j++) {
   A[i] = A[i] + 1;       B[j] = B[j] + 1;
   }                      }




9/25/2012                                                19
CONTD…

    LITERAL_int LBRACK RBRACK IDENT ASSIGN
    LCURLY NUM_INT COMMA NUM_INT
    COMMA NUM_INT COMMA NUM_INT RCURLY SEMI
    LITERAL_for LPAREN LITERAL_int IDENT ASSIGN
    NUM_INT SEMI IDENT LT
    IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY
    NUM_INT SEMI
    RCURLY




9/25/2012                                          20
Detection via Parse Tree Similarities




9/25/2012                                 21
Detection via Metrics
  Calculate and compare attribute counts.


  Programs with similar attribute counts are potentially
     similar programs.

  Counts of operators and operands are typically used to
     construct attribute counts.




9/25/2012                                                   22
Tools used for code based plagiarism
 Jplag

  Finds similarities among multiple sets of source code files.
  JPlag operates in two phases.
  First phase: All programs to be compared are parsed and
   converted into token strings.
  Second phase: Token strings are compared in pairs for
   determining the similarity of each pair.
  It is more robust. It supports Java, c#, C, C++ and natural
   language text.
9/25/2012                                                        23
CONTD..
MOSS (Measure Of Software Similarity)

 Measure Of Software Similarity was developed in 1994
  by Alex Aiken.
 It analyzes code written in languages like
  C, C++, Python, Visual
  Basic, Javascript, FORTRAN, Lisp, Ada etc.
 Provided as an internet service and given a list of source
  files.

9/25/2012                                                      24
CONTD…
  YAP (Yet Another Plague)

  Token-based system.
  YAP works in two phases.
  The first phase generates a token file for each submission.
  The second phase compares pairs of token files using the
     token matching algorithm, Running-Karp-Rabin Greedy-
     String-Tiling algorithm (RKRGST)



9/25/2012                                                        25
Conclusion
  Plagiarism in programming assignments is an inevitable
   issue for most academics teaching programming.
  Plagiarism Detection systems are built based on a few
   languages.
  Most of the detection software checking is done with
   some repository situated in an organization.
  As the number of digital copies are going up the
   repository size should be large and the plagiarism
   Detection software should be able to handle it.


9/25/2012                                                   26
Conclusion
  Plagiarism in programming assignments is an inevitable
   issue for most academics teaching programming.
  Most popular plagiarism detection algorithms use string-
   matching to create token string representations of
   programs.
  The tokens of each document are compared on a pair-wise
   basis to determine similar source-code segments between
   the files.
  String-matching systems are language-dependent
   depending on the programming languages supported by
   their parsers

9/25/2012                                                     27
References
 1)     G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism
        Detection and Investigation Using Latent Semantic Analysis”
        IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012
 2)     Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th
        August 2007 ,ICS,University of Ulster
        http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/
 3)     Okiemute Omuta ”Electronic Source Code Plagiarism Detection”
        Computer Engineering Department,European University of
        Lefke, North Cyprus
 4)     S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local
        Algorithms for Document Fingerprinting,” Proc. the ACM
        SIGMOD Int’l Conf. Management of Data, pp. 76-85, 2003
9/25/2012                                                                 28
References
 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer
    Program and Other Texts,” Proc. 27th SIGCSE Technical
    Symp., pp. 130-134, 1996.




9/25/2012                                                          29
THANK U!!!


9/25/2012                30

More Related Content

What's hot

Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topiccsandit
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsUltraUploader
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareICSM 2010
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
 
Survey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationSurvey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationAhmad Sharifi
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...TELKOMNIKA JOURNAL
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Sebastiano Panichella
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
 
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Amogh Kawle
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Modelijtsrd
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaningfeiwin
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsPreetha Chatterjee
 

What's hot (20)

Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent Software
 
H017445260
H017445260H017445260
H017445260
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Survey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationSurvey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communication
 
Oop
OopOop
Oop
 
C++ programing lanuage
C++ programing lanuageC++ programing lanuage
C++ programing lanuage
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
 

Similar to Plagiarism introduction

A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceCheckmarx
 
Software engineering principles in system software design
Software engineering principles in system software designSoftware engineering principles in system software design
Software engineering principles in system software designTech_MX
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Basics of c# by sabir
Basics of c# by sabirBasics of c# by sabir
Basics of c# by sabirSabir Ali
 
Compiler gate question key
Compiler gate question keyCompiler gate question key
Compiler gate question keyArthyR3
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETWaqas Tariq
 
distributing computing
distributing computingdistributing computing
distributing computingnibiganesh
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinAlexander Klimetschek
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages ijseajournal
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowMassimiliano Di Penta
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality coderadek_j
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalJerin John
 
Tag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationTag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationRa'Fat Al-Msie'deen
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfKayla Smith
 

Similar to Plagiarism introduction (20)

A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk Intelligence
 
Software engineering principles in system software design
Software engineering principles in system software designSoftware engineering principles in system software design
Software engineering principles in system software design
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Basics of c# by sabir
Basics of c# by sabirBasics of c# by sabir
Basics of c# by sabir
 
7068458.ppt
7068458.ppt7068458.ppt
7068458.ppt
 
Compiler gate question key
Compiler gate question keyCompiler gate question key
Compiler gate question key
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NET
 
C sharp
C sharpC sharp
C sharp
 
distributing computing
distributing computingdistributing computing
distributing computing
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 Berlin
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
 
Learning activity 3
Learning activity 3Learning activity 3
Learning activity 3
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality code
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_final
 
Tag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationTag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code Visualization
 
Objective-C
Objective-CObjective-C
Objective-C
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdf
 

Recently uploaded

Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Plagiarism introduction

  • 1. Guide : Ms Sangeetha Jamal Presented by Dept of Computer Science Merin Paul Mtech CS-IS S1 9/25/2012 1
  • 2. Contents  Introduction  Types of Source-code Plagiarism Textual Similarity Functional Similarity  Source Code Detection Algorithms.  Detecting Techniques  Tools used for code based plagiarism.  Conclusion 9/25/2012 2
  • 3. Introduction Plagiarism in source-code files occurs when source-code is copied and edited without proper acknowledgment of the original author. Techniques for plagiarism: Lexical changes and structural changes. Lexical changes: changes that can be done to the source- code without affecting the parsing of the program 9/25/2012 3
  • 4. Introduction Structural changes: changes made to the source code that will affect the parsing of the code and involve program debugging. Reasons for code copying: Code reusing. Programmer limitation Coincidentally implement using the same logic 9/25/2012 4
  • 5. TYPES OF SOURCE CODE PLAGIARISM  Textual Similarity  Functional Similarity 9/25/2012 5
  • 6. Textual Similarity  Two individual source codes look similar based on their textual content.  Textual content mean the words, letters, variable names, etc  Type 1, Type 2, Type 3. 9/25/2012 6
  • 7. Type I  The copied code fragment is as same as the original one without any modification except white spaces, comments and line modifications. int a; // counter // count five times for(a = 0; a < 5; a++) { printf(“a = %d”, a); // print value of a } return 0; 9/25/2012 7
  • 8. Type I int a; /* Loop increasing of a and print a value of it */ for(a = 0; a < 5; a++){ printf(“a = %d”, a); } return 0; 9/25/2012 8
  • 9. Type II  Same as Type I and also with modifications to variable names, function names and other user-defined identifiers. if(a > b) { a = a - 1; b = b * a; // comment 1 } else { b = a; // comment 2a = 0; } 9/25/2012 9
  • 10. Type II if(m > n) {m=m - 5; n=n*m; //my comment 1 } else {n=m; //my comment 2m=0; } 9/25/2012 10
  • 11. Type III  A copied code fragment is done by inserting or removing unnecessary statements. if(a > b) { a = a - 1; b = b * a; } else { b = a; a = 0; } 9/25/2012 11
  • 12. Type III if(a > b) { a = a – 1; c = 0; // this statement is added b = b * a; } else { b = a; a = 0; } 9/25/2012 12
  • 13. Functional similarity It refers to the code fragments that have the same semantic or functionality. fragment 1 : fragment 2: int i , j = 1; int factorial(int n) for(i = 1; i <= VALUE; i++) { j = j * i; if(n == 0) return 1; else return factorial(n – 1)*n; } 9/25/2012 13
  • 14. Source Code Detection Algorithms  Text based  Token-based  Parse tree-based  PDG-based  Metrics-based  Hybrid Approaches 9/25/2012 14
  • 15. CONTD..  Text based  Find textual match between two source codes.. Simple and Fast.  Token based  Using a lexer to convert the program into tokens. Find a match in token sequences. More robust to simple text replacements. 9/25/2012 15
  • 16. CONTD…  Parse Trees Build and compare parsetrees Contains the complete information about the source code Tree comparison can normalize conditional statements.  Program Dependency Graphs (PDGs) Captures the actual flow of control in a program. Allows higher-level equivalences to be located. More complex. 9/25/2012 16
  • 17. CONTD…  Metrics capture 'scores' of code segments according to certain criteria. Metrics are simple to calculate. Lead to false positives. • Hybrid Combination of two or more previous techniques. 9/25/2012 17
  • 18. Detecting Techniques Detection via Lexical Similarities The process of lexical analysis takes source code and converts it into a stream of lexical tokens. Source code undergoes a series of transformation. Identification of reserved words, identifiers, and numbers are beneficial for plagiarism detection. 9/25/2012 18
  • 19. CONTD… int[] A = {1,2,3,4}; int[] B = {1, 2, 3, 4}; for(int i = 0; i < for(int j = 0; j < B.length; A.length; i++) { j++) { A[i] = A[i] + 1; B[j] = B[j] + 1; } } 9/25/2012 19
  • 20. CONTD… LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INT COMMA NUM_INT COMMA NUM_INT RCURLY SEMI LITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY NUM_INT SEMI RCURLY 9/25/2012 20
  • 21. Detection via Parse Tree Similarities 9/25/2012 21
  • 22. Detection via Metrics  Calculate and compare attribute counts.  Programs with similar attribute counts are potentially similar programs.  Counts of operators and operands are typically used to construct attribute counts. 9/25/2012 22
  • 23. Tools used for code based plagiarism Jplag  Finds similarities among multiple sets of source code files.  JPlag operates in two phases.  First phase: All programs to be compared are parsed and converted into token strings.  Second phase: Token strings are compared in pairs for determining the similarity of each pair.  It is more robust. It supports Java, c#, C, C++ and natural language text. 9/25/2012 23
  • 24. CONTD.. MOSS (Measure Of Software Similarity)  Measure Of Software Similarity was developed in 1994 by Alex Aiken.  It analyzes code written in languages like C, C++, Python, Visual Basic, Javascript, FORTRAN, Lisp, Ada etc.  Provided as an internet service and given a list of source files. 9/25/2012 24
  • 25. CONTD…  YAP (Yet Another Plague)  Token-based system.  YAP works in two phases.  The first phase generates a token file for each submission.  The second phase compares pairs of token files using the token matching algorithm, Running-Karp-Rabin Greedy- String-Tiling algorithm (RKRGST) 9/25/2012 25
  • 26. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Plagiarism Detection systems are built based on a few languages.  Most of the detection software checking is done with some repository situated in an organization.  As the number of digital copies are going up the repository size should be large and the plagiarism Detection software should be able to handle it. 9/25/2012 26
  • 27. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Most popular plagiarism detection algorithms use string- matching to create token string representations of programs.  The tokens of each document are compared on a pair-wise basis to determine similar source-code segments between the files.  String-matching systems are language-dependent depending on the programming languages supported by their parsers 9/25/2012 27
  • 28. References 1) G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis” IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012 2) Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th August 2007 ,ICS,University of Ulster http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/ 3) Okiemute Omuta ”Electronic Source Code Plagiarism Detection” Computer Engineering Department,European University of Lefke, North Cyprus 4) S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proc. the ACM SIGMOD Int’l Conf. Management of Data, pp. 76-85, 2003 9/25/2012 28
  • 29. References 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer Program and Other Texts,” Proc. 27th SIGCSE Technical Symp., pp. 130-134, 1996. 9/25/2012 29