SlideShare a Scribd company logo
1 of 29
Translated and
Paraphrased Plagiarism
    The cat and mouse game continues...

  One man’s rigor is another man’s mortis.
          - CF Bohren and DR Huffman, 1983
Over view

The ever changing counter-detection
landscape
Paraphrasing versus textual entailment
Ways to paraphrase
Tools of the trade
Many Roads to Plagiarism
The ‘old fashioned’ way
Many Roads to Plagiarism
Translated plagiarism.
Many Roads to Plagiarism
Translated plagiarism.
Many Roads to Plagiarism
    Paraphrased plagiarism
                    Back-translation: the latest form of plagiarism
                                 Michael Jones University of Wollongong, Australia

               4th Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009


Paraphrased plagiarism is not new either. However, there are new
tools to aid in automatically paraphrasing text which are accelerating
this form of detection avoidance.

Paraphrase plagiat n'est pas nouveau non plus. Toutefois, il existe
de nouveaux outils pour l'aide dans le texte paraphrase
automatiquement qui sont l'accélération de cette forme d'évasion de
détection.

Paraphrase plagiarism is not new either. However, there are
new tools to help in paraphrasing the text automatically, which are
accelerating this form of escape detection.
Many Roads to Plagiarism
Paraphrased plagiarism
Many Roads to Plagiarism
Paraphrased plagiarism
Paraphrasing vs Textual
     Entailment
Two sentences are paraphrased if they
“mean the same thing”:
 1) Similarity: they share a substantial
 amount of information
 2) Dissimilarities are extraneous: if
 extra information in the sentences
 exists, the effect of its removal is not
 significant.
Paraphrasing vs Textual
     Entailment
A paraphrase is a special case of textual
entailment. A paraphrase is reflexive
whereas textual entailment indicates
that t wo sentences overlap to a degree
with one sentence being subsumed by
the other.
Ways to Paraphrase
    Lexical substitution/synonymy
      Hypo/Syno/Hyper-nym replacement: article,
      paper or red, crimson
•     Acronym replacement: Mr., mister
•     Contractions: do not, don’t
      Compounding/decompounding: ballgame, ball
      game
•     Numeric/Alphabetic numbers: 11, eleven;
      12/1/2010, December first t wo-thousand-ten
Ways to Paraphrase

     Active and passive exchange
          The gangster killed 3 innocent people.
          vs Three innocent people are killed by
          the gangster.
•    Re-ordering of sentence components
          Tuesday they met vs They met Tuesday
    Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase
 Realization in different syntactic
 components
      Palestinian leader Arafat vs
      Arafat, Palestinian leader
 Prepositional phrase attachment
      The Alabama plant vs
      A plant in Alabama
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase
      Change into different sentence types
           Who drew this picture? vs
           Tell me who drew this picture.
      Morphological derivation
           He is a good teacher. vs
           He teaches well. vs
           He is good at teaching.
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase

      Light verb construction
           The film impressed him. vs
           The film made an impression on him.
        Comparatives vs. superlatives
           He is smarter than everyone else. vs
           He is the smartest one.
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase

      Converse word substitution
           John is Mary's husband. vs
           Mary is John's wife.
      Verb nominalization
           He wrote the book. vs
           He was the author of the book.
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase
      Substitution using words with
      overlapping meanings
           Bob excels at mathematics. vs
           Bob studies mathematics well.
      Inference
           He died of cancer. vs
           Cancer killed him.
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Ways to Paraphrase
      Different semantic role realization
           He enjoyed the game. vs
           The game pleased him.
      Subordinate clauses vs separate
      sentences lined by anaphoric pronouns.
           The tree healed its wounds by growing
           new bark. vs
           The tree healed its wounds. It grew
           new bark.
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
Tools of the Trade
Microsoft paraphrase corpus
 Used to test algorithms
WordNet: English only :(
 Synonyms, hypernyms, hyponyms,
 and antonyms.
Algorithms: Finite State Transducers
(FSTs) and/or iterative Longest Common
Sequence (LCS) on sets.
Tools of the Trade


Stemming or lemmatization
 am, are, is     be
 car, cars, car's, cars'   car
Word Alignment Examples
       According to the MS paraphrase corpus:
    This is a paraphrase




 12/14 = 86%
 12/16 = 75%
Not Paraphrased (However, the first sentence is textually entailed by the second.
                    Turnitin would currently match this.)



 18/19 = 95%
 18/26 = 69%
Slippery Slope
When does textual entailment become arbitrary/noise?




    14/48 = 29%
    14/34 = 41%
Slippery Slope
When does textual entailment become arbitrary/noise?




13/24 = 54%
13/21 = 62%
Translated Plagiarism

Non-English markets, in particular, are
concerned about their English as a
second language students submitting
English documents that have been
translated to their native language.
Translated Plagiarism
Initial approach:
 Non-English documents searched as
 they are now
 Additional search performed:
 Translate document to English, search
 English documents, and then display
 English matches with translations (or
 vice versa)
Translated Plagiarism
Our new strategic partner:




On demand SaaS statistical machine translation
Translated Plagiarism: Need
    for Paraphrasing?
  Machines and humans translate text in
  many different ways.
  Paraphrase detection allows us to
  match the variations.
   Google translate: The zeitgeist is thinking and feeling one age. The term describes
   the characteristics of a particular period, or an attempt to remind us it. The German
   word Zeitgeist is transferred through English as a loanword into numerous other
   languages been.

   Bing translate: Zeitgeist is thinking and feeling how an age. Is the nature of a
   particular era or trying to understand them. The German word Zeitgeist is taken from
   English as a loanword in many other languages.

     http://de.wikipedia.org/wiki/Zeitgeist
Possible Turnitin UI
Finis


Thank you for listening!
Questions?

More Related Content

What's hot

Adverbials and other related matters work 2015
Adverbials and other related matters work 2015Adverbials and other related matters work 2015
Adverbials and other related matters work 2015Viana Nacolonha
 
Aya sentence relation and truth
Aya sentence relation and truthAya sentence relation and truth
Aya sentence relation and truthMYlove99
 
Term paper of pragmatics presupposition
Term paper of pragmatics presuppositionTerm paper of pragmatics presupposition
Term paper of pragmatics presuppositionMuhammad Sajjad Raja
 
tajweed تجويد
tajweed تجويد tajweed تجويد
tajweed تجويد khaled sabaa
 
Common errors comitted in translating bazlik
Common errors comitted in translating  bazlikCommon errors comitted in translating  bazlik
Common errors comitted in translating bazlikAST-School
 
Phrase structure rule
Phrase structure rulePhrase structure rule
Phrase structure ruleSila Chaniago
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translationAcademic Supervisor
 
Advance english 3[1]
Advance english 3[1]Advance english 3[1]
Advance english 3[1]Yota Bhikkhu
 
Types of Presupposition
Types of PresuppositionTypes of Presupposition
Types of Presuppositionsheroz_ramzan
 
Pragmatics presupposition and entailnment
Pragmatics presupposition and entailnmentPragmatics presupposition and entailnment
Pragmatics presupposition and entailnmentphannguyen161
 
Guide to punctuation
Guide to punctuationGuide to punctuation
Guide to punctuationhelenmazarron
 
Guide to punctuation
Guide to punctuationGuide to punctuation
Guide to punctuationAna Mena
 
The problem of non equivalence
The problem of non equivalenceThe problem of non equivalence
The problem of non equivalenceEve_55
 

What's hot (17)

Adverbials and other related matters work 2015
Adverbials and other related matters work 2015Adverbials and other related matters work 2015
Adverbials and other related matters work 2015
 
Aya sentence relation and truth
Aya sentence relation and truthAya sentence relation and truth
Aya sentence relation and truth
 
Term paper of pragmatics presupposition
Term paper of pragmatics presuppositionTerm paper of pragmatics presupposition
Term paper of pragmatics presupposition
 
tajweed تجويد
tajweed تجويد tajweed تجويد
tajweed تجويد
 
Common errors comitted in translating bazlik
Common errors comitted in translating  bazlikCommon errors comitted in translating  bazlik
Common errors comitted in translating bazlik
 
Phrase structure rule
Phrase structure rulePhrase structure rule
Phrase structure rule
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translation
 
Advance english 3[1]
Advance english 3[1]Advance english 3[1]
Advance english 3[1]
 
Types of Presupposition
Types of PresuppositionTypes of Presupposition
Types of Presupposition
 
Syntax
SyntaxSyntax
Syntax
 
Eduction (ph1)
Eduction (ph1)Eduction (ph1)
Eduction (ph1)
 
Pragmatics presupposition and entailnment
Pragmatics presupposition and entailnmentPragmatics presupposition and entailnment
Pragmatics presupposition and entailnment
 
Guide to punctuation
Guide to punctuationGuide to punctuation
Guide to punctuation
 
Leipziggloss
LeipzigglossLeipziggloss
Leipziggloss
 
Guide to punctuation
Guide to punctuationGuide to punctuation
Guide to punctuation
 
Verb phrase
Verb phraseVerb phrase
Verb phrase
 
The problem of non equivalence
The problem of non equivalenceThe problem of non equivalence
The problem of non equivalence
 

Similar to 34016665 translation-and-paraphrasing

Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...eaquals
 
language skills editing updated
language skills editing updatedlanguage skills editing updated
language skills editing updatedKiran
 
Class 1 Pronouns Iza May 16 2009
Class 1  Pronouns   Iza May 16 2009Class 1  Pronouns   Iza May 16 2009
Class 1 Pronouns Iza May 16 2009justbrasil
 
Pronouns Slideshow
Pronouns Slideshow Pronouns Slideshow
Pronouns Slideshow SamG62
 
Textual cohesion
Textual cohesionTextual cohesion
Textual cohesionmrstovila
 
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...Romina Marazzato Sparano
 
The lexical approach and lexical priming(1)
The lexical approach and lexical priming(1)The lexical approach and lexical priming(1)
The lexical approach and lexical priming(1)walkea
 
AEDU 431 - Week 4 - Learning Object - John Ellis
AEDU 431 - Week 4 - Learning Object - John EllisAEDU 431 - Week 4 - Learning Object - John Ellis
AEDU 431 - Week 4 - Learning Object - John EllisJohn Ellis
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of SpeechAl_Waseem
 
Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word unitsPascual Pérez-Paredes
 
Case theory in Punjabi Language syntax
Case theory in Punjabi Language syntaxCase theory in Punjabi Language syntax
Case theory in Punjabi Language syntaxBasharat Mirza
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetationqamaraftab6
 

Similar to 34016665 translation-and-paraphrasing (20)

Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...Professor Michael Hoey: The hidden similarities across languages - some good ...
Professor Michael Hoey: The hidden similarities across languages - some good ...
 
Syntax turn paper
Syntax turn paperSyntax turn paper
Syntax turn paper
 
Grammar Syntax(1).pptx
Grammar Syntax(1).pptxGrammar Syntax(1).pptx
Grammar Syntax(1).pptx
 
language skills editing updated
language skills editing updatedlanguage skills editing updated
language skills editing updated
 
Class 1 Pronouns Iza May 16 2009
Class 1  Pronouns   Iza May 16 2009Class 1  Pronouns   Iza May 16 2009
Class 1 Pronouns Iza May 16 2009
 
Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
 
Unergativity in Embosi
Unergativity in EmbosiUnergativity in Embosi
Unergativity in Embosi
 
Pronouns Slideshow
Pronouns Slideshow Pronouns Slideshow
Pronouns Slideshow
 
P99 1067
P99 1067P99 1067
P99 1067
 
Textual cohesion
Textual cohesionTextual cohesion
Textual cohesion
 
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...
Beyond Readability Formulas: The Editor as Advocate of Whole Text and All Rea...
 
The lexical approach and lexical priming(1)
The lexical approach and lexical priming(1)The lexical approach and lexical priming(1)
The lexical approach and lexical priming(1)
 
West greenlandic antipassive
West greenlandic antipassiveWest greenlandic antipassive
West greenlandic antipassive
 
AEDU 431 - Week 4 - Learning Object - John Ellis
AEDU 431 - Week 4 - Learning Object - John EllisAEDU 431 - Week 4 - Learning Object - John Ellis
AEDU 431 - Week 4 - Learning Object - John Ellis
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of Speech
 
DICTION pptx
DICTION pptxDICTION pptx
DICTION pptx
 
Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word units
 
Case theory in Punjabi Language syntax
Case theory in Punjabi Language syntaxCase theory in Punjabi Language syntax
Case theory in Punjabi Language syntax
 
cohesion
cohesioncohesion
cohesion
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetation
 

Recently uploaded

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 

34016665 translation-and-paraphrasing

  • 1. Translated and Paraphrased Plagiarism The cat and mouse game continues... One man’s rigor is another man’s mortis. - CF Bohren and DR Huffman, 1983
  • 2. Over view The ever changing counter-detection landscape Paraphrasing versus textual entailment Ways to paraphrase Tools of the trade
  • 3. Many Roads to Plagiarism The ‘old fashioned’ way
  • 4. Many Roads to Plagiarism Translated plagiarism.
  • 5. Many Roads to Plagiarism Translated plagiarism.
  • 6. Many Roads to Plagiarism Paraphrased plagiarism Back-translation: the latest form of plagiarism Michael Jones University of Wollongong, Australia 4th Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009 Paraphrased plagiarism is not new either. However, there are new tools to aid in automatically paraphrasing text which are accelerating this form of detection avoidance. Paraphrase plagiat n'est pas nouveau non plus. Toutefois, il existe de nouveaux outils pour l'aide dans le texte paraphrase automatiquement qui sont l'accélération de cette forme d'évasion de détection. Paraphrase plagiarism is not new either. However, there are new tools to help in paraphrasing the text automatically, which are accelerating this form of escape detection.
  • 7. Many Roads to Plagiarism Paraphrased plagiarism
  • 8. Many Roads to Plagiarism Paraphrased plagiarism
  • 9. Paraphrasing vs Textual Entailment Two sentences are paraphrased if they “mean the same thing”: 1) Similarity: they share a substantial amount of information 2) Dissimilarities are extraneous: if extra information in the sentences exists, the effect of its removal is not significant.
  • 10. Paraphrasing vs Textual Entailment A paraphrase is a special case of textual entailment. A paraphrase is reflexive whereas textual entailment indicates that t wo sentences overlap to a degree with one sentence being subsumed by the other.
  • 11. Ways to Paraphrase Lexical substitution/synonymy Hypo/Syno/Hyper-nym replacement: article, paper or red, crimson • Acronym replacement: Mr., mister • Contractions: do not, don’t Compounding/decompounding: ballgame, ball game • Numeric/Alphabetic numbers: 11, eleven; 12/1/2010, December first t wo-thousand-ten
  • 12. Ways to Paraphrase Active and passive exchange The gangster killed 3 innocent people. vs Three innocent people are killed by the gangster. • Re-ordering of sentence components Tuesday they met vs They met Tuesday Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 13. Ways to Paraphrase Realization in different syntactic components Palestinian leader Arafat vs Arafat, Palestinian leader Prepositional phrase attachment The Alabama plant vs A plant in Alabama Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 14. Ways to Paraphrase Change into different sentence types Who drew this picture? vs Tell me who drew this picture. Morphological derivation He is a good teacher. vs He teaches well. vs He is good at teaching. Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 15. Ways to Paraphrase Light verb construction The film impressed him. vs The film made an impression on him. Comparatives vs. superlatives He is smarter than everyone else. vs He is the smartest one. Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 16. Ways to Paraphrase Converse word substitution John is Mary's husband. vs Mary is John's wife. Verb nominalization He wrote the book. vs He was the author of the book. Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 17. Ways to Paraphrase Substitution using words with overlapping meanings Bob excels at mathematics. vs Bob studies mathematics well. Inference He died of cancer. vs Cancer killed him. Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 18. Ways to Paraphrase Different semantic role realization He enjoyed the game. vs The game pleased him. Subordinate clauses vs separate sentences lined by anaphoric pronouns. The tree healed its wounds by growing new bark. vs The tree healed its wounds. It grew new bark. Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.
  • 19. Tools of the Trade Microsoft paraphrase corpus Used to test algorithms WordNet: English only :( Synonyms, hypernyms, hyponyms, and antonyms. Algorithms: Finite State Transducers (FSTs) and/or iterative Longest Common Sequence (LCS) on sets.
  • 20. Tools of the Trade Stemming or lemmatization am, are, is be car, cars, car's, cars' car
  • 21. Word Alignment Examples According to the MS paraphrase corpus: This is a paraphrase 12/14 = 86% 12/16 = 75% Not Paraphrased (However, the first sentence is textually entailed by the second. Turnitin would currently match this.) 18/19 = 95% 18/26 = 69%
  • 22. Slippery Slope When does textual entailment become arbitrary/noise? 14/48 = 29% 14/34 = 41%
  • 23. Slippery Slope When does textual entailment become arbitrary/noise? 13/24 = 54% 13/21 = 62%
  • 24. Translated Plagiarism Non-English markets, in particular, are concerned about their English as a second language students submitting English documents that have been translated to their native language.
  • 25. Translated Plagiarism Initial approach: Non-English documents searched as they are now Additional search performed: Translate document to English, search English documents, and then display English matches with translations (or vice versa)
  • 26. Translated Plagiarism Our new strategic partner: On demand SaaS statistical machine translation
  • 27. Translated Plagiarism: Need for Paraphrasing? Machines and humans translate text in many different ways. Paraphrase detection allows us to match the variations. Google translate: The zeitgeist is thinking and feeling one age. The term describes the characteristics of a particular period, or an attempt to remind us it. The German word Zeitgeist is transferred through English as a loanword into numerous other languages been. Bing translate: Zeitgeist is thinking and feeling how an age. Is the nature of a particular era or trying to understand them. The German word Zeitgeist is taken from English as a loanword in many other languages. http://de.wikipedia.org/wiki/Zeitgeist
  • 29. Finis Thank you for listening! Questions?

Editor's Notes

  1. How many people know who Sergey Brin and Larry Page are? For those of you who didn’t raise their hands, they are the founders of Google. Did you know that some of Sergey’s original research wrote was on a plagiarism detection system he wrote with his collaborators. This so called ‘cat and mouse game’ is common place. Rules are meant to be broken. For instance, people who like to drive fast will buy radar detectors instead of abiding by the speed limit. At Turnitin.com we find the same to be true with plagiarism detection. Rules are meant to be broken and students will find or develop new ways to circumvent the system. This talk explores some of the new counter-detection methods being used and what we are doing to counter them. The details are very technical. I’m going to stay away from these details so as to not bore you to death.
  2. First I would like to give a quick survey of the different methods being employed today to avoid detection. None of these are new per se. However, new digital tools are accelerating their use just as the digital authoring of documents, email and the many other modes of document sharing, and, most importantly, the internet made plagiarism a pandemic. Then I’ll switch gears and get a little more technical on you to discuss the finer points of paraphrasing and its comparison to textual entailment.
  3. First I would like to give a quick survey of the different methods being employed today to avoid detection. None of these are new per se. However, new digital tools are accelerating their use just as the digital authoring of documents, email and the many other modes of document sharing, and, most importantly, the internet made plagiarism a pandemic. I believe the most common method of plagiarizing remains copying text from one or more sources where the ‘author’ edits the text to sew the pieces into their paper. Along these lines, you might find it not surprising that Wikipedia is the number one source of internet matches we find in the Turnitin service. Whereas peer collusion accounts for the largest number of matches.
  4. Translated plagiarism or the repurposing of content translated from a foreign language content, isn’t a new phenomenon.
  5. However, growing anecdotal evidence suggests that students and researchers are using this method of plagiarizing to avoid the current detection technologies. In the US the growing population of foreign students and wide availability of machine translation technologies, e.g., Google translate, is thought to be attributing to the rise of this phenomenon.
  6. Tools that paraphrase for you have grown in abundance and sophistication. A simple search for ‘article spinner’ turns up 1.3 million results from Google and a large number of adds for companies promoting their services. This sort of service also goes by other names, such as synonymizers.
  7. Typically these services are aimed at online marketers looking to produce many versions of document that search engines will find unique to artificially promote their site by increasing the number of backlinks to their site. However, they are equally effective in rewriting a student paper.
  8. Having computers understand how similar two sentences are to one another is a rich area of academic and corporate research. The utility of this technology is widespread. Everything from a question and answer system like Wolfram alpha being able to respond to a query the same way despite the multitude of ways you can phrase your question to Google being able to understand the relatedness of web pages at the phrase or sentence level instead of just at a ‘bag of words’ level.
  9. Now I would like to outline the myriad of ways that an author can paraphrase. Although the details of each method isn’t so important. On the whole it demonstrates how rich languages are and, to a lot of people’s surprise, how unique writing is. There are many, many ways to deliver the same information through writing. Initial research is focused on English but the algorithmic framework is being generalized to work with all languages.
  10. Hopefully, by now you can see how hard a natural language processing problem detecting paraphrases is.
  11. I won’t go into detail regarding the algorithms/methods we are exploring but I will highlight some of the tools of the trade. I would also like to point out how different creating production quality code which can deal with enormous scale is in comparison to prototyping a solution. Most solutions simply can’t scale to processing hundreds of thousands of documents against a collection of tens of billions of documents. Microsoft was gracious enough to produce a paraphrase test corpus consisting of 5800 of sentence pairs of which 3900 were considered “semantically equivalent” by two human raters. What is interesting about this is that 83% of the sentence pairs were deemed the same by two raters but the remaining 17% required a third rater to break the stalemate. This elucidates another issue with paraphrase detection, their is a certain level of subjectiveness in ascertaining whether two sentences are equivalent or not. The same situation holds in plagiarism detection. When it comes to small matches one person’s plagiarism is another person’s noise.
  12. The textbook definition of lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. The difference between lemmatization and a stemmer is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. http://en.wikipedia.org/wiki/Lemmatisation
  13. So Turnitin currently does a certain level of paraphrase and textual entailment matching. To that end we’ve spent a lot of time adjusting the algorithms so that they are at the same time effective matching text while not producing ‘noisy’ or spurious matches. This is one of the hard problems that we are trying to solve. If done incorrectly, paraphrased or textually entailed matches which allow for a much higher degree of change could swamp a report in spurious matches. 18/19 18/26
  14. One way to visualize this is to compare the sentences of an document against itself. The document typically is discussing a particular topic and it has a consistent voice. To this end for this presentation, I took a news article about Google’s new buzz messaging service and did a document wise comparison of the sentences. In this example you can see that the sentences are somewhat semantically related but wouldn’t be deemed paraphrases of each other.
  15. So what if I changed each sentence so that they became more similar. At what point do they become semantically equivalent enough for it to go from false positive to a true positive. Although maybe not the best example, I think it illustrates the pitfalls of developing such a system. I believe the answer has to do with context, parts of speech, and the importance of the entailed words. Generalizing this type of AI model is very difficult because it is easy to overfit the model against the training data.
  16. Translated plagiarism is a particular type of cross-lingual information retrieval aimed at finding similar documents across languages.
  17. The first step in offering a translated plagiarism detection service is to find a partner that offers machine translation. Language Weaver currently offers 37 language pairs. Although Google and others offer free translation services their use is contractually limited and not bound by service level agreements. Furthermore, we feel the quality of language weaver’s technology is at the moment superior to the ‘free’ services.
  18. Inter- and intra-language matches will be displayed together. We are considering offering a confidence level of the inter-language match.