SlideShare a Scribd company logo
1 of 26
Machine Learning and Affect Analysis
      Against Cyber-Bullying

             Michal Ptaszynski, Pawel Dybala,
             Tatsuaki Matsuba, Fumito Masui,
                      Rafal Rzepka, Kenji Araki
Presentation Outline
• Introduction
• What is Cyber-Bullying?
• Machine Learning Method for Cyber-bullying
  Detection
• Affect Analysis of Cyber-Bullying Data
• Conclusions & Future Work
Introduction
• Dialogue agent–companion needs to be aware
  of undesirable activities (in or around the
  user)
• Application: Web security
  • Could these be stopped with Web-mining?
                  • Bus-jack case,
                    2000, Japan
                                     • 9.11
Introduction
• Dialogue agent–companion needs to be aware
  of undesirable activities (inDemi Moore
                                or around the
                               Saves A Teen
  user)                        Through
                               Twitter 03.19
• Application: Web security
  • Could these be stopped with Web-mining?
                  • Bus-jack case,
                    2000, Japan
                                     • 9.11
Introduction
   We need an
• Dialogue agent–companion needs to be aware
  artificial Demi
  of undesirable activities (inDemi Moore
                                or around the
                               Saves A Teen
      Moore!
  user)                        Through
                               Twitter 03.19
• Application: Web security
  • Could these be stopped with Web-mining?
                  • Bus-jack case,
                    2000, Japan
                                     • 9.11
New Threat: Cyber-Bullying
• cyber-bullying (or cyber-harassment, cyber-
  stalking)
     • Cyber-bullying happens ”when the Internet, cell phones or
       other devices are used to send or post text or images
       intended to hurt or embarrass another person.”
         – The National Crime Prevention Council in America

     • cyber-bullying ”involves the use of information and
       communication technologies to support deliberate,
       repeated, and hostile behavior by an individual or group,
       that is intended to harm others.”
         – B. Belsey. Cyberbullying: An Emerging Threat for the ”Always On”
           Generation, http://www.cyberbullying.ca/pdf/Cyberbullying
           Presentation Description.pdf
New Threat: Cyber-Bullying
• In Japan:
  – several suicide cases of cyber-bullying victims
  – Ministry of Education officially considers cyber-
    bullying a problem and produces a manual for
    spotting and handling the cyber-bullying cases.
     • Ministry of Education, Culture, Sports, Science and
       Technology, 2008:
        – 'Netto jou no ijime' ni kansuru taiou manyuaru jirei shuu
          (gakkou, kyouin muke)
        – ["Bullying on the net" Manual for handling and the collection
          of cases (directed to school teachers)] (in Japanese).
New Threat: Cyber-Bullying
• In Japan:
 – Volunteers (teachers, PTA members) started Online
   Patrol (OP) to spot CB cases…
 – But there is too much of it!
   (impossible deal with
   all of it manually)
New Threat: Cyber-Bullying
• In Japan:
 – Volunteers (teachers, PTA members) started Online
   Patrol (OP) to spot CB cases…
 – But there is too much of it!
   (impossible deal with
   all of it manually)

    Need to help OP
     automatically
      spot Cyber-
        bullying
Cyber-bullying Detection Method
• Construction of lexicon of
  words distinguishable for
  cyber-bullying
• Estimation of word
  similarity (due to slang
  modifications of words)
• Classification of entries
  into harmful/non-harmful
• Ranking according to
  harmfulness
Lexicon Construction
• Words distinguishable for cyber-bullying =
  vulgarities
  – In English: f**ck, b*tch, sh*t, c*nt, etc..
  – In Japanese: uzai (freaking annoying), kimoi
    (freaking ugly), etc.
  ↓
• Usually not recognized by parsers
Lexicon Construction
     • Obtained Cyber-bullying data (from Online
       Patrol of Japanese secondary school sites)*

     • Read and manually specified 216
       distinguishable vulgar words.
                                                                    Example:

     • Added to parser dictionary:


*) From Human Rights Research Institute Against All Forms for Discrimination and Racism-MIE, Japan
Similarity Estimation
• Jargonization (online slang)
  – English: “CU” (see you [later]), “brah” (bro[ther],
    friend)
  – Japanese:




  *Problem: The same words will not be recognized or
    will be recognized as separate words.
Similarity Estimation
    • Use Levenshtein Distance
                   – “The Levenshtein Distance between two strings is calculated
                     as the minimum number of operations required to transform
                     one string into another, where the available operations are
                     only deletion, insertion or substitution of a single character.”




V. I. Levenshtein. Binary Code Capable of Correcting Deletions, Insertions and Reversals. Doklady
Akademii Nauk SSSR, Vol. 163, No. 4, pp. 845-848 (1965).
Similarity Estimation
• Add heuristic rules for optimization




• With the threshold set
  on 2, the Precision
  before applying the
  rules was 58.9% and
  was improved to 85.0%.
SVM Classification
• Support Vector Machines (SVM) are a method
  of supervised machine learning developed by
  Vapnik [14] and used for classification of data.
• Training data: 966 entries (750 hamful, 216
  non-harmful)
• Calculate result as balanced F-score (with
  Precision and Recall)
• Perform 10-fold cross validation on all data
SVM Classification
• 10-fold cross validation on all data
  – Divide data to 10 parts
  – Use 9 for training and 1 for test
  – Perform 10 times and take an approximation.

  Precision=79.9%, Recall=98.3%  F=88.2%
Affect Analysis of Cyber-Bullying Data
      • The Affect Analysis system used:
            – ML-Ask:
                  1. Determines Emotiveness
                  2. Determines the types of emotions expressed




M. Ptaszynski, P. Dybala, R. Rzepka and K. Araki. Affecting Corpora: Experiments with Automatic Affect Annotation System
- A Case Study of the 2channel Forum -’, In Proceedings of The Conference of the Pacific Association for Computational
Linguistics 2009 (PACLING-09), pp. 223-228 (2009).
Affect Analysis of Cyber-Bullying Data
• The Affect Analysis system used:
  – ML-Ask:
     1. Emotiveness:
        1.   Determine whether utterance is emotive (0/1)
        2.   Calculate emotive value of an utterance (0-5)
        3.   Number of emotive utterances in conversation
        4.   Approx emotive value for all utterances
        5.   Determine number of emotiveness’ features:
             • Interjections
             • Exclamations
             • Vulgarities
             • Mimetic expressions
Affect Analysis of Cyber-Bullying Data
    • The Affect Analysis system used:
          – ML-Ask:
                2. Determines the types of emotions expressed:
                    One of 10 emotion types said to be the most appropriate
                    for the Japanese language:
                    ki/yorokobi (joy, delight), do/ikari (anger), ai/aware
                    (sadness, gloom), fu/kowagari (fear), chi/haji (shame,
                    shyness), ko/suki (liking, fondness), en/iya (dislike),
                    ko/takaburi (excitement), an/yasuragi (relief) and
                    kyo/odoroki (surprise, amazement)

                       Based on an emotive expression database



A. Nakamura. Kanjo hyogen jiten [Dictionary of Emotive Expressions] (in Japanese), Tokyodo Publishing, Tokyo, 1993.
Affect Analysis of Cyber-Bullying Data
• Results
Affect Analysis of Cyber-Bullying Data
• Results                              Many moderate proofs:
                                     Harmful data is less emotive
     1. Emotiveness:
        1.   Determine whether utterance is emotive
        2.   Calculate emotive value of an utterance
        3.   Number of emotive utterances in conversation
        4.   Approx emotive value for all utterances
        5.   Determine number of emotiveness’ features:
             • Interjections
             • Exclamations
             • Vulgarities
             • Mimetic expressions
                                          There are two distinctive features
Affect Analysis of Cyber-Bullying Data
• Results
  2. Emotion types
  – More positive emotions in non-harmful data
  – Slightly more negative emotions in harmful data
  – Detailed analysis: fondness is often used in irony
Conclusions
• New problem: Cyber-Bullying
• Prototype Machine Learning Method for Cyber-
  bullying Detection
  – Results not ideal, but somewhat encouraging
• Affect Analysis of Cyber-Bullying Data
  – Cyber-bullying is less “emotive” (cold irony)
  – Distinctive features of CB: vulgarities, mimetic
    expressions
  – Expressions of emotions considered as positive are
    often used in ironic meaning
Future Works
• New vulgarities are created everyday
  – Create a method for extraction of vulgarities
  – Find a syntactic model of vulgar expression
• Implement in to a web crawler automatically
  performing online patrol (e.g. for school web
  sites)
Thank you for your attention.

More Related Content

Viewers also liked

Ijcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalIjcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalMichal Ptaszynski
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notificationokazaki117
 
semantic social network analysis
semantic social network analysissemantic social network analysis
semantic social network analysisguillaume ereteo
 
IEEE 2016-2017 SOFTWARE TITLE
IEEE 2016-2017  SOFTWARE TITLE IEEE 2016-2017  SOFTWARE TITLE
IEEE 2016-2017 SOFTWARE TITLE FOCUSLOGICPROJECTS
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Shalin Hai-Jew
 
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot..."Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...Davide Chicco
 
Internet safety and cyber bulling final
Internet safety and cyber bulling finalInternet safety and cyber bulling final
Internet safety and cyber bulling finalKaren Brooks
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural networkSmriti Tikoo
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Ohsawa Goodfellow
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural networkIndira Nayak
 
Ersatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j DemoErsatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j DemoAdam Gibson
 
Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Adrian Paschke
 
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Nicolas Nicolov
 

Viewers also liked (20)

Ijcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalIjcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-final
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notification
 
semantic social network analysis
semantic social network analysissemantic social network analysis
semantic social network analysis
 
IEEE 2016-2017 SOFTWARE TITLE
IEEE 2016-2017  SOFTWARE TITLE IEEE 2016-2017  SOFTWARE TITLE
IEEE 2016-2017 SOFTWARE TITLE
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
 
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot..."Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...
"Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annot...
 
Internet safety and cyber bulling final
Internet safety and cyber bulling finalInternet safety and cyber bulling final
Internet safety and cyber bulling final
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural network
 
Ersatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j DemoErsatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j Demo
 
Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010
 
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...
 

Similar to Machine Learning Detects Cyber-Bullying

Special Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecuritySpecial Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecurityMichael Rushanan
 
DarkPatternsUpdated.pptx
DarkPatternsUpdated.pptxDarkPatternsUpdated.pptx
DarkPatternsUpdated.pptxEmma Keaveny
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Jason Hong
 
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...AIRCC Publishing Corporation
 
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...AIRCC Publishing Corporation
 
CS147 Social Mobile
CS147 Social MobileCS147 Social Mobile
CS147 Social Mobilemor
 
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011Jason Hong
 
Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningMirXahid1
 
How machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIHow machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIVerena Rieser
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textDanbi Cho
 
Utian Ayuba - Profiling The Cloud Crime.pdf
Utian Ayuba - Profiling The Cloud Crime.pdfUtian Ayuba - Profiling The Cloud Crime.pdf
Utian Ayuba - Profiling The Cloud Crime.pdfidsecconf
 
Digital literacy edpc605
Digital literacy edpc605Digital literacy edpc605
Digital literacy edpc605Barbara M. King
 
Cyberbullying Assignment - Inbox The movie
Cyberbullying Assignment - Inbox The movieCyberbullying Assignment - Inbox The movie
Cyberbullying Assignment - Inbox The movieMichelle Urdiales
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
 
Detection of Cyberbullying on Social Media using Machine Learning
Detection of Cyberbullying on Social Media using Machine LearningDetection of Cyberbullying on Social Media using Machine Learning
Detection of Cyberbullying on Social Media using Machine LearningIRJET Journal
 
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid Platform
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid PlatformEmpathyWorks – Towards an Event-Based Simulation/ML Hybrid Platform
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid PlatformMike Slinn
 

Similar to Machine Learning Detects Cyber-Bullying (20)

Special Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecuritySpecial Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on Cybersecurity
 
DarkPatternsUpdated.pptx
DarkPatternsUpdated.pptxDarkPatternsUpdated.pptx
DarkPatternsUpdated.pptx
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
 
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
 
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning...
 
CS147 Social Mobile
CS147 Social MobileCS147 Social Mobile
CS147 Social Mobile
 
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011
Achieving Behavioral Change, for ISSA 2011 in San Francisco Feb 2011
 
Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learning
 
How machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIHow machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AI
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Cybercrime
CybercrimeCybercrime
Cybercrime
 
Cybercrime
CybercrimeCybercrime
Cybercrime
 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
 
Utian Ayuba - Profiling The Cloud Crime.pdf
Utian Ayuba - Profiling The Cloud Crime.pdfUtian Ayuba - Profiling The Cloud Crime.pdf
Utian Ayuba - Profiling The Cloud Crime.pdf
 
Digital literacy edpc605
Digital literacy edpc605Digital literacy edpc605
Digital literacy edpc605
 
Cyberbullying Assignment - Inbox The movie
Cyberbullying Assignment - Inbox The movieCyberbullying Assignment - Inbox The movie
Cyberbullying Assignment - Inbox The movie
 
Dark patterns
Dark patternsDark patterns
Dark patterns
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
Detection of Cyberbullying on Social Media using Machine Learning
Detection of Cyberbullying on Social Media using Machine LearningDetection of Cyberbullying on Social Media using Machine Learning
Detection of Cyberbullying on Social Media using Machine Learning
 
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid Platform
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid PlatformEmpathyWorks – Towards an Event-Based Simulation/ML Hybrid Platform
EmpathyWorks – Towards an Event-Based Simulation/ML Hybrid Platform
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Machine Learning Detects Cyber-Bullying

  • 1. Machine Learning and Affect Analysis Against Cyber-Bullying Michal Ptaszynski, Pawel Dybala, Tatsuaki Matsuba, Fumito Masui, Rafal Rzepka, Kenji Araki
  • 2. Presentation Outline • Introduction • What is Cyber-Bullying? • Machine Learning Method for Cyber-bullying Detection • Affect Analysis of Cyber-Bullying Data • Conclusions & Future Work
  • 3. Introduction • Dialogue agent–companion needs to be aware of undesirable activities (in or around the user) • Application: Web security • Could these be stopped with Web-mining? • Bus-jack case, 2000, Japan • 9.11
  • 4. Introduction • Dialogue agent–companion needs to be aware of undesirable activities (inDemi Moore or around the Saves A Teen user) Through Twitter 03.19 • Application: Web security • Could these be stopped with Web-mining? • Bus-jack case, 2000, Japan • 9.11
  • 5. Introduction We need an • Dialogue agent–companion needs to be aware artificial Demi of undesirable activities (inDemi Moore or around the Saves A Teen Moore! user) Through Twitter 03.19 • Application: Web security • Could these be stopped with Web-mining? • Bus-jack case, 2000, Japan • 9.11
  • 6. New Threat: Cyber-Bullying • cyber-bullying (or cyber-harassment, cyber- stalking) • Cyber-bullying happens ”when the Internet, cell phones or other devices are used to send or post text or images intended to hurt or embarrass another person.” – The National Crime Prevention Council in America • cyber-bullying ”involves the use of information and communication technologies to support deliberate, repeated, and hostile behavior by an individual or group, that is intended to harm others.” – B. Belsey. Cyberbullying: An Emerging Threat for the ”Always On” Generation, http://www.cyberbullying.ca/pdf/Cyberbullying Presentation Description.pdf
  • 7. New Threat: Cyber-Bullying • In Japan: – several suicide cases of cyber-bullying victims – Ministry of Education officially considers cyber- bullying a problem and produces a manual for spotting and handling the cyber-bullying cases. • Ministry of Education, Culture, Sports, Science and Technology, 2008: – 'Netto jou no ijime' ni kansuru taiou manyuaru jirei shuu (gakkou, kyouin muke) – ["Bullying on the net" Manual for handling and the collection of cases (directed to school teachers)] (in Japanese).
  • 8. New Threat: Cyber-Bullying • In Japan: – Volunteers (teachers, PTA members) started Online Patrol (OP) to spot CB cases… – But there is too much of it! (impossible deal with all of it manually)
  • 9. New Threat: Cyber-Bullying • In Japan: – Volunteers (teachers, PTA members) started Online Patrol (OP) to spot CB cases… – But there is too much of it! (impossible deal with all of it manually) Need to help OP automatically spot Cyber- bullying
  • 10. Cyber-bullying Detection Method • Construction of lexicon of words distinguishable for cyber-bullying • Estimation of word similarity (due to slang modifications of words) • Classification of entries into harmful/non-harmful • Ranking according to harmfulness
  • 11. Lexicon Construction • Words distinguishable for cyber-bullying = vulgarities – In English: f**ck, b*tch, sh*t, c*nt, etc.. – In Japanese: uzai (freaking annoying), kimoi (freaking ugly), etc. ↓ • Usually not recognized by parsers
  • 12. Lexicon Construction • Obtained Cyber-bullying data (from Online Patrol of Japanese secondary school sites)* • Read and manually specified 216 distinguishable vulgar words. Example: • Added to parser dictionary: *) From Human Rights Research Institute Against All Forms for Discrimination and Racism-MIE, Japan
  • 13. Similarity Estimation • Jargonization (online slang) – English: “CU” (see you [later]), “brah” (bro[ther], friend) – Japanese: *Problem: The same words will not be recognized or will be recognized as separate words.
  • 14. Similarity Estimation • Use Levenshtein Distance – “The Levenshtein Distance between two strings is calculated as the minimum number of operations required to transform one string into another, where the available operations are only deletion, insertion or substitution of a single character.” V. I. Levenshtein. Binary Code Capable of Correcting Deletions, Insertions and Reversals. Doklady Akademii Nauk SSSR, Vol. 163, No. 4, pp. 845-848 (1965).
  • 15. Similarity Estimation • Add heuristic rules for optimization • With the threshold set on 2, the Precision before applying the rules was 58.9% and was improved to 85.0%.
  • 16. SVM Classification • Support Vector Machines (SVM) are a method of supervised machine learning developed by Vapnik [14] and used for classification of data. • Training data: 966 entries (750 hamful, 216 non-harmful) • Calculate result as balanced F-score (with Precision and Recall) • Perform 10-fold cross validation on all data
  • 17. SVM Classification • 10-fold cross validation on all data – Divide data to 10 parts – Use 9 for training and 1 for test – Perform 10 times and take an approximation. Precision=79.9%, Recall=98.3%  F=88.2%
  • 18. Affect Analysis of Cyber-Bullying Data • The Affect Analysis system used: – ML-Ask: 1. Determines Emotiveness 2. Determines the types of emotions expressed M. Ptaszynski, P. Dybala, R. Rzepka and K. Araki. Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum -’, In Proceedings of The Conference of the Pacific Association for Computational Linguistics 2009 (PACLING-09), pp. 223-228 (2009).
  • 19. Affect Analysis of Cyber-Bullying Data • The Affect Analysis system used: – ML-Ask: 1. Emotiveness: 1. Determine whether utterance is emotive (0/1) 2. Calculate emotive value of an utterance (0-5) 3. Number of emotive utterances in conversation 4. Approx emotive value for all utterances 5. Determine number of emotiveness’ features: • Interjections • Exclamations • Vulgarities • Mimetic expressions
  • 20. Affect Analysis of Cyber-Bullying Data • The Affect Analysis system used: – ML-Ask: 2. Determines the types of emotions expressed: One of 10 emotion types said to be the most appropriate for the Japanese language: ki/yorokobi (joy, delight), do/ikari (anger), ai/aware (sadness, gloom), fu/kowagari (fear), chi/haji (shame, shyness), ko/suki (liking, fondness), en/iya (dislike), ko/takaburi (excitement), an/yasuragi (relief) and kyo/odoroki (surprise, amazement) Based on an emotive expression database A. Nakamura. Kanjo hyogen jiten [Dictionary of Emotive Expressions] (in Japanese), Tokyodo Publishing, Tokyo, 1993.
  • 21. Affect Analysis of Cyber-Bullying Data • Results
  • 22. Affect Analysis of Cyber-Bullying Data • Results Many moderate proofs: Harmful data is less emotive 1. Emotiveness: 1. Determine whether utterance is emotive 2. Calculate emotive value of an utterance 3. Number of emotive utterances in conversation 4. Approx emotive value for all utterances 5. Determine number of emotiveness’ features: • Interjections • Exclamations • Vulgarities • Mimetic expressions There are two distinctive features
  • 23. Affect Analysis of Cyber-Bullying Data • Results 2. Emotion types – More positive emotions in non-harmful data – Slightly more negative emotions in harmful data – Detailed analysis: fondness is often used in irony
  • 24. Conclusions • New problem: Cyber-Bullying • Prototype Machine Learning Method for Cyber- bullying Detection – Results not ideal, but somewhat encouraging • Affect Analysis of Cyber-Bullying Data – Cyber-bullying is less “emotive” (cold irony) – Distinctive features of CB: vulgarities, mimetic expressions – Expressions of emotions considered as positive are often used in ironic meaning
  • 25. Future Works • New vulgarities are created everyday – Create a method for extraction of vulgarities – Find a syntactic model of vulgar expression • Implement in to a web crawler automatically performing online patrol (e.g. for school web sites)
  • 26. Thank you for your attention.