The document describes Negobot, a conversational agent that uses game theory to detect paedophile behavior. Negobot poses as a child online to engage suspects in conversation while strategizing how to obtain information without scaring them off. It incorporates natural language processing, multiple chatbots, and an evaluation function to determine the conversation level and child-like responses. The goal is to identify paedophiles through analysis of conversations using this game theory approach.
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
In this paper, we proposed a XAI-based Language Learning Chatbot (namely XAI Language Tutor) by using ontology and transfer learning techniques. To facilitate three levels of language learning, XAI Language Tutor consists of three levels for systematically English learning, which includes: 1) phonetics level for speech recognition and pronunciation correction; 2) semantic level for specific domain conversation, and 3) simulation of “free-style conversation” in English - the highest level of language chatbot communication as “free-style conversation agent”. In terms of academic contribution, we implement the ontology graph to explain the performance of free-style conversation, following the concept of XAI (Explainable Artificial Intelligence) to visualize the connections of neural network in bionics, and explain the output sentence from language model. From implementation perspective, our XAI Language Tutor agent integrated the mini-program in WeChat as front-end, and fine-tuned GPT-2 model of transfer learning as back-end to interpret the responses by ontology graph.
All of our source codes have uploaded to GitHub: https://github.com/p930203110/EnglishLanguageRobot
This document presents research on using machine learning and affect analysis to detect cyber-bullying. It introduces cyber-bullying and describes a machine learning method used, including constructing a lexicon of harmful words, estimating word similarities, and classifying entries using SVM. Affect analysis of the data found cyber-bullying language is less emotive and uses vulgarities and mimetic expressions. Future work aims to improve detection of new vulgarities and implement automated online patrol.
𝐓𝐚𝐤𝐞 𝐚 𝐭𝐨𝐮𝐫: 𝐎𝐮𝐫 𝐥𝐚𝐭𝐞𝐬𝐭 𝐁𝐥𝐨𝐠 𝐢𝐬 𝐏𝐮𝐛𝐥𝐢𝐬𝐡𝐞𝐝 𝐧𝐨𝐰👉 The Powerful Landscape of Natural Language Processing.
Click: https://bit.ly/2UUeftt
NLP has changed the way we interact with machine and computers. 𝐖𝐡𝐚𝐭 𝐬𝐭𝐚𝐫𝐭𝐞𝐝 𝐚𝐬 𝐜𝐨𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐞𝐝, 𝐡𝐚𝐧𝐝𝐰𝐫𝐢𝐭𝐭𝐞𝐧 𝐟𝐨𝐫𝐦𝐮𝐥𝐚𝐬 is now a streamlined set of algorithms powered by AI.
𝐍𝐋𝐏 𝐭𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 will be the underlying force for transformation from data driven to intelligence driven endeavors, as they shape and improve communication technology in the years to come.
Hoax analyzer for Indonesian news using RNNs with fasttext and glove embeddingsjournalBEEI
Misinformation has become an innocuous yet potentially harmful problem ever since the development of internet. Numbers of efforts are done to prevent the consumption of misinformation, including the use of artificial intelligence (AI), mainly natural language processing (NLP). Unfortunately, most of natural language processing use English as its linguistic approach since English is a high resource language. On the contrary, Indonesia language is considered a low resource language thus the amount of effort to diminish consumption of misinformation is low compared to English-based natural language processing. This experiment is intended to compare fastText and GloVe embeddings for four deep neural networks (DNN) models: long short-term memory (LSTM), bidirectional long short-term memory (BI-LSTM), gated recurrent unit (GRU) and bidirectional gated recurrent unit (BI-GRU) in terms of metrics score when classifying news between three classes: fake, valid, and satire. The latter results show that fastText embedding is better than GloVe embedding in supervised text classification, along with BI-GRU + fastText yielding the best result.
In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages.
This document describes improvements made to an existing natural language processing algorithm for detecting phishing emails. The original algorithm analyzed sentence structures and word relationships but had low recall. The author evaluated its performance on 500 legitimate and 438 phishing emails, finding a precision of 100% but recall of only 12%. Improvements included optimizing performance, adapting the verb-noun pair blacklist to focus on phishing emails, and combining it with a link analysis program. This increased precision to 99% and recall to 73.5%. The combined approach analyzes both email content and links to better identify sophisticated phishing emails that contain no malicious links.
“C’mon – You Should Read This”: Automatic Identification of Tone from Languag...Waqas Tariq
Information extraction researchers have recently recognized that more subtle information beyond the basic semantic content of a message can be communicated via linguistic features in text, such as sentiments, emotions, perspectives, and intentions. One way to describe this information is that it represents something about the generator’s mental state, which is often interpreted as the tone of the message. A current technical barrier to developing a general-purpose tone identification system is the lack of reliable training data, with messages annotated with the message tone. We first describe a method for creating the necessary annotated data using human-based computation, based on interactive games between humans trying to generate and interpret messages conveying different tones. This draws on the use of game with a purpose methods from computer science and wisdom of the crowds methods from cognitive science. We then demonstrate the utility of this kind of database and the advantage of human-based computation by examining the performance of two machine learning classifiers trained on the database, each of which uses only shallow linguistic features. Though we already find near-human levels of performance with one classifier, we also suggest more sophisticated linguistic features and alternate implementations for the database that may improve tone identification results further.
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
This document provides an overview of natural language processing (NLP) and discusses several NLP applications. It introduces NLP and how it helps computers understand human language through examples like Apple's Siri and Google Now. It then summarizes popular NLP toolkits and describes applications including text summarization, information extraction, sentiment analysis, and dialog systems. The document concludes by discussing NLP system development, testing, and evaluation.
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
In this paper, we proposed a XAI-based Language Learning Chatbot (namely XAI Language Tutor) by using ontology and transfer learning techniques. To facilitate three levels of language learning, XAI Language Tutor consists of three levels for systematically English learning, which includes: 1) phonetics level for speech recognition and pronunciation correction; 2) semantic level for specific domain conversation, and 3) simulation of “free-style conversation” in English - the highest level of language chatbot communication as “free-style conversation agent”. In terms of academic contribution, we implement the ontology graph to explain the performance of free-style conversation, following the concept of XAI (Explainable Artificial Intelligence) to visualize the connections of neural network in bionics, and explain the output sentence from language model. From implementation perspective, our XAI Language Tutor agent integrated the mini-program in WeChat as front-end, and fine-tuned GPT-2 model of transfer learning as back-end to interpret the responses by ontology graph.
All of our source codes have uploaded to GitHub: https://github.com/p930203110/EnglishLanguageRobot
This document presents research on using machine learning and affect analysis to detect cyber-bullying. It introduces cyber-bullying and describes a machine learning method used, including constructing a lexicon of harmful words, estimating word similarities, and classifying entries using SVM. Affect analysis of the data found cyber-bullying language is less emotive and uses vulgarities and mimetic expressions. Future work aims to improve detection of new vulgarities and implement automated online patrol.
𝐓𝐚𝐤𝐞 𝐚 𝐭𝐨𝐮𝐫: 𝐎𝐮𝐫 𝐥𝐚𝐭𝐞𝐬𝐭 𝐁𝐥𝐨𝐠 𝐢𝐬 𝐏𝐮𝐛𝐥𝐢𝐬𝐡𝐞𝐝 𝐧𝐨𝐰👉 The Powerful Landscape of Natural Language Processing.
Click: https://bit.ly/2UUeftt
NLP has changed the way we interact with machine and computers. 𝐖𝐡𝐚𝐭 𝐬𝐭𝐚𝐫𝐭𝐞𝐝 𝐚𝐬 𝐜𝐨𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐞𝐝, 𝐡𝐚𝐧𝐝𝐰𝐫𝐢𝐭𝐭𝐞𝐧 𝐟𝐨𝐫𝐦𝐮𝐥𝐚𝐬 is now a streamlined set of algorithms powered by AI.
𝐍𝐋𝐏 𝐭𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 will be the underlying force for transformation from data driven to intelligence driven endeavors, as they shape and improve communication technology in the years to come.
Hoax analyzer for Indonesian news using RNNs with fasttext and glove embeddingsjournalBEEI
Misinformation has become an innocuous yet potentially harmful problem ever since the development of internet. Numbers of efforts are done to prevent the consumption of misinformation, including the use of artificial intelligence (AI), mainly natural language processing (NLP). Unfortunately, most of natural language processing use English as its linguistic approach since English is a high resource language. On the contrary, Indonesia language is considered a low resource language thus the amount of effort to diminish consumption of misinformation is low compared to English-based natural language processing. This experiment is intended to compare fastText and GloVe embeddings for four deep neural networks (DNN) models: long short-term memory (LSTM), bidirectional long short-term memory (BI-LSTM), gated recurrent unit (GRU) and bidirectional gated recurrent unit (BI-GRU) in terms of metrics score when classifying news between three classes: fake, valid, and satire. The latter results show that fastText embedding is better than GloVe embedding in supervised text classification, along with BI-GRU + fastText yielding the best result.
In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages.
This document describes improvements made to an existing natural language processing algorithm for detecting phishing emails. The original algorithm analyzed sentence structures and word relationships but had low recall. The author evaluated its performance on 500 legitimate and 438 phishing emails, finding a precision of 100% but recall of only 12%. Improvements included optimizing performance, adapting the verb-noun pair blacklist to focus on phishing emails, and combining it with a link analysis program. This increased precision to 99% and recall to 73.5%. The combined approach analyzes both email content and links to better identify sophisticated phishing emails that contain no malicious links.
“C’mon – You Should Read This”: Automatic Identification of Tone from Languag...Waqas Tariq
Information extraction researchers have recently recognized that more subtle information beyond the basic semantic content of a message can be communicated via linguistic features in text, such as sentiments, emotions, perspectives, and intentions. One way to describe this information is that it represents something about the generator’s mental state, which is often interpreted as the tone of the message. A current technical barrier to developing a general-purpose tone identification system is the lack of reliable training data, with messages annotated with the message tone. We first describe a method for creating the necessary annotated data using human-based computation, based on interactive games between humans trying to generate and interpret messages conveying different tones. This draws on the use of game with a purpose methods from computer science and wisdom of the crowds methods from cognitive science. We then demonstrate the utility of this kind of database and the advantage of human-based computation by examining the performance of two machine learning classifiers trained on the database, each of which uses only shallow linguistic features. Though we already find near-human levels of performance with one classifier, we also suggest more sophisticated linguistic features and alternate implementations for the database that may improve tone identification results further.
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
This document provides an overview of natural language processing (NLP) and discusses several NLP applications. It introduces NLP and how it helps computers understand human language through examples like Apple's Siri and Google Now. It then summarizes popular NLP toolkits and describes applications including text summarization, information extraction, sentiment analysis, and dialog systems. The document concludes by discussing NLP system development, testing, and evaluation.
Natural language processing using pythonPrakash Anand
Natural language processing (NLP) is concerned with interactions between computers and human languages. NLP analyzes text to handle tasks like summarization, translation, sentiment analysis, and topic segmentation. The Natural Language Toolkit (NLTK) is a Python library that provides tools for NLP tasks like tokenization, stemming, tagging, parsing, and classification. Tokenization is the process of splitting text into tokens or chunks. Bag-of-words is an algorithm that encodes text as numeric vectors, representing word presence or absence to allow machine learning on text data. NLP has applications in areas like spam filtering, chatbots, and sentiment analysis.
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.
Detecting the presence of cyberbullying using computer softwareAshish Arora
This document discusses methods for detecting cyberbullying using machine learning techniques. It proposes using a machine learning online patrol crawler that utilizes support vector machines to classify social media posts as containing cyberbullying or not. The crawler is trained on manually identified examples of cyberbullying comments involving negativity, profanity, and attacks on attributes. It also discusses using sentiment analysis and social graphs to identify bullies and victims on Twitter by analyzing tweets containing gender-based terms. The goal is to accurately classify sentiment and identify instances of bullying to increase visibility of the problem.
1) The document presents a framework called Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potentially offensive users in social media. It uses word lists, syntactic rules, and user profile features to analyze content and predict offensiveness at both the sentence and user level.
2) Experiments show the LSF framework performs significantly better than baseline methods in detecting offensive content and identifying offensive users, with over 70% recall and high precision.
3) The framework incorporates features like word strength, punctuation, capitalization, and message/user histories to analyze content in context and predict user behaviors, achieving better results than methods using only individual messages or basic machine learning.
This document introduces transfer learning and its importance for natural language processing (NLP). It discusses how transfer learning allows knowledge gained from one task or domain to be applied to another, similar to how humans learn. Large companies can train complex neural networks on vast datasets, but this requires massive resources. Transfer learning addresses this by enabling models pretrained on huge datasets to be fine-tuned and applied to new problems with far fewer resources. This paradigm has been key to advancing and democratizing NLP techniques.
Predicting cyber bullying on t witter using machine learningMirXahid1
The document discusses a project that aims to predict cyberbullying on Twitter using machine learning. The objectives are to use natural language processing techniques like sentiment analysis and part-of-speech tagging on tweets to extract features and train a machine learning model using these features to predict if a tweet contains cyberbullying or not. The proposed system collects tweets, converts emojis and audio to text, preprocesses the data by removing stop words and punctuation, and trains a recurrent neural network model for prediction. The current status is that the team has collected datasets and is working with them. Hardware and software requirements for the project are also outlined.
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...csandit
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to high-profile news. Such volatility, can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information.
Such mechanisms include statistical data only, without considering the collective feeling. This article, is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem
The document provides an overview of sentiment analysis and summarizes the current approaches used. It discusses how machine learning classifiers like Naive Bayes can be used for sentiment classification of texts, treating it as a two-class text classification problem. It also mentions the use of natural language processing techniques. The current system discussed will use machine learning and NLP for sentiment analysis of tweets, training classifiers on labeled tweet data to classify the polarity of new tweets.
Fairy Tale Writing Unit By Miss Teacher Tess TeachJody Sullivan
The document provides instructions for students to request assistance writing assignments from the HelpWriting.net website. It outlines a 5-step process: 1) Create an account; 2) Complete an order form with instructions and deadline; 3) Review bids from writers and select one; 4) Review the completed paper and authorize payment; 5) Request revisions until satisfied. The document emphasizes that original, high-quality work is guaranteed or a full refund will be provided.
ChatGPT is a natural language processing model created by OpenAI that can generate human-like responses to text-based conversations. It uses deep learning and was pre-trained on vast amounts of text to understand language. Performance is evaluated using metrics like perplexity, accuracy, fluency and human evaluation. While very powerful, ChatGPT and other large language models raise legal and ethical concerns regarding copyright, privacy, bias and how training data was obtained. The future potential includes using conversational AI to streamline operations in areas like data entry, scheduling and customer service.
The Auto response system for legal consultation will provide the knowledge of cyber laws. The objective is to implement the legal consultant system service by using chat bot Technology. It was implemented based on the information of the offence, previous records of cyber crimes and under sections of INDIAN IT ACT 2008 and their penalty all the records are gathered. User will input the offence then the chat bot will take the keyword from that offence and search for the law for that particular offence or crime and it will show the penalties, sections, and imprisonment for that offence or crime. Nikita Bhanushali | Shruti Habibkar | Sagar Shah | Amol Dhumal | Radhika Fulzele "Auto Response System for Legal Consultation" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31045.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31045/auto-response-system-for-legal-consultation/nikita-bhanushali
Using Generative AI in the Classroom .pptxJonathanDietz3
Here are some key ethical issues to consider when using generative AI like ChatGPT in the classroom:
1. Accuracy and reliability of information. Students may take generative AI outputs as fact without verifying the information. Teachers need to emphasize to students that AI systems can be wrong or generate implausible responses.
2. Bias and unfair treatment. As the systems are trained on human-created data, they risk perpetuating biases in that data if not developed carefully. Teachers should be aware of potential biases.
3. Privacy and consent. Student data used to improve systems raises privacy issues. Systems should not collect private student data without permission.
4. Authorship and ownership. It may not be clear
Predicting Discussions on the Social Semantic WebMatthew Rowe
This document discusses predicting discussions on social media platforms. It first notes the large amount of social data being published and then discusses how analysis of this data is currently limited. It proposes predicting which posts will start discussions ("seed posts") and how active those discussions will be. The document describes experiments using semantic features and ontologies to identify seed posts and predict discussion volume. Key findings include that user reputation, broadcast reach, and connections influence discussion likelihood and levels. The approach accurately predicts which posts will spark replies and how active discussions will be.
The document discusses developing an open domain chatbot using sequence modeling and machine translation techniques. It provides background on early rule-based chatbots and modern data-driven approaches. The proposed methodology collects data, performs word embeddings, uses an encoder-decoder model with attention to generate responses, and evaluates the model using metrics like F1 score.
This document summarizes a survey paper on chatbots. It discusses how chatbots can be used to relieve stress in adolescents through continuous dialogue that provides positive information and guidance. The proposed "HappySoul" chatbot system would act as a virtual friend to help stressed adolescents express their negative feelings and release stress. The technology at the core of such chatbots is natural language processing, recurrent neural networks, and a client-server architecture with an Android GUI. Key applications of chatbots discussed include assisting dementia patients, helping insomniacs, allowing marginalized communities to provide feedback, and making medical diagnoses faster.
This document summarizes a survey paper on chatbots. It discusses how chatbots use natural language processing to understand user queries and generate responses in a conversational manner. The document outlines the methodology of the proposed chatbot, which uses a client-server architecture with an Android application as the front-end and a recurrent neural network on the server to process inputs and generate outputs. The chatbot is intended to help relieve stress and anxiety in adolescents through positive conversations.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Manyata Tech Park Bangalore_ Infrastructure, Facilities and Morenarinav14
Located in the bustling city of Bangalore, Manyata Tech Park stands as one of India’s largest and most prominent tech parks, playing a pivotal role in shaping the city’s reputation as the Silicon Valley of India. Established to cater to the burgeoning IT and technology sectors
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Natural language processing using pythonPrakash Anand
Natural language processing (NLP) is concerned with interactions between computers and human languages. NLP analyzes text to handle tasks like summarization, translation, sentiment analysis, and topic segmentation. The Natural Language Toolkit (NLTK) is a Python library that provides tools for NLP tasks like tokenization, stemming, tagging, parsing, and classification. Tokenization is the process of splitting text into tokens or chunks. Bag-of-words is an algorithm that encodes text as numeric vectors, representing word presence or absence to allow machine learning on text data. NLP has applications in areas like spam filtering, chatbots, and sentiment analysis.
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.
Detecting the presence of cyberbullying using computer softwareAshish Arora
This document discusses methods for detecting cyberbullying using machine learning techniques. It proposes using a machine learning online patrol crawler that utilizes support vector machines to classify social media posts as containing cyberbullying or not. The crawler is trained on manually identified examples of cyberbullying comments involving negativity, profanity, and attacks on attributes. It also discusses using sentiment analysis and social graphs to identify bullies and victims on Twitter by analyzing tweets containing gender-based terms. The goal is to accurately classify sentiment and identify instances of bullying to increase visibility of the problem.
1) The document presents a framework called Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potentially offensive users in social media. It uses word lists, syntactic rules, and user profile features to analyze content and predict offensiveness at both the sentence and user level.
2) Experiments show the LSF framework performs significantly better than baseline methods in detecting offensive content and identifying offensive users, with over 70% recall and high precision.
3) The framework incorporates features like word strength, punctuation, capitalization, and message/user histories to analyze content in context and predict user behaviors, achieving better results than methods using only individual messages or basic machine learning.
This document introduces transfer learning and its importance for natural language processing (NLP). It discusses how transfer learning allows knowledge gained from one task or domain to be applied to another, similar to how humans learn. Large companies can train complex neural networks on vast datasets, but this requires massive resources. Transfer learning addresses this by enabling models pretrained on huge datasets to be fine-tuned and applied to new problems with far fewer resources. This paradigm has been key to advancing and democratizing NLP techniques.
Predicting cyber bullying on t witter using machine learningMirXahid1
The document discusses a project that aims to predict cyberbullying on Twitter using machine learning. The objectives are to use natural language processing techniques like sentiment analysis and part-of-speech tagging on tweets to extract features and train a machine learning model using these features to predict if a tweet contains cyberbullying or not. The proposed system collects tweets, converts emojis and audio to text, preprocesses the data by removing stop words and punctuation, and trains a recurrent neural network model for prediction. The current status is that the team has collected datasets and is working with them. Hardware and software requirements for the project are also outlined.
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...csandit
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to high-profile news. Such volatility, can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information.
Such mechanisms include statistical data only, without considering the collective feeling. This article, is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem
The document provides an overview of sentiment analysis and summarizes the current approaches used. It discusses how machine learning classifiers like Naive Bayes can be used for sentiment classification of texts, treating it as a two-class text classification problem. It also mentions the use of natural language processing techniques. The current system discussed will use machine learning and NLP for sentiment analysis of tweets, training classifiers on labeled tweet data to classify the polarity of new tweets.
Fairy Tale Writing Unit By Miss Teacher Tess TeachJody Sullivan
The document provides instructions for students to request assistance writing assignments from the HelpWriting.net website. It outlines a 5-step process: 1) Create an account; 2) Complete an order form with instructions and deadline; 3) Review bids from writers and select one; 4) Review the completed paper and authorize payment; 5) Request revisions until satisfied. The document emphasizes that original, high-quality work is guaranteed or a full refund will be provided.
ChatGPT is a natural language processing model created by OpenAI that can generate human-like responses to text-based conversations. It uses deep learning and was pre-trained on vast amounts of text to understand language. Performance is evaluated using metrics like perplexity, accuracy, fluency and human evaluation. While very powerful, ChatGPT and other large language models raise legal and ethical concerns regarding copyright, privacy, bias and how training data was obtained. The future potential includes using conversational AI to streamline operations in areas like data entry, scheduling and customer service.
The Auto response system for legal consultation will provide the knowledge of cyber laws. The objective is to implement the legal consultant system service by using chat bot Technology. It was implemented based on the information of the offence, previous records of cyber crimes and under sections of INDIAN IT ACT 2008 and their penalty all the records are gathered. User will input the offence then the chat bot will take the keyword from that offence and search for the law for that particular offence or crime and it will show the penalties, sections, and imprisonment for that offence or crime. Nikita Bhanushali | Shruti Habibkar | Sagar Shah | Amol Dhumal | Radhika Fulzele "Auto Response System for Legal Consultation" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31045.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31045/auto-response-system-for-legal-consultation/nikita-bhanushali
Using Generative AI in the Classroom .pptxJonathanDietz3
Here are some key ethical issues to consider when using generative AI like ChatGPT in the classroom:
1. Accuracy and reliability of information. Students may take generative AI outputs as fact without verifying the information. Teachers need to emphasize to students that AI systems can be wrong or generate implausible responses.
2. Bias and unfair treatment. As the systems are trained on human-created data, they risk perpetuating biases in that data if not developed carefully. Teachers should be aware of potential biases.
3. Privacy and consent. Student data used to improve systems raises privacy issues. Systems should not collect private student data without permission.
4. Authorship and ownership. It may not be clear
Predicting Discussions on the Social Semantic WebMatthew Rowe
This document discusses predicting discussions on social media platforms. It first notes the large amount of social data being published and then discusses how analysis of this data is currently limited. It proposes predicting which posts will start discussions ("seed posts") and how active those discussions will be. The document describes experiments using semantic features and ontologies to identify seed posts and predict discussion volume. Key findings include that user reputation, broadcast reach, and connections influence discussion likelihood and levels. The approach accurately predicts which posts will spark replies and how active discussions will be.
The document discusses developing an open domain chatbot using sequence modeling and machine translation techniques. It provides background on early rule-based chatbots and modern data-driven approaches. The proposed methodology collects data, performs word embeddings, uses an encoder-decoder model with attention to generate responses, and evaluates the model using metrics like F1 score.
This document summarizes a survey paper on chatbots. It discusses how chatbots can be used to relieve stress in adolescents through continuous dialogue that provides positive information and guidance. The proposed "HappySoul" chatbot system would act as a virtual friend to help stressed adolescents express their negative feelings and release stress. The technology at the core of such chatbots is natural language processing, recurrent neural networks, and a client-server architecture with an Android GUI. Key applications of chatbots discussed include assisting dementia patients, helping insomniacs, allowing marginalized communities to provide feedback, and making medical diagnoses faster.
This document summarizes a survey paper on chatbots. It discusses how chatbots use natural language processing to understand user queries and generate responses in a conversational manner. The document outlines the methodology of the proposed chatbot, which uses a client-server architecture with an Android application as the front-end and a recurrent neural network on the server to process inputs and generate outputs. The chatbot is intended to help relieve stress and anxiety in adolescents through positive conversations.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Manyata Tech Park Bangalore_ Infrastructure, Facilities and Morenarinav14
Located in the bustling city of Bangalore, Manyata Tech Park stands as one of India’s largest and most prominent tech parks, playing a pivotal role in shaping the city’s reputation as the Silicon Valley of India. Established to cater to the burgeoning IT and technology sectors
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...Luigi Fugaro
Vector databases are transforming how we handle data, allowing us to search through text, images, and audio by converting them into vectors. Today, we'll dive into the basics of this exciting technology and discuss its potential to revolutionize our next-generation AI applications. We'll examine typical uses for these databases and the essential tools
developers need. Plus, we'll zoom in on the advanced capabilities of vector search and semantic caching in Java, showcasing these through a live demo with Redis libraries. Get ready to see how these powerful tools can change the game!
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Transforming Product Development using OnePlan To Boost Efficiency and Innova...OnePlan Solutions
Ready to overcome challenges and drive innovation in your organization? Join us in our upcoming webinar where we discuss how to combat resource limitations, scope creep, and the difficulties of aligning your projects with strategic goals. Discover how OnePlan can revolutionize your product development processes, helping your team to innovate faster, manage resources more effectively, and deliver exceptional results.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
Photoshop Tutorial for Beginners (2024 Edition)alowpalsadig
Photoshop Tutorial for Beginners (2024 Edition)
Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."
Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
Photoshop Tutorial for Beginners (2024 Edition)Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
The importance of developing and designing programming in 2024
Programming design and development represents a vital step in keeping pace with technological advancements and meeting ever-changing market needs. This course is intended for anyone who wants to understand the fundamental importance of software development and design, whether you are a beginner or a professional seeking to update your knowledge.
Course objectives:
1. **Learn about the basics of software development:
- Understanding software development processes and tools.
- Identify the role of programmers and designers in software projects.
2. Understanding the software design process:
- Learn about the principles of good software design.
- Discussing common design patterns such as Object-Oriented Design.
3. The importance of user experience (UX) in modern software:
- Explore how user experience can improve software acceptance and usability.
- Tools and techniques to analyze and improve user experience.
4. Increase efficiency and productivity through modern development tools:
- Access to the latest programming tools and languages used in the industry.
- Study live examples of applications
DevOps Consulting Company | Hire DevOps Servicesseospiralmantra
Spiral Mantra excels in providing comprehensive DevOps services, including Azure and AWS DevOps solutions. As a top DevOps consulting company, we offer controlled services, cloud DevOps, and expert consulting nationwide, including Houston and New York. Our skilled DevOps engineers ensure seamless integration and optimized operations for your business. Choose Spiral Mantra for superior DevOps services.
https://www.spiralmantra.com/devops/
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/256194568
Negobot: A Conversational Agent Based on Game Theory for the Detection of
Paedophile Behaviour
Chapter · January 2013
DOI: 10.1007/978-3-642-33018-6_27
CITATIONS
6
READS
85
6 authors, including:
Some of the authors of this publication are also working on these related projects:
DANTE- Detecting and analysing terrorist-related online contents and financing activities View project
Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying View project
Carlos Laorden
University of Deusto
47 PUBLICATIONS 560 CITATIONS
SEE PROFILE
Patxi Galán-García
University of Deusto
12 PUBLICATIONS 97 CITATIONS
SEE PROFILE
Igor Santos
University of Deusto
110 PUBLICATIONS 1,184 CITATIONS
SEE PROFILE
Borja Sanz
University of Deusto
42 PUBLICATIONS 500 CITATIONS
SEE PROFILE
All content following this page was uploaded by Patxi Galán-García on 30 March 2016.
The user has requested enhancement of the downloaded file.
2. Negobot: A conversational agent based on game
theory for the detection of paedophile behaviour
Carlos Laorden1
, Patxi Gal´an-Garc´ıa1
, Igor Santos1
, Borja Sanz1
, Jose Mar´ıa
G´omez Hidalgo2
and Pablo G. Bringas1
1
S3
Lab - DeustoTech Computing, University of Deusto
Avenida de las Universidades 24, 48007 Bilbao, Spain
{claorden,patxigg,isantos,borja.sanz,pablo.garcia.bringas}@deusto.es
2
Optenet
Madrid, Spain
jgomez@optenet.com
Abstract. Children have been increasingly becoming active users of
the Internet and, although any segment of the population is suscepti-
ble to falling victim to the existing risks, they in particular are one of
the most vulnerable. Thus, some of the major scourges of this cyber-
society are paedophile behaviours on the Internet, child pornography or
sexual exploitation of children. In light of this background, Negobot is
a conversational agent posing as a child, in chats, social networks and
other channels suffering from paedophile behaviour. As a conversational
agent, Negobot, has a strong technical base of Natural Language Pro-
cessing and information retrieval, as well as Artificial Intelligence and
Machine Learning. However, the most innovative proposal of Negobot
is to consider the conversation itself as a game, applying game theory.
In this context, Negobot proposes, first, a competitive game in which
the system identifies the best strategies for achieving its goal, to obtain
information that leads us to infer if the subject involved in a conversa-
tion with the agent has paedophile tendencies, while our actions do not
bring the alleged offender to leave the conversation due to a suspicious
behaviour of the agent.
Keywords: conversational agent, game theory, natural language pro-
cessing
1 Introduction
Children have been turning into active users of the Internet and, despite every
segment of the population is susceptible to become a victim of the existing risks,
they in particular are the most vulnerable. In fact, one of the major scourges on
this cyber-society is paedophile behaviour, with examples such as child pornog-
raphy or sexual exploitation.
Some researchers have tried to approach this problem with automatic pae-
dophile behaviour identifiers able to analyse children and adult conversations us-
3. ing automatic classifiers either encoding manual rules [1] or by Machine Learning
[2]. These kind of conversation analysers are also present in commercial systems1
.
In this context, “Negobot: A conversational agent based on game theory for
the detection of paedophile behaviour” seeks to identify these types of actions
against children. With this particular goal in mind, our system is a chatter bot
that poses as a kid in chats, social networks and similar services on the Internet.
Because Negobot is a chatter bot, it uses natural language processing (NLP), in-
formation retrieval (IR) and Automatic Learning. However, the most innovative
proposal of Negobot is to apply game theory by considering the conversation
a game. Our system observes the conversation as a competition where our ob-
jective is to obtain as much information as possible in order to decide whether
the subject who is talking with Negobot has paedophile tendencies or not. At
the same time, our actions (e.g., phrases and questions) may lead to the alleged
aggressor to leave the conversation or even to act discretely.
As a major difference with others work, Negobot involves both a bot resem-
bling the behaviour of a child and acting as a “hook” to Internet predators, and
a game theory-based strategy for identifying suspicious speakers. Summarising,
the main contributions of the Negobot system are:
– A structure of seven chatter-bots with different behaviours, reflecting the
different ways of acting depending on the conversation state.
– A system to identify and adapt patterns within the conversations to maintain
conversation flows closer to conversation flows between real persons.
– An evaluation function to classify the current conversation, in real time,
to provide the chatter bot with specific information to follow an adequate
conversation flow.
2 Negobot architecture
Negobot includes the use of different NLP techniques, chatter-bot technologies
and game theory for the strategical decision making. Finally, the glue that binds
them all is an evaluation function, which in fact determines how the child emu-
lated by the conversational agent behaves.
When a new subject starts a conversation with Negobot the system is acti-
vated, and starts monitoring the input from the user. Besides, Negobot registers
the conversations maintained with every user for future references, and to keep
a record that could be sent to the authorities in case of determining that the
subject is a paedophile. Fig. 1 offers a functional flow of Negobot.
2.1 Processing the conversation
Our AI system’s knowledge came from the website Perverted Justice2
. This
website offers an extensive database of paedophile conversations with victims,
1
E.g. Crisp Thinking provides such a service to online communities: http://www.
crispthinking.com/
2
http://www.perverted-justice.com
4. Fig. 1. Functional flow of Negobot.
used in other works [2, 3, 1]. A total of 377 real conversations were chosen to
populate our database (henceforth they will be called assimilated conversations).
Besides, Perverted Justice users provide an evaluation of each conversation’s
seriousness by selecting a level of “slimyness” (e.g., dungy, revolting thing). Note
that this evaluation is given by the website’s visitors, so it may not be accurate,
but we consider that it is a proper baseline in order to compare future conver-
sations of the chatter-bot.
2.2 Chat-bot conversation process
Every time the user introduces some text our system gathers the input data,
from now on “Conversational Unit” (CU), and character and word repetitions
are removed. Then, “emoticons” are replaced and misspelled words are corrected
through Levenshtein distance [6]. To translate SMS-like words, we use diccionar-
ioSMS3
, a website supported by MSN, Movistar, Vodafone and Lleida.net. Ne-
gobot also recognises named entities relying on a database of personal nouns4
.
3
www.diccionariosms.com
4
www.planetamama.com.ar
5. Next, the CU is translated into English if the input language is different.
This translation is performed with Google’s Translation Service5
. The decision
of translating resides on the intention of normalising Negobot’s knowledge base
to English, to be able to scale the system to other languages.
In the next step the system queries Lucene6
, a high-performance Information
Retrieval tool, to stablish how similar is the CU to the assimilated conversations,
returning a similarity rank. Based on that query, we retrieve the conversations
surpassing an empirically-defined similarity threshold. The final result is then
used to calculate the B value of the evaluation function (see Section 2.4).
Question-answering patterns Negobot uses the Artificial Intelligence Markup
Language (AIML) to provide the bot with the capacity of giving consistent an-
swers and, also, the ability to be an active part in the conversation and to start
new topics or discussions about the subject’s answers.
Although the AIML structure is based on the Galaia project7
, which has
successfully implanted derived projects in social networks and chat systems [4,
5, 7], we edited their AIML files to adequate them to our needs. Those files can
be found at the authors’ website8
.
2.3 Applying game theory
Applied game theory intends to provide a solution for a large number of different
problems through a deep analysis of the situation. Modelling these problems as
a game implies that there exist two or more teams or players and that the result
of the actions of one player depends on the actions taken by the other teams. It
is important to notice that game theory itself does not teach to play the game
but shows general strategies for competitive situations. The Negobot system
assumes that there are two players in a conversation: the alleged paedophile and
the system itself.
Goal The main goal of the Negobot system is to gather, through the conver-
sation, the maximum amount of information in order to evaluate it afterwards.
This evaluation will determine whether the subject is a paedophile or not, to
communicate, in the latter case, to pertinent authorities.
Conversation level The conversation level depends on the input data from the
subject and determines which action should be taken by the system. There are
seven levels consisting of seven different stages. In each stage, the subject talking
with the bot is evaluated. The result of the evaluation will determine whether
the subject that is talking is an alleged paedophile or not, depending on the
content of the conversation and changing or not the level of the conversation.
5
http://translate.google.com
6
http://lucene.apache.org/
7
http://papa.det.uvigo.es/ galaia/ES/
8
http://paginaspersonales.deusto.es/patxigg/recursos/NegobotAIML.zip
6. – Initial state (Start level or Level 0). In this level, the conversation has
started recently or it is within the fixed limits. The user can stay indefinitely
in this level if the conversation does not contain disturbing content. The
topics of conversation are trivial and the provided information about the
bot is brief: only the name, age, gender and home-town. The bot does not
provide more personal information until higher levels.
– Possibly not (Level -1). In this level, the subject talking to the bot, does
not want to continue the conversation. Since this is the first negative level,
the bot will try to reactivate the conversation. To this end, the bot will ask
for help about family issues, bullying or other types of adolescent problems.
– Probably not (Level -2). In this level, the user is too tired about the
conversation and his language and ways to leave it are less polite than before.
The conversation is almost lost. The strategy in this stage is to act as a victim
to which nobody pays any attention, looking for affection from somebody.
– Is not a paedophile (Level -3). In this level, the subject has stopped
talking to the bot. The strategy in this stage is to look for affection in
exchange for sex. We decided this strategy because a lot of paedophiles try
to hide themselves to not get caught.
– Possibly yes (Level +1). In this level, the subject shows interest in
the conversation and asks about personal topics. The topics of the bot are
favourite films, music, personal style, clothing, drugs and alcohol consump-
tion and family issues. The bot is not too explicit in this stage.
– Probably yes (Level +2). In this level, the subject continues interested
in the conversation and the topics become more private. Sex situations and
experiences appear in the conversation and the bot does not avoid talking
about them. The information is more detailed and private than before be-
cause we have to make the subject believe that he/she owns a lot of personal
information for blackmailing. After reaching this level, it cannot decrease
again.
– Allegedly paedophile (Level +3). In this level, the system determines
that the user is an actual paedophile. The conversations about sex becomes
more explicit. Now, the objective is to keep the conversation active to gather
as much information as possible. The information in this level is mostly
sexual. The strategy in this stage is to give all the private information of
the child simulated by the bot. After reaching this level, it cannot decrease
again.
Actions The Negobot system has three sensors, three actuators and three dif-
ferent actions to take to obtain its objective.
1. Sensors: (i) Knowledge of the current level, (ii) knowledge of the complete
current conversation, (iii) assimilated conversations by the AI system.
2. Actuators: (i) The accusation level goes up, (ii) the accusation level goes
down, (iii) the accusation level is maintained.
3. Actions: (i) Level goes up, this action increases the accusation level of
the subject to modify the type of conversation by adding more personal
7. information; (ii) level goes down, this action decreases the accusation level
of the subject to change the type of conversation by adding more tentative
information; (iii) maintain level, this action maintains the accusation level
of the user to leave the current conversation type.
The consequences after each action are unknown beforehand because the sys-
tem does not know how the other player will response to each answer, question,
affirmation or negation.
Strategy The aim of the system is to reach a final state while extracting the
highest amount of information. The state of the system is determined both by the
conversation and the conversation topic. The environment is partially observable
since the chat-bot does not know what will the subject answer or question.
Besides, the environment is stochastic because the conversations are composed
only of questions, answers, affirmations and negations and when the user is
writing one sentence, the environment does not change.
Furthermore, the system utilises an evaluation function (refer to Section 2.4)
to analyse the answers. This analysis starts by evaluating the conversation level.
Then, it generates the answer and, finally, communicates with the subject. This
task had to be performed within a coherent and variable time so that it resembled
the writing way of a child. To this end, the system calculates a waiting time for
the response based on the number of words in the answer and an estimated
child’s writing speed.
2.4 Evaluation function
The evaluation function determines the bot’s behaviour according to the evalua-
tion of the sentences that the subject introduces, and determines, in real time, its
level of paedophilia. This function is also responsible for evaluating the evolution
of the conversation (i.e., if the level maintains, goes up or down).
We defined the function as,
f(x) = α + β + γ (1)
α represents the historic values of the conversations with this user. This
variable is defined as the sum of the levels of the previous conversations, divided
by the number of total conversations. It is defined as,
n=x−1
n=1 f(x)/n, where n
is the number of conversations the current user has maintained with Negobot.
β represents the “slimyness” of the current CU. We calculated the average
value of the result of multiplying the similarity score of each of the retrieved
paedophiles’ conversations — the ones that surpass a fixed value of similarity —
by its “slimyness” value,
n=x
n=1 score ∗ slimyness/|ACl|, where ACl is the total
number of assimilated conversations.
γ evaluates the temporality of the subject’s conversations (i.e., how frequent
the user talks with Negobot). There are two possible values for γ, when β = 0
and when β > 0:
8. 1. Value for γ if β = 0
– The subtraction between the “slimyness” of the CU and the total CUs of
the subject. This result is divided by the subtraction, in hours, between the
last maintained CU and the first maintained CU: Ncu − Tcu/Tlast − Tfirst
2. Value for γ if β > 0
– The number of CUs that surpass the “slimyness” threshold divided by
the subtraction, in hours, between the last maintained CU and the first
maintained CU: Ncu> threshold/Tlast − Tfirst
3 Examples of conversations
This section presents some sample conversations with Negobot. It must be noted
that the system was developed to analyse extensive conversations because the
most dangerous paedophiles are too cautious. In our experiments we show some
tests in a controlled environment and with a short extension (see Figure 2 and
Figure 39
). Also note that the evaluation function does not depend only on the
CU, historical conversation values and time are also taken into consideration.
For these experiments we performed two conversations: one aggressive con-
versation and one passive conversation.
– Aggressive conversation. The subject who maintained this conversation
was asked to add explicit questions about sex related topics with the objec-
tive to obtain sexual information. Despite usually paedophiles use a more
cautious approach, we needed to provide a short conversation talking about
sex to show how the level of suspicion raises. In this conversation the bot is
asked about sensitive unknown information, such as virginity or porn.
– Passive conversation. In this conversation the subject starts a conversa-
tion and when the age of Negobot is known he/she tries to end it. Negobot
then asks the reason why the subject is leaving and after 10 minutes of not
having a response tries to restart the conversation.
The conversations are in Spanish since its the language the system was de-
signed for, but variables have been translated and visual guidance is provided to
understand the results.
4 Discussion and future work
More and more children are connected nowadays to the Internet. Although this
medium provides a lot of important advantages, it also provides the anonymity
that paedophiles use to harass children. A rapid identification of this type of users
on the Internet is crucial. Systems able to identify those menaces are going to
be important protecting this less prepared population segment on the Internet.
9
In these examples, ottoproject is the Negobot system and Patxi is a user that emu-
lates the different behaviours
9. Fig. 2. Aggressive conversation
Therefore, we consider that the Negobot system and its future updates are
going to be useful to the authorities to detect and identify paedophile behaviours.
The Negobot project and existing initiatives like PROTEGELES10
can achieve
a safer Internet.
However, the proposed system has several limitations. First, despite current
translation systems are good, they are far to be perfect. Therefore, the lan-
guage is one of the most important issues. To solve it, we should obtain already
classified conversations in other languages, in this particular case Spanish con-
versations. Besides, the subsystem that adapts the way of speaking (i.e., child
10
http://www.protegeles.com/
10. Fig. 3. Passive conversation
behaviour) should be improved. To this end, we will perform a further analysis
of how young people speak on the Internet. Finally, there are some limitations
11. regarding how the system determines the change of a topic. They are intrinsic
to the language, and its solution is not simple.
The future work of the Negobot system is oriented in three main directions.
First, we will generate a net of collaborative agents able to achieve a common
goal in a collaborative way, in which agent will seek an optimal common equi-
librium (Nash equilibrium). Second, we will add more NLP techniques like word
sense disambiguation or opinion mining to improve the understanding of the
bot. Also, we will try to upgrade the question-answering patterns system. Fi-
nally, we will adapt the Negobot system for social networks, chat rooms, and
similar environments.
Acknowledgments
We would like to acknowledge the Department of Education, Universities and
Research of the Basque Government for their support in this research project
through the Plan +Euskadi 09 and more specifically for the financing of the
project “Negobot: Un agente conversacional basado en teora de juegos para la
deteccin de conductas pedfilas”. UNESCO Cod 120304. Area Cod Tecnol´ogica
3304. Science Area Code330499.
References
1. Kontostathis, A., Edwards, L., Bayzick, J., McGhee, I., Leatherman, A., Moore, K.:
Comparison of Rule-based to Human Analysis of Chat Logs. communication theory
8, 2 (2009)
2. Kontostathis, A., Edwards, L., Leatherman, A.: ChatCoder: Toward the tracking
and categorization of internet predators. In: Proceedings of the 7th Text Mining
Workshop (2009)
3. Kontostathis, A., Edwards, L., Leatherman, A.: Text Mining and Cybercrime (2010)
4. Mikic, F., Burguillo, J., Llamas, M.: TQ-Bot: An AIML-based Tutor and Evalu-
ator Bot. Journal of Universal Computer Science (JUCS), Verlag der Technischen
Universit¨at Graz, Austria 15(7) (2010)
5. Mikic, F., Burguillo, J., Llamas, M., Rodr´ıguez, D., Rodr´ıguez, E.: CHARLIE: An
AIML-based Chatterbot which Works as an Interface among INES and Humans.
In: EAEEIE Annual Conference, 2009. pp. 1–6. IEEE (2009)
6. Okuda, T., Tanaka, E., Kasai, T.: A method for the correction of garbled words
based on the Levenshtein metric. Computers, IEEE Transactions on 100(2), 172–
178 (1976)
7. Rodriguez, E., Burguillo, J., Rodriguez, D., Mikic, F., Gonzalez-Moreno, J., Novegil,
V.: Developing virtual teaching assistants for open e-learning platforms. In: EAEEIE
Annual Conference, 2008 19th. pp. 193–198. IEEE (2008)
View publication statsView publication stats