1) The document presents a framework called Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potentially offensive users in social media. It uses word lists, syntactic rules, and user profile features to analyze content and predict offensiveness at both the sentence and user level.
2) Experiments show the LSF framework performs significantly better than baseline methods in detecting offensive content and identifying offensive users, with over 70% recall and high precision.
3) The framework incorporates features like word strength, punctuation, capitalization, and message/user histories to analyze content in context and predict user behaviors, achieving better results than methods using only individual messages or basic machine learning.
A presentation on Image Recognition, the basic definition and working of Image Recognition, Edge Detection, Neural Networks, use of Convolutional Neural Network in Image Recognition, Applications, Future Scope and Conclusion
Cyber Security.
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
A presentation on Image Recognition, the basic definition and working of Image Recognition, Edge Detection, Neural Networks, use of Convolutional Neural Network in Image Recognition, Applications, Future Scope and Conclusion
Cyber Security.
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
I created this slide show for Middle and High school students to help educate them about cyberbullying and how it can start out so innocently, and become so very hurtful. I hope you will be able to use parts or all of this presentation with your students.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Smart detection of offensive words in social media using the soundex algorith...IJECEIAES
Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words.
Insult detection using a partitional CNN-LSTM modelCSITiaesprime
Recently, deep learning has been coupled with notice- able advances in natural language processing (NLP) related research. In this work, we propose a general framework to detect verbal offense in social networks comments. We introduce a partitional CNN-LSTM architecture in order to automatically recognize ver- bal offense patterns in social network comments. Specifically, we use a partitional CNN along with a LSTM model to map the social network comments into two predefined classes. In particular, rather than considering a whole document/comments as input as performed using typical CNN, we partition the comments into parts in order to capture and weight the locally relevant information in each partition. The resulting local information is then sequentially exploited across partitions using LSTM for verbal offense detection. The combination of the partitional CNN and LSTM yields the integration of the local within comments information and the long distance correlation across comments. The proposed approach was assessed using real dataset, and the obtained results proved that our solution outperforms existing relevant solutions.
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...dbpublications
As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn a robust and discriminative representation of text. Comprehensive experiments on two public cyberbullying corpora (Twitter and MySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods..
Hate speech has been an ongoing problem on the Internet for many years. Besides, social media, especially Facebook, and Twitter have given it a global stage where those hate speeches can spread far more rapidly. Every social media platform needs to implement an effective hate speech detection system to remove offensive content in real-time. There are various approaches to identify hate speech, such as Rule-Based, Machine Learning based, deep learning based and Hybrid approach. Since this is a review paper, we explained the valuable works of various authors who have invested their valuable time in studying to identifying hate speech using various approaches.
Due to the fast growth of World Wide Web the online communication has increased. In recent times the communication focus has shifted to social networking. In order to enhance the text methods of communication such as tweets, blogs and chats, it is necessary to examine the emotion of user by studying the input text. Online reviews are posted by customers for the products and services on offer at a website portal. This has provided impetus to substantial growth of online purchasing making opinion analysis a vital factor for business development. To analyze such text and reviews sentiment analysis is used. Sentiment analysis is a sub domain of Natural Language Processing which acquires writer’s feelings about several products which are placed on the internet through various comments or posts. It is used to find the opinion or response of the user. Opinion may be positive, negative or neutral. In this paper a review on sentiment analysis is done and the challenges and issues involved in the process are discussed. The approaches to sentiment analysis using dictionaries such as SenticNet, SentiFul, SentiWordNet, and WordNet are studied. Dictionary-based approaches are efficient over a domain of study. Although a generalized dictionary like WordNet may be used, the accuracy of the classifier get affected due to issues like negation, synonyms, sarcasm, etc.
w
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
There are many malicious programs disbursing on Face book every single day. Within the recent occasions, online hackers have thought about recognition within the third-party application platform additionally to deployment of malicious programs. Programs that present appropriate method of online hackers to spread malicious content on Face book however, little is known concerning highlights of malicious programs and just how they function. Our goal ought to be to create a comprehensive application evaluator of face book the very first tool that will depend on recognition of malicious programs on Face book. To develop rigorous application evaluator of face book we utilize information that's collected by way of observation of posting conduct of Face book apps that are seen across numerous face book clients. This can be frequently possibly initial comprehensive study which has dedicated to malicious Face book programs that concentrate on quantifying additionally to knowledge of malicious programs making these particulars in to a effective recognition method. For structuring of rigorous application evaluator of face book, we utilize data within the security application within Facebook that examines profiles of Facebook clients.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Arabic Sentiment analysis research field has been progressing in a slow pace compared to English and other languages. In addition to that most of the contributions are based on using supervised machine learning algorithms while comparing the performance of different classifiers with different selected stylistic and syntactic features. In this paper, we presented a novel framework for using the Concept-level sentiment analysis approach which classifies text based on their semantics rather than syntactic features. Moreover, we provided a lexicon dataset of around 69 k unique concepts that covers multi-domain reviews collected from the internet. We also tested the lexicon on a test sample from the dataset it was collected from and obtained an accuracy of 70%. The lexicon has been made publicly available for scientific purposes.
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxResearchWap
Language is a distinctive quality unique to man. It is what enables man to express him/herself and communicate with his/her fellow man, and it is acquired naturally. According to Fromkin et al “…language is the source of human life and power” (3). They also state that “we use language to convey information to others…, ask questions…, give command…, and express wishes” (173). There are two specific media of using language: oral – which is by words of mouth; and written – which is a graphic representation of words on paper.
It is in the use of language that style comes in. Style shows the difference between one piece of writing and the other. According to Adejare, “style is an ambiguous term…” (1). He further states that the term style means different things to different professions. Some examples are: to a psychologist, a style is a form of behaviour, to the critic, style is individuality and to the linguist, it is the formal structures in function (1).
Stylistics is the study of oral and written texts. It is the description of the linguistic characteristics (which means features of linguistics) of all situationally restricted uses of language. Linguistics is the scientific study of language or of a particular language. Linguistics is scientific because it applies the method of objective observation, collection, classification and application of facts to the study of language.
Stylistics focuses on texts and gives much attention to the devices, parts of speech and figures of speech. It goes further to look into the effects of the use of the devices on the reader.
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxResearchWap
Language is one of the most complex of all human-specific phenomena. Its convolutions of parts and meanings. It goes beyond its semiotic possibility of conveying information at a communicative level to have an art form that exists by it alone which is known as the literary art.
At the communicative level, it involves other tools to aid interlocution namely voice modulation and pitch, gesticulations which for the sake of this study include facial expressions and feedback from the other person for the clarification of meanings and understanding. At the interpersonal level, language is always based on contextual sense-making as the complexity of language always bears upon every utterance.
Remove the verbal and personal arrangement of this semiotic speech act and all the other tools for sense-making to go with it. So that one runs the risk of being misunderstood which defeats the aim of conversations at all levels. However, with the advancement of technology especially in the telecommunications sector, people now rely much on texting and instant messaging platforms are becoming more and more popular across social classes and with this popularity comes the need for its acceptance by formal and informal purposes.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
1. Detecting Offensive Language in Social Media to Protect
Adolescent Online Safety
Written by
ying chen, et al.
2012 ASE /IEEE International Conference on Social Computing and 2012 ASE/IEEE
International Conference on Privacy, Security,
Risk and Trust
Presented by
ziar khan
2. Abstract
Since the textual contents on online social media are highly unstructured, informal, and
often misspelled, existing research on message level offensive language detection cannot
accurately detect offensive contents.
In this paper, they designed a framework called Lexical Syntactic Feature (LSF)
architecture to detect offensive contents and identify potential offensive users in social
media.
They distinguished the contribution of pejoratives/profanities and obscenities in
determining offensive content, and introduce hand authoring syntactic rules in identifying
name calling harassments.
In particular, they incorporated a user’s writing style, structure and specific cyberbullying
content as features to predict the users capability to send out offensive content.
Results from experiments showed that their LSF framework performed significantly better
than existing methods in offensive content detection.
3. Introduction
With the rapid growth of social media, users especially adolescents are spending significant
amount of time on various social networking sites to connect with others, to share
information, and to pursue common interests.
It has been found that 70% of teens use social media sites on a daily basis and nearly one
in four teens hit their favorite social media sites 10 or more times a day.
19% of teens report that someone has written or posted mean or embarrassing things about
them on social networking sites.
As adolescents are more likely to be negatively affected by biased and harmful contents
than adults, detecting online offensive contents to protect adolescents online safety
becomes an urgent task.
To address concerns on children access to offensive contents over Internet, administrators
of social media often manually review online contents to detect and delete offensive
materials, however, manually reviewing are labour intensive and time consuming.
4. Introduction
Some automatic contents filtering software packages, such as Appen, eblaster,
iambigbrother, Internet Security Suite etc, have been developed to detect and filter online
offensive contents. Most of them simply blocked web pages and paragraphs that contained
dirty words.
These word based approaches not only affect the readability and usability of web sites, but
also fail to identify subtle offensive messages.
To address these limitations, in this paper, they proposed Lexical Syntactic Feature based
(LSF) language model to effectively detect offensive language in social media and to
protect adolescents.
LSF not only examines messages, but also the person who posts the messages and his/her
patterns of posting.
LSF can be implemented as a client side application for individuals and groups who are
concerned about adolescents online safety.
Users are also allowed to adjust the threshold of acceptable level of offensive contents.
5. Related Work
A. Offensiveness Content Filtering Methods in Social Media : Popular online social
networking sites apply several mechanisms to screen offensive contents.
For example, Youtube’s safety mode, once activated, can hide all comments containing
offensive languages from users. But pre-screened content will still appear. The
pejoratives replaced by asterisks, if users simply click "Text Comments."
On Facebook, users can add comma separated keywords to the "Moderation Blacklist."
When people include blacklisted keywords in a post and/or a comment on a page, the
content will be automatically identified as spam and thus be screened.
Twitter client, “Tweetie 1.3,” was rejected by Apple Company for allowing foul
languages to appear in users’ tweets.
In general, the majority of popular social media use simple lexicon based approach to
filter offensive contents. Their lexicons are either predefined (such as Youtube) or
composed by the users themselves (such as Facebook).
For adolescents who often lack cognitive awareness of risks, these approaches are
hardly effective to prevent them from being exposed to offensive contents. Therefore,
parents need more sophisticate software and techniques to efficiently detect offensive
contents to protect their adolescents from potential exposure to vulgar, pornographic
and hateful languages.
6. Related Work
B. Using Text Mining Techniques to Detect Online Offensive Contents : Offensive
language identification in social media is a difficult task because the textual contents in
such environment is often unstructured, informal, and even misspelled.
Researchers have studied intelligent ways to identify offensive contents using text
mining approach. Implementing text mining techniques to analyze online data requires
the following phases:
1) Data acquisition and preprocess,
2) Feature extraction, and
3) Classification.
The major challenges of using text mining to detect offensive contents lie on the feature
selection phase, which will be elaborated in the following sections.
a) Message level Feature Extraction: Most offensive content detection research extracts
two kinds of features: lexical and syntactic features.
Lexical features treat each word and phrase as an entity. Word patterns such as
appearance of certain keywords and their frequencies are often used to represent the
language model. For example Bag of Words(BoW) and N-grams.
7. Related Work
Syntactic features: Although lexical features perform well in detecting offensive
entities, without considering the syntactical structure of the whole sentence, they fail to
distinguish sentences offensiveness which contain same words but in different orders.
Therefore, to consider syntactical features in sentences, natural language parsers are
introduced to parse sentences on grammatical structures before feature selection.
b) User level Offensiveness Detection: Most contemporary research on detecting online
offensive languages only focus on sentence level and message level constructs. Since no
detection technique is 100% accurate. However, user level detection is a more
challenging task. Some efforts at user level are:
Kontostathis et al. propose a rule based communication model to track and categorize
online predators.
Pendar, uses lexical features with machine learning classifiers to differentiate victims
from predators in online chatting environment.
Pazienza and Tudorache, propose utilizing user profiling features such as online
presence and conversations to detect aggressive discussions.
However, in the above research user information such as users’writing styles or posting
trends or reputations have not been included to improve the detection rate.
8. Research Questions
They identify the following research questions to prevent adolescents from exposing to
offensive textual content:
• How to design an effective framework that incorporates both message level and user
level features to detect and prevent offensive content in social media?
• What strategy is effective in detecting and evaluating level of offensiveness in a
message? Will advanced linguistic analysis improve the accuracy and reduce false
positives in detecting message level offensiveness?
• What strategy is effective in detecting and predicting user level offensiveness? Besides
using information from message-level offensiveness, could user profile information
further improve the performance?
• Is the proposed framework efficient enough to be deployed on real time social media?
9. Design Framework
In order to tackle these challenges, they proposed a Lexical Syntactic Feature (LSF)
based framework to detect offensive contents and identify offensive users in social
media. Their Framework include two phases of offensiveness detection.
Phase 1 aims to detect the offensiveness on the sentence level.
Phase 2 derives offensiveness on the user level.
10. Design Framework
The system consists of pre-processing and two major components: sentence
offensiveness prediction and user offensiveness estimation.
During the pre-processing stage, users conversation history is chunked into posts, and
then into sentences.
During sentence offensiveness prediction, each sentence’s offensiveness can be derived
from two features: its words’ offensiveness using lexical features and its context
through syntactical features. i.e. grammatically parse sentences into dependency sets to
capture all dependency types between a word and other words in the same sentence, and
mark some of its related words as intensifiers.
During user offensiveness estimation stage, sentence offensiveness and users language
patterns are helped to predict users’ likelihood of being offensive.
11. Sentence Offensiveness Calculation
They proposed a new method of sentence level analysis based on offensive word
lexicons and sentence syntactic structures. Firstly, they construct two offensive word
dictionaries based on different strengths of offensiveness. Secondly, the concept of
syntactic intensifier is introduced to adjust words’ offensiveness levels based on their
context. Lastly, for each sentence, an offensiveness value is generated by aggregating its
words’ offensiveness. Mathematically
For each offensive word, w, in sentence, s, its offensiveness
a1 if w is a strongly offensive word
Ow = a2 if w is a weakly offensive word
0 otherwise
The value of intensifier Iw , for offensive word, w, can be calculated as ∑k
j=1 dj
Consequently, the offensiveness value of sentence, s, becomes a determined linear
combination of words’ offensiveness Os = ∑ Ow Iw
b1 if user identifier
dj = b2 if is an offensive word
0 otherwise
12. User offensiveness estimation
Given a user, u, they retrieve his/her conversation history which contains several posts
{p1, …, pm} and each post contains sentences {s1, …. , sn}. Sentence offensiveness
values are denoted as {Os1, …. , Osn}. The original offensiveness value of post p,
Op = ∑Os . The offensiveness value of modified posts can be presented as, Op
s . So the final post offensiveness Op
` of post p can be calculated as
Op
` = max(Op, Ops). Hence, the offensiveness value of Ou, can be presented
as Ou = 1/m∑ Op
`
Additional Features Extracted From Users’ Language Profiles
they also considered additional features such as:
Sentence styles. Users may use punctuation and words with all uppercase letters to
indicate feelings or speaking volume. Punctuation, such as exclamation marks, can
emphasize offensiveness of posts. (i.e. Both “He is stupid!” and “He is STUPID.” are
stronger than “He is stupid.”).
13. Experiment 1: Sentence Offensiveness Calculation
According to the above Fig, none of the baseline approaches provides recall rate higher
than 70%, because many of the offensive sentences are imperatives, which omit all user
identifiers. Among the baseline approaches, the BoW approach has the highest recall
rate 66%. However, BoW generates a high false positive rate.
14. Experiment 2: User Offensiveness Estimation-with presence
of strongly offensive words
Machine learning techniques—Naïve Bayes (NB) and SVM—are used to perform the
classification, and 10-fold cross validation was conducted in this experiment. To fully
evaluate the effectiveness of users’ sentence offensiveness value (LSF), style features,
structure features and content specific features for user offensiveness estimation, data
fed sequentially into the classifiers, and get the result in Fig.
According to Fig., offensive words and user language features are not compensating to
each other to improve the detection rate, which means they are not independent. In
contrast, incorporating with user language features, the classifiers have better detection
rate than just adopting LSF.
15. Conclusion
In this study, they investigate existing text mining methods in detecting offensive
contents for protecting adolescent online safety. Proposed Lexical Syntactical Feature
(LSF) approach to identify offensive contents in social media, and predict a user’s
potentiality to send out offensive contents.
Their research has several contributions. First, practically conceptualize the notion of
online offensive contents, and further distinguish the contribution of pejoratives/
profanities and obscenities in determining offensive contents, and introduce hand
authoring syntactic rules in identifying name calling harassment.
Second, improved the traditional machine learning methods by not only using lexical
features to detect offensive languages, but also incorporating style features, structure
features and context-specific features to better predict a user’s potentiality to send out
offensive content in social media.
Experimental result shows that the LSF sentence offensiveness prediction and user
offensiveness estimate algorithms outperform traditional learning based approaches in
terms of precision, recall and f-score. It also achieves high processing speed for
effective deployment in social media.