As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn a robust and discriminative representation of text. Comprehensive experiments on two public cyberbullying corpora (Twitter and MySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods..
Categorize balanced dataset for troll detectionvivatechijri
As we know cyber bullying is increasing day by day and Cyber troll is one of the cyber-aggressive actions that is not much different from cyberbullying in online abuse so that the victims feel uncomfortable. One of the most used social media platforms in which cyber trolling frequently happens is Twitter. Basically, it is found that during an investigation of cyberbullying cases a lot of information gathered is false which aims to give discomfort, hatred and waste lots of time. So, it is necessary to classify between cyberbullying tweets and normal tweets on twitter. There has already been research on classification of cyberbullying tweets and normal tweets using the Support vector machine (SVM) algorithm. But the drawback of the system is that it only gives 63.83% of accuracy. Firstly, we can improve the accuracy of the system by using the Recurrent Neural Network (RNN) And Secondly, for balancing the dataset we will be using Synthetic Minority Over-sampling Technique (SMOTE). We believe that using these techniques we will be able to increase the accuracy of the previous proposed.
Presentation on the draft manuscript 'A systematic literature review of academic cyberbullying- notable research absences in Higher Education contexts' given to the Design Research Activities Workgroup at CPUT
Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...Eswar Publications
With ease of access of internet connectivity and owing to ability of maintaining anonymity, online chatting has become very common. Based on an empirical study comprising of more than 700 chatting sessions spread over a period of 15 months with nearly 2500 online chatters, this paper aims to present a psychological study and analysis of the behavior of chatters in online chatting environments. It has been found that the chatting environments are dominated by male gender and explicit sexual expression is common. The paper also laments
the ability of chatting environments to be exploited as breeding ground for cyber crimes by using ‘social engineering’. On the sidelines, the paper also lists the motivations driving the people to chat as well as the various rewards and drawbacks that chatting poses to the chatter in specific and society in general.
An electronic copy of a handout that is used with the presentation "A Parent and Teacher Training Program for Cyberbullying Detection and Intervention", Andy Jeter's presentation on his action research proposal. The handout includes a list of web resources and cyberbullying prevention tips for teachers and parents. The PowerPoint for the presentation can be found at - http://www.slideshare.net/andymanj/a-parent-and-teacher-training-program-for-cyberbullying-detection-and-intervention
Semantic Massage Addressing based on Social Cloud Actor's InterestsCSCJournals
Wireless communication with Mobile Terminals has become popular tools for collecting and sending information and data. With mobile communication comes the Short Message Service (SMS) technology which is an ideal way to stay connected with anyone, anywhere anytime to help maintain business relationships with customers. Sending individual SMS messages to long list of mobile numbers can be very time consuming, and face problems of wireless communications such as variable and asymmetric bandwidth, geographical mobility and high usage costs and face the rigidity of lists. This paper proposes a technique that assures sending the message to semantically specified group of recipients. A recipient group is automatically identified based on personal information (interests, work place, publications, social relationships, etc.) and behavior based on a populated ontology created by integrating the publicly available FOAF (Friend-of-a-Friend) documents. We demonstrate that our simple technique can first, ensure extracting groups effectively according to the descriptive attributes and second send SMS effectively and can help combat unintentional spam and preserve the privacy of mobile numbers and even individual identities. The technique provides fast, effective, and dynamic solution to save time in constructing lists and sending group messages which can be applied both on personal level or in business.
Smart detection of offensive words in social media using the soundex algorith...IJECEIAES
Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words.
Categorize balanced dataset for troll detectionvivatechijri
As we know cyber bullying is increasing day by day and Cyber troll is one of the cyber-aggressive actions that is not much different from cyberbullying in online abuse so that the victims feel uncomfortable. One of the most used social media platforms in which cyber trolling frequently happens is Twitter. Basically, it is found that during an investigation of cyberbullying cases a lot of information gathered is false which aims to give discomfort, hatred and waste lots of time. So, it is necessary to classify between cyberbullying tweets and normal tweets on twitter. There has already been research on classification of cyberbullying tweets and normal tweets using the Support vector machine (SVM) algorithm. But the drawback of the system is that it only gives 63.83% of accuracy. Firstly, we can improve the accuracy of the system by using the Recurrent Neural Network (RNN) And Secondly, for balancing the dataset we will be using Synthetic Minority Over-sampling Technique (SMOTE). We believe that using these techniques we will be able to increase the accuracy of the previous proposed.
Presentation on the draft manuscript 'A systematic literature review of academic cyberbullying- notable research absences in Higher Education contexts' given to the Design Research Activities Workgroup at CPUT
Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...Eswar Publications
With ease of access of internet connectivity and owing to ability of maintaining anonymity, online chatting has become very common. Based on an empirical study comprising of more than 700 chatting sessions spread over a period of 15 months with nearly 2500 online chatters, this paper aims to present a psychological study and analysis of the behavior of chatters in online chatting environments. It has been found that the chatting environments are dominated by male gender and explicit sexual expression is common. The paper also laments
the ability of chatting environments to be exploited as breeding ground for cyber crimes by using ‘social engineering’. On the sidelines, the paper also lists the motivations driving the people to chat as well as the various rewards and drawbacks that chatting poses to the chatter in specific and society in general.
An electronic copy of a handout that is used with the presentation "A Parent and Teacher Training Program for Cyberbullying Detection and Intervention", Andy Jeter's presentation on his action research proposal. The handout includes a list of web resources and cyberbullying prevention tips for teachers and parents. The PowerPoint for the presentation can be found at - http://www.slideshare.net/andymanj/a-parent-and-teacher-training-program-for-cyberbullying-detection-and-intervention
Semantic Massage Addressing based on Social Cloud Actor's InterestsCSCJournals
Wireless communication with Mobile Terminals has become popular tools for collecting and sending information and data. With mobile communication comes the Short Message Service (SMS) technology which is an ideal way to stay connected with anyone, anywhere anytime to help maintain business relationships with customers. Sending individual SMS messages to long list of mobile numbers can be very time consuming, and face problems of wireless communications such as variable and asymmetric bandwidth, geographical mobility and high usage costs and face the rigidity of lists. This paper proposes a technique that assures sending the message to semantically specified group of recipients. A recipient group is automatically identified based on personal information (interests, work place, publications, social relationships, etc.) and behavior based on a populated ontology created by integrating the publicly available FOAF (Friend-of-a-Friend) documents. We demonstrate that our simple technique can first, ensure extracting groups effectively according to the descriptive attributes and second send SMS effectively and can help combat unintentional spam and preserve the privacy of mobile numbers and even individual identities. The technique provides fast, effective, and dynamic solution to save time in constructing lists and sending group messages which can be applied both on personal level or in business.
Smart detection of offensive words in social media using the soundex algorith...IJECEIAES
Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words.
Data mining applied about polygamy using sentiment analysis on Twitters in In...journalBEEI
Polygamy remains one of the interesting key topics in various society, especially Indonesia. Polygamy is the act of marrying multiple spouses that means having more one wife at the same time, is common case in worldwide. In this problem, the most of Muslim in Indonesia adopt Islamic law that let men to do the polygamy with certain requirements. In worldwide communities, this has been a prominent feature and has become the subject of numerous books, heated debates, journal articles, discussion papers, and so on. Furthermore, in Indonesia the polygamous marriage has recently become a heated topic. Despite controversy, Indonesia’s law, of recent, allows people for doing polygamy in certain conditions. Because the polygamy is often debated, this study focuses on assessment of Indonesian’s perceptions through sentiment analysis and would determine people’s perception about polygamy issue from 500 tweets on Twitter. To conclude, it elucidates that the polygamy in Indonesia is a normal thing and few assume the case is as negative thing.
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Trends of machine learning in 2020 - International Journal of Artificial Inte...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Ethical Implications of Student Plagiarism in Myanmarijtsrd
This study presents efforts to establish evidence for the construct validity of scores on the ethical issue related to student plagiarism in Myanmar universities. Student plagiarism in colleges and universities has become a controversial issue in recent years. The case considered as the most commonly used immoral and unethical activities, are selected for evaluation, and the participants select these activities according questionnaire. Recognizing the difficulty in defining plagiarism while still acknowledging the practical importance of doing so, this system finds the common element about student plagiarism to be the lack of appropriate attribution to the original source. Chaw Chaw Su "Ethical Implications of Student Plagiarism in Myanmar" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27832.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/27832/ethical-implications-of-student-plagiarism-in-myanmar/chaw-chaw-su
These slides were part of the kickoff for the Social Computing Collaborative group at the University of Minnesota - Jan. 2011. Each participant presented a single slide as part of their introduction of themselves and their social computing research interest areas.
Comparative Study of Cyberbullying Detection using Different Machine Learning...ijtsrd
The advancement of social media plays an important role in increasing the population of youngsters on the web. And it has become the biggest medium of expressing one’s thoughts and emotions. Recent studies report that cyberbullying constitutes a growing problem among youngsters on the web. These kinds of attacks have a major influence on the current generation’s personal and social life because youngsters are ready to adopt online life instead of a real one, which leads them into an imaginary world. So, we are proposing a system for early detection of cyberbullying on the web and comparing different machine learning Algorithms to obtain the optimal result. We are comparing four different algorithms which can be effectively used for the detection of cyberbullying, with the implementation of the bag of words algorithm with different n gram methods. Comparatively naïve Bayes algorithm has the highest accuracy of 79 with trigram implementation of the bag of words algorithm. Rohini K R | Sreehari T Anil | Sreejith P M | Yedumohan P M "Comparative Study of Cyberbullying Detection using Different Machine Learning Algorithms" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30765.pdf Paper Url :https://www.ijtsrd.com/computer-science/artificial-intelligence/30765/comparative-study-of-cyberbullying-detection-using-different-machine-learning-algorithms/rohini-k-r
Filter unwanted messages from walls and blocking nonlegitimate user in osnIJSRD
Today’s life is totally based on Internet. Now a days people cannot imagine life without Internet. Information and communication technology plays vital role in today’s online networked society. In today’s life, we are very close to the online social networks. Online social networks are used for posting and sharing information across various social networking sites. But user’s privacy is not maintained by online social networks. For maintaining users sensitive information’s privacy online social networks provides little or no support. For filtering unwanted messages we propose a system using machine learning (ML). Using machine learning in soft classifier content based filtering performed. In proposed system filtering rules (FR’s) are provided for content independent filtering.. Blacklists are used for more flexibility by which filtering choices are increased. Proposed system provides security to the Online Social Networks.
Cyberbullying includes the repeated and intentional use of digital technology to target another person with
threats, harassment, or public humiliation. One of the techniques of Cyberbullying is sharing bullying
memes on social media, which has increased enormously in recent years. Memes are images and texts
overlapped and sometimes together they present concepts that become dubious if one of them is absent.
Here, we propose a unified deep neural model for detecting bullying memes on social media. In the
proposed unified architecture, VGG16-BiLSTM, consists of a VGG16 convolutional neural network for
predicting the visual bullying content and a BiLSTM with one-dimensional convolution for predicting the
textual bullying content. The meme is discretized by extracting the text from the image using OCR. The
perceptron-based feature-level strategy for multimodal learning is used to dynamically combine the features
of discrete modalities and output the final category as bullying or nonbullying type. We also create a
“bullying Bengali memes dataset” for experimental evaluation. Our proposed model attained an accuracy
of 87% with an F1 score of 88%. The proposed model demonstrates the capability to detect instances of
Cyberbullying involving Bengali memes on various social media platforms. Consequently, it can be utilized
to implement effective filtering mechanisms aimed at mitigating the prevalence of Cyberbullying.
Cyberbullying includes the repeated and intentional use of digital technology to target another person with
threats, harassment, or public humiliation. One of the techniques of Cyberbullying is sharing bullying
memes on social media, which has increased enormously in recent years. Memes are images and texts
overlapped and sometimes together they present concepts that become dubious if one of them is absent.
Here, we propose a unified deep neural model for detecting bullying memes on social media. In the
proposed unified architecture, VGG16-BiLSTM, consists of a VGG16 convolutional neural network for
predicting the visual bullying content and a BiLSTM with one-dimensional convolution for predicting the
textual bullying content. The meme is discretized by extracting the text from the image using OCR. The
perceptron-based feature-level strategy for multimodal learning is used to dynamically combine the features
of discrete modalities and output the final category as bullying or nonbullying type. We also create a
“bullying Bengali memes dataset” for experimental evaluation. Our proposed model attained an accuracy
of 87% with an F1 score of 88%. The proposed model demonstrates the capability to detect instances of
Cyberbullying involving Bengali memes on various social media platforms. Consequently, it can be utilized
to implement effective filtering mechanisms aimed at mitigating the prevalence of Cyberbullying.
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Data mining applied about polygamy using sentiment analysis on Twitters in In...journalBEEI
Polygamy remains one of the interesting key topics in various society, especially Indonesia. Polygamy is the act of marrying multiple spouses that means having more one wife at the same time, is common case in worldwide. In this problem, the most of Muslim in Indonesia adopt Islamic law that let men to do the polygamy with certain requirements. In worldwide communities, this has been a prominent feature and has become the subject of numerous books, heated debates, journal articles, discussion papers, and so on. Furthermore, in Indonesia the polygamous marriage has recently become a heated topic. Despite controversy, Indonesia’s law, of recent, allows people for doing polygamy in certain conditions. Because the polygamy is often debated, this study focuses on assessment of Indonesian’s perceptions through sentiment analysis and would determine people’s perception about polygamy issue from 500 tweets on Twitter. To conclude, it elucidates that the polygamy in Indonesia is a normal thing and few assume the case is as negative thing.
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Trends of machine learning in 2020 - International Journal of Artificial Inte...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Ethical Implications of Student Plagiarism in Myanmarijtsrd
This study presents efforts to establish evidence for the construct validity of scores on the ethical issue related to student plagiarism in Myanmar universities. Student plagiarism in colleges and universities has become a controversial issue in recent years. The case considered as the most commonly used immoral and unethical activities, are selected for evaluation, and the participants select these activities according questionnaire. Recognizing the difficulty in defining plagiarism while still acknowledging the practical importance of doing so, this system finds the common element about student plagiarism to be the lack of appropriate attribution to the original source. Chaw Chaw Su "Ethical Implications of Student Plagiarism in Myanmar" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27832.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/27832/ethical-implications-of-student-plagiarism-in-myanmar/chaw-chaw-su
These slides were part of the kickoff for the Social Computing Collaborative group at the University of Minnesota - Jan. 2011. Each participant presented a single slide as part of their introduction of themselves and their social computing research interest areas.
Comparative Study of Cyberbullying Detection using Different Machine Learning...ijtsrd
The advancement of social media plays an important role in increasing the population of youngsters on the web. And it has become the biggest medium of expressing one’s thoughts and emotions. Recent studies report that cyberbullying constitutes a growing problem among youngsters on the web. These kinds of attacks have a major influence on the current generation’s personal and social life because youngsters are ready to adopt online life instead of a real one, which leads them into an imaginary world. So, we are proposing a system for early detection of cyberbullying on the web and comparing different machine learning Algorithms to obtain the optimal result. We are comparing four different algorithms which can be effectively used for the detection of cyberbullying, with the implementation of the bag of words algorithm with different n gram methods. Comparatively naïve Bayes algorithm has the highest accuracy of 79 with trigram implementation of the bag of words algorithm. Rohini K R | Sreehari T Anil | Sreejith P M | Yedumohan P M "Comparative Study of Cyberbullying Detection using Different Machine Learning Algorithms" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30765.pdf Paper Url :https://www.ijtsrd.com/computer-science/artificial-intelligence/30765/comparative-study-of-cyberbullying-detection-using-different-machine-learning-algorithms/rohini-k-r
Filter unwanted messages from walls and blocking nonlegitimate user in osnIJSRD
Today’s life is totally based on Internet. Now a days people cannot imagine life without Internet. Information and communication technology plays vital role in today’s online networked society. In today’s life, we are very close to the online social networks. Online social networks are used for posting and sharing information across various social networking sites. But user’s privacy is not maintained by online social networks. For maintaining users sensitive information’s privacy online social networks provides little or no support. For filtering unwanted messages we propose a system using machine learning (ML). Using machine learning in soft classifier content based filtering performed. In proposed system filtering rules (FR’s) are provided for content independent filtering.. Blacklists are used for more flexibility by which filtering choices are increased. Proposed system provides security to the Online Social Networks.
Cyberbullying includes the repeated and intentional use of digital technology to target another person with
threats, harassment, or public humiliation. One of the techniques of Cyberbullying is sharing bullying
memes on social media, which has increased enormously in recent years. Memes are images and texts
overlapped and sometimes together they present concepts that become dubious if one of them is absent.
Here, we propose a unified deep neural model for detecting bullying memes on social media. In the
proposed unified architecture, VGG16-BiLSTM, consists of a VGG16 convolutional neural network for
predicting the visual bullying content and a BiLSTM with one-dimensional convolution for predicting the
textual bullying content. The meme is discretized by extracting the text from the image using OCR. The
perceptron-based feature-level strategy for multimodal learning is used to dynamically combine the features
of discrete modalities and output the final category as bullying or nonbullying type. We also create a
“bullying Bengali memes dataset” for experimental evaluation. Our proposed model attained an accuracy
of 87% with an F1 score of 88%. The proposed model demonstrates the capability to detect instances of
Cyberbullying involving Bengali memes on various social media platforms. Consequently, it can be utilized
to implement effective filtering mechanisms aimed at mitigating the prevalence of Cyberbullying.
Cyberbullying includes the repeated and intentional use of digital technology to target another person with
threats, harassment, or public humiliation. One of the techniques of Cyberbullying is sharing bullying
memes on social media, which has increased enormously in recent years. Memes are images and texts
overlapped and sometimes together they present concepts that become dubious if one of them is absent.
Here, we propose a unified deep neural model for detecting bullying memes on social media. In the
proposed unified architecture, VGG16-BiLSTM, consists of a VGG16 convolutional neural network for
predicting the visual bullying content and a BiLSTM with one-dimensional convolution for predicting the
textual bullying content. The meme is discretized by extracting the text from the image using OCR. The
perceptron-based feature-level strategy for multimodal learning is used to dynamically combine the features
of discrete modalities and output the final category as bullying or nonbullying type. We also create a
“bullying Bengali memes dataset” for experimental evaluation. Our proposed model attained an accuracy
of 87% with an F1 score of 88%. The proposed model demonstrates the capability to detect instances of
Cyberbullying involving Bengali memes on various social media platforms. Consequently, it can be utilized
to implement effective filtering mechanisms aimed at mitigating the prevalence of Cyberbullying.
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Scraping and Clustering Techniques for the Characterization of Linkedin Profilescsandit
The socialization of the web has undertaken a new dimension after the emergence of the Online
Social Networks (OSN) concept. The fact that each Internet user becomes a potential content
creator entails managing a big amount of data. This paper explores the most popular
professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million
public profiles. The application of natural language processing techniques (NLP) to classify the
educational background and to cluster the professional background of the collected profiles led
us to provide some insights about this OSN’s users and to evaluate the relationships between
educational degrees and professional careers.
The socialization of the web has undertaken a new dimension after the emergence of the Online
Social Networks (OSN) concept. The fact that each Internet user becomes a potential content
creator entails managing a big amount of data. This paper explores the most popular
professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million
public profiles. The application of natural language processing techniques (NLP) to classify the
educational background and to cluster the professional background of the collected profiles led
us to provide some insights about this OSN’s users and to evaluate the relationships between
educational degrees and professional careers.
Hate speech has been an ongoing problem on the Internet for many years. Besides, social media, especially Facebook, and Twitter have given it a global stage where those hate speeches can spread far more rapidly. Every social media platform needs to implement an effective hate speech detection system to remove offensive content in real-time. There are various approaches to identify hate speech, such as Rule-Based, Machine Learning based, deep learning based and Hybrid approach. Since this is a review paper, we explained the valuable works of various authors who have invested their valuable time in studying to identifying hate speech using various approaches.
A DEVELOPMENT FRAMEWORK FOR A CONVERSATIONAL AGENT TO EXPLORE MACHINE LEARNIN...mlaij
This study aims to introduce a discussion platform and curriculum designed to help people understand how
machines learn. Research shows how to train an agent through dialogue and understand how information
is represented using visualization. This paper starts by providing a comprehensive definition of AI literacy
based on existing research and integrates a wide range of different subject documents into a set of key AI
literacy skills to develop a user-centered AI. This functionality and structural considerations are organized
into a conceptual framework based on the literature. Contributions to this paper can be used to initiate
discussion and guide future research on AI learning within the computer science community.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
Online Data Preprocessing: A Case Study ApproachIJECEIAES
Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.
A hybrid approach based on personality traits for hate speech detection in Ar...IJECEIAES
In recent years, as social media has grown in popularity, people have gained the ability to freely share their views. However, this may lead to users' conflict and hostility, resulting in unattractive online environments. Hate speech relates to using expressions or phrases that are violent, offensive, or insulting to a minority of people. The number of Arab social media users is quickly rising, and this is being followed by an increase in the frequency of cyber hate speech in the area. Therefore, the automated detection of Arabic hate speech has become a major concern for many stakeholders. The intersection of personality learning and hate speech detection is a relatively less studied niche. We suggest a novel approach that is focused on extracting personality trait features and using these features to detect Arabic hate speech. The experimental results show that the proposed approach is superior in terms of the macro-F1 score by achieving 82.3% compared to previous work reported in the literature.
Mental Disorder Prevention on Social Network with Supervised Learning Based A...ijtsrd
Informal community clients guess the interpersonal organizations that they use to preserve their protection. Be that as it may, in online interpersonal organizations, protection ruptures are not really. In this proposed, first classifies to secure the buyer that occur in online informal communities. Our proposed methodology depends on specialist based portrayal of an informal organization, where the operators handle clients seclusion prerequisites by making duties with the framework. The prevailing limit through exchange learning and highlight Convolution Neural Network CNN have expected developing significance inside the PC vision network, so creation a progression of noteworthy leaps forward in basic leadership. In like manner it is a significant procedure with the end goal of how to be pertinent CNN to basic leadership for better execution. Or maybe, by and by prescribe an AI framework, to be express, Social Network Mental Disorder Detection SNMDD , that misuses features isolated from natural association data log record to exactly perceive potential cases of SNMDs. We furthermore misuse multi source learning in SNMDD and propose another Supervised Learning with Amoeba Optimization SLAO to improve the precision. To extend the adaptability of SMM, we further improve the efficiency with execution guarantee. Swati Dubey | Dr. Rajesh Kumar Shukla ""Mental Disorder Prevention on Social Network with Supervised Learning Based Amoeba Optimization"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25107.pdf
Paper URL: https://www.ijtsrd.com/computer-science/world-wide-web/25107/mental-disorder-prevention-on-social-network-with-supervised-learning-based-amoeba-optimization/swati-dubey
An evolutionary approach to comparative analysis of detecting Bangla abusive ...journalBEEI
The use of Bangla abusive texts has been accelerated with the progressive use of social media. Through this platform, one can spread the hatred or negativity in a viral form. Plenty of research has been done on detecting abusive text in the English language. Bangla abusive text detection has not been done to a great extent. In this experimental study, we have applied three distinct approaches to a comprehensive dataset to obtain a better outcome. In the first study, a large dataset collected from Facebook and YouTube has been utilized to detect abusive texts. After extensive pre-processing and feature extraction, a set of consciously selected supervised machine learning classifiers i.e. multinomial Naïve Bayes (MNB), multi layer perceptron (MLP), support vector machine (SVM), decision tree, random forrest, stochastic gradient descent (SGD), ridge, perceptron and k-nearest neighbors (k-NN) has been applied to determine the best result. The second experiment is conducted by constructing a balanced dataset by random under sampling the majority class and finally, a Bengali stemmer is employed on the dataset and then the final experiment is conducted. In all three experiments, SVM with the full dataset obtained the highest accuracy of 88%.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
Similar to Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder (20)
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
ML for identifying fraud using open blockchain data.pptx
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder
1. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 1 | P a g e Copyright@IDL-2017
Cyber bullying Detection based on Semantic-
Enhanced Marginalized Denoising Auto-
Encoder
ARPITHA.R1
, KAVYA.M2
, PRAJNA R BHAT3
, PRIYA SHIVAKUMAR4
MR.
HARISHA.D S5
DR. VRINDA SHETTY6
DR. RAMESH BABU.HS7
1,2,3,4. BE, Student – Sai Vidya institute of technology, Bengaluru, India 5. Guide, Assistant
Professor – Sai Vidya institute of technology, Bengaluru, India 6. HOD – Sai Vidya institute of
technology, Bengaluru, India 7. Principal – Sai Vidya institute of technology, Bengaluru, India
Abstract:
As a side effect of increasingly popular social media,
cyberbullying has emerged as a serious problem
afflicting children, adolescents and young adults.
Machine learning techniques make automatic
detection of bullying messages in social media
possible, and this could help to construct a healthy and
safe social media environment. In this meaningful
research area, one critical issue is robust and
discriminative numerical representation learning of
text messages. In this paper, we propose a new
representation learning method to tackle this problem.
Our method named Semantic-Enhanced Marginalized
Denoising Auto-Encoder (smSDA) is developed via
semantic extension of the popular deep learning model
stacked denoising autoencoder. The semantic
extension consists of semantic dropout noise and
sparsity constraints, where the semantic dropout noise
is designed based on domain knowledge and the word
embedding technique. Our proposed method is able to
exploit the hidden feature structure of bullying
information and learn a robust and discriminative
representation of text. Comprehensive experiments on
two public cyberbullying corpora (Twitter and
MySpace) are conducted, and the results show that our
proposed approaches outperform other baseline text
representation learning methods..
Keywords: Cyberbullying Detection, Text Mining,
Representation Learning, Stacked Denoising
Autoencoders, Word Embedding
INTRODUCTION
Media, as defined in [1], is „‟a group of Internet-
based applications that build on the ideological and
technological foundations of Web 2.0, and that allow
the creation and exchange of user-generated content.„‟
Via social media, people can enjoy enormous
information, convenient communication experience
and so on. However, social media may have some side
effects such as cyberbullying, which may have
negative impacts on the life of people, especially
children and teenagers. Cyberbullying can be defined
as aggressive, intentional actions performed by an
individual or a group of people via digital
communication methods such as sending messages
and posting comments against a victim. Different from
traditional bullying that usually occurs at school
during face- to-face communication, cyberbullying on
social media can take place anywhere at any time. For
bullies, they are free to hurt their peers‟ feelings
because they do not need to face someone and can
hide behind the Internet. For victims, they are easily
exposed to harassment since all of us, especially
2. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 2 | P a g e Copyright@IDL-2017
youth, are constantly connected to Internet or social
media. As reported in [2], cyberbullying victimization
rate ranges from 10% to 40%. In the United States,
approximately 43% of teenagers were ever bullied on
social media [3]. The same as traditional bullying,
cyberbullying has negative, insidious and sweeping
impacts on children [4], [5], [6]. The outcomes for
victims under cyberbullying may even be tragic such
as the occurrence of self-injurious behavior or
suicides. One way to address the cyberbullying
problem is to automatically detect and promptly report
bullying messages so that proper measures can be
taken to prevent possible tragedies. Previous works on
computational studies of bullying have shown that
natural language processing and machine learning are
powerful tools to study bullying [7], [8].
Cyberbullying detection can be formulated as a
supervised learning problem. A classifier is first
trained on a cyberbullying corpus labeled by humans,
and the learned classifier is then used to recognize a
bullying message. Three kinds of information
including text, user demography, and social network
features are often used in cyberbullying detection [9].
Since the text content is the most reliable, our work
here focuses on text-based cyberbullying detection. In
the text-based cyberbullying detection, the first and
also critical step is the numerical representation
learning for text messages. In fact, representation
learning of text is extensively studied in text mining,
information retrieval and natural language processing
(NLP). Bag-of-words (BoW) model is one commonly
used model that each dimension corresponds to a term.
Latent Semantic Analysis (LSA) and topic models are
another popular text representation models, which are
both based on BoW models. By mapping text units
into fixed-length vectors, the learned represen- tation
can be further processed for numerous language
processing tasks. Therefore, the useful representation
should discover the meaning behind text units. In
cyberbullying detection, the numerical representation
for Internet messages should be robust and
discriminative. Since messages on social media are
often very short and contain a lot of informal language
and misspellings, robust representations for these
messages are required to reduce their ambiguty. Even
worse, the lack of sufficient high-quality training data,
i.e., data sparsity make the issue more challenging.
Firstly, labeling data is labor intensive and time
consuming. Secondly, cyberbullying is hard to
describe and judge from a third view due to its
intrinsic ambiguities. Thirdly, due to protection of
Internet users and privacy issues, only a small portion
of messages are left on the Internet, and most bullying
posts are deleted. As a result, the trained classifier may
not generalize well on testing messages that contain
nonactivated but discriminative features. The goal of
this present study is to develop methods that can learn
robust and discriminative representations to tackle the
above problems in cyberbullying detection. Some
approaches have been proposed to tackle these
problems by incorporating expert knowledge into
feature learning. Yin et.al proposed to combine BoW
features, senti- ment features and contextual features
to train a support vec- tor machine for online
harassment detection [10]. Dinakar et.al utilized label
specific features to extend the general features, where
the label specific features are learned by Linear
Discriminative Analysis [11]. In addition, common
sense knowledge was also applied. Nahar et.al
presented a weighted TF-IDF scheme via scaling
bullying-like features by a factor of two [12]. Besides
content-based information, Maral et.al proposed to
apply users‟ information, such as gender and history
messages, and context information as extra features
[13], [14]. But a major limitation of these approaches
is that the learned feature space still relies on the BoW
assumption and may not be robust. In addition, the
performance of these approaches rely on the quality of
hand-crafted features, which require extensive domain
knowledge. In this paper, we investigate one deep
learning method named stacked denoising autoencoder
(SDA) [15]. SDA stacks several denoising
autoencoders and concatenates the output of each layer
as the learned representation. Each denoising
autoencoder in SDA is trained to recover the input
data from a corrupted version of it. The input is
corrupted by randomly setting some of the input to
zero, which is called dropout noise. This denoising
3. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 3 | P a g e Copyright@IDL-2017
process helps the autoencoders to learn robust
representation. In addition, each autoencoder layer is
intended to learn an increasingly abstract
representation of the input [16]. In this paper, we
develop a new text representation model based on a
variant of SDA: marginalized stacked denoising
autoencoders (mS- DA) [17], which adopts linear
instead of nonlinear projection to accelerate training
and marginalizes infinite noise distri- bution in order
to learn more robust representations. We utilize
semantic information to expand mSDA and develop
Semantic-enhanced Marginalized Stacked Denoising
Autoencoders (smSDA). The semantic information
consists of bullying words. An automatic extraction of
bullying words based on word embeddings is proposed
so that the involved human labor can be reduced.
During training of smSDA, we attempt to reconstruct
bullying features from other normal words by
discovering the latent structure, i.e. correlation,
between bullying and normal words. The intuition
behind this idea is that some bullying messages do not
contain bullying words. The correlation information
discovered by smSDA helps to reconstruct bullying
features from normal words, and this in turn facilitates
detection of bullying messages without containing
bullying words. For example, there is a strong
correlation between bullying word fuck and normal
word off since they often occur together. If bullying
messages do not contain such obvious bullying
features, such as fuck is often misspelled as fck, the
correlation may help to reconstruct the bullying
features from normal ones so that the bullying
message can be detected. It should be noted that
introducing dropout noise has the effects of enlarging
the size of the dataset, including training data size,
which helps alleviate the data sparsity problem. In
addition, L1 regularization of the projection matrix is
added to the objective function of each autoencoder
layer in our model to enforce the sparstiy of projection
matrix, and this in turn facilitates the discovery of the
most relevant terms for reconstructing bullying terms.
The main contributions of our work can be
summarized as follows:
Our proposed Semantic-enhanced Marginalized S-
tacked Denoising Autoencoder is able to learn ro- bust
features from BoW representation in an effi cient and
effective way. These robust features are
learned by reconstructing original input from cor
rupted (i.e., missing) ones. The new feature space can
improve the performance of cyberbullying detection
even with a small labeled training corpus. Semantic
information is incorporated into the re-construction
process via the designing of semantic dropout noises
and imposing sparsity constraints on mapping matrix.
In our framework, high-quality semantic information,
i.e., bullying words, can be extracted automatically
through word embeddings. Finally, these specialized
modifications make the new feature space more
discriminative and this in turn facilitates bullying
detection. Comprehensive experiments on real-data
sets have verified the performance of our proposed
model.
This paper is organized as follows. In Section 2,
some re- lated work is introduced. The proposed
Semantic-enhanced Marginalized Stacked Denoising
Auto-encoder for cyber- bullying detection is
presented in Section 3. In Section 4, experimental
results on several collections of cyberbullying data are
illustrated. Finally, concluding remarks are provid- ed
in Section 5.
RELATED WORK
This work aims to learn a robust and discriminative
text representation for cyberbullying detection. Text
representation and automatic cyberbullying detection
are both related to our work. In the following, we
briefly review the previous work in these two areas.
Text Representation Learning
In text mining, information retrieval and natural
language processing, effective numerical
representation of linguistic units is a key issue. The
Bag-of-words (BoW) model is the most classical text
representation and the cornerstone of some states-of-
arts models including Latent Semantic Analysis (LSA)
[18] and topic models [19], [20]. BoW model
4. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 4 | P a g e Copyright@IDL-2017
represents a document in a textual corpus using a
vector of real numbers indicating the occurrence of
words in the document. Although BoW model has
proven to be efficient and effective, the representation
is often very sparse. To address this problem, LSA
applies Singular Value Decomposition (SVD) on the
word-document matrix for BoW model to derive a
low-rank approximation. Each new feature is a linear
combination of all original features to alleviate the
sparsity problem. Topic models, including
Probabilistic La tent Semantic Analysis [21] and
Latent Dirichlet Allocation [20], are also proposed.
The basic idea behind topic models is that word choice
in a document will be influenced by the topic of the
document probabilistically. Topic models try to define
the generation process of each word occurred in a
document. Similar to the approaches aforementioned,
our proposed approach takes the BoW representation
as the input. How- ever, our approach has some
distinct merits. Firstly, the multi-layers and non-
linearity of our model can ensure a deep learning
architecture for text representation, which has been
proven to be effective for learning high-level features.
Second, the applied dropout noise can make the
learned representation more robust. Third, specific to
cyberbullying detection, our method employs the
semantic information, including bullying words and
sparsity constraint imposed on mapping matrix in each
layer and this will in turn pro duce more
discriminative representation.
Cyberbullying Detection
Although the incorporation of knowledge base can
achieve a performance improvement, the construction
of a complete and general one is labor-consuming.
Nahar et.al proposed to scale bullying words by a
factor of two in the original BoW features [12]. The
motivation behind this work is quit similar to that of
our model to enhance bullying features. However, the
scaling operation in [12] is quite arbitrary. Ptaszynski
et.al searched sophisticated patterns in a brute-force
way [26]. The weights for each extracted pattern need
to be calculated based on annotated training corpus,
and thus the performance may not be guaranteed if the
training corpus has a limited size. Besides content-
based information, Maral et.al also employ users‟
information, such as gender and history messages, and
context information as extra features [13], [14]. Huang
et.al also considered social network features to learn
the features for cyberbullying detection [9]. The
shared deficiency among these forementioned
approaches is constructed text features are still from
BoW representation, which has been criticized for its
inherent over-sparsity and failure to capture semantic
structure [18], [19], [20]. Different from these
approaches, our proposed model can learn robust
features by reconstructing the original data from
corrupted data and introduce semantic corruption noise
and sparsity mapping matrix to explore the feature
structure which are predictive of the existence of
bullying so that the learned representation can be
discriminative. With the increasing popularity of
social media in recent years, cyberbullying has
emerged as a serious problem afflicting children and
young adults. Previous studies of cyberbullying
focused on extensive surveys and its psy- chological
effects on victims, and were mainly conducted by
social scientists and psychologists [6]. Although these
efforts facilitate our understanding for cyberbullying,
the psychological science approach based on personal
surveys is very time-consuming and may not be
suitable for automatic detection of cyberbullying.
Since machine learning is gaining increased popularity
in recent years, the computational study of
cyberbullying has at- tracted the interest of
researchers. Several research areas including topic
detection and affective analysis are closely related to
cyberbullying detection. Owing to their efforts,
automatic cyberbullying detection is becoming
possible. In machine learning-based cyberbullying
detection, there are two issues: 1) text representation
learning to transform each post/message into a
numerical vector and 2) classifier train- ing. Xu et.al
presented several off-the-shelf NLP solutions
including BoW models, LSA and LDA for
representation learning to capture bullying signals in
social media [8]. As an introductory work, they did not
5. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 5 | P a g e Copyright@IDL-2017
develop specialized models for cyberbullying
detection. Yin et.al proposed to combine BoW
features, sentiment feature and contextual features to
train a classifier for detecting possible harassing posts
[10]. The introduction of the sentiment and contex-
tual features has been proven to be effective. Dinakar
et.al used Linear Discriminative Analysis to learn label
specific features and combine them with BoW features
to train a classifier [11]. The performance of label-
specific features largely depends on the size of training
corpus. In addition, they need to construct a bullyspace
knowledge base to boost the performance of natural
language processing methods.
SEMANTIC-ENHANCED MARGINALIZED
STACKED DENOISING AUTO-ENCODER
We first introduce notations used in our paper. Let D =
{w1 , . . . , wd } be the dictionary covering all the
words existing in the text corpus. We represent each
message using a BoW vector x ∈ Rd . Then, the whole
corpus can be denoted as a matrix: X = [x1 , . . . , xn ]
∈ Rd×n , where n is the number of available posts. We
next briefly review the marginalized stacked denoising
auto-encoder and present our proposed Semantic
enhanced Marginalized Stacked Denoising Auto-
Encoder. 3.1 Marginalized Stacked Denoising Auto-
encoder Chen et.al proposed a modified version of
Stacked Denois- ing Auto-encoder that employs a
linear instead of a non- linear projection so as to obtain
a closed-form solution [17]. The basic idea behind
denoising auto-encoder is to recon- struct the original
input from a corrupted one x1 , . . . , xn ith the goal of
obtaining robust representation.
As analyzed above, the bullying features play an
important role and should be chosen properly. In the
following, the steps for constructing bullying feature
set Zb are given, in which the first layer and the other
layers are addressed separately. For the first layer,
expert knowledge and word embeddings are used. For
the other layers, discriminative feature selection is
conducted. Layer One: firstly, we build a list of words
with negative affective, including swear words and
dirty words. Then, we compare the word list with the
BoW features of our own corpus, and regard the
intersections as bullying features. However, it is
possible that expert knowledge is limited and does not
reflect the current usage and style of cyber language.
Therefore, we expand the list of pre-defined insulting
words, i.e. insulting seeds, based on word embeddings
as follows: Word embeddings use real-valued and
low-dimension- al vectors to represent semantics of
words. The well-trained word embeddings lie in a
vector space where similar words are placed close to
6. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 6 | P a g e Copyright@IDL-2017
each other. In addition, the cosine similarity between
word embeddings is able to quantify the semantic
similarity between words. Considering the Interent
messages are our interested corpus, we utilize a well-
trained word2vec model on a large-scale twitter corpus
containing 400 million tweets. A visualization of some
word embeddings after dimensionality reduction
(PCA) is shown in Figure 2. It is observed that curse
words form distinct clusters, which are also far away
from normal words. Even insulting words are located
at different regions due to different word usages and
insulting expressions. In addition, since the word
embeddings adopted here are trained in a large scale
corpus from Twitter, the similarity captured by word
embeddings can represent the specific language
pattern. For example, the embedding of the misspelled
word fck is close to the embedding of fuck so that the
word fck can be automatically extracted based on
word embeddings.
“Fig. 2. Two dimensional visualization of our used
word embeddings via PCA. Displayed terms include
both bullying ones and normal ones. It shows that
similar words are nearby vectors.
EXPERIMENTS
In this section, we evaluate our proposed
semanticenhanced marginalized stacked denoising
auto-encoder(smSDA) with two public real-world
cyberbullying corpora.We start by describing the
adopted corpora and experimental setup. Experimental
results are then compared with otherbaseline methods
to test the performance of our approach.At last, we
provide a detailed analysis to explain the
goodperformance of our method. 4.1 Descriptions of
Datasets Some important merits of our proposed
approach are sum marized as follows: Most
cyberbullying detection methods rely on the BoW
model. Due to the sparsity problems of both Two
datasets are used here. One is from Twitter and
another is from MySpace groups. The details of these
two datasets are described below: Twitter Dataset:
Twitter is „‟a real-time information network that
connects you to the latest stories, ideas, opinions and
news about what you find interesting„‟ (https:
//about.twitter.com/). Registered users can read and
post tweets, which are defined as the messages posted
on Twitter with a maximum length of 140 characters.
The Twitter dataset is composed of tweets crawled by
the public Twitter stream API through two steps. In
Step 1, keywords starting with ”bull” including
”bully”, ”bullied” and ”bullying” are used as queries
in Twitter to preselect some tweets that potentially
contain bullying contents. Re- tweets are removed by
excluding tweets containing the acronym „‟RT„‟. In
Step 2, the selected tweets are manually labeled as
bullying trace or non-bullying trace based on the
contents of the tweets. 7321 tweets are randomly
sampled from the whole tweets collections from
August 6, 201.
The MySpace dataset is crawled from MySpace
groups. Each group consists of several posts by
different users, which can be regarded as a
conversation about one topic. Due to the interactive
nature behind cyberbullying, each data sample is
defined as a window of 10 consecutive posts and the
windows are moved one post by one post so that we
got multiple windows. Then, three people labeled the
data for the existence of bullying content
independently. To be objective, an instance is labeled
as cyberbullying only if at least 2 out of 3 coders
identify bullying content in the windows of posts. The
raw text for these data, as XML files, have been kindly
provided by Kontostathis et.al3 . The XML files
contain information about the posts, such as post text,
post data, and users‟ information, which are put into
11 packets. Some posts in MySpace are shown in
Figure 4. Here, we focus on content-based mining, and
hence, we only extract and preprocess the posts‟ text.
7. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 7 | P a g e Copyright@IDL-2017
The preprocessing steps of the MySpace raw text
include tokenization, deletion of punctuation and
special characters. The unigrams and bigrams features
are adopted here. The threshold for negligible low-
frequency terms is set to 20, considering one post
occurred in a long conversation will occur in at least
ten windows. The details of this dataset is shown in
Table 1.
Fig. 3. Some Examples from Twitter Datasets. Three
of them are non bullying traces. And the other three
are bullying traces testing.
The process is repeated ten times to generate ten
subdatasets constructed from MySpace data. Finally,
we have twenty sub-datasets, in which ten datasets are
from Twitter corpus and another ten datasets are from
MySpace corpus. 4.2 Experimental Setup Here, we
experimentally evaluate our smSDA on two cyber
bullying detection corpora. The following methods
will be compared.
Fig. 4. Some Examples from MySpace Datasets. Two
Conversions are Displayed and each one includes a
normal post (P ) and a bullying post(B P ). BWM:
Bullying word matching. If the message contains at
least one of our defined bullying words, it will be
classified as bullying. BoW Model: the raw BoW
features are directly fed into the classifier. Semantic-
enhanced BoW Model: This approach is referred in
[12]. Following the original setting, we scale the
bullying features by a factor of 2. LSA: Latent
Semantic Analysis [18]. LDA: Latent Dirchilet
Allocation [20]. Our implementation of LDA is based
on Gensim4. mSDA: marginalized stacked denoising
autoen coder [17]. smSDA and smSDAu : semantic-
enhanced marginalized denoising autoencoder that
utilizes semantic dropout noise and unbiased one,
respectively.
Fig. 5. Word Cloud Visualization of the List of Words with
Negative Affective
8. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 8 | P a g e Copyright@IDL-2017
Fig. 6. Word Cloud Visualization of the Bullying
Features in Twitter Datasets.
Fig. 7. Word Cloud Visualization of the Bullying
Features in MySpace Datasets
Experimental Results
In this section, we show a comparison of our proposed
smSDA method with six benchmark approaches on
Twitter and MySpace datasets. The average results, for
these two datasets, on classification accuracy and F1
score are shown in Table 2. Figures 8 and 9 show the
results of seven compared approaches on all sub-
datasets constructed from Twitter and MySpace
datasets, respectively. Since BWM does not require
training documents, its results over the whole corpus
are reported in Table 2. It is clear that our approaches
outperform the other approaches in these two Twitter
and MySpace corpora.
the correlated features by reconstructing masking
feature values from uncorrupted feature values.
Further, the stacking structure and the nonlinearity
contribute to mSDA‟s ability for discovering complex
factors behind data. Based on mSDA, our proposed
smSDA utilizes semantic dropout noise and sparsity
constraints on mapping matrix, in which the efficiency
of training can be kept. This extension leads to a stable
performance improvement on cyberbullying detection
and the detailed analysis has been provided in the
following section.
TABLE 3 Term Reconstruction on Twitter datasets.
Each Row Shows Specific Bullying Word, along with
Top-4 Reconstructed Words (ranked with their
frequency values from top to bottom) via mSDA (left
column) and smSDA (right column). each output
dimension under such a setting. It is shown that these
reconstructed words discovered by smSDA are more
correlated to bullying words than those by mSDA. For
example, fucking is reconstructed by because, friend,
off, gets in mSDA. Except off, the other three words
seem to be unreasonable. However, in smSDA,
fucking is reconstructed by off, pissed, shit and of.
9. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 9 | P a g e Copyright@IDL-2017
The occurrence of the term of may be due to the
frequent misspelling in Internet writing. It is obvious
that the correlation discovered by smSDA is more
meaningful. This indicates that smSDA can learn the
words‟ correlations which may be the signs of
bullying semantics, and therefore the learned robust
features boost the performance on cyberbullying
detection.
CONCLUSION– This paper addresses the text-based
cyberbullying detection problem, where robust and
discriminative representations of messages are critical
for an effective detection system. By designing
semantic dropout noise and enforcing sparsity, we
have developed semantic-enhanced marginalized
denoising autoencoder as a specialized representation
learning model for cyberbullying detection. In
addition, word embeddings have been used to
automatically expand and refine bullying word lists
that is initialized by domain knowledge. The
performance of our approaches has been
experimentally verified through two cyberbullying
corpora from social medias: Twitter and MySpace. As
a next step we are planning to further improve the
robustness of the learned representation by considering
word order in messages.
REFERENCES
[1] A. M. Kaplan and M. Haenlein, “Users of the
world, unite! The challenges and opportunities of
social media,” Business horizons, vol. 53, no. 1, pp.
59–68, 2010.
[2] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder,
and M. R. Lattanner, “Bullying in the digital age: A
critical review and metaanalysis of cyberbullying
research among youth.” 2014.
[3] M. Ybarra, “Trends in technology-based sexual
and non-sexual aggression over time and linkages to
nontechnology aggression,” National Summit on
Interpersonal Violence and Abuse Across the
Lifespan: Forging a Shared Agenda, 2010.
[4] B. K. Biggs, J. M. Nelson, and M. L. Sampilo,
“Peer relations in the anxiety–depression link: Test of
a mediation model,” Anxiety, Stress, & Coping, vol.
23, no. 4, pp. 431–447, 2010.
[5] S. R. Jimerson, S. M. Swearer, and D. L. Espelage,
Handbook of bullying in schools: An international
perspective. Routledge/Taylor & Francis Group, 2010.
[6] G. Gini and T. Pozzoli, “Association between
bullying and psychosomatic problems: A meta-
analysis,” Pediatrics, vol. 123, no. 3, pp. 1059–1065,
2009.
[7] A. Kontostathis, L. Edwards, and A. Leatherman,
“Text mining and cybercrime,” Text Mining:
Applications and Theory. John Wiley & Sons, Ltd,
Chichester, UK, 2010.
[8] J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore,
“Learning from bullying traces in social media,” in
Proceedings of the 2012 conference of the North
American chapter of the association for computational
linguistics: Human language technologies. Association
for Computational Linguistics, 2012, pp. 656–666.
[9] Q. Huang, V. K. Singh, and P. K. Atrey, “Cyber
bullying detection using social and textual analysis,”
in Proceedings of the 3rd International Workshop on
Socially-Aware Multimedia. ACM, 2014, pp. 3–6.
[10] D. Yin, Z. Xue, L. Hong, B. D. Davison, A.
Kontostathis, and L. Edwards, “Detection of
harassment on web 2.0,” Proceedings of the Content
Analysis in the WEB, vol. 2, pp. 1–7, 2009.
[11] K. Dinakar, R. Reichart, and H. Lieberman,
“Modeling the detection of textual cyberbullying.” in
The Social Mobile Web, 2011.
[12] V. Nahar, X. Li, and C. Pang, “An effective
approach for cyberbullying detection,”
Communications in Information Science and
Management Engineering, 2012.
[13] M. Dadvar, F. de Jong, R. Ordelman, and R.
Trieschnigg, “Improved cyberbullying detection using
gender information,” in Proceedings of the 12th -
Dutch-Belgian Information Retrieval Workshop
(DIR2012). Ghent, Belgium: ACM, 2012.
[14] M. Dadvar, D. Trieschnigg, R. Ordelman, and F.
de Jong, “Improving cyberbullying detection with user
context,” in Advances in Information Retrieval.
Springer, 2013, pp. 693–696.
10. IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 10 | P a g e Copyright@IDL-
2017