With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
20 Latest Computer Science Seminar Topics on Emerging TechnologiesSeminar Links
A list of Top 20 technical seminar topics for computer science engineering (CSE) you should choose for seminars and presentations in 2019. The list also contains related seminar topics on the emerging technologies in computer science, IT, Networking, software branch. To download PDF, PPT Seminar Reports check the links.
20 Latest Computer Science Seminar Topics on Emerging TechnologiesSeminar Links
A list of Top 20 technical seminar topics for computer science engineering (CSE) you should choose for seminars and presentations in 2019. The list also contains related seminar topics on the emerging technologies in computer science, IT, Networking, software branch. To download PDF, PPT Seminar Reports check the links.
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Extending the current Internet and providing connection, communication, and inter-networking between devices and physical objects, or "Things," is a growing trend that is often referred to as the Internet of Things.
“The technologies and solutions that enable integration of real world data and services into the current information networking technologies are often described under the umbrella term of the Internet of Things (IoT)”
Blue Gene_SM
Introduction
The word "supercomputer" entered the mainstream lexicon in 1996 and 1997 when IBM's Deep Blue supercomputer challenged the world chess champion in two tournaments broadcast around the world.
Since then, IBM has been busy improving its supercomputer technology and tackling much deeper problems.
Their latest project, code named Blue Gene, is poised to shatter all records for computer and network performance.
What is a Super Computer
A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.
Today, supercomputers are typically one-of-a-kind custom designs produced by "traditional" companies such as Cray, IBM and Hewlett-Packard, who had purchased many of the 1980s companies to gain their experience.
Why we need Super Computers
Supercomputers are very useful in highly calculation-intensive tasks such as
Problems involving quantum physics,
Weather forecasting,
Climate research,
Molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals),
Physical simulations (such as simulation of airplanes in wind tunnels, simulation of the detonation of nuclear weapons, and research into nuclear fusion).
Why we need Super Computers
Also, they are useful for a particular class of problems, known as Grand Challenge problems, full solution for such problems require semi-infinite computing resources.
NASA™s Linux-based Super Computer
Why Supercomputers are Fast
Several elements of a supercomputer contribute to its high level of performance:
Numerous high-performance processors (CPUs) for parallel processing
Specially-designed high-speed internal networks
Specially-designed or tuned operating systems
What is Blue gene
Blue Gene is a computer architecture project designed to produce several supercomputers that are designed to reach operating speeds in the PFLOPS (petaFLOPS = 1015) range, and currently reaching sustained speeds of nearly 500 TFLOPS (teraFLOPS = 1012).
It is a cooperative project among IBM(particularly IBM Rochester and the Thomas J. Watson Research Center), the Lawrence Livermore National Laboratory, the United States Department of Energy (which is partially funding the project), and academia.
Why Blue Gene
Blue Gene is an IBM Research project dedicated to exploring the
frontiers in supercomputing:
in computer architecture,
in the software required to program and control massively parallel systems, and
in the use of computation to advance the understanding of important biological processes such as protein folding.
Learning more about biomolecular mechanisms is expected to give medical researchers better understanding of diseases, as well as potential cures.
Why the name Blue gene
Blue - The corporate color of IBM
Gene - The intended use of the Blue Gene clusters was for Computational biology.
Blue Gene Projects
There
"The proposed system overcomes the above mentioned issue in an efficient way. It aims at analyzing the number of fraud transactions that are present in the dataset.
"
Building a multi headed model thats capable of detecting different types of toxicity like threats, obscenity, insult and identity based hate. Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to efficiently facilitate conversations, leading many communities to limit or completely shut down user comments. So far we have a range of publicly available models served through the perspective APIs, including toxicity. But the current models still make errors, and they dont allow users to select which type of toxicity theyre interested in finding. Pallam Ravi | Hari Narayana Batta | Greeshma S | Shaik Yaseen ""Toxic Comment Classification"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23464.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/23464/toxic-comment-classification/pallam-ravi
Driver drowsiness monitoring system using visual behavior and Machine Learning.AasimAhmedKhanJawaad
Drowsy driving is one of the major causes of road accidents and death. Hence, detection of
driver’s fatigue and its indication is an active research area. Most of the conventional methods are
either vehicle based, or behavioral based or physiological based. Few methods are intrusive and
distract the driver, some require expensive sensors and data handling. Therefore, in this study, a low
cost, real time driver’s drowsiness detection system is developed with acceptable accuracy. In the
developed system, a webcam records the video and driver’s face is detected in each frame employing
image processing techniques. Facial landmarks on the detected face are pointed and subsequently the
eye aspect ratio, mouth opening ratio and nose length ratio are computed and depending on their
values, drowsiness is detected based on developed adaptive thresholding. Machine learning
algorithms have been implemented as well in an offline manner. A sensitivity of 95.58% and
specificity of 100% has been achieved in Support Vector Machine based classification.
Fake accounts detection on social media using stack ensemble systemIJECEIAES
In today’s world, social media has spread widely, and the social life of people have become deeply associated with social media use. They use it to communicate with each other, share events and news, and even run businesses. The huge growth in social media and the massive number of users has lured attackers to distribute harmful content through fake accounts, leading to a large number of people falling victim to those accounts. In this work, we propose a mechanism for identifying fake accounts on the social media site Twitter by using two methods to preprocess data and extract the most effective features, they are the spearman correlation coefficient and the chi-square test. For classification, we used supervised machine learning algorithms based on the ensemble system (stack method) by using random forest, support vector machine, and naive Bayes algorithms in the first level of the stack, and the logistic regression algorithm as a meta classifier. The stack ensemble system was shown to be effective in achieving the best results when compared to the algorithms used with it, with data accuracy reaching 99%.
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Extending the current Internet and providing connection, communication, and inter-networking between devices and physical objects, or "Things," is a growing trend that is often referred to as the Internet of Things.
“The technologies and solutions that enable integration of real world data and services into the current information networking technologies are often described under the umbrella term of the Internet of Things (IoT)”
Blue Gene_SM
Introduction
The word "supercomputer" entered the mainstream lexicon in 1996 and 1997 when IBM's Deep Blue supercomputer challenged the world chess champion in two tournaments broadcast around the world.
Since then, IBM has been busy improving its supercomputer technology and tackling much deeper problems.
Their latest project, code named Blue Gene, is poised to shatter all records for computer and network performance.
What is a Super Computer
A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.
Today, supercomputers are typically one-of-a-kind custom designs produced by "traditional" companies such as Cray, IBM and Hewlett-Packard, who had purchased many of the 1980s companies to gain their experience.
Why we need Super Computers
Supercomputers are very useful in highly calculation-intensive tasks such as
Problems involving quantum physics,
Weather forecasting,
Climate research,
Molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals),
Physical simulations (such as simulation of airplanes in wind tunnels, simulation of the detonation of nuclear weapons, and research into nuclear fusion).
Why we need Super Computers
Also, they are useful for a particular class of problems, known as Grand Challenge problems, full solution for such problems require semi-infinite computing resources.
NASA™s Linux-based Super Computer
Why Supercomputers are Fast
Several elements of a supercomputer contribute to its high level of performance:
Numerous high-performance processors (CPUs) for parallel processing
Specially-designed high-speed internal networks
Specially-designed or tuned operating systems
What is Blue gene
Blue Gene is a computer architecture project designed to produce several supercomputers that are designed to reach operating speeds in the PFLOPS (petaFLOPS = 1015) range, and currently reaching sustained speeds of nearly 500 TFLOPS (teraFLOPS = 1012).
It is a cooperative project among IBM(particularly IBM Rochester and the Thomas J. Watson Research Center), the Lawrence Livermore National Laboratory, the United States Department of Energy (which is partially funding the project), and academia.
Why Blue Gene
Blue Gene is an IBM Research project dedicated to exploring the
frontiers in supercomputing:
in computer architecture,
in the software required to program and control massively parallel systems, and
in the use of computation to advance the understanding of important biological processes such as protein folding.
Learning more about biomolecular mechanisms is expected to give medical researchers better understanding of diseases, as well as potential cures.
Why the name Blue gene
Blue - The corporate color of IBM
Gene - The intended use of the Blue Gene clusters was for Computational biology.
Blue Gene Projects
There
"The proposed system overcomes the above mentioned issue in an efficient way. It aims at analyzing the number of fraud transactions that are present in the dataset.
"
Building a multi headed model thats capable of detecting different types of toxicity like threats, obscenity, insult and identity based hate. Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to efficiently facilitate conversations, leading many communities to limit or completely shut down user comments. So far we have a range of publicly available models served through the perspective APIs, including toxicity. But the current models still make errors, and they dont allow users to select which type of toxicity theyre interested in finding. Pallam Ravi | Hari Narayana Batta | Greeshma S | Shaik Yaseen ""Toxic Comment Classification"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23464.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/23464/toxic-comment-classification/pallam-ravi
Driver drowsiness monitoring system using visual behavior and Machine Learning.AasimAhmedKhanJawaad
Drowsy driving is one of the major causes of road accidents and death. Hence, detection of
driver’s fatigue and its indication is an active research area. Most of the conventional methods are
either vehicle based, or behavioral based or physiological based. Few methods are intrusive and
distract the driver, some require expensive sensors and data handling. Therefore, in this study, a low
cost, real time driver’s drowsiness detection system is developed with acceptable accuracy. In the
developed system, a webcam records the video and driver’s face is detected in each frame employing
image processing techniques. Facial landmarks on the detected face are pointed and subsequently the
eye aspect ratio, mouth opening ratio and nose length ratio are computed and depending on their
values, drowsiness is detected based on developed adaptive thresholding. Machine learning
algorithms have been implemented as well in an offline manner. A sensitivity of 95.58% and
specificity of 100% has been achieved in Support Vector Machine based classification.
Fake accounts detection on social media using stack ensemble systemIJECEIAES
In today’s world, social media has spread widely, and the social life of people have become deeply associated with social media use. They use it to communicate with each other, share events and news, and even run businesses. The huge growth in social media and the massive number of users has lured attackers to distribute harmful content through fake accounts, leading to a large number of people falling victim to those accounts. In this work, we propose a mechanism for identifying fake accounts on the social media site Twitter by using two methods to preprocess data and extract the most effective features, they are the spearman correlation coefficient and the chi-square test. For classification, we used supervised machine learning algorithms based on the ensemble system (stack method) by using random forest, support vector machine, and naive Bayes algorithms in the first level of the stack, and the logistic regression algorithm as a meta classifier. The stack ensemble system was shown to be effective in achieving the best results when compared to the algorithms used with it, with data accuracy reaching 99%.
Empirical analysis of ensemble methods for the classification of robocalls in...IJECEIAES
With the advent of technology, there has been an excessive use of cellular phones. Cellular phones have made life convenient in our society. However, individuals and groups have subverted the telecommunication devices to deceive unwary victims. Robocalls are quite prevalent these days and they can either be legal or used by scammers to trick one out of their money. The proposed methodology in the paper is to experiment two ensemble models on the dataset acquired from the Federal Trade Commission(DNC Dataset). It is imperative to analyze the call records and based on the patterns the calls can classify as a robocall or not a robocall. Two algorithms Random Forest and XgBoost are combined in two ways and compared in the paper in terms of accuracy, sensitivity and the time taken.
Classification of instagram fake users using supervised machine learning algo...IJECEIAES
On Instagram, the number of followers is a common success indicator. Hence, followers selling services become a huge part of the market. Influencers become bombarded with fake followers and this causes a business owner to pay more than they should for a brand endorsement. Identifying fake followers becomes important to determine the authenticity of an influencer. This research aims to identify fake users' behavior and proposes supervised machine learning models to classify authentic and fake users. The dataset contains fake users bought from various sources, and authentic users. There are 17 features used, based on these sources: 6 metadata, 3 media info, 2 engagement, 2 media tags, 4 media similarity Five machine learning algorithms will be tested. Three different approaches of classification are proposed, i.e. classification to 2-classes and 4-classes, and classification with metadata. Random forest algorithm produces the highest accuracy for the 2-classes (authentic, fake) and 4-classes (authentic, active fake user, inactive fake user, spammer) classification, with accuracy up to 91.76%. The result also shows that the five metadata variables, i.e. number of posts, followers, biography length, following, and link availability are the biggest predictors for the users class. Additionally, descriptive statistics results reveal noticeable differences between fake and authentic users.
Abstract: Detection of fake news based on deep learning techniques is a major issue used to mislead people. For
the experiments, several types of datasets, models, and methodologies have been used to detect fake news. Also,
most of the datasets contain text id, tweets id, and user-based id and user-based features. To get the proper results
and accuracy various models like CNN (Convolution neural network), DEEP CNN, and LSTM (Long short-term
memory) are used
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
Machine learning is a sub-field of artificial intelligence (AI) that focuses on creating statistical models and algorithms that allow computers to learn and become more proficient at performing particular tasks. Machine learning algorithms create a mathematical model with the help of historical sample data, or “training data,” that assists in making predictions or judgments without being explicitly programmed.
Machine learning is a sub-field of artificial intelligence (AI) that focuses on creating statistical models and algorithms that allow computers to learn and become more proficient at performing particular tasks. Machine learning algorithms create a mathematical model with the help of historical sample data, or “training data,” that assists in making predictions or judgments without being explicitly programmed.
How to build machine learning apps.pdfJamieDornan2
Machine learning is a sub-field of artificial intelligence (AI) that focuses on creating statistical models and algorithms that allow computers to learn and become more proficient at performing particular tasks.
In the era of data-driven warfare, the integration of big data and machine learning (ML) techniques has
become paramount for enhancing defence capabilities. This research report delves into the applications of
big data and ML in the defence sector, exploring their potential to revolutionize intelligence gathering,
strategic decision-making, and operational efficiency. By leveraging vast amounts of data and advanced
algorithms, these technologies offer unprecedented opportunities for threat detection, predictive analysis,
and optimized resource allocation. However, their adoption also raises critical concerns regarding data
privacy, ethical implications, and the potential for misuse. This report aims to provide a comprehensive
understanding of the current state of big data and ML in defence, while examining the challenges and
ethical considerations that must be addressed to ensure responsible and effective implementation.
Cloud Computing, being one of the most recent innovative developments of the IT world, has been
instrumental not just to the success of SMEs but, through their productivity and innovative contribution to
the economy, has even made a remarkable contribution to the economic growth of the United States. To
this end, the study focuses on how cloud computing technology has impacted economic growth through
SMEs in the United States. Relevant literature connected to the variables of interest in this study was
reviewed, and secondary data was generated and utilized in the analysis section of this paper. The findings
of this paper revealed that there have been meaningful contributions that the usage of virtualization has
made in the commercial dealings of small firms in the United States, and this has also been reflected in the
economic growth of the country. This paper further revealed that as important as cloud-based software is,
some SMEs are still skeptical about how it can help improve their business and increase their bottom line
and hence have failed to adopt it. Apart from the SMEs, some notable large firms in different industries,
including information and educational services, have adopted cloud computing technology and hence
contributed to the economic growth of the United States. Lastly, findings from our inferential statistics
revealed that no discernible change has occurred in innovation between small and big businesses in the
adoption of cloud computing. Both categories of businesses adopt cloud computing in the same way, and
their contribution to the American economy has no significant difference in the usage of virtualization.
Energy-constrained Wireless Sensor Networks (WSNs) have garnered significant research interest in
recent years. Multiple-Input Multiple-Output (MIMO), or Cooperative MIMO, represents a specialized
application of MIMO technology within WSNs. This approach operates effectively, especially in
challenging and resource-constrained environments. By facilitating collaboration among sensor nodes,
Cooperative MIMO enhances reliability, coverage, and energy efficiency in WSN deployments.
Consequently, MIMO finds application in diverse WSN scenarios, spanning environmental monitoring,
industrial automation, and healthcare applications.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication. IJCSIT publishes original research papers and review papers, as well as auxiliary material such as: research papers, case studies, technical reports etc.
With growing, Car parking increases with the number of car users. With the increased use of smartphones
and their applications, users prefer mobile phone-based solutions. This paper proposes the Smart Parking
Management System (SPMS) that depends on Arduino parts, Android applications, and based on IoT. This
gave the client the ability to check available parking spaces and reserve a parking spot. IR sensors are
utilized to know if a car park space is allowed. Its area data are transmitted using the WI-FI module to the
server and are recovered by the mobile application which offers many options attractively and with no cost
to users and lets the user check reservation details. With IoT technology, the smart parking system can be
connected wirelessly to easily track available locations.
Welcome to AIRCC's International Journal of Computer Science and Information Technology (IJCSIT), your gateway to the latest advancements in the dynamic fields of Computer Science and Information Systems.
Computer-Assisted Language Learning (CALL) are computer-based tutoring systems that deal with
linguistic skills. Adding intelligence in such systems is mainly based on using Natural Language
Processing (NLP) tools to diagnose student errors, especially in language grammar. However, most such
systems do not consider the modeling of student competence in linguistic skills, especially for the Arabic
language. In this paper, we will deal with basic grammar concepts of the Arabic language taught for the
fourth grade of the elementary school in Egypt. This is through Arabic Grammar Trainer (AGTrainer)
which is an Intelligent CALL. The implemented system (AGTrainer) trains the students through different
questions that deal with the different concepts and have different difficulty levels. Constraint-based student
modeling (CBSM) technique is used as a short-term student model. CBSM is used to define in small grain
level the different grammar skills through the defined skill structures. The main contribution of this paper
is the hierarchal representation of the system's basic grammar skills as domain knowledge. That
representation is used as a mechanism for efficiently checking constraints to model the student knowledge
and diagnose the student errors and identify their cause. In addition, satisfying constraints and the number
of trails the student takes for answering each question and fuzzy logic decision system are used to
determine the student learning level for each lesson as a long-term model. The results of the evaluation
showed the system's effectiveness in learning in addition to the satisfaction of students and teachers with its
features and abilities.
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This research aims to further understanding in the field of continuous authentication using behavioural
biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing
Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed
machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KNN), and
Support Vector Classifier (SVC), to determine the authenticity of specific user actions. Our most robust
model was SVC, which achieved an average accuracy of approximately 90%, demonstrating that touch
dynamics can effectively distinguish users. However, further studies are needed to make it viable option
for authentication systems. You can access our dataset at the following
link:https://github.com/AuthenTech2023/authentech-repo
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
The security of Electric Vehicle (EV) charging has gained momentum after the increase in the EV adoption
in the past few years. Mobile applications have been integrated into EV charging systems that mainly use a
cloud-based platform to host their services and data. Like many complex systems, cloud systems are
susceptible to cyberattacks if proper measures are not taken by the organization to secure them. In this
paper, we explore the security of key components in the EV charging infrastructure, including the mobile
application and its cloud service. We conducted an experiment that initiated a Man in the Middle attack
between an EV app and its cloud services. Our results showed that it is possible to launch attacks against
the connected infrastructure by taking advantage of vulnerabilities that may have substantial economic and
operational ramifications on the EV charging ecosystem. We conclude by providing mitigation suggestions
and future research directions.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm
for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a
software framework used with Apache Hadoop, which has become the de facto standard platform for
processing and storing large amounts of data in a distributed computing environment. The research
presented here focuses on the variations observed among the results of an efficient iterative transitive
closure algorithm when run against different distributed environments. The results from these comparisons
were validated against the benchmark results from OYSTER, an open source Entity Resolution system. The
experiment results highlighted the inconsistencies that can occur when using the same codebase with
different implementations of Map Reduce.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Home assignment II on Spectroscopy 2024 Answers.pdf
Detection of Fake Accounts in Instagram Using Machine Learning
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
DOI: 10.5121/ijcsit.2019.11507 83
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING
MACHINE LEARNING
Ananya Dey1
, Hamsashree Reddy2
, Manjistha Dey3
and Niharika Sinha4
1
National Institute of Technology, Tiruchirappalli, India
2
PES University, Bangalore, India
3
RV College of Engineering, Bangalore, India
4
Manipal Institute of Technology, Karnataka, India
ABSTRACT
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of
information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted
towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in
cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media
such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social
network, creation of user accounts in these sites usually needs just an email-id. A real life person can create
multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and
regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s
license), in the virtual world of social media, admission does not require any such checks. In this paper, we study
the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine
Learning techniques namely Logistic Regression and Random Forest Algorithm.
KEYWORDS
Logistic Regression, Random Forest Algorithm, median imputation, Maximum likelihood estimation, k cross
validation, overfitting, out of bag data, recall, identity theft, Angler phishing.
1. INTRODUCTION
Instagram is an online photo and video sharing social networking platform that has been available on both
Android and iOS since 2012. As of May 2019, there are over a billion users registered on Instagram.
In the recent years, Instagram has been found to be using third party apps, called bots. While these can
definitely impersonate a user and tarnish their reputation leading to ‘identity theft’, there has also been
greater instances of malicious ways of promoting the brand image of a company known as “influencer
marketing”. These days a number of businesses are using social media to heed to their customers’ needs
which has led to yet another malpractice called Angler phishing. All these malpractices have made it vital
to implement strong fraud detection techniques and hence we propose our solution.
2. LITERATURE SURVEY
Previously a lot of work has been done on other platforms like Facebook and Twitter, but not much work
has been done for Instagram. Each of these Social Medias are different in terms of features that have to be
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
84
considered, strategies used, etc. Few of the past work include [1] Worth its Weight in Likes: Towards
Detecting Fake Likes on Instagram: This particular paper concentrates on analysing likes and identifying
the genuine ones to reduce the effect of fake likes on Instagram influencer market. They used a simple
feed-forward neural network Multi-Layer Perceptron (MLP) which obtained a precision about 83%. [2]
Identifying Fake Profiles in LinkedIn: A number of features were considered to train the dataset using
neural networks, SVMs, and principal component analysis. The precision rate achieved was 84%. [3]
Detection of Fake Profiles in Social Media: This is a literature review to detecting fake social media
accounts classified into the approaches aimed on analysing individual accounts. So our 2 proposed
method is a novel approach in terms of the platform chosen and the algorithms used like Random Forest
for classification.
Figure1: Proposed System
3. MATERIALS AND METHODS
In this section we present the materials and methods used for the research work.
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
85
Data set information:
The dataset has been taken from https://www.kaggle.com/free4ever1/instagram-fake-spammer-genuine-
accounts.It consists of two CSV files- train.csv (19 KB) and test.csv (4 KB) .The dependent variable,
which is whether it is a fake or not fake account is categorical and it takes two values 0 (not fake) and 1
(fake) profile. The distribution of the training dataset is such that 50% is fake and the rest 50% is
legitimate. Below is a table to denote the parameters that have been considered (denoted in column
Profile feature), their range of values, each of their mean values and what each of the features denote.1.
Collecting the IP addresses which will be used as the training dataset
Table 1: Dataset Features
Figure 2: Snapshot of Training Dataset
Exploratory Data Analysis: This is a critical process of initial data investigation done so as to discover
patterns in the dataset and spot anomalies with the help of summary statistics and graphical
representation. Below are the various sub processes that were done.
a. Missing Value Treatment
The given dataset had no missing value. Missing values could occur in a dataset due to mostly real world
problems and can be treated either through deletion or imputation. The presence of missing values
reduces the data available to be analysed, compromising on the statistical power of the study, and
eventually the reliability of its results.
b. Outlier Detection
Outliers are extreme values that deviate from the usual data values in the dataset. If outliers are present in
the dataset, then the accuracy is reduced significantly as the training dataset learns from this noise in the
data and could give an over-fit model. After careful analysis using graphs, we conclude that the following
features had outliers present in them- nums/length username, full name words, description length, #posts,
#followers and #follows. To deal with these outliers, we used median imputation by calculating the
median of these set of values. Note that we do not include the outliers in the median calculation. Once we
get the median, we replace all the outliers with the calculated median value.
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
86
c. Bivariate Analysis
This is done to understand the relationship between two variables and the strength of association between
them. We calculated the correlation matrix and concluded absence of high multicollinearity between the
variables. This is one of the assumptions before building a logistic regression model.
Once data pre-processing is done, we can safely move into the algorithms. We have been provided with a
labelled training dataset and can therefore proceed with applying supervised learning algorithms that map
the input to the output .For the scope of this paper we have considered two commonly used classification
algorithms, i.e, Logistic Regression and the Random Forest algorithm. Each of them has been explained
in depth below.
1. Logistic Regression
The assumptions of this model include absence of outliers in the dataset and absence of high correlations
between the predictors, which have been taken care of in the preceding steps. In logistic regression, the
probabilities predicted using the logit function. The values greater than or equal to the decision boundary
belong to one class while the values lower than it belong to the other.
We first run the GLM function in R to perform regression and find out the beta coefficients and p values
for each of the features. The Beta coefficient, which is calculated based on maximum likelihood
estimation, is an indicator of how strongly the predictor variable indicates the dependent variable. Based
on the p values, we remove those variables whose 5 values are greater than 0.05 and re run the model.
Finally we performed K cross validation to check overfitting
Figure 3: Model Summery
2. Random forest Algorithm
This is a regression model performing well on classification model. Since there are a very few
assumptions attached to it, so data preparation is less challenging. We used the random Forest function in
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
87
R and set the n-tree parameter (denoting the number of trees) as 500 and the number of variables tried at
each split as 3. Random Record Selection is the first task in which each tree is trained on ⅔ of the total
training data and some variables are selected at random(say m) out of all the variables and these m
variables are used to split the node. For each tree, using the leftover (36.8%) data, the misclassification
rate is calculated. This gives us the Out Of Bag (OOB) error rate. The forest chooses the classification
having the most votes over all the trees in the forest. This is the RF score and the percent YES votes
received is the predicted probability.
4. RESULTS AND ANALYSIS
In this section we determine the results of the two classification models.
After creating the models using the training datasets, we apply the models on unseen data, i.e., and the
test dataset. We create the confusion matrix based on these and calculate various performance parameters
as discussed below.
1. Logistic Regression
Figure 4: Confusion matrix for Logistic Regression (T = True, F=False, P=Positive, N=Negative)
We calculate the different model metrics as follows-
Precision- We divide the total number of correctly classified positive examples by the total
number of predicted positive examples.
Precision= 𝑇𝑃 / (𝑇𝑃+𝐹𝑃) = 57/ (57+8) = 87.6 %
Recall- The ratio of the total number of correctly classified positive examples divide to the total
number of positive examples. Recall= 𝑇𝑃/((𝑇𝑃+𝐹𝑁) = 57/ (57+3) = 95 %
F1 score- It is the harmonic mean of recall and precision. F1 score= (2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙)/
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙) = 91.15
Accuracy- It is calculated using the following formula
(𝑇𝑃+𝑇𝑁) /(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁) = (57+52) / (57+52+8+3) = 90.8 %
Based on this curve, we infer that using k=3 will give optimal results.
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
88
Figure 5: ROC Curve for Logistic Regression
2. Random Forest Algorithm
Figure 6: Confusion matrix for Logistic Regression (T = True, F=False, P=Positive, N=Negative)
On a similar note, we calculate the various model metrics for Random forest algorithm.
Precision= 𝑇𝑃 / (𝑇𝑃+𝐹𝑃) = 55 / (55+4) = 93.2 %
Recall= 𝑇𝑃 / (𝑇𝑃+𝐹𝑁) = 55/(55+5) = 91.6 %
F1 score= (2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙) / (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙) = 92.42
Accuracy= (𝑇𝑃+𝑇𝑁) / (𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁) = (55+56) / (55+56+4+5) = 92.5 %
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
89
Figure 7: Variable Importance Graph
5. CONCLUSIONS
While going through previous similar research conducted on detection of fake profiles on social media
platforms, we realized that not a lot has been done on Instagram as a social network platform in particular.
Hence, we targeted our approach for the same. In this paper, we introduced a novel approach for detecting
fake user profiles on Instagram based on certain features using concepts of machine learning. We used
two models for this- Logistic Regression and Random Forest algorithms, achieving an accuracy of 90.8%
and 92.5% respectively. Such high accuracies have not been attained in previous work conducted for
other social media platforms (highest accuracy achieved before this was 86%).
REFERENCES
[1] Indira Sen,Anupama Aggarwal,Shiven Mian.2018."Worth its Weight in Likes: Towards Detecting Fake Likes
on Instagram". In ACM International Conference on Information and Knowledge Management.
[2] Shalinda Adikari, Kaushik Dutta. 2014. “Identifying Fake Profiles In LinkedIn”. In Pacific Asia Conference
on Information Systems.
[3] Aleksei Romanov, Alexander Semenov, Oleksiy Mazhelis and Jari Veijalainen.2017. "Detection of Fake
Profiles in Social Media”. In 13th International Conference on Web Information Systems and Technologies.
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 11, No 5, October 2019
90
[4] https://telecom.economictimes.indiatimes.com/news/india-saw-457-rise-in-cybercrime-in-fiveyears-
study/67455224
[5] Todor Mihaylov, Preslav Nakov.2016. "Hunting for Troll Comments in News Community Forums". In
Association for Computational Linguistics.
[6] Ml-cheatsheet.readthedocs.io. (2019). Logistic Regression — ML Cheatsheet documentation. [Online]
Available at: https://ml cheatsheet.readthedocs.io/en/latest/logistic_regression.html#binarylogistic-regression
[Accessed 10 Jun. 2019].
[7] 3. Schoonjans, F. (2019). ROC curve analysis with MedCalc. [Online] MedCalc. Available at:
https://www.medcalc.org/manual/roc-curves.php [Accessed 10 Jun. 2019].
[8] Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre,B.S., 2011. Social media? Get serious!
Understanding the functional building blocks of social media. Bus.Horiz., SPECIAL ISSUE: SOCIAL
MEDIA 54, 241251. doi:10.1016/j.bushor.2011.01.005.
[9] Krombholz, K., Hobel, H., Huber, M., Weippl, E., 2015.Advanced Social Engineering Attacks. J Inf
SecurAppl 22, 113–122. doi:10.1016/j.jisa.2014.09.005.