The rapid growth of Internet has revolutionized online news reporting. Many users tend to use online news websites to obtain news information. When considering Sri Lanka, there are numerous news websites, which are subscribed on a daily basis. With the rise in this number of news websites, the Sri Lankan authorities of media face the issue of lacking a proper methodology or a tool which is capable of tracking
and regulating publications made by different disseminators of news. This paper proposes a News Agent toolbox which periodically extracts news articles and associated comments with the aid of a concept called Mapping Rules; to classify them into Personalized Categories defined in terms of keywords based Category Profiles. The proposed tool also analyzes comments made by the readers with the aid of simple statistical techniques to discover the most popular news articles and fluctuations in popularity of news stories.
Ins and Outs of News Twitter as a Real-Time News Analysis ServiceArjumand Younus
Paper approved to Workshop on Visual Interfaces to the Social and Semantic Web, 2011 (VISSW 2011) in conjunction with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University, Palo Alto.
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
1. The document proposes a framework called SociRank to identify and rank prevalent news topics using social media factors.
2. SociRank identifies topics prevalent in both social media and news media, and then ranks them based on their media focus in news, user attention in social media, and user interaction regarding the topic.
3. The experiments show that SociRank improves the quality and variety of automatically identified news topics compared to other topic identification and ranking methods.
Web search engines help users find useful information on the WWW. However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information without any user effort. Such search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history. There are three possible types of web search system which can provide personalized information: (1) systems using relevance feedback, (2) systems in which users register their interest, and (3) systems that recommend information based on user’s history. In first technique, users have to provide feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique is best in which users don’t have to give explicit rating; relevancy automatically tracked by user behavior with search results and history of data usage. It doesn’t require registration of interests; it captures changing interests of user dynamically by itself. The result section shows that user’s browsing history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results.
Use of tf idf for classification and distribution of e-news paper based on t...IAEME Publication
The document describes an e-newspaper personalization system that classifies news articles and recommends personalized news to users based on their profiles. It does this in two steps:
1. It first classifies news articles from a database into categories like sports, business, etc. using a Vector Space Model classifier. Each article is represented as a word vector and assigned to the category with the most matching words.
2. It then retrieves and displays news articles to the user based on their profile preferences. The system presents a personalized newspaper containing only articles from the categories the user is interested in. This allows users to quickly access relevant news without searching through various categories.
The document evaluates the system using a 20 News
Abstract— Relationships are there between objects in Wikipedia. Emphases on determining relationships are there between pairs of objects in Wikipedia whose pages can be regarded as separate objects. Two classes of relationships between two objects exist there in Wikipedia, an explicit relationship is illustrated by a single link between the two pages for the objects, and the other implicit relationship is illustrated by a link structure containing the two pages. Some of the before proposed techniques for determining relationships are cohesion-based techniques, and this technique which underestimate objects containing higher degree values and also such objects could be significant in constituting relationships in Wikipedia. The other techniques are inadequate for determining implicit relationships because they use only one or two of the following three important factor such as the distance, the connectivity and the cocitation.
IRJET - Political Orientation Prediction using Social Media ActivityIRJET Journal
This document discusses research into predicting the political orientation of Twitter users based on their social media activity. The researchers aim to incorporate factors like tweets, retweets, followers, followees, and network connections to better understand how political views are expressed and shaped on Twitter. Prior studies that have analyzed political bias in media outlets and the spread of information across partisan networks on social media are reviewed. The researchers describe collecting and analyzing Twitter data including tweets, retweets, mentions, followers and who a user follows to predict individual users' political leanings.
This document summarizes a research paper that proposes using a logistic regression classifier trained with stochastic gradient descent to predict Twitter users' personalities from their tweets. It begins with an abstract of the paper and an introduction on personality prediction from social media. It then provides more detail on the anatomy of the research, including defining personality prediction from Twitter, its applications, and the general process of using machine learning for the task. Next, it reviews several previous studies on personality prediction from Twitter and social networks, noting their approaches, findings and limitations. It identifies remaining research gaps, such as the need for improved linguistic analysis of tweets and more robust/scalable predictive models. Finally, it proposes using a logistic regression classifier as the personality prediction model to address
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRYKaustubh Nale
This paper proposes the importance of
social media analysis in supply chain management in the
food industry. In this analysis, the social media platform
(Twitter) is used to obtain information. In this approach,
two different software (Nodexl and Nvivo) are used to
conduct data mining and text analysis. The outcome of this
analysis will help researchers to make decisions based on
customer feedback.
Ins and Outs of News Twitter as a Real-Time News Analysis ServiceArjumand Younus
Paper approved to Workshop on Visual Interfaces to the Social and Semantic Web, 2011 (VISSW 2011) in conjunction with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University, Palo Alto.
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
1. The document proposes a framework called SociRank to identify and rank prevalent news topics using social media factors.
2. SociRank identifies topics prevalent in both social media and news media, and then ranks them based on their media focus in news, user attention in social media, and user interaction regarding the topic.
3. The experiments show that SociRank improves the quality and variety of automatically identified news topics compared to other topic identification and ranking methods.
Web search engines help users find useful information on the WWW. However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information without any user effort. Such search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history. There are three possible types of web search system which can provide personalized information: (1) systems using relevance feedback, (2) systems in which users register their interest, and (3) systems that recommend information based on user’s history. In first technique, users have to provide feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique is best in which users don’t have to give explicit rating; relevancy automatically tracked by user behavior with search results and history of data usage. It doesn’t require registration of interests; it captures changing interests of user dynamically by itself. The result section shows that user’s browsing history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results.
Use of tf idf for classification and distribution of e-news paper based on t...IAEME Publication
The document describes an e-newspaper personalization system that classifies news articles and recommends personalized news to users based on their profiles. It does this in two steps:
1. It first classifies news articles from a database into categories like sports, business, etc. using a Vector Space Model classifier. Each article is represented as a word vector and assigned to the category with the most matching words.
2. It then retrieves and displays news articles to the user based on their profile preferences. The system presents a personalized newspaper containing only articles from the categories the user is interested in. This allows users to quickly access relevant news without searching through various categories.
The document evaluates the system using a 20 News
Abstract— Relationships are there between objects in Wikipedia. Emphases on determining relationships are there between pairs of objects in Wikipedia whose pages can be regarded as separate objects. Two classes of relationships between two objects exist there in Wikipedia, an explicit relationship is illustrated by a single link between the two pages for the objects, and the other implicit relationship is illustrated by a link structure containing the two pages. Some of the before proposed techniques for determining relationships are cohesion-based techniques, and this technique which underestimate objects containing higher degree values and also such objects could be significant in constituting relationships in Wikipedia. The other techniques are inadequate for determining implicit relationships because they use only one or two of the following three important factor such as the distance, the connectivity and the cocitation.
IRJET - Political Orientation Prediction using Social Media ActivityIRJET Journal
This document discusses research into predicting the political orientation of Twitter users based on their social media activity. The researchers aim to incorporate factors like tweets, retweets, followers, followees, and network connections to better understand how political views are expressed and shaped on Twitter. Prior studies that have analyzed political bias in media outlets and the spread of information across partisan networks on social media are reviewed. The researchers describe collecting and analyzing Twitter data including tweets, retweets, mentions, followers and who a user follows to predict individual users' political leanings.
This document summarizes a research paper that proposes using a logistic regression classifier trained with stochastic gradient descent to predict Twitter users' personalities from their tweets. It begins with an abstract of the paper and an introduction on personality prediction from social media. It then provides more detail on the anatomy of the research, including defining personality prediction from Twitter, its applications, and the general process of using machine learning for the task. Next, it reviews several previous studies on personality prediction from Twitter and social networks, noting their approaches, findings and limitations. It identifies remaining research gaps, such as the need for improved linguistic analysis of tweets and more robust/scalable predictive models. Finally, it proposes using a logistic regression classifier as the personality prediction model to address
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRYKaustubh Nale
This paper proposes the importance of
social media analysis in supply chain management in the
food industry. In this analysis, the social media platform
(Twitter) is used to obtain information. In this approach,
two different software (Nodexl and Nvivo) are used to
conduct data mining and text analysis. The outcome of this
analysis will help researchers to make decisions based on
customer feedback.
IRJET- Sentiment Analysis using Machine LearningIRJET Journal
This document discusses sentiment analysis of social media posts using machine learning. The authors aim to classify social media posts as having either a political or non-political sentiment. A dictionary of keywords and their sentiments is created to analyze posts. Users can make posts that are then classified, and an admin can hide or delete posts containing harmful keywords. Graphs are generated to analyze the classification of posts as political versus non-political and by sentiment. The accuracy of classification depends on the training data and dictionary.
Social media-based systems: an emerging area of information systems research ...Nurhazman Abdul Aziz
This article presents a review of the social media-based systems; an emerging
area of information system research, design, and practice shaped by social media phenomenon. Social media-based system (SMS) is the application of a wider range of social software and social media phenomenon in organizational and non-organization context to facilitate every day interactions. To characterize SMS, a total of 274 articles (published during 2003–2011) were analyzed that were classified as computer science information system related in the Web of Science data base and had at least one social media phenomenon related keyword—social media; social network analysis; social network; social network site; and social network system. As a result, we found four main research streams in SMS research dealing with: (1) organizational aspect of SMS, (2) non-organizational aspect of SMS, (3) technical aspect of SMS, and (4) social as a tool. The results indicates that SMS research is fragmented and has not yet found way into the core IS journals, however, it is diverse and interdisciplinary in nature. We also proposed that unlike the
conventional and socio-technical IS where information is bureaucratic, formal, bounded within the intranet, and tightly controlled by organizations; in the SMS context, information is social, informal, boundary-less (i.e. boundary is within the internet), has less control, and more sharing of information may lead to higher value/impact.
IJRET : International Journal of Research in Engineering and TechnologyImprov...eSAT Publishing House
This document summarizes techniques for improving web search results through web personalization. It discusses how web usage mining can be used to optimize information by monitoring user interaction histories and profiles. The proposed system aims to reduce manual user feedback by implicitly gathering preferences from behaviors like click-through rates and dwell times. It introduces an algorithm that calculates new ranking values for websites based on keyword matches and time spent on pages, and swaps ranks accordingly. This system provides personalized search results by continuously updating rankings based on implicit user interactions.
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALijcsa
Search engines today are retrieving more than a few thousand web pages for a single query, most of which
are irrelevant. Listing results according to user needs is, therefore, a very real necessity. The challenge lies
in ordering retrieved pages and presenting them to users in line with their interests. Search engines,
therefore, utilize page rank algorithms to analyze and re-rank search results according to the relevance of
the user’s query by estimating (over the web) the importance of a web page. The proposed work
investigates web page ranking methods and recently-developed improvements in web page ranking.
Further, a new content-based web page rank technique is also proposed for implementation. The proposed
technique finds out how important a particular web page is by evaluating the data a user has clicked on, as
well as the contents available on these web pages. The results demonstrate the effectiveness of the proposed
page ranking technique and its efficiency.
Classification of instagram fake users using supervised machine learning algo...IJECEIAES
On Instagram, the number of followers is a common success indicator. Hence, followers selling services become a huge part of the market. Influencers become bombarded with fake followers and this causes a business owner to pay more than they should for a brand endorsement. Identifying fake followers becomes important to determine the authenticity of an influencer. This research aims to identify fake users' behavior and proposes supervised machine learning models to classify authentic and fake users. The dataset contains fake users bought from various sources, and authentic users. There are 17 features used, based on these sources: 6 metadata, 3 media info, 2 engagement, 2 media tags, 4 media similarity Five machine learning algorithms will be tested. Three different approaches of classification are proposed, i.e. classification to 2-classes and 4-classes, and classification with metadata. Random forest algorithm produces the highest accuracy for the 2-classes (authentic, fake) and 4-classes (authentic, active fake user, inactive fake user, spammer) classification, with accuracy up to 91.76%. The result also shows that the five metadata variables, i.e. number of posts, followers, biography length, following, and link availability are the biggest predictors for the users class. Additionally, descriptive statistics results reveal noticeable differences between fake and authentic users.
E-PROCUREMENT ADOPTION AND PROCUREMENT ROCESS IN RWANDA: A CASE OF MUHIMA, SH...AkashSharma618775
This paper assessed the e-procurement adoption and procurement process, a case of District hospitals in
Rwanda . This was guided by three specific objectives: to establish the extent of ICT applications on the District’s
hospitals procurement processes; to determine the challenges of e-procurement adoption on the procurement
processes in District hospitals in Rwanda; to determine relationship between the individual factors and
Procurement process in District hospitals in Rwanda. The target population of the study was comprised of 54
employees of District hospitals in Rwanda. The study used a purposive sampling technique. With purposive
sampling technique as a type of non-probability sampling, the researcher uses his /her own judgment about which
respondents to choose, and picks only those who meet the purpose of the study.Through structured
questionnaires,the researcher was able to collect primary data. Secondary data were collected from relevant
literature review, business magazines, journals, internet and other relevant materials. The study revealed that for
87.5% respondents, no relation were between procurement officers and Government suppliers.The study also
revealed that 75% of respondents agreed to have received products of good quality at lower costs.The study found
that majority of the respondents acknowledged that ICT applications like e –procurement in use in the District
Hospitals had influenced the procurement processes. In regards to finding out the relationship between individual
factors and procurement process the study finds that the relationship between the individual factors and
Procurement process in District hospitals in Rwanda it was revealed that there is a positive relationship between
human resource comptency in e-procurement and performance of contractors since r= .220, with p-value of 0.001
which is less than 0.01 as the significance level. The major challenge found in the study was that of High
introduction costs for new solutions such as access to bandwidth and enterprise resource management systems that
are key to implementing e-business. Organizations willing to implement e-procurement should therefore invest
into structures and processes necessary for e-procurement implemention. Finally, the research found that for ICT
to be easily implemented, information systems have to be set up by all players in the procurement procedures,
structures will have to be invested on and processes standardized. system in their procurement processes.
Abstract: Privacy is one of the friction points that emerge when communications get mediated in Online Social Networks (OSNs). Different communities of computer science researchers have framed the ‘OSN privacy problem’ as one of surveillance, institutional or social privacy. In this article, first we provide an introduction to the surveillance and social privacy perspectives emphasizing the narratives that inform them, as well as their assumptions and goals. This paper mainly addresses visitors events (population) on an users account and updates the account holders log information. And thus the evolutionary aspects of Surveillance are reflected in User's Log, this needs the implementation of Genetic Algorithm. Further, this requires a bridge module between every interaction between the user and social network server. This paper implements mutation aspects through Genetic Algorithm by differing users into Guests and Friends, and identifies and Cross Over issues of a guest Clicking Friend of a friend.
CAS and SDI are types of current awareness services that aim to keep users informed of new developments in their fields. CAS disseminates information to all users on a topic, while SDI provides personalized, targeted information to individuals based on their specific interests. SDI involves creating user profiles that are matched to document profiles to select only the most relevant new information for each user. Both services rely on scanning current literature sources, but SDI uses computers to automate the selection and notification process, providing a more precise service than general CAS updates. The goal of both is to save users' time by bringing new relevant information to their attention in a timely manner.
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
Current awareness service a contemporary issue in digital era - anil mishraAnil Mishra
Current Awareness Services (CAS) provide important information to keep professionals informed in their fields. Traditionally, CAS involved selecting and disseminating newly available documents. With digital technologies, CAS delivery has shifted to be more personalized and timely. Effective CAS know the topics, users, information sources, and deliver the right information to the right user in the right format in a reliable and cost-effective manner. Common forms of CAS discussed include current awareness lists, selective dissemination of information, press clippings, research in progress announcements, and electronic methods like newsletters, blogs, RSS feeds, and mobile alerts.
This document discusses information behaviors in the U.S. intelligence community. It notes that intelligence analysts experience information overload due to the vast amounts of data they must process each day from numerous sources. This overload can compromise their efficiency and ability to identify threats in a timely manner. The document also examines issues between different levels of government in the intelligence community, such as a lack of consistent training and information sharing between federal, state, and local agencies. It proposes applying theories of information behavior from library and information science, such as minimizing effort, to help analysts better manage information overload.
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...IOSR Journals
This document provides a literature survey of approaches for ranking tagged web documents in social bookmarking systems. It begins with background on social information systems, folksonomy, and social bookmarking systems. The main body of the document reviews six categories of approaches for ranking documents: personomy based techniques, frequency and similarity based techniques, structure-based techniques, semantics-based techniques, cluster based techniques, and probability based techniques. Each category discusses several specific approaches that have been proposed, outlining the techniques and algorithms used. The document concludes by stating that while many techniques have been proposed, open questions still remain around improving search effectiveness in social bookmarking systems.
Seminar on current awareness service and selective disseminitionDILEEP_DS
SDI is a type of current awareness service meant to keep users abreast with the latest developments in their fields of interest. It is a personalized service that provides pinpointed information to individual users or groups based on their predefined information needs. SDI involves reviewing documents, selecting relevant information, and notifying users of matching items so they can stay well-informed on topics important to their work.
Analysis on Recommended System for Web Information Retrieval Using HMMIJERA Editor
Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The
number of users is continuously access the information over the internet. Web mining is an application of data
mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used
in the domain of web information mining is refers as web mining, that is further divided into three major
domains web uses mining, web content mining and web structure mining. The proposed work is intended to
work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation
pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data,
which method promises to provide much accurate recommendation.
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
This document describes a study that uses community detection models to identify prevalent news topics discussed on both Twitter and traditional media like BBC. It collects tweets and news articles about sports over a one-month period. Keywords are extracted from the data and a graph is constructed to represent relationships between words. Three community detection models - Girvan-Newman clustering, CLIQUE, and Louvain - are used to cluster similar content and detect communities of keywords representing news topics. The number of unique Twitter users engaged with each topic is also calculated to rank topics by user attention. The goal is to analyze how information is distributed between social and traditional media and identify emerging topics with low coverage in traditional sources.
Pestle based event detection and classificationeSAT Journals
Abstract Organizations use PESTLE classification as a tool for tracking the environment in which they are functioning and for launching plan of new product or service. It helps to give true view of the environment from different aspects. These aspects are essential for any business that organization may be in as it gives a clear picture one wishes to check and observe while contemplating on certain idea or plan. The PESTLE framework helps to understand the market dynamics and is also one of the pillars of strategic management of an enterprise that drives goal and strategy for them. PESTLE based event detection approach proposed in this paper would help for PESTLE analysis of any organization. It puts together all relevant factors in terms of detected events in one place and classifies them into separate buckets while taking current market situation into consideration. We accomplish this with the application of clustering technique and later training the classifier to classify the events in PESTLE format. Keywords: Event Detection, PESTLE Analysis, Twitter
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
Social Media Influence Analysis using Data Science TechniquesMuhammad Bilal
The major purpose of this literature search report is to demonstrate the usage of different tactics of data science to investigate impact of social media while considering the interaction between influences and their followers.
IRJET- An Improved Machine Learning for Twitter Breaking News Extraction ...IRJET Journal
This document discusses an improved machine learning approach for extracting breaking news from Twitter based on trending topics. The approach aims to filter tweets to remove irrelevant information, cluster similar tweets that relate to real-world events to identify breaking news stories, and dynamically rank the identified news stories over time for tracking. The approach is evaluated using different supervised text classification algorithms to classify tweets as news or not and a density-based clustering algorithm to group related tweets.
IRJET- Event Detection and Text Summary by Disaster WarningIRJET Journal
The document proposes two models: 1) The Hot Event Evolution (HEE) model which uses short text data and user interests to detect and track the evolution of hot events on social media. 2) The IncreSTS (Incremental Short Text Summarization) algorithm which can incrementally cluster and summarize comment streams on social networks. The HEE model improves on existing event detection methods by considering how user interests change during event evolution. The IncreSTS algorithm aims to help users understand comment streams in real-time without reading all comments. Both models were found to achieve high efficiency, accuracy and scalability.
In this contribution, we develop an accurate and effective event detection method to detect events from a
Twitter stream, which uses visual and textual information to improve the performance of the mining
process. The method monitors a Twitter stream to pick up tweets having texts and images and stores them
into a database. This is followed by applying a mining algorithm to detect an event. The procedure starts
with detecting events based on text only by using the feature of the bag-of-words which is calculated using
the term frequency-inverse document frequency (TF-IDF) method. Then it detects the event based on image
only by using visual features including histogram of oriented gradients (HOG) descriptors, grey-level cooccurrence
matrix (GLCM), and color histogram. K nearest neighbours (Knn) classification is used in the
detection. The final decision of the event detection is made based on the reliabilities of text only detection
and image only detection. The experiment result showed that the proposed method achieved high accuracy
of 0.94, comparing with 0.89 with texts only, and 0.86 with images only.
IRJET- Sentiment Analysis using Machine LearningIRJET Journal
This document discusses sentiment analysis of social media posts using machine learning. The authors aim to classify social media posts as having either a political or non-political sentiment. A dictionary of keywords and their sentiments is created to analyze posts. Users can make posts that are then classified, and an admin can hide or delete posts containing harmful keywords. Graphs are generated to analyze the classification of posts as political versus non-political and by sentiment. The accuracy of classification depends on the training data and dictionary.
Social media-based systems: an emerging area of information systems research ...Nurhazman Abdul Aziz
This article presents a review of the social media-based systems; an emerging
area of information system research, design, and practice shaped by social media phenomenon. Social media-based system (SMS) is the application of a wider range of social software and social media phenomenon in organizational and non-organization context to facilitate every day interactions. To characterize SMS, a total of 274 articles (published during 2003–2011) were analyzed that were classified as computer science information system related in the Web of Science data base and had at least one social media phenomenon related keyword—social media; social network analysis; social network; social network site; and social network system. As a result, we found four main research streams in SMS research dealing with: (1) organizational aspect of SMS, (2) non-organizational aspect of SMS, (3) technical aspect of SMS, and (4) social as a tool. The results indicates that SMS research is fragmented and has not yet found way into the core IS journals, however, it is diverse and interdisciplinary in nature. We also proposed that unlike the
conventional and socio-technical IS where information is bureaucratic, formal, bounded within the intranet, and tightly controlled by organizations; in the SMS context, information is social, informal, boundary-less (i.e. boundary is within the internet), has less control, and more sharing of information may lead to higher value/impact.
IJRET : International Journal of Research in Engineering and TechnologyImprov...eSAT Publishing House
This document summarizes techniques for improving web search results through web personalization. It discusses how web usage mining can be used to optimize information by monitoring user interaction histories and profiles. The proposed system aims to reduce manual user feedback by implicitly gathering preferences from behaviors like click-through rates and dwell times. It introduces an algorithm that calculates new ranking values for websites based on keyword matches and time spent on pages, and swaps ranks accordingly. This system provides personalized search results by continuously updating rankings based on implicit user interactions.
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALijcsa
Search engines today are retrieving more than a few thousand web pages for a single query, most of which
are irrelevant. Listing results according to user needs is, therefore, a very real necessity. The challenge lies
in ordering retrieved pages and presenting them to users in line with their interests. Search engines,
therefore, utilize page rank algorithms to analyze and re-rank search results according to the relevance of
the user’s query by estimating (over the web) the importance of a web page. The proposed work
investigates web page ranking methods and recently-developed improvements in web page ranking.
Further, a new content-based web page rank technique is also proposed for implementation. The proposed
technique finds out how important a particular web page is by evaluating the data a user has clicked on, as
well as the contents available on these web pages. The results demonstrate the effectiveness of the proposed
page ranking technique and its efficiency.
Classification of instagram fake users using supervised machine learning algo...IJECEIAES
On Instagram, the number of followers is a common success indicator. Hence, followers selling services become a huge part of the market. Influencers become bombarded with fake followers and this causes a business owner to pay more than they should for a brand endorsement. Identifying fake followers becomes important to determine the authenticity of an influencer. This research aims to identify fake users' behavior and proposes supervised machine learning models to classify authentic and fake users. The dataset contains fake users bought from various sources, and authentic users. There are 17 features used, based on these sources: 6 metadata, 3 media info, 2 engagement, 2 media tags, 4 media similarity Five machine learning algorithms will be tested. Three different approaches of classification are proposed, i.e. classification to 2-classes and 4-classes, and classification with metadata. Random forest algorithm produces the highest accuracy for the 2-classes (authentic, fake) and 4-classes (authentic, active fake user, inactive fake user, spammer) classification, with accuracy up to 91.76%. The result also shows that the five metadata variables, i.e. number of posts, followers, biography length, following, and link availability are the biggest predictors for the users class. Additionally, descriptive statistics results reveal noticeable differences between fake and authentic users.
E-PROCUREMENT ADOPTION AND PROCUREMENT ROCESS IN RWANDA: A CASE OF MUHIMA, SH...AkashSharma618775
This paper assessed the e-procurement adoption and procurement process, a case of District hospitals in
Rwanda . This was guided by three specific objectives: to establish the extent of ICT applications on the District’s
hospitals procurement processes; to determine the challenges of e-procurement adoption on the procurement
processes in District hospitals in Rwanda; to determine relationship between the individual factors and
Procurement process in District hospitals in Rwanda. The target population of the study was comprised of 54
employees of District hospitals in Rwanda. The study used a purposive sampling technique. With purposive
sampling technique as a type of non-probability sampling, the researcher uses his /her own judgment about which
respondents to choose, and picks only those who meet the purpose of the study.Through structured
questionnaires,the researcher was able to collect primary data. Secondary data were collected from relevant
literature review, business magazines, journals, internet and other relevant materials. The study revealed that for
87.5% respondents, no relation were between procurement officers and Government suppliers.The study also
revealed that 75% of respondents agreed to have received products of good quality at lower costs.The study found
that majority of the respondents acknowledged that ICT applications like e –procurement in use in the District
Hospitals had influenced the procurement processes. In regards to finding out the relationship between individual
factors and procurement process the study finds that the relationship between the individual factors and
Procurement process in District hospitals in Rwanda it was revealed that there is a positive relationship between
human resource comptency in e-procurement and performance of contractors since r= .220, with p-value of 0.001
which is less than 0.01 as the significance level. The major challenge found in the study was that of High
introduction costs for new solutions such as access to bandwidth and enterprise resource management systems that
are key to implementing e-business. Organizations willing to implement e-procurement should therefore invest
into structures and processes necessary for e-procurement implemention. Finally, the research found that for ICT
to be easily implemented, information systems have to be set up by all players in the procurement procedures,
structures will have to be invested on and processes standardized. system in their procurement processes.
Abstract: Privacy is one of the friction points that emerge when communications get mediated in Online Social Networks (OSNs). Different communities of computer science researchers have framed the ‘OSN privacy problem’ as one of surveillance, institutional or social privacy. In this article, first we provide an introduction to the surveillance and social privacy perspectives emphasizing the narratives that inform them, as well as their assumptions and goals. This paper mainly addresses visitors events (population) on an users account and updates the account holders log information. And thus the evolutionary aspects of Surveillance are reflected in User's Log, this needs the implementation of Genetic Algorithm. Further, this requires a bridge module between every interaction between the user and social network server. This paper implements mutation aspects through Genetic Algorithm by differing users into Guests and Friends, and identifies and Cross Over issues of a guest Clicking Friend of a friend.
CAS and SDI are types of current awareness services that aim to keep users informed of new developments in their fields. CAS disseminates information to all users on a topic, while SDI provides personalized, targeted information to individuals based on their specific interests. SDI involves creating user profiles that are matched to document profiles to select only the most relevant new information for each user. Both services rely on scanning current literature sources, but SDI uses computers to automate the selection and notification process, providing a more precise service than general CAS updates. The goal of both is to save users' time by bringing new relevant information to their attention in a timely manner.
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
Current awareness service a contemporary issue in digital era - anil mishraAnil Mishra
Current Awareness Services (CAS) provide important information to keep professionals informed in their fields. Traditionally, CAS involved selecting and disseminating newly available documents. With digital technologies, CAS delivery has shifted to be more personalized and timely. Effective CAS know the topics, users, information sources, and deliver the right information to the right user in the right format in a reliable and cost-effective manner. Common forms of CAS discussed include current awareness lists, selective dissemination of information, press clippings, research in progress announcements, and electronic methods like newsletters, blogs, RSS feeds, and mobile alerts.
This document discusses information behaviors in the U.S. intelligence community. It notes that intelligence analysts experience information overload due to the vast amounts of data they must process each day from numerous sources. This overload can compromise their efficiency and ability to identify threats in a timely manner. The document also examines issues between different levels of government in the intelligence community, such as a lack of consistent training and information sharing between federal, state, and local agencies. It proposes applying theories of information behavior from library and information science, such as minimizing effort, to help analysts better manage information overload.
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...IOSR Journals
This document provides a literature survey of approaches for ranking tagged web documents in social bookmarking systems. It begins with background on social information systems, folksonomy, and social bookmarking systems. The main body of the document reviews six categories of approaches for ranking documents: personomy based techniques, frequency and similarity based techniques, structure-based techniques, semantics-based techniques, cluster based techniques, and probability based techniques. Each category discusses several specific approaches that have been proposed, outlining the techniques and algorithms used. The document concludes by stating that while many techniques have been proposed, open questions still remain around improving search effectiveness in social bookmarking systems.
Seminar on current awareness service and selective disseminitionDILEEP_DS
SDI is a type of current awareness service meant to keep users abreast with the latest developments in their fields of interest. It is a personalized service that provides pinpointed information to individual users or groups based on their predefined information needs. SDI involves reviewing documents, selecting relevant information, and notifying users of matching items so they can stay well-informed on topics important to their work.
Analysis on Recommended System for Web Information Retrieval Using HMMIJERA Editor
Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The
number of users is continuously access the information over the internet. Web mining is an application of data
mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used
in the domain of web information mining is refers as web mining, that is further divided into three major
domains web uses mining, web content mining and web structure mining. The proposed work is intended to
work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation
pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data,
which method promises to provide much accurate recommendation.
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
This document describes a study that uses community detection models to identify prevalent news topics discussed on both Twitter and traditional media like BBC. It collects tweets and news articles about sports over a one-month period. Keywords are extracted from the data and a graph is constructed to represent relationships between words. Three community detection models - Girvan-Newman clustering, CLIQUE, and Louvain - are used to cluster similar content and detect communities of keywords representing news topics. The number of unique Twitter users engaged with each topic is also calculated to rank topics by user attention. The goal is to analyze how information is distributed between social and traditional media and identify emerging topics with low coverage in traditional sources.
Pestle based event detection and classificationeSAT Journals
Abstract Organizations use PESTLE classification as a tool for tracking the environment in which they are functioning and for launching plan of new product or service. It helps to give true view of the environment from different aspects. These aspects are essential for any business that organization may be in as it gives a clear picture one wishes to check and observe while contemplating on certain idea or plan. The PESTLE framework helps to understand the market dynamics and is also one of the pillars of strategic management of an enterprise that drives goal and strategy for them. PESTLE based event detection approach proposed in this paper would help for PESTLE analysis of any organization. It puts together all relevant factors in terms of detected events in one place and classifies them into separate buckets while taking current market situation into consideration. We accomplish this with the application of clustering technique and later training the classifier to classify the events in PESTLE format. Keywords: Event Detection, PESTLE Analysis, Twitter
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
Social Media Influence Analysis using Data Science TechniquesMuhammad Bilal
The major purpose of this literature search report is to demonstrate the usage of different tactics of data science to investigate impact of social media while considering the interaction between influences and their followers.
IRJET- An Improved Machine Learning for Twitter Breaking News Extraction ...IRJET Journal
This document discusses an improved machine learning approach for extracting breaking news from Twitter based on trending topics. The approach aims to filter tweets to remove irrelevant information, cluster similar tweets that relate to real-world events to identify breaking news stories, and dynamically rank the identified news stories over time for tracking. The approach is evaluated using different supervised text classification algorithms to classify tweets as news or not and a density-based clustering algorithm to group related tweets.
IRJET- Event Detection and Text Summary by Disaster WarningIRJET Journal
The document proposes two models: 1) The Hot Event Evolution (HEE) model which uses short text data and user interests to detect and track the evolution of hot events on social media. 2) The IncreSTS (Incremental Short Text Summarization) algorithm which can incrementally cluster and summarize comment streams on social networks. The HEE model improves on existing event detection methods by considering how user interests change during event evolution. The IncreSTS algorithm aims to help users understand comment streams in real-time without reading all comments. Both models were found to achieve high efficiency, accuracy and scalability.
In this contribution, we develop an accurate and effective event detection method to detect events from a
Twitter stream, which uses visual and textual information to improve the performance of the mining
process. The method monitors a Twitter stream to pick up tweets having texts and images and stores them
into a database. This is followed by applying a mining algorithm to detect an event. The procedure starts
with detecting events based on text only by using the feature of the bag-of-words which is calculated using
the term frequency-inverse document frequency (TF-IDF) method. Then it detects the event based on image
only by using visual features including histogram of oriented gradients (HOG) descriptors, grey-level cooccurrence
matrix (GLCM), and color histogram. K nearest neighbours (Knn) classification is used in the
detection. The final decision of the event detection is made based on the reliabilities of text only detection
and image only detection. The experiment result showed that the proposed method achieved high accuracy
of 0.94, comparing with 0.89 with texts only, and 0.86 with images only.
This document provides a review of techniques, tools, and platforms for analyzing social media data. It discusses the types of social media data and formats available, as well as tools for accessing, cleaning, analyzing, and visualizing social media data. Some key challenges of social media research are the restricted access to comprehensive data sources, lack of tools for in-depth analysis without programming, and need for large data storage and computing facilities to support research at scale. The document provides a methodology and critique of current approaches and outlines requirements to better support social media research.
A Review: Text Classification on Social Media DataIOSR Journals
This document provides a review of different classifiers used for text classification on social media data. It discusses how social media data is often unstructured and contains users' opinions and sentiments. Various machine learning algorithms can be used to classify this social media text data, extracting meaningful information. The document focuses on describing Naive Bayes classifiers, which are commonly used for text classification tasks. It explains how Naive Bayes classifiers work by calculating the posterior probability that a document belongs to a certain class, based on applying Bayes' theorem with an independence assumption between features.
This document provides a review of different classifiers used for text classification on social media data. It discusses how social media data is often unstructured and contains users' opinions and sentiments. Various machine learning algorithms can be used to classify this social media text data, extracting meaningful information. The document focuses on describing Naive Bayes classifiers, which are commonly used for text classification tasks. It explains how Naive Bayes classifiers work by calculating the posterior probability that a document belongs to a certain class, based on applying Bayes' theorem with an independence assumption between features.
Detection of Fake News Using Machine LearningIRJET Journal
This document summarizes a literature review on detecting fake news using machine learning. It discusses how machine learning classifiers can be trained to automatically identify fake news. Specifically, it addresses three research questions: 1) Why machine learning is needed to detect fake news, 2) Which machine learning classifiers can be used, and 3) How the classifiers are trained. It finds that common classifiers like support vector machines, naive Bayes, logistic regression, random forests, and recurrent neural networks have been effectively used to detect fake news by analyzing content. Accuracy of the classifiers depends on how well they are trained on labeled datasets.
The document describes a proposed system for authenticating and summarizing news articles. The system prioritizes authenticating the news over summarization. If the news is determined to be at least 55% authentic based on analysis of keywords and content from authorized news sites, then it will generate a summary by breaking the news down into shorter, meaningful snippets. The system uses machine learning and artificial intelligence algorithms like Syntaxnet to extract phrases and analyze the news. It is implemented as a user-friendly web application using technologies like Python, Django, HTML, CSS and Bootstrap.
Terrorism Analysis through Social Media using Data MiningIRJET Journal
This document presents a study that uses deep learning models like Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) to analyze terrorism through detecting toxicity in social media text data. The study aims to classify text data into categories like toxicity, severe toxicity, obscenity, threat, insult or identity hate. It provides an overview of DNN and CNN models for text classification and compares their methodology, architecture and performance. The models are trained on preprocessed social media data related to terrorist activities and aim to accurately predict the toxicity level and classify tweets for concerned authorities to make informed decisions.
Fake News Detection Using Machine LearningIRJET Journal
This document proposes a machine learning approach for detecting fake news using support vector machines. It discusses preprocessing news data using techniques like TF-IDF, extracting features related to text, date, source and author, and training a support vector machine classifier on the preprocessed data. The proposed system architecture involves preprocessing, training a model on the training data, validating it on test data, adjusting parameters to improve accuracy, and then using the model to classify new unlabeled news. Prior research that used techniques like n-gram analysis, naive Bayes classifiers and linear support vector machines for fake news detection are also reviewed. The conclusion is that the proposed approach using support vector machines can help identify fake news effectively.
A Literature Survey on Recommendation Systems for Scientific Articles.pdfAmber Ford
This document summarizes a literature survey on recommendation systems for scientific articles. It begins by outlining problems faced by researchers, including information overload from searching large amounts of non-structured data. It then reviews different types of recommender systems, including content-based, collaborative, knowledge-based, semantic-based, and hybrid approaches. The objective of the survey is to develop a framework for a semantic-based recommender system that integrates ontologies to help researchers more efficiently find relevant scientific articles.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
Quantum Criticism: an Analysis of Political News Reportingmlaij
In this project, we continuously collect data from the RSS feeds of traditional news sources. We apply
several pre-trained implementations of named entity recognition (NER) tools, quantifying the success of
each implementation. We also perform sentiment analysis of each news article at the document, paragraph
and sentence level, with the goal of creating a corpus of tagged news articles that is made available to the
public through a web interface. We show how the data in this corpus could be used to identify bias in news
reporting, and also establish different quantifiable publishing patterns of left-leaning and right-leaning
news organisations.
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
This document proposes a method to predict election results using sentiment analysis of social media data. It involves collecting data from Twitter, Facebook, and Instagram using their APIs. The data will then be preprocessed by removing special characters and URLs. Popular machine learning algorithms like Naive Bayes and SVM will be trained on the preprocessed data to classify tweets as positive, negative, or neutral sentiment toward political parties. The classified tweets will then be analyzed to predict the outcome of elections.
Similar to INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS WEBSITES (20)
NEW ONTOLOGY RETRIEVAL IMAGE METHOD IN 5K COREL IMAGESijcax
This document summarizes a research paper that proposes a new ontology-based method for automatic image retrieval and annotation using 5,000 images from the Corel dataset. The method combines global and regional visual features with contextual relationships defined in an ontology. It creates a new ontology based on WordNet to semantically relate tags and reduce gaps between low-level features and high-level concepts. Experimental results show the proposed method increases annotation accuracy compared to other methods.
THE STUDY OF CUCKOO OPTIMIZATION ALGORITHM FOR PRODUCTION PLANNING PROBLEMijcax
Constrained Nonlinear programming problems are hard problems, and one of the most widely used and
common problems for production planning problem to optimize. In this study, one of the mathematical
models of production planning is survey and the problem solved by cuckoo algorithm. Cuckoo Algorithm is
efficient method to solve continues non linear problem. Moreover, mentioned models of production
planning solved with Genetic algorithm and Lingo software and the results will compared. The Cuckoo
Algorithm is suitable choice for optimization in convergence of solution
COMPARATIVE ANALYSIS OF ROUTING PROTOCOLS IN MOBILE AD HOC NETWORKSijcax
This document compares the performance of five routing protocols (AODV, DSR, DSDV, OLSR, DYMO) in mobile ad hoc networks through simulations. It summarizes each protocol and discusses the simulation setup. The protocols are categorized as reactive, proactive, or hybrid. Key performance metrics like packet delivery ratio, end-to-end delay, and routing load are evaluated under varying pause times using the NS-2 simulator. The analysis seeks to determine the best operational conditions for each protocol in mobile ad hoc networks.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school
students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and
science were studied and compared. The purpose of this research is to predict the academic major of high
school students using Bayesian networks. The effective factors have been used in academic major selection
for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on
each other, discretization data and processing them was performed by GeNIe. The proper course would be
advised for students to continue their education.
A Multi Criteria Decision Making Based Approach for Semantic Image Annotation ijcax
Automatic image annotation has emerged as an important research topic due to its potential application on
both image understanding and web image search. This paper presents a model, which integrates visual
topics and regional contexts to automatic image annotation. Regional contexts model the relationship
between the regions, while visual topics provide the global distribution of topics over an image. Previous
image annotation methods neglected the relationship between the regions in an image, while these regions
are exactly explanation of the image semantics, therefore considering the relationship between them are
helpful to annotate the images. Regional contexts and visual topics are learned by PLSA (Probability
Latent Semantic Analysis) from the training data. The proposed model incorporates these two types of
information by MCDM (Multi Criteria Decision Making) approach based on WSM (Weighted Sum
Method). Experiments conducted on the 5k Corel dataset demonstrate the effectiveness of the proposed
model.
On Fuzzy Soft Multi Set and Its Application in Information Systems ijcax
Research on information and communication technologies have been developed rapidly since it can be
applied easily to several areas like computer science, medical science, economics, environments,
engineering, among other. Applications of soft set theory, especially in information systems have been
found paramount importance. Recently, Mukherjee and Das defined some new operations in fuzzy soft
multi set theory and show that the De-Morgan’s type of results hold in fuzzy soft multi set theory with
respect to these newly defined operations. In this paper, we extend their work and study some more basic
properties of their defined operations. Also, we define some basic supporting tools in information system
also application of fuzzy soft multi sets in information system are presented and discussed. Here we define
the notion of fuzzy multi-valued information system in fuzzy soft multi set theory and show that every fuzzy
soft multi set is a fuzzy multi valued information system.
Web is a collection of inter-related files on one or more web servers while web mining means extracting
valuable information from web databases. Web mining is one of the data mining domains where data
mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer
behaviour, evaluate a particular website based on the information which is stored in web log files. Web
mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library.
Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase
the complexity of dealing information from different web service providers. The collection of information
becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR ijcax
The Pattern classification system classifies the pattern into feature space within a boundary. In case
adversarial applications use, for example Spam Filtering, the Network Intrusion Detection System (NIDS),
Biometric Authentication, the pattern classification systems are used. Spam filtering is an adversary
application in which data can be employed by humans to attenuate perspective operations. To appraise the
security issue related Spam Filtering voluminous machine learning systems. We presented a framework for
the experimental evaluation of the classifier security in an adversarial environments, that combines and
constructs on the arms race and security by design, Adversary modelling and Data distribution under
attack. Furthermore, we presented a SVM, LR and MILR classifier for classification to categorize email as
legitimate (ham) or spam emails on the basis of thee text samples.
Visually impaired people face many problems in their day to day lives. Among them, outdoor navigation is
one of the major concerns. The existing solutions based on Wireless Sensor Networks(WSN) and Global
Positioning System (GPS) track ZigBee units or RFID (Radio Frequency Identification) tags fixed on the
navigation system. The issues pertaining to these solutions are as follows: (1) It is suitable only when the
visually impaired person is commuting in a familiar environment; (2) The device provides only a one way
communication; (3) Most of these instruments are heavy and sometimes costly. Preferable solution would
be to make a system which is easy to carry and cheap.
The objective of this paper is to break down the technological barriers, and to propose a system by
developing an Android App which would help a visually impaired person while traveling via the public
transport system like Bus. The proposed system uses an inbuilt feature of smart phone such as GPS
location tracker to track the location of the user and Text to Speech converter. The system also integrates
Google Speech to Text converter for capturing the voice input and converts them to text. This system
recommends the requirement of installing a GPS module in buses for real time tracking. With minor
modification, this App can also help older people for independent navigation.
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...ijcax
The rapid growth of Internet has revolutionized online news reporting. Many users tend to use online news
websites to obtain news information. When considering Sri Lanka, there are numerous news websites,
which are subscribed on a daily basis. With the rise in this number of news websites, the Sri Lankan
authorities of media face the issue of lacking a proper methodology or a tool which is capable of tracking
and regulating publications made by different disseminators of news.
This paper proposes a News Agent toolbox which periodically extracts news articles and associated
comments with the aid of a concept called Mapping Rules; to classify them into Personalized Categories
defined in terms of keywords based Category Profiles. The proposed tool also analyzes comments made by
the readers with the aid of simple statistical techniques to discover the most popular news articles and
fluctuations in popularity of news stories.
ADVANCED E-VOTING APPLICATION USING ANDROID PLATFORMijcax
The advancement in the mobile devices, wireless and web technologies given rise to the new application
that will make the voting process very easy and efficient. The E-voting promises the possibility of
convenient, easy and safe way to capture and count the votes in an election[1]. This research project
provides the specification and requirements for E-Voting using an Android platform. The e-voting means
the voting process in election by using electronic device. The android platform is used to develop an evoting application. At first, an introduction about the system is presented. Sections II and III describe all
the concepts (survey, design and implementation) that would be used in this work. Finally, the proposed evoting system will be presented. This technology helps the user to cast the vote without visiting the polling
booth. The application follows proper authentication measures in order to avoid fraud voters using the
system. Once the voting session is completed the results can be available within a fraction of seconds. All
the candidates vote count is encrypted and stored in the database in order to avoid any attacks and
disclosure of results by third person other than the administrator. Once the session is completed the admin
can decrypt the vote count and publish results and can complete the voting process.
The design of silicon chips in every semiconductor industry involves the testing of these chips with other
components on the board. The platform developed acts as power on vehicle for the silicon chips. This
Printed Circuit Board design that serves as a validation platform is foundational to the semiconductor
industry.
The manual/repetitive design activities that accompany the development of this board must be minimized to
achieve high quality, improve design efficiency, and eliminate human-errors. One of the time consuming
tasks in the board design is the Trace Length matching. The paper aims to reduce the length matching time
by automating it using SKILL scripts.
RESEARCH TRENDS İN EDUCATIONAL TECHNOLOGY İN TURKEY: 2010-2018 YEAR THESIS AN...ijcax
The purpose of this research is the analysis using meta-analysis of studies in the field of Educational
Technology in Turkey and in the field is to demonstrate how to get to that trend. For this purpose, a total of
263 studies were analyzed including 98 theses and 165 articles published between 2010-2018. Purpose
sampling method was used when selecting publications. In the research, while selecting articles and theses;
Turkey addressed; YOK Tez Tarama Database, Journal of Hacettepe University Faculty of Education,
Educational Sciences : Theory & Practice Journal, Education and Science Journal, Elementary Education
Online Journal, The Turkish Online Journal of Education and The Turkish Online Journal of Educational
Technology used in journals. Publications have been reviewed under 11 criteria. Index, year of
publication, research scope, method, education level, sample, number of samples, data collection methods,
analysis techniques, and research tendency, research topics in Educational Technology Research in Turkey
has revealed. The data is interpreted based on percentage and frequency and the results are shown using
the table.
RESEARCH TRENDS İN EDUCATIONAL TECHNOLOGY İN TURKEY: 2010-2018 YEAR THESIS AN...ijcax
The purpose of this research is the analysis using meta-analysis of studies in the field of Educational
Technology in Turkey and in the field is to demonstrate how to get to that trend. For this purpose, a total of
263 studies were analyzed including 98 theses and 165 articles published between 2010-2018. Purpose
sampling method was used when selecting publications. In the research, while selecting articles and theses;
Turkey addressed; YOK Tez Tarama Database, Journal of Hacettepe University Faculty of Education,
Educational Sciences : Theory & Practice Journal, Education and Science Journal, Elementary Education
Online Journal, The Turkish Online Journal of Education and The Turkish Online Journal of Educational
Technology used in journals. Publications have been reviewed under 11 criteria. Index, year of
publication, research scope, method, education level, sample, number of samples, data collection methods,
analysis techniques, and research tendency, research topics in Educational Technology Research in Turkey
has revealed. The data is interpreted based on percentage and frequency and the results are shown using
the table
IMPACT OF APPLYING INTERNATIONAL QUALITY STANDARDS ON MEDICAL EQUIPMENT IN SA...ijcax
This document summarizes a study on the impact of applying international quality standards on medical equipment in Saudi Arabia. A questionnaire was distributed to 300 healthcare professionals in public and private hospitals to collect data. The results showed that around 80% of respondents strongly agreed that international standards like ISO have significantly improved safety, testing and calibration of medical devices, use of consistent terminology, and compatibility/interoperability. Adopting global quality standards appears to have reduced risks associated with medical equipment and enhanced patient safety in Saudi Arabia according to the healthcare workers surveyed.
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR ijcax
The Pattern classification system classifies the pattern into feature space within a boundary. In case adversarial applications use, for example Spam Filtering, the Network Intrusion Detection System (NIDS), Biometric Authentication, the pattern classification systems are used. Spam filtering is an adversary
application in which data can be employed by humans to attenuate perspective operations. To appraise the
security issue related Spam Filtering voluminous machine learning systems. We presented a framework for the experimental evaluation of the classifier security in an adversarial environments, that combines and constructs on the arms race and security by design, Adversary modelling and Data distribution under
attack. Furthermore, we presented a SVM, LR and MILR classifier for classification to categorize email as legitimate (ham) or spam emails on the basis of thee text samples
Developing Product Configurator Tool Using CADs’ API with the help of Paramet...ijcax
Order placingis a crucial phase of lifecycle of a Mass-customizable product and seeks improvement in
Mechanical industry. ‘Product Configurator’ is a good solution to bring in data transparency and speed
up the process. Configuration tools arebeing used on a very small scale,reasons being lack of awareness
and dearer costs of existing tools. In this research work a product configurator is developedfor
Hydraulic Actuator (HA).This method uses Applicable Programing Interface (API) of a CAD tool coupled
with Visual Basics (VB) and MS Excel.Itis a standaloneapplication of VB and its integration into web
portal can be the future scope. The final aim was to reduce time delay at CRM phase,bring more
transparency in the ordering system and to establish a method which, small and medium scale enterprises
canafford. Trails on the tool developed generated Part-Assembly drawings, BOM and JT files in
moments.
DESIGN AND DEVELOPMENT OF CUSTOM CHANGE MANAGEMENT WORKFLOW TEMPLATES AND HAN...ijcax
A large no. of automobile companies finding a convinient way to manage design changes with the use of
various PLM techniques. Change in any product is something that should occur on timely basis to match
up with customer requirement and cost reduction. The change made in the vehicle designs directly affects
various concerned agencies. Automobile Vehicle structures contains thousands of parts and if there is any
change is occurring in child parts then it becomes important to track that impacted part, propose a solution
on that part and release a new assembly structure with feasible changes such that all efforts need to be
done for cost reduction.
Visually impaired people face many problems in their day to day lives. Among them, outdoor navigation is
one of the major concerns. The existing solutions based on Wireless Sensor Networks(WSN) and Global
Positioning System (GPS) track ZigBee units or RFID (Radio Frequency Identification) tags fixed on the
navigation system. The issues pertaining to these solutions are as follows: (1) It is suitable only when the
visually impaired person is commuting in a familiar environment; (2) The device provides only a one way
communication; (3) Most of these instruments are heavy and sometimes costly. Preferable solution would
be to make a system which is easy to carry and cheap.
The objective of this paper is to break down the technological barriers, and to propose a system by
developing an Android App which would help a visually impaired person while traveling via the public
transport system like Bus. The proposed system uses an inbuilt feature of smart phone such as GPS
location tracker to track the location of the user and Text to Speech converter. The system also integrates
Google Speech to Text converter for capturing the voice input and converts them to text. This system
recommends the requirement of installing a GPS module in buses for real time tracking. With minor
modification, this App can also help older people for independent navigation.
TEACHER’S ATTITUDE TOWARDS UTILISING FUTURE GADGETS IN EDUCATION ijcax
Today’s era is an era of modernization and globalization. Everything is happening at a very fast rate
whether it is politics, societal reforms, commercialization, transportation, or educational innovations. In
every few second, technology grows either in the form of arrival of the new devices/gadgets with millions of
apps and these latest technological objects may be in the form of hardware/software devices. We are the
educationists, teachers, students and stakeholders of present Indian educational system. These
gadgets/devices are partly being used by us or most of them are still unaware of these innovative
technologies due to the mass media or economical factor. So, there is a need to improvise ourselves
towards utilizing the future gadgets in order to explore the educational uses, barriers and preparatoryneeds of these available devices for educational purposes. This paper aims to study the opinion of the
teacher-educators about the usage of future gadgets in higher education. It will also contribute towards
establishing the list of latest technological devices, and how it can enhances the process of teachinglearning system.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Software Engineering and Project Management - Software Testing + Agile Method...Prakhyath Rai
Software Testing: A Strategic Approach to Software Testing, Strategic Issues, Test Strategies for Conventional Software, Test Strategies for Object -Oriented Software, Validation Testing, System Testing, The Art of Debugging.
Agile Methodology: Before Agile – Waterfall, Agile Development.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS WEBSITES
1. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
1
INTELLIGENT AGENT FOR PUBLICATION AND
SUBSCRIPTION PATTERN ANALYSIS OF NEWS
WEBSITES
W.D.R Wijedasa and Chathura De Silva
Department of Computer Science Engineering, Faculty of Engineering University of
Moratuwa, Sri Lanka
ABSTRACT
The rapid growth of Internet has revolutionized online news reporting. Many users tend to use online news
websites to obtain news information. When considering Sri Lanka, there are numerous news websites,
which are subscribed on a daily basis. With the rise in this number of news websites, the Sri Lankan
authorities of media face the issue of lacking a proper methodology or a tool which is capable of tracking
and regulating publications made by different disseminators of news.
This paper proposes a News Agent toolbox which periodically extracts news articles and associated
comments with the aid of a concept called Mapping Rules; to classify them into Personalized Categories
defined in terms of keywords based Category Profiles. The proposed tool also analyzes comments made by
the readers with the aid of simple statistical techniques to discover the most popular news articles and
fluctuations in popularity of news stories.
KEYWORDS
News Articles; Mapping Rules; Personalized Classification; Category Profiles
1.INTRODUCTION
The rapid growth of Internet and the increased confidence on digitized information have
revolutionized online news reporting. Such improvements provide efficient means and ways such
as news websites and news blogs, for disseminating news information much quicker than ever
before. The main objective of online news reporting is to provide up-to-date news as well as to
increase news consumption and its usages.
When considering Sri Lanka, there are large number of news websites and news blogs in all three
mediums, Sinhala, English and Tamil. These news websites and blogs publish news related to
local issues, politics, events, celebrations, people, business, weather and entertainment. Such
news websites are subscribed on a daily basis for up-to-date information. With the growth in this
number of news websites and news blogs, the Sri Lankan authorities of media face the issue of
lacking a proper methodology or a tool which is capable of carrying out the following. Identifying
frequently published news topics, finding out topics related to a given set of keywords,
identifying topics which increase popularity with respect to time and identifying news
subscription patterns of users in terms of spotting out the regular users, which topics they
subscribe most and whether they actively express their ideas via commenting.
Lacking of a proper methodology or a tool with previously mentioned capabilities has led the
responsible parties of media to face several issues. Among these issues, difficulty of tracking and
regulating publications made by different disseminators of news takes a major place. This major
2. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
2
issue has led to several sub issues such as difficulty in identifying incidents which require more
attention and difficulty in maintaining the fairness, accuracy and balance of coverage under each
publication. Moreover the difficulty in regulating publications made by different parties in a way
in which it meets the demands of the readers can be considered as another major issue.
A media system within a country directly influences its society in various ways and plays an
important role in the economy of the country. It has been argued that a person’s view is
influenced more by the media system within the country than by his or her personal experiences.
Additionally the public opinion and the awareness are also affected by the means by which news
is being reported [1].
When considering the media system of Sri Lanka, online news reporting also plays a significant
role that is equivalent to traditional paper based news media. As a result of the significant role
played by the news websites within the Sri Lankan media system, it highlights that these online
news websites and news blogs need to balance reporting of news while providing equal coverage
on each and every view. Moreover fairness and accuracy of covered topics are also considered as
equally important qualities. Such qualities can be maintained by solving previously mentioned
issues faced by the Sri Lankan authorities of media.
This paper proposes an Intelligent Agent for publication and subscription patterns analysis of
news websites and associated reader comments, using Personalized Classification [2] and simple
statistical techniques.
2.PREVIOUS WORK
Media analysis has remained as a vast research domain, in social sciences for several decades.
Traditional media analysis has been based on a process known as “coding”, which involves
manual analysis and annotation of small sets of news articles[1][3].
As the result of exponential growth of the World Wide Web and the Internet, online news
reporting has been modernized with the most efficient ways of reporting news. Therefore the
users and authorities of media are now experiencing overwhelming quantities of readily available
news content which grows day by day. This has resulted in a great difficulty for the responsible
parties of media to analyze the news content via coding.
A variety of research efforts have been carried out to provide an opportunity to monitor a vast
number of news outlets, constantly and in an automated way [3] [4] [5] [6]. The required
automation of analysis has been realized by the utilization of Artificial Intelligence (AI)
techniques from the fields of Natural Language Processing ( NLP), Text mining, Text Analytics
and Machine Learning (ML) [3] [4].
Some of the existing automations of analysis are News Outlets Analysis and Monitoring
(NOAM) [3], Lynda [4], News Analysis System (NAS) [5], NewsBlaster [6] and Google News.
NOAM [3] is a data management system which gathers and monitors multi-lingual news content
from 21 countries. Lynda [4] is a multi-purpose system which focuses mainly on the US media
system. It has been developed to detect spatial and temporal distributions of Named Entities. NAS
[5] is a prototype news analysis system which classifies and indexes news stories in real time
while NewsBlaster [6] is a robust news tracking and summarization system of Colombia.
Extraction of news articles from various semi-structured and heterogeneous online sources is a
major activity and a huge challenge for above mentioned automated systems. Various research
efforts have been taken place to come up with algorithms and approaches for extracting news
3. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
3
articles from online sources. Among such approaches, work in [7] has proposed a system called
HTML2RSS which extracts content from HTML Web pages based on Document Object Model
(DOM) structure to generate RSS files. Authors have introduced a concept called Mapping Rules
[7] for extraction purposes. A Mapping Rule is defined as XML file containing XPath
information, which specifies from which node of the DOM tree the content should be extracted.
These Mapping Rules have been generated using manual help. A GUI has been provided for users
to mark HTML elements such as text and links. Subsequently the XPath information of the
selected elements has been extracted automatically by the system.
Most of the automated systems such as NOAM [3], NAS [5], NewsBlaster [6] and Google News
have utilized Clustering or Classification for analyzing the extracted news articles. Grouping
news articles into stories describing a similar event is a major activity in such systems. Various
research efforts have been taken place in order to identify the benefits of existing algorithms and
to propose new approaches for grouping news articles. Among such efforts a novel classification
approach named Personalized Classification has been presented in [2] [8] for classifying news
documents. Personalized Classification enables users to define their own categories of interest on
the fly, with the aim of automating the assignment of documents to such categories. The authors
have used a set of keywords to define each Personalized Category and have utilized Support
Vector Machines (SVM) to perform the classification. The training documents for Personalized
Categories have been obtained from a pool of training documents with the use of a search engine
and the set of keywords defining each category. Subsequently a classifier per Personalized
Category has been created and trained for grouping the articles into stories.
3.METHODOLOGY
Section 3 will describe the approach carried out by this research to address the important issues
identified in section 1, by coming up with a News Agent Toolbox. The proposed tool has been
fundamentally developed to periodically analyze patterns and visualize the results for surveillance
purposes.
The news content analysis process of the tool follows a three step methodology, namely the news
content acquisition, pre-processing and analysis, as in existing systems such as NOAM. A probe
was developed for periodical acquisition of news content, associated reader comments and news
article sharing information made on social networks, from a set of pre-defined Sri Lankan news
websites. The pre-processing step involves tokenizing, stop-word removal, stemming and
representing the news content in a suitable form. The content analysis step involves major
activities, such as classifying news articles into Personalized Categories [2] [8] and analyzing
reader comments and sharing information associated with news articles. The high level
architecture of the proposed tool is depicted in Figure 1.
4. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
4
Figure 1. High level architecture of the proposed system
The tool has been designed using a modular architecture where each module is specialized for a
particular task. According to the three steps methodology, the tool has been designed with four
major modules namely the Content Acquisitioner, Pre-processor, News Analyzer and Comments
Analyzer. Moreover a user friendly GUI and a visualization tool has been attached to the system
to support user interaction and visualization of results.
3.1.Content Acquisitioner
The Content Acquisitioner module was developed to extract news articles, associated user
comments and sharing information on social networks, from a set of predefined English news
websites and news blogs.
Several challenges were encountered while developing the extraction functionality of the Content
Acquisitioner module. Non-uniform structures of news websites and variations in organization of
news articles take a major place among these challenges. Each news website has its own structure
and its own way of organizing news articles. Such arrangements lead to difficulties in coming up
with an extraction methodology which suits all news websites with different structures. Existence
of news websites which do not follow specifications and websites with malformed html content is
also challenging. Moreover, many news websites place additional content such as videos,
advertisements and navigational elements in between news content, making the extraction process
of full text news content a complicated task.
In order to minimize the impact of above mentioned challenges and to come up with an extraction
methodology which suits many sites with different arrangements, the work proposed in [7],
Mapping Rules were used for each news website in order to extract navigational links that point
to full stories of news articles.
5. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
5
3.1.1.News Content Extraction
A four step methodology was used to extract news articles from a set of user provided news
websites.
Step 1:XPath based Mapping Rule generation for each website
As the first step of the extraction process, Mapping Rules [7] per news website will be generated
with the manual assistance. The user desired news website will be loaded via the tool. The user
will be provided with a functionality to select sample links pointing to news articles, within web
pages where news articles are required to be extracted. These selected sample links will be used
to distinguish links pointing to news articles from links pointing to additional content such as
videos, advertisements and etc. Subsequently the XPath of the user selected sample links will be
determined by the tool. Next the identified XPath will be processed in a way that can be used to
aid the extraction of all news links appearing in similar paths within a given web page.
Sample XPath expression identified for a selected news link.
/html /body /div [3] /div [10] /div [2] /div [4] /div [2] /h5 /a
Sample XPath expression which has been processed to aid the extraction of news links appearing
in similar paths.
/html /body /div /div /div /div /div /h5 /a
Step 2: Creation of DOM trees for web pages containing news article links
A DOM tree structure is generated for each webpage from which the articles are required to be
extracted, using HtmlAgilityPack [9] parser.
Step 3: Extraction of links pointing to news articles based on Mapping Rules
Links pointing to news articles which fall under given XPath based Mapping Rules are extracted
by traversing through DOM trees of the web pages.
Step 4: Extraction of full text news content from news articles
Nboilerpipe [10] library, developed by Christian Kohlschütter, was used for extracting textual
news content from news articles pointed by the links extracted in the previous step.
3.1.2. Reader Comments Extraction
A three step methodology was used to extract the author of the comment, commented date and
the comments associated with the extracted news articles.
Step 1: XPath based Mapping Rule generation
As the first step of the comments extraction process, three separate Mapping Rules will be
generated for each website, in order to aid the extraction of commented authors together with the
commented dates and the comments.
Step 2: Creation of DOM trees for news articles containing user comments
A DOM tree structure is generated for each news article webpage.
6. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
6
Step 3: Extraction of comments from news articles based on Mapping Rules
The commented authors and their corresponding comments and the commented dates within each
article web page are extracted by traversing through DOM trees.
Many news websites facilitate users to share news articles via social networks such as Facebook,
Twitter, LinkedIn, Google plus and etc. Social sharing allows users to explicitly publish the URL
of a news article or to like the URL of a news article. The statistics of such social sharing
information on Facebook, Twitter, LinkedIn and Google plus were extracted utilizing below
described two step methodology [11].
For each extracted news article the corresponding social network API call was made to the given
endpoint incorporating the article's URL. In next step the retrieved response in terms of JSON
objects were queried to extract the counts of likes and shares made for the extracted news article.
3.2. Pre-processor
The major functions of the pre-processor module are to perform an initial cleansing of each
article, tokenizing, removal of stop-words and stemming the content of news articles. The Porter
stemming algorithm by Martin Porter [12], developed based on an explicit list of suffixes together
with the criterion under which each suffix to be removed from a word was utilized for stemming
news articles.
3.3. News Analyzer
The main functionality of the News Analyzer component is to classify news articles into a set of
predefined news stories defined under general categories such as politics, business, health and etc.
The approach adopted for the classification process is Personalized Classification presented in
work [2] and [8].
3.3.1. Personalized Classification of News Articles
Personalized Classification is a process which enables users to create their own classification
categories on the fly. This classification methodology is capable of classifying documents under
diverse user interests, compared to general classification process with a fixed set of pre-defined
categories.
The News Analyzer component which utilizes Personalized Classification approach follows a
three step methodology for creating personalized categories. As the first step it enables an
administrative user of the tool to define general categories such as politics, business, health,
sports and etc. Subsequently it enables the user to define more specific news stories (Personalized
Categories), under each general category. As the final step the user will be allowed to define a set
of keywords to describe each personalized news category. A personalized category defined in
terms of keywords will be named as keyword based Category Profile [2] [8]. Keyword based
Category Profiles simplifies the user effort, compared to selecting adequate number of training
documents for each Personalized Category.
7. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
7
3.3.2. Keyword based Category Profiles
In this research each Personalized Category is defined in terms of a list of user specified
keywords instead of using a set of training documents. The Personalized Classification process
based on keyword Category Profiles, would work optimally when the provided keywords have a
high discriminatory power in distinguishing news articles under a desired category from the
articles under the other categories.
In order to facilitate a higher discriminating power for keywords, a weighting scheme tabulated in
Table 1 was introduced. The proposed system enables users to specify a higher weight on
keywords which may occur infrequently and a lower weight on certain keywords which may
occur frequently in many news articles.
Table1. Weighting scheme used for keywords
Weight 0.2 0.4 0.6 0.8 1.0
Indication Very Low Low Medium High Very High
A sample Personalized Classification category will be illustrated below.
General Category: HEALTH
Personalized Category: DCD and Whey Protein products detected
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Dicyandiamide 1.0
DCD 1.0
Whey Protein 1.0
Botulism 1.0
Clostridium 1.0
Milk powder 0.4
3.3.3. Classification Rank Calculation
This research utilizes a classification rank calculation methodology for the classification process
of news articles. The classification rank is an indication of how well, a given news article is
connected to the classified Personalized Category. According to the classification rank calculation
methodology, initially the news articles will be classified into two categories namely the user
desired Personalized Category or OTHER category. All news articles with a classification rank
above zero will get assigned to the user desired Personalized Category and the articles with a zero
classification rank will get assigned to the category of OTHER.
The proposed classification process is depicted in Figure 2.
8. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
8
Figure 2. Personalized Classification process
The classification rank of each news article is calculated based on the tf-idf values of the
keywords describing the desired Personalized Category. The tf-idf weight of a keyword ki in a
news article aj is computed from the term frequency Freq (ki, aj) and the inverse document
frequency as provided in Eq. 1 [2].
tfidf (ki, aj) = Freq (ki, aj) * log2 N / DF(ki) (1)
According to the Eq.1, N is the number of news articles extracted on the user specified date and
DF(ki) is the number of news articles in the article collection N, having keyword ki occurring at
least once.
The classification rank of an article is defined in terms of the summation of the tf-idf weights of
all keywords describing the desired Personalized Category denoted by cp [2].
rank(aj) = ∑ kє cp tfidf (ki, aj) (2)
Three additional equations for calculating ranks were derived utilizing Eq.1 and Eq.2 with the
aim of increasing the accuracy of the classification process and reducing the inaccurate articles
being classified into desired categories with high rank values.
The Eq.3 will calculate weighted tf-idf value of a keyword ki in a news article aj by incorporating
weights defined for the keywords describing the desired Personalized Category. This aims to
reduce the rank values of articles having one to two keywords with high frequency, which might
not exactly describe the desired news story.
wgt_tfidf (ki, aj) = [Wi * Freq (ki, aj)] * log2 N / DF(ki) (3)
According to Eq.3 Wi will denote the user defined weight given for keyword ki describing the
Personalized Category. Based on Eq.3 weighted summation and weighted average based rank
calculations were derived and will be denoted by Eq.4 and Eq.5 respectively.
wgt_rank(aj) = ∑ kє cp wgt_tfidf (ki, aj) (4)
wgt_avg_rank(aj) = ∑ kє cp wgt_tfidf (ki, aj)/ total(Wi) (5)
According to Eq.5, total(Wi) will denote the total of the user defined Weights, of the keywords
describing the desired Personalized Category. The classification process of the proposed system
was evaluated by calculating the classification rank of news articles utilizing Eq.5.
9. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
9
3.4. Comment Analyzer
The main functionality of the Comments Analyzer component is to analyze the most popular
news articles published within a given period, graphically plot sudden fluctuations in popularity
of news stories and to provide a detailed profile for each commenter.
3.4.1. Popularity Score for News Articles
The Comments Analyzer component enables users to obtain a list of most popular news articles
which have been published within a user desired period. Popularity of each article was calculated
utilizing the statistics of the below factors.
• Number of unique users who have made comments.
• Number of days the article has received comments.
• Number of times that the news article has been shared, liked and commented via
Facebook.
• Total number of times that the article has been shared via Twitter.
• Total number of times that the article has been shared via Google plus.
• Total number of times that the article has been shared via LinkedIn.
A popularity score was calculated for each news article utilizing a four steps methodology. As the
first step statistics of above listed factors were calculated utilizing the comments and sharing
information extracted via the Content Acquisitioner module. Subsequently the calculated
statistics were analyzed with the aid of IMB SPSS Statistics [13] toolkit in order to come up with
weights for each factor. As the third step, the weights obtained from the previous step were
utilized to calculate a popularity score for each article using Eq.6.
popularity_score = (Weight[i] *Factor[i]) /age (6)
According to the Eq.6 Weight[i] will denote the weight of the i th
factor, where Factor[i] will
denote the total number of cases available for the i th
factor for a given news article. The number
of days since the article was published will be denoted via variable age.
As the final step, the articles will be ranked according to the popularity scores and the top most k
articles will be presented to the users of the News Agent toolbox.
3.4.2. Profile for Commenters
The Comments Analyzer module maintains user profiles for commenters of news websites. The
main purpose of maintaining user profiles is to present various statistics related to comments
made by users and their commenting behaviour.
3.4.3. Popularity Analysis of News Stories (Personalized Categories)
The main purpose of analyzing the popularity of news stories is to provide a graphical
visualization of sudden increases in popularity of news articles belonging to a specific news story
(Personalized Category). The popularity analysis of news stories will follow a four step
methodology as described below.
As the first step the tool will classify the news articles based on the user provided period and the
news story (Personalized Category). As the next step the tool will calculate the total number of
comments, Tweets, Facebook shares, LinkedIn shares and the Google plus shares that the
10. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
10
classified articles belonging to the given news story has received. Subsequently the tool will
calculate sudden increases in popularity by considering the slope (the rate of change) between the
two data points which represent the total number of comments and sharing statistics. As the final
step the tool will plot how the popularity of the news story has been change over the user given
period.
4. RESULTS
The proposed system was evaluated utilizing the accuracy of news article extraction process of
Content Acquisitioner and the classification results of the News Analyzer modules. The
extraction accuracy was simply evaluated using the percentages of correctly and incorrectly
extracted articles. The classification results were evaluated by comparing the results obtained via
the system with the results obtained via a manual classification process.
The Weighted Average rank calculation given by Eq.5, was utilized for the classification
evaluation process.
4.1. Article Extraction Accuracy
The accuracy of the News Article Extraction process was evaluated utilizing data, which have
been extracted in months of October and November. Around six thousand full text news articles
were manually examined in order to identify news articles and links which have not been
extracted as expected. The news articles which included only the commenting sections made by
the users, the text content placed on navigator sections, the text content of footer sections and
news links pointing to advertisements, leave comments and contact us web pages were considered
as articles and links which have been extracted incorrectly.
The Figure 3 will illustrate the number of news articles which have been extracted accurately and
inaccurately considering the total number of news articles extracted per day between the chosen
time period.
Figure 3. Accuracy of article extraction process
11. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
11
4.2. Article Classification Accuracy
The classification process of the News Analyzer component was evaluated creating several timely
news stories (Personalized Categories) under three main categories such as Politics, Health and
Business. The classification accuracy of each news story was measured considering the period of
time where multiple news websites have reported the related news event.
The following keyword based Category Profile was utilized to obtain test results for news story
“Commonwealth (CHOGM) 2013”.
General Category: POLITICS
Personalized Category: Commonwealth (CHOGM) 2013
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Commonwealth 1.0
CHOGM 1.0
Commonwealth Heads Government Meeting 1.0
Commonwealth Secretariat 1.0
Commonwealth Leaders 1.0
Summit 0.4
Figure 4 will illustrate the test results obtained for the classification process of news articles
related to the news story “Commonwealth (CHOGM) 2013”. The graph illustrates the number of
news articles classified by the News Agent tool box compared to the number of news articles
classified manually. The gap between the two will indicate the number of incorrect articles which
have been classified by the proposed tool.
Figure 4. Test results obtained for Commonwealth (CHOGM) 2013
12. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
12
The following keyword based Category Profile was utilized to obtain test results for news story
“Whey Protein detected in dairy products”.
General Category: HEALTH
Personalized Category: Whey Protein detected in dairy products
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Whey Protein 1.0
Botulism 1.0
Clostridium 1.0
Milk powder 0.4
Dairy products 0.4
Bacteria 0.4
Figure 5 will illustrate the test results obtained for the classification process of news articles
related to the news story “Whey Protein detected in dairy products”.
Figure 5. Test results obtained for Whey Protein detected in dairy products
The following keyword based Category Profile was utilized to obtain test results for news story
“Central Bank related news”.
General Category: BUSINESS
Personalized Category: Central Bank related news
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Central Bank 1.0
CB 1.0
Monitor 0.2
Report 0.4
Figure 6 will illustrate the test results obtained for the classification process of news articles
related to the news story “Central Bank related news”.
13. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
13
Figure 6. Test results obtained for Central Bank related news
According to the test results obtained for classifying news articles into above mentioned news
stories, it is clear that the accuracy of the proposed system is much closer to the results obtained
via the manual classification process. According to the conducted evaluations, the News Analyzer
module functions with an approximate accuracy of 81% to 82%.
4.2.1 Discriminative Power of Keywords vs. Accuracy
The accuracy of the classification process is dependent upon the degree of the discriminating
power of keywords defined under Category Profiles of the news stories. The higher the keywords
are capable of distinguishing news articles of one news story from another, more accuracy can be
obtained by the proposed classification process.
The test results obtained by the following sample test scenarios show that Category Profiles with
keywords having high discriminating power provide more accuracy and Category Profiles with
keywords having low discriminating power provide less accuracy. Figure 7 and Figure 8 will
illustrate how the classification accuracy has been altered when the discriminating power of the
keywords are changed.
The following keyword based Category Profile was utilized to obtain test results for news story
“Dengue outbreaks in Sri Lanka” with high discriminating power keywords.
General Category: HEALTH
Personalized Category: Dengue outbreaks in Sri Lanka
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Dengue 1.0
Dengue mosquito 1.0
Dengue virus 1.0
Dengue Hemorrhagic Fever 1.0
Fever 0.4
Mosquito breeding 0.4
14. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
14
Figure 7. Results for keywords with high discriminating power
The following keyword based Category Profile was utilized to obtain test results for news story
“Dengue outbreaks in Sri Lanka” with low discriminating power keywords.
General Category: HEALTH
Personalized Category: Dengue outbreaks in Sri Lanka
Occurrence of Keywords: Multiple Occurrences
Keywords and Weights:
Dengue 1.0
Mosquito 1.0
Virus 1.0
Fever 0.4
Breeding 0.4
Figure 8. Results for keywords with low discriminating power
15. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
15
5. CONCLUSIONS
In summary, this research attempts to develop an Intelligent Agent, for analyzing readily
available news content and the subscription patterns of Sri Lankan news websites and news blogs.
The aim of this research is to take proper initiatives to mine heavily available online news content
and the subscription data, in order to provide an efficient way to surveillance different publication
and subscription patterns. The main objectives of this research are to assist the Sri Lankan
authorities of media to track and regulate different publications made by different disseminators
of news, to assist identifying incidents which requires more attention and news events increasing
popularity over time, to assure proper balance, accuracy and equal coverage of news reporting.
The proposed system has been evaluated utilizing the accuracy of news article extraction process
and the classification process. The extraction accuracy has been simply evaluated using the
percentages of correctly and incorrectly extracted articles. The classification results have been
evaluated using several news stories together with the keyword profiles for general categories
such as Politics, Health and Business. The test results show that the extraction process functions
with a 98% of accuracy and the classification process functions with an approximate accuracy of
81%.
ACKNOWLEDGEMENTS
I would like to convey my sincere gratitude to my supervisor, Dr. Chathura De Silva, Senior
Lecturer, Department of Computer Science and Engineering, University of Moratuwa for
providing the valuable research idea, guidance, constructive comments and extensive support
throughout this research.
I also express my sincere gratitude to our MSc Project Coordinator Dr A. Shehan Perera, Senior
Lecturer, Department of Computer Science and Engineering, University of Moratuwa for
providing extensive support and encouragement by conducting weekly sessions, forums and
progress presentations throughout the year.
REFERENCES
[1] I. Flaounas, ‘Pattern analysis of news media content’, University of Bristol, 2011.
[2] A. Sun, E.-P. Lim, and W.-K. Ng, ‘Personalized classification for keyword-based category profiles’, in
Research and Advanced Technology for Digital Libraries, Springer, 2002, pp. 61–74.
[3] I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, and Cristianini, ‘NOAM: news
outlets analysis and monitoring system’, in Proc. of the 2011 ACM SIGMOD international conference
on Management of data, 2011, pp. 1275–1278.
[4] L. Lloyd, D. Kechagias, and S. Skiena, ‘Lydia: A system for large-scale news analysis’, in String
Processing and Information Retrieval, 2005, pp. 161–166.
[5] R. J. Kuhns, ‘A news analysis system’, in Proceedings of the 12th conference on Computational
linguistics-Volume 1, 1988, pp. 351–355.
[6] K. R. McKeown, R. Barzilay, D. Evans, V. Hatzivassiloglou, J. L. Klavans, A. Nenkova, C. Sable, B.
Schiffman, S. Sigelman, and M. Summarization, Tracking and Summarizing News on a Daily Basis
with Columbia’s Newsblaster, Proceedings of Human Language Technology Conference. San Diego,
USA, 2002.
[7] Geng, Q. Gao, and J. Pan, ‘Extracting content for news web pages based on DOM’, IJCSNS
International Journal of Computer Science and Network Security, vol. 7, no. 2, pp. 124–129,2007.
[8] C.-H. C. A. S. Ee and P. Lim, ‘Automated online news classification with personalization’, in 4th
international conference on Asian digital libraries, 2001.
16. International Journal of Computer-Aided Technologies (IJCAx) Vol.2,No.1,January 2015
16
[9] “Parsing HTML Documents with the Html Agility Pack” [Online]. Available:
http://www.4guysfromrolla.com/articles/011211-1.aspx. [Accessed: 29-Apr-2013].
[10]Kohlschütter, P. Fankhauser, and W. Nejdl, ‘Boilerplate detection using shallow text features’, in
Proceedings of the third ACM international conference on Web search and data mining, New York,
NY, USA, 2010, pp. 441–450.
[11]‘Get Social Share Counts - A Complete Guide’, CUBE3X. [Online]. Available:
http://cube3x.com/2013/01/get-social-share-counts-a-complete-guide/. [Accessed: 10-Oct-2013].
[12]‘Porter Stemming Algorithm’. [Online]. Available: http:// tartarus.org /martin
/PorterStemmer/index.html. [Accessed: 03-Jun-2013].
[13]‘IBM SPSS Statistics’, 29-Oct-2013. [Online]. Available: http://www-
01.ibm.com/software/analytics/spss/products/statistics/. [Accessed: 14-Oct-2013].
Authors
W.D.R Wijedasa received her M.Sc. Degree in Computer Science Engineering fro m
University of Moratuwa Sri Lanka in 2014. She has been working as a Software
Engineer at IFS RnD Pvt Limited from 2010. Her research interests are Data mining,
Artificial Intelligence and Agent Technologies.
Dr. Chathura De Silva received his MEng. Degree and Ph.D. Degree from National
University of Singapore. He has been working as a Senior Lecturer at Department of
Computer Science E ngineering University of Moratuwa Sri Lanka. He is the current
Head of Department of Computer Science Engineering University of Moratuwa.