On-going big data from social networks sites alike Twitter or Facebook has been an entrancing
hotspot for investigation by researchers in current decades as a result of various aspects including
up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy.
Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which
can cluster data within a lesser computational time, especially for data streaming is needed. The presented
adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome
the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is
viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally
improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data
streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is
classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization.
At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to
discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs
dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority of
presented approach comparing with the existing methods in terms of precision, recall, F-score,
convergence, ROC curve and accuracy.
Over recent years, big data, a huge amount of structured and unstructured data is generated from social Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper, multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the classification accuracy over multi class classification based on single-tier architecture by 7%.
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Designing and configuring context-aware semantic web applicationsTELKOMNIKA JOURNAL
Context-aware services are attracting attention of world as the use of web services are rapidly growing. We designed an architecture of context-aware semantic web which provides on demand flexibility and scalability in extracting and mining the research papers from well-known digital libraries i.e. ACM, IEEE and SpringerLink. This paper proposes a context-aware administrations system, which supports programmed revelation and incorporation of setting dependent on Semantic Web administrations. This work has been done using the python programming language with a dedicated library for the semantic web analysis named as “Cubic-Web” on any defined dataset, in our case as we have used a dataset for extracting and studying several publications to measure the impact of context aware semantic web application on the results. We have found the average recall and averge accuracy for all the context aware research journals in our research work. Moreover, as this study is limited journal documents, other future studies can be approached by examining different types of publications using this advance research. An efficient system has been designed considering the parameters of research article meta-data to find out the papers from the web using semantic web technology. Parameters like year of publication, type of publication, number of contributors, evaluation methods and analysis method used in publication. All this data has been extracted using the designed context-aware semantic web technology.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...IIIT Hyderabad
Social network and publishing platforms, such as Twitter, support the concept of verification. Veri-
fied accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tan-
tamount to endorsement. However, a significant body of prior work suggests that possessing a verified
status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a station
is highly coveted among public figures and influencers. Hence, we attempt to characterize the network
of verified users on Twitter and compare the results to similar analyses performed for the entire Twit-
ter network. We extracted the whole graph of verified users on Twitter (as of July 2018) and obtained
231,246 English user-profiles and 79,213,811 connections. Subsequently, in the network analysis, we
found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects, such as
possessing a short diameter. However, our findings contrast with earlier results on multiple fronts, such
as the possession of a power-law out-degree distribution, slight dissortativity, and a significantly higher
reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient com-
ponents within this sub-graph and detect the absence of homophily with respect to popularity, which
again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series
of verified user activity levels.
It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policy
mingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’s
profile, which likely engender or preclude verification. The aim of the paper is two-fold: First, we test
if discerning the verification status of a handle from profile metadata and content features is feasible.
Second, we unravel the characteristics which have the most significant bearing on a handle’s verification
status. We augmented our dataset with all the 494 million tweets of the aforementioned users over a one
year collection period along with their temporal social reach and activity characteristics. Our proposed
models are able to reliably identify verification status (Area under curve AUC > 99%). We show that
the number of public list memberships, presence of neutral sentiment in tweets and an authoritative
language style are the most pertinent predictors of verification status.
To the best of our knowledge, this work represents the first quantitative attempt at characterizing
verified users on Twitter and also the first attempt at discerning and classifying verification worthy users
on Twitter.
Over recent years, big data, a huge amount of structured and unstructured data is generated from social Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper, multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the classification accuracy over multi class classification based on single-tier architecture by 7%.
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Designing and configuring context-aware semantic web applicationsTELKOMNIKA JOURNAL
Context-aware services are attracting attention of world as the use of web services are rapidly growing. We designed an architecture of context-aware semantic web which provides on demand flexibility and scalability in extracting and mining the research papers from well-known digital libraries i.e. ACM, IEEE and SpringerLink. This paper proposes a context-aware administrations system, which supports programmed revelation and incorporation of setting dependent on Semantic Web administrations. This work has been done using the python programming language with a dedicated library for the semantic web analysis named as “Cubic-Web” on any defined dataset, in our case as we have used a dataset for extracting and studying several publications to measure the impact of context aware semantic web application on the results. We have found the average recall and averge accuracy for all the context aware research journals in our research work. Moreover, as this study is limited journal documents, other future studies can be approached by examining different types of publications using this advance research. An efficient system has been designed considering the parameters of research article meta-data to find out the papers from the web using semantic web technology. Parameters like year of publication, type of publication, number of contributors, evaluation methods and analysis method used in publication. All this data has been extracted using the designed context-aware semantic web technology.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...IIIT Hyderabad
Social network and publishing platforms, such as Twitter, support the concept of verification. Veri-
fied accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tan-
tamount to endorsement. However, a significant body of prior work suggests that possessing a verified
status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a station
is highly coveted among public figures and influencers. Hence, we attempt to characterize the network
of verified users on Twitter and compare the results to similar analyses performed for the entire Twit-
ter network. We extracted the whole graph of verified users on Twitter (as of July 2018) and obtained
231,246 English user-profiles and 79,213,811 connections. Subsequently, in the network analysis, we
found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects, such as
possessing a short diameter. However, our findings contrast with earlier results on multiple fronts, such
as the possession of a power-law out-degree distribution, slight dissortativity, and a significantly higher
reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient com-
ponents within this sub-graph and detect the absence of homophily with respect to popularity, which
again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series
of verified user activity levels.
It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policy
mingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’s
profile, which likely engender or preclude verification. The aim of the paper is two-fold: First, we test
if discerning the verification status of a handle from profile metadata and content features is feasible.
Second, we unravel the characteristics which have the most significant bearing on a handle’s verification
status. We augmented our dataset with all the 494 million tweets of the aforementioned users over a one
year collection period along with their temporal social reach and activity characteristics. Our proposed
models are able to reliably identify verification status (Area under curve AUC > 99%). We show that
the number of public list memberships, presence of neutral sentiment in tweets and an authoritative
language style are the most pertinent predictors of verification status.
To the best of our knowledge, this work represents the first quantitative attempt at characterizing
verified users on Twitter and also the first attempt at discerning and classifying verification worthy users
on Twitter.
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
Semantic web is a web of future. The Resource Description Framework (RDF) is a language
to represent resources in the World Wide Web. When these resources are queried the problem of duplicate
query results occurs. The present techniques used hash index comparison to remove duplicate query
results. The major drawback of using the hash index to remove duplicate query results is that, if there is a
slight change in formatting or word order, then hash index is changed and query results are no more
considered as duplicate even though they have same contents. We presented an algorithm for detection and
elimination of duplicate query results from semantic web using hash index and page size comparisons.
Experimental results showed that the proposed technique removed duplicate query results from semantic
web efficiently, solved the problems of using hash index for duplicate handling and could be embedded in
existing SQL-Based query system for semantic web. Research could be carried out for certain flexibilities
in existing SQL-Based query system of semantic web to accommodate other duplicate detection techniques
as well.
World Wide Web is large sized repository of
interlinked hypertext documents accessed via the Internet. Web
may contain text, images, video, and other multimedia data. The
user navigates through this using hyperlink. Search Engine gives
millions of results and applies Web mining techniques to order the
results. The sorted order of search results is obtained by applying
some special algorithms called—Page ranking algorithms. The
algorithm measures the importance of the pages by analyzing the
number of inlinked and outlinked pages. Our proposed system is
built on an idea that to rank the relevant pages higher in the
retrieved document set, an analysis of both page‘s text substance
and links information is required. The proposed approach is
based on the assumption that the effective weight of a term in a
page is computed by adding the weight of a term in the current
page and additional weight of the term in the linked pages. In
this chapter, we first study the nature of web pages, the various
link analysis ranking algorithms and their limitations and then
show the comparative analysis of the ranking scores obtained
through these approaches with our new suggested ranking
approach.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Social Targeting: Understanding Social Media Data Mining & AnalysisInfini Graph
Chase McMichael – CEO, InfiniGraph
Social Targeting: Understanding Social Media Data Mining & Analysis
With the advent of the social web, companies that aren’t actively mining, analysing and using social media data are missing a huge commercial advantage. In this session Chase McMichael will explain how social targeting works, including technologies, techniques and opportunities. He will also highlight the privacy challenges facing the industry.
Comparable Analysis of Web Mining Categoriestheijes
Web Data Mining is the current field of analysis which is a combination of two research area known as Data Mining and World Wide Web. Web Data Mining research associates with various research diversities like Database, Artificial Intelligence and Information redeem. The mining techniques are categorized into various categories namely Web Content Mining, Web Structure Mining and Web Usage Mining. In this work, analysis of mining techniques are done. From the analysis it has been concluded that Web Content Mining has unstructured or semi- structure view of data whereas Web Structure Mining have linked structure and Web Usage Mining mainly includes interaction.
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
A real-time big data sentiment analysis for iraqi tweets using spark streamingjournalBEEI
The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
Semantic web is a web of future. The Resource Description Framework (RDF) is a language
to represent resources in the World Wide Web. When these resources are queried the problem of duplicate
query results occurs. The present techniques used hash index comparison to remove duplicate query
results. The major drawback of using the hash index to remove duplicate query results is that, if there is a
slight change in formatting or word order, then hash index is changed and query results are no more
considered as duplicate even though they have same contents. We presented an algorithm for detection and
elimination of duplicate query results from semantic web using hash index and page size comparisons.
Experimental results showed that the proposed technique removed duplicate query results from semantic
web efficiently, solved the problems of using hash index for duplicate handling and could be embedded in
existing SQL-Based query system for semantic web. Research could be carried out for certain flexibilities
in existing SQL-Based query system of semantic web to accommodate other duplicate detection techniques
as well.
World Wide Web is large sized repository of
interlinked hypertext documents accessed via the Internet. Web
may contain text, images, video, and other multimedia data. The
user navigates through this using hyperlink. Search Engine gives
millions of results and applies Web mining techniques to order the
results. The sorted order of search results is obtained by applying
some special algorithms called—Page ranking algorithms. The
algorithm measures the importance of the pages by analyzing the
number of inlinked and outlinked pages. Our proposed system is
built on an idea that to rank the relevant pages higher in the
retrieved document set, an analysis of both page‘s text substance
and links information is required. The proposed approach is
based on the assumption that the effective weight of a term in a
page is computed by adding the weight of a term in the current
page and additional weight of the term in the linked pages. In
this chapter, we first study the nature of web pages, the various
link analysis ranking algorithms and their limitations and then
show the comparative analysis of the ranking scores obtained
through these approaches with our new suggested ranking
approach.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Social Targeting: Understanding Social Media Data Mining & AnalysisInfini Graph
Chase McMichael – CEO, InfiniGraph
Social Targeting: Understanding Social Media Data Mining & Analysis
With the advent of the social web, companies that aren’t actively mining, analysing and using social media data are missing a huge commercial advantage. In this session Chase McMichael will explain how social targeting works, including technologies, techniques and opportunities. He will also highlight the privacy challenges facing the industry.
Comparable Analysis of Web Mining Categoriestheijes
Web Data Mining is the current field of analysis which is a combination of two research area known as Data Mining and World Wide Web. Web Data Mining research associates with various research diversities like Database, Artificial Intelligence and Information redeem. The mining techniques are categorized into various categories namely Web Content Mining, Web Structure Mining and Web Usage Mining. In this work, analysis of mining techniques are done. From the analysis it has been concluded that Web Content Mining has unstructured or semi- structure view of data whereas Web Structure Mining have linked structure and Web Usage Mining mainly includes interaction.
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
A real-time big data sentiment analysis for iraqi tweets using spark streamingjournalBEEI
The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.
There are numerous ways to analyse the web information, generally web substance are housed in
large information sets and basic inquiries are utilized to parse such information sets. As the requests
expanded with time, mining web information amended to meet challenging task in a web analysis.
Machine learning methodologies are the most up to date one to go into these analysis forms. Different
approaches like decision trees, association rules, Meta heuristic and basic learning methods are embraced
for making web data appraisal and mining data from various web instances. This study will highlight these
approaches in perspective of web investigation. One of the prime goals of this exploration is to investigate
more data mining approaches alongside machine learning systems, and to express emerging collaboration
of web analytics with artificial intelligence.
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...josephjonse
Organizations face a challenge of accurately analyzing network data and providing automated action based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and improve the performance of the network services, but organizations use different network management tools to understand and visualize the network traffic with limited abilities to dynamically optimize the network. This research focuses on the development of an intelligent system that leverages big data telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web application for effortless management of all subsystems, and the system and application developed in this research demonstrate the true potential for a scalable system capable of effectively benchmarking the network to set the expected behavior for comparison and trend analysis. Moreover, this research provides a proof of concept of how trend analysis results are actioned in both a traditional network and a software-defined network (SDN) to achieve dynamic, automated load balancing.
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...ijngnjournal
Organizations face a challenge of accurately analyzing network data and providing automated action
based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and
improve the performance of the network services, but organizations use different network management
tools to understand and visualize the network traffic with limited abilities to dynamically optimize the
network. This research focuses on the development of an intelligent system that leverages big data
telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web
application for effortless management of all subsystems, and the system and application developed in
this research demonstrate the true potential for a scalable system capable of effectively benchmarking
the network to set the expected behavior for comparison and trend analysis. Moreover, this research
provides a proof of concept of how trend analysis results are actioned in both a traditional network and
a software-defined network (SDN) to achieve dynamic, automated load balancing.
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...josephjonse
Organizations face a challenge of accurately analyzing network data and providing automated action based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and improve the performance of the network services, but organizations use different network management tools to understand and visualize the network traffic with limited abilities to dynamically optimize the network. This research focuses on the development of an intelligent system that leverages big data telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web application for effortless management of all subsystems, and the system and application developed in this research demonstrate the true potential for a scalable system capable of effectively benchmarking the network to set the expected behavior for comparison and trend analysis. Moreover, this research provides a proof of concept of how trend analysis results are actioned in both a traditional network and a software-defined network (SDN) to achieve dynamic, automated load balancing
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
Service Level Comparison for Online Shopping using Data MiningIIRindia
The term knowledge discovery in databases (KDD) is the analysis step of data mining. The data mining goal is to extract the knowledge and patterns from large data sets, not the data extraction itself. Big-Data Computing is a critical challenge for the ICT industry. Engineers and researchers are dealing with the cloud computing paradigm of petabyte data sets. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. We investigate the problem for a single source node to broadcast the big chunk of data sets to a set of nodes to minimize the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed data centers. The Big-data broadcasting problem is modeled into a LockStep Broadcast Tree (LSBT) problem. And the main idea of the LSBT is defining a basic unit of upload bandwidth, r, a node with capacity c broadcasts data to a set of [c=r] children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. The broadcast data are further divided into m chunks. In a pipeline manner, these m chunks can then be broadcast down the LSBT. In a homogeneous network environment in which each node has the same upload capacity c, the optimal uplink rate r, of LSBT is either c=2 or 3, whichever gives the smaller maximum completion time. For heterogeneous environments, an O(nlog2n) algorithm is presented to select an optimal uplink rate r, and to construct an optimal LSBT. With lower computational complexity and low maximum completion time, the numerical results shows better performance.The methodology includes Various Web applications Building and Broadcasting followed by the Gateway Application and Batch Processing over the TSV Data after which the Web Crawling for Resources and MapReduce process takes place and finally Picking Products from Recommendations and Purchasing it.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
Big data analytics allows a vast amount of structured and unstructured data to be effectively processed so that correlations, hidden patterns, and other useful information can be mined from the data. Several open source big data analytic tools that can perform tasks such as dimensionality reduction, feature extraction, transformation, optimization, are now available. One interesting area where such tools can provide effective solutions is transportation. Big data analytics can be used to efficiently manage transport infrastructure assets such as roads, airports, bus stations or ports. In this paper an overview of two open source big data analytic tools is first provided followed by a simple demonstration of application of these tools on transport dataset.
Similar to An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark (20)
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the product’s manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
In this paper, snake optimization algorithm (SOA) is used to find the optimal gains of an enhanced controller for controlling congestion problem in computer networks. M-file and Simulink platform is adopted to evaluate the response of the active queue management (AQM) system, a comparison with two classical controllers is done, all tuned gains of controllers are obtained using SOA method and the fitness function chose to monitor the system performance is the integral time absolute error (ITAE). Transient analysis and robust analysis is used to show the proposed controller performance, two robustness tests are applied to the AQM system, one is done by varying the size of queue value in different period and the other test is done by changing the number of transmission control protocol (TCP) sessions with a value of ± 20% from its original value. The simulation results reflect a stable and robust behavior and best performance is appeared clearly to achieve the desired queue size without any noise or any transmission problems.
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
Vehicular ad-hoc networks (VANETs) are wireless-equipped vehicles that form networks along the road. The security of this network has been a major challenge. The identity-based cryptosystem (IBC) previously used to secure the networks suffers from membership authentication security features. This paper focuses on improving the detection of intruders in VANETs with a modified identity-based cryptosystem (MIBC). The MIBC is developed using a non-singular elliptic curve with Lagrange interpolation. The public key of vehicles and roadside units on the network are derived from number plates and location identification numbers, respectively. Pseudo-identities are used to mask the real identity of users to preserve their privacy. The membership authentication mechanism ensures that only valid and authenticated members of the network are allowed to join the network. The performance of the MIBC is evaluated using intrusion detection ratio (IDR) and computation time (CT) and then validated with the existing IBC. The result obtained shows that the MIBC recorded an IDR of 99.3% against 94.3% obtained for the existing identity-based cryptosystem (EIBC) for 140 unregistered vehicles attempting to intrude on the network. The MIBC shows lower CT values of 1.17 ms against 1.70 ms for EIBC. The MIBC can be used to improve the security of VANETs.
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
Understanding the primary factors of internet banking (IB) acceptance is critical for both banks and users; nevertheless, our knowledge of the role of users’ perceived risk and trust in IB adoption is limited. As a result, we develop a conceptual model by incorporating perceived risk and trust into the technology acceptance model (TAM) theory toward the IB. The proper research emphasized that the most essential component in explaining IB adoption behavior is behavioral intention to use IB adoption. TAM is helpful for figuring out how elements that affect IB adoption are connected to one another. According to previous literature on IB and the use of such technology in Iraq, one has to choose a theoretical foundation that may justify the acceptance of IB from the customer’s perspective. The conceptual model was therefore constructed using the TAM as a foundation. Furthermore, perceived risk and trust were added to the TAM dimensions as external factors. The key objective of this work was to extend the TAM to construct a conceptual model for IB adoption and to get sufficient theoretical support from the existing literature for the essential elements and their relationships in order to unearth new insights about factors responsible for IB adoption.
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
The smart antennas are broadly used in wireless communication. The least mean square (LMS) algorithm is a procedure that is concerned in controlling the smart antenna pattern to accommodate specified requirements such as steering the beam toward the desired signal, in addition to placing the deep nulls in the direction of unwanted signals. The conventional LMS (C-LMS) has some drawbacks like slow convergence speed besides high steady state fluctuation error. To overcome these shortcomings, the present paper adopts an adaptive fuzzy control step size least mean square (FC-LMS) algorithm to adjust its step size. Computer simulation outcomes illustrate that the given model has fast convergence rate as well as low mean square error steady state.
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
This paper presents the design and implementation of a forest fire monitoring and warning system based on long range (LoRa) technology, a novel ultra-low power consumption and long-range wireless communication technology for remote sensing applications. The proposed system includes a wireless sensor network that records environmental parameters such as temperature, humidity, wind speed, and carbon dioxide (CO2) concentration in the air, as well as taking infrared photos.The data collected at each sensor node will be transmitted to the gateway via LoRa wireless transmission. Data will be collected, processed, and uploaded to a cloud database at the gateway. An Android smartphone application that allows anyone to easily view the recorded data has been developed. When a fire is detected, the system will sound a siren and send a warning message to the responsible personnel, instructing them to take appropriate action. Experiments in Tram Chim Park, Vietnam, have been conducted to verify and evaluate the operation of the system.
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
Cognitive radio is a smart radio that can change its transmitter parameter based on interaction with the environment in which it operates. The demand for frequency spectrum is growing due to a big data issue as many Internet of Things (IoT) devices are in the network. Based on previous research, most frequency spectrum was used, but some spectrums were not used, called spectrum hole. Energy detection is one of the spectrum sensing methods that has been frequently used since it is easy to use and does not require license users to have any prior signal understanding. But this technique is incapable of detecting at low signal-to-noise ratio (SNR) levels. Therefore, the wavelet-based sensing is proposed to overcome this issue and detect spectrum holes. The main objective of this work is to evaluate the performance of wavelet-based sensing and compare it with the energy detection technique. The findings show that the percentage of detection in wavelet-based sensing is 83% higher than energy detection performance. This result indicates that the wavelet-based sensing has higher precision in detection and the interference towards primary user can be decreased.
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
In this paper, we present the design of a new wide dual-band bandstop filter (DBBSF) using nonuniform transmission lines. The method used to design this filter is to replace conventional uniform transmission lines with nonuniform lines governed by a truncated Fourier series. Based on how impedances are profiled in the proposed DBBSF structure, the fractional bandwidths of the two 10 dB-down rejection bands are widened to 39.72% and 52.63%, respectively, and the physical size has been reduced compared to that of the filter with the uniform transmission lines. The results of the electromagnetic (EM) simulation support the obtained analytical response and show an improved frequency behavior.
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
A distributed denial of service (DDoS) attack is where one or more computers attack or target a server computer, by flooding internet traffic to the server. As a result, the server cannot be accessed by legitimate users. A result of this attack causes enormous losses for a company because it can reduce the level of user trust, and reduce the company’s reputation to lose customers due to downtime. One of the services at the application layer that can be accessed by users is a web-based lightweight directory access protocol (LDAP) service that can provide safe and easy services to access directory applications. We used a deep learning approach to detect DDoS attacks on the CICDDoS 2019 dataset on a complex computer network at the application layer to get fast and accurate results for dealing with unbalanced data. Based on the results obtained, it is observed that DDoS attack detection using a deep learning approach on imbalanced data performs better when implemented using synthetic minority oversampling technique (SMOTE) method for binary classes. On the other hand, the proposed deep learning approach performs better for detecting DDoS attacks in multiclass when implemented using the adaptive synthetic (ADASYN) method.
The appearance of uncertainties and disturbances often effects the characteristics of either linear or nonlinear systems. Plus, the stabilization process may be deteriorated thus incurring a catastrophic effect to the system performance. As such, this manuscript addresses the concept of matching condition for the systems that are suffering from miss-match uncertainties and exogeneous disturbances. The perturbation towards the system at hand is assumed to be known and unbounded. To reach this outcome, uncertainties and their classifications are reviewed thoroughly. The structural matching condition is proposed and tabulated in the proposition 1. Two types of mathematical expressions are presented to distinguish the system with matched uncertainty and the system with miss-matched uncertainty. Lastly, two-dimensional numerical expressions are provided to practice the proposed proposition. The outcome shows that matching condition has the ability to change the system to a design-friendly model for asymptotic stabilization.
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
Many systems, including digital signal processors, finite impulse response (FIR) filters, application-specific integrated circuits, and microprocessors, use multipliers. The demand for low power multipliers is gradually rising day by day in the current technological trend. In this study, we describe a 4×4 Wallace multiplier based on a carry select adder (CSA) that uses less power and has a better power delay product than existing multipliers. HSPICE tool at 16 nm technology is used to simulate the results. In comparison to the traditional CSA-based multiplier, which has a power consumption of 1.7 µW and power delay product (PDP) of 57.3 fJ, the results demonstrate that the Wallace multiplier design employing CSA with first zero finding logic (FZF) logic has the lowest power consumption of 1.4 µW and PDP of 27.5 fJ.
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
The flaw in 5G orthogonal frequency division multiplexing (OFDM) becomes apparent in high-speed situations. Because the doppler effect causes frequency shifts, the orthogonality of OFDM subcarriers is broken, lowering both their bit error rate (BER) and throughput output. As part of this research, we use a novel design that combines massive multiple input multiple output (MIMO) and weighted overlap and add (WOLA) to improve the performance of 5G systems. To determine which design is superior, throughput and BER are calculated for both the proposed design and OFDM. The results of the improved system show a massive improvement in performance ver the conventional system and significant improvements with massive MIMO, including the best throughput and BER. When compared to conventional systems, the improved system has a throughput that is around 22% higher and the best performance in terms of BER, but it still has around 25% less error than OFDM.
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
In this study, it is aimed to obtain two different asymmetric radiation patterns obtained from antennas in the shape of the cross-section of a parabolic reflector (fan blade type antennas) and antennas with cosecant-square radiation characteristics at two different frequencies from a single antenna. For this purpose, firstly, a fan blade type antenna design will be made, and then the reflective surface of this antenna will be completed to the shape of the reflective surface of the antenna with the cosecant-square radiation characteristic with the frequency selective surface designed to provide the characteristics suitable for the purpose. The frequency selective surface designed and it provides the perfect transmission as possible at 4 GHz operating frequency, while it will act as a band-quenching filter for electromagnetic waves at 5 GHz operating frequency and will be a reflective surface. Thanks to this frequency selective surface to be used as a reflective surface in the antenna, a fan blade type radiation characteristic at 4 GHz operating frequency will be obtained, while a cosecant-square radiation characteristic at 5 GHz operating frequency will be obtained.
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
A simple and low-cost fiber based optical sensor for iron detection is demonstrated in this paper. The sensor head consist of an unclad optical fiber with the unclad length of 1 cm and it has a straight structure. Results obtained shows a linear relationship between the output light intensity and iron concentration, illustrating the functionality of this iron optical sensor. Based on the experimental results, the sensitivity and linearity are achieved at 0.0328/ppm and 0.9824 respectively at the wavelength of 690 nm. With the same wavelength, other performance parameters are also studied. Resolution and limit of detection (LOD) are found to be 0.3049 ppm and 0.0755 ppm correspondingly. This iron sensor is advantageous in that it does not require any reagent for detection, enabling it to be simpler and cost-effective in the implementation of the iron sensing.
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
This article discusses the progressive learning for structural tolerance online sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save robust accuracy while updating the new data and the new class data on the online training situation. The robustness accuracy arises from using the householder block exact QR decomposition recursive least squares (HBQRD-RLS) of the PSTOS-ELM. This method is suitable for applications that have data streaming and often have new class data. Our experiment compares the PSTOS-ELM accuracy and accuracy robustness while data is updating with the batch-extreme learning machine (ELM) and structural tolerance online sequential extreme learning machine (STOS-ELM) that both must retrain the data in a new class data case. The experimental results show that PSTOS-ELM has accuracy and robustness comparable to ELM and STOS-ELM while also can update new class data immediately.
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
This study aimed to develop a brain-computer interface that can control an electric wheelchair using electroencephalography (EEG) signals. First, we used the Mind Wave Mobile 2 device to capture raw EEG signals from the surface of the scalp. The signals were transformed into the frequency domain using fast Fourier transform (FFT) and filtered to monitor changes in attention and relaxation. Next, we performed time and frequency domain analyses to identify features for five eye gestures: opened, closed, blink per second, double blink, and lookup. The base state was the opened-eyes gesture, and we compared the features of the remaining four action gestures to the base state to identify potential gestures. We then built a multilayer neural network to classify these features into five signals that control the wheelchair’s movement. Finally, we designed an experimental wheelchair system to test the effectiveness of the proposed approach. The results demonstrate that the EEG classification was highly accurate and computationally efficient. Moreover, the average performance of the brain-controlled wheelchair system was over 75% across different individuals, which suggests the feasibility of this approach.
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
For image segmentation, level set models are frequently employed. It offer best solution to overcome the main limitations of deformable parametric models. However, the challenge when applying those models in medical images stills deal with removing blurs in image edges which directly affects the edge indicator function, leads to not adaptively segmenting images and causes a wrong analysis of pathologies wich prevents to conclude a correct diagnosis. To overcome such issues, an effective process is suggested by simultaneously modelling and solving systems’ two-dimensional partial differential equations (PDE). The first PDE equation allows restoration using Euler’s equation similar to an anisotropic smoothing based on a regularized Perona and Malik filter that eliminates noise while preserving edge information in accordance with detected contours in the second equation that segments the image based on the first equation solutions. This approach allows developing a new algorithm which overcome the studied model drawbacks. Results of the proposed method give clear segments that can be applied to any application. Experiments on many medical images in particular blurry images with high information losses, demonstrate that the developed approach produces superior segmentation results in terms of quantity and quality compared to other models already presented in previeous works.
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
Drug addiction is a complex neurobiological disorder that necessitates comprehensive treatment of both the body and mind. It is categorized as a brain disorder due to its impact on the brain. Various methods such as electroencephalography (EEG), functional magnetic resonance imaging (FMRI), and magnetoencephalography (MEG) can capture brain activities and structures. EEG signals provide valuable insights into neurological disorders, including drug addiction. Accurate classification of drug addiction from EEG signals relies on appropriate features and channel selection. Choosing the right EEG channels is essential to reduce computational costs and mitigate the risk of overfitting associated with using all available channels. To address the challenge of optimal channel selection in addiction detection from EEG signals, this work employs the shuffled frog leaping algorithm (SFLA). SFLA facilitates the selection of appropriate channels, leading to improved accuracy. Wavelet features extracted from the selected input channel signals are then analyzed using various machine learning classifiers to detect addiction. Experimental results indicate that after selecting features from the appropriate channels, classification accuracy significantly increased across all classifiers. Particularly, the multi-layer perceptron (MLP) classifier combined with SFLA demonstrated a remarkable accuracy improvement of 15.78% while reducing time complexity.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
2. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3087
request apparatuses that are good with hadoop [12]. As of late, analysing enormous
unstructured information is a business need. Cluster analysis is one of the mining issues utilized
for investigation like assessment mining, sentimental investigation and popularity
examination [13]. Current systems use devices and advances to process Twitter information
which are utilizing event processing and one message at time investigation [14].
A standout amongst the latest studies utilized several learning frameworks [15] such as
K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Random Forest (RF), and Naïve
Bayes (NB) [16, 17]. The RFA generates better recall, precision, and F-measure values. SVM
all performed similarly by achieving about 93% accuracy in every group. In every one of these
prior studies investigations, classification was used for spam discovery on Twitter. The anomaly
identification framework improvement is for distinguishing spammers on Twitter utilizing account
data and streaming tweets [18, 19].
The main contributions can be stated as: 1) pre-processed utilizing an Improved
Fuzzy C-means clustering to viably cluster the atwitter information then the clustering
is additionally improved by utilizing an Adaptive Particle swarm optimization (PSO) algorithm,
2) pre-processed information is classified utilizing the modified support vector machine (MSVM)
classifier with grid search optimization.This article is presented in different sections as follows:
the related previous studies to the proposed system were reviewed in section 2, while
section 3 briefly discussed the suggested approach. In section 4, the experimental results were
discussed while section 5 presented the conclusion.
2. Research Method
A Hypertext-Induced Topic Search (HITS) was suggested by Leilei et al. [20] based
on the Topic-Decision strategy (TD-HITS) and a Latent Dirichlet Allocation (LDA)-based
Three-Step display (TS-LDA). The framework was suggested for influential spreaders detection
and identification in social media data streams. The proposed TDHITS can easily identify
the number of themes as different related posts in a huge number of posts. TS-LDA can identify
powerful propagators of trending event based on the client data and the post. On a Twitter
dataset, the results showed the efficiency of the suggested methods in events recognition and in
distinguishing powerful event propagators.
Shangsong Liang et al. [21] proposed a work for handling the issue of client clustering
with regards to their distributed short text streams. To acquire better client clustering
performance, they proposed a two-user cooperative interest following models that go for
following changes of every client's dynamic point dissemination as a team with their followers’
dynamic subject dispersions, based both with respect to the content of current short messages
and the recently evaluated conveyances. They also suggested 2 collapsed Gibbs sampling
frameworks for the cooperate inducement of the dynamic advantages of the clients for both
short- and long-term clustering reliance point models.
Streaming data is one of the considerations accepting hotspots for concept-evolution
studies. At the point when another class happens in the information stream it very well may be
considered as another idea thus the concept-evolution. Tahseen et al. [22] highlighted
the problem by characterizing a new collaborative strategy called “class-based’ group which
swaps the conventional "chunk-based" method for repetitive class identification. The study
discussed the attribute of the 2 different techniques in class-based group in order to provide
their detailed analysis and clarification. They also proved the superiority of the "class-based"
groups over procedures by means of observational methodology on various benchmark
databases comprising web remarks as text mining challenge.
Lekha et al. [23] developed a framework for open-source big data called Apache Spark
which is a cloud-based framework that focus on the development of machine learning
framework with respect to big data streaming. In this framework, the user tweets his/her health
traits and the application get the equivalent progressively, extricates the traits and develop
machine learning framework to anticipate client's health status which was then legitimately
informed to him/her immediately to make suitable action.
Senthil and Usha [24] worked on categorizing streams of Twitter data based on
sentiment analysis using hybridization. The study used a URL-based security device to collect
600 million open tweets while feature selection was applied for sentiment investigation.
The ternary classification was performed based on a pre-processing strategy while the results of
3. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3088
the tweets sent by the users are collected. Then, a hybridization approach based on 3
optimization methods (PSO, GA and DT) was applied for classification accuracy using
sentiment analysis. The results were compared with previous works, and their developed
strategy demonstrates a greater than different classifiers analysis.
3. Proposed Methodology
3.1. Phase 1: Adaptive Clustering for Twitter Data Streams in Apache Spark
The presented technique consists of the subsequent steps: initially, input twitter data is
pre-processed using tokenization and stop word removal processes. Then the pre-processed
data is effectively clustered utilizing an Improved Fuzzy C-means clustering with Adaptive
Particle swarm optimization (PSO) algorithm. Finally twitter data streaming using our proposed
method is examined in apache spark engine. The flow diagram of this proposed twitter data
streaming utilizing phase 1 methodology is given in Figure 1.
Figure 1. Flow diagram of phase 1 proposed methodology
3.1.1. Preprocessing
In the proposed twitter data streaming, at first the input data used for the proposed
proficient information streaming is taken from the dataset [25-30]. Here, is the twitter input
dataset. At that point the input twitter data is preprocessed utilizing tokenization and stop word
removal processes which are utilized to expel the conflicting information or noisy information
from dataset. Input data preprocessing incorporates the accompanying processes [31].
a. Symbolization
Symbolization is the task of splitting the input information up into pieces, called tokens,
possibly in the meantime discarding certain characters, like punctuation. Basically, tokenization
is the way toward separating the given text into units called tokens and it is utilized for further
handling. The tokens might be words, number and punctuation sample [32-35]. The reason for
symbolization is to expel all the punctuation marks like commas, full stop, hyphen and brackets.
The input data after applying the tokenization is given in (1):
𝑇̅𝑖 = {𝐷1, 𝐷2, 𝐷3, … , 𝐷 𝑛} (1)
where, 𝑇̅𝑖 is the tokenized data and 𝑖 = 1,2,3, … , 𝑛.
b. Stop word removal
After tokenizing, the tokenized information (𝑇̅𝑖) is given as the contribution for stop word
removing and here some undesired words are rejected by utilizing stop word elimination. Stop
words will be words that are by and large thought to be futile. The purpose for this procedure is
4. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3089
utilized to avoid conjunction, relational words, articles and other continuous words, like adverbs,
action words and adjectives from textual information [36]. Some of the as often as possible
utilized stop words are "a", "me", "of", 'the', 'he', 'she', 'you'. The tokenized information
subsequent to applying the stop word elimination is given in (2):
𝑆 𝑤 = {𝑇1, 𝑇2, 𝑇3, … , 𝑇𝑛} (2)
here, 𝑆 𝑤 is the preprocessed set of data after eliminating stop words and 𝑤 = 1,2,3, … , 𝑛.
3.1.2. Data Aggregation
Aggregation is the process of splitting a set of objects in the dataset into subsets or
cluster. Each subset is a cluster, and attributes in a cluster are similar to each another.
The proposed modified fuzzy clustering algorithm (MFCM) is used for effective clustering where
the performance of the MFCM depends upon the updating the memberships function using
sigmoid function. Additionally MFCM performance is improved by using support value based
adaptive PSO algorithm. The preprocessed data is optimized using support value based
adaptive PSO algorithm before modified fuzzy c-means clustering [37].
Clustering is the process of separating a set of items in the dataset into subsets or
cluster. Every subset is a cluster, and traits in a group are like each another. The proposed
modified fuzzy c-means clustering algorithm (MFCM) is utilized for viable clustering where
the execution of the MFCM relies on the updating the membership functions utilizing sigmoid
function. Also MFCM execution is improved by utilizing support value based adaptive PSO [38].
a. Support value based adaptive PSO
The PSO was developed as a heuristic population-based optimization method which
was inspired by the flocking behaviour of birds. The PSO is presented as a collection of
particles which individually represents a potential solution [39]. The particles pursue a basic
behavior: copy the accomplishment of neighbouring particles and its own accomplished
triumphs. The location of a particle is thusly affected by the best particle in a neighbourhood,
𝑝′ 𝑏𝑒𝑠𝑡 𝑗 just as the arrangement found 𝑔′ 𝑏𝑒𝑠𝑡. Particle position 𝑦𝑗 is balanced utilizing
the accompanying condition:
𝑦𝑗(𝑘′
+ 1) = 𝑦𝑗(𝑘′) + 𝑣𝑗(𝑘′
+ 1) (3)
where, the velocity component 𝑣𝑗 signifies the step size. The velocity is updated via (4):
𝑣𝑗(𝑘′
+ 1) = 𝑤′𝑣𝑗(𝑘′) + 𝑐1 𝑟1{𝑝′ 𝑏𝑒𝑠𝑡 𝑗 − 𝑥𝑗(𝑘′)} + 𝑐2 𝑟2{𝑔′ 𝑏𝑒𝑠𝑡 − 𝑥𝑗(𝑘′)} (4)
where, 𝑤′ is the inertia weight, 𝑐1 and 𝑐2 are the acceleration coefficients 𝑟1, 𝑟2 ∈ [0,1], 𝑝′ 𝑏𝑒𝑠𝑡 𝑗 is
the individual best position of particle 𝑗, and 𝑗′ 𝑏𝑒𝑠𝑡 is the best position of the particles.
At that point, Map the location of each particle into solution space and evaluate its
fitness esteem as indicated by the support value based fitness function. In the meantime,
update 𝑝′ 𝑏𝑒𝑠𝑡 𝑗 and 𝑔′ 𝑏𝑒𝑠𝑡 position if required. The support value is estimated by utilizing (5):
𝑆̅𝑣 =
𝑦1 ∗ 𝑦2 ∗. . . . . . 𝑦𝑛
𝑦1 + 𝑦2+. . . . . . 𝑦𝑛
(5)
Here, 𝑆̅𝑣 denotes the support value, 𝑦1, 𝑦2, … , 𝑦𝑛 signifies the input population. This
updating process is proceeds until a criterion is met, usually it used for finding optimum solution
through number of iterations. The pseudo code of support value based adaptive PSO algorithm
is given in Algorithm 1.
Algorithm 1: Support value base adaptive PSO algorithm
Step 1: Initialization
Set the initial size k’ = 0
Set a population size of NP
Set velocities size vj of the insect
5. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3090
Set 2:
While condition not reached
Do
For j = 1 to NP
Step 3: Calculate 𝑝′ 𝑏𝑒𝑠𝑡 𝑗 and 𝑔′ 𝑏𝑒𝑠𝑡
Evaluate the fitness of particles using (5)
Step 4: Update position and velocity
Calculate the positions and velocities of insect utilizing (3) and (4)
End For
Step 5: Increase the generation count
k’ = k’ + 1
End while
b. Modified fuzzy C-means (MFCM) clustering
Fuzzy c-means is a clustering method which permits the situation of one dataset
belonging to more than one cluster at a time. The suggested MFCM clustering provides better
clustering performance compared to the conventional FCM clustering methods. In modified
fuzzy c means clustering, Let 𝑝 = {𝑝1, 𝑝2, 𝑝3, … , 𝑝𝐼} be the set of data points after adaptive
particle swarm optimization and 𝑞 = {𝑞1, 𝑞2, 𝑞3, … , 𝑞𝐽} be the set of centers. The pseudo code of
modified fuzzy c-means clustering algorithm is given in algorithm 2,
Algorithm 2: pseudo code of modified fuzzy c-means clustering
The MFCM algorithm allots data to every class by utilizing fuzzy memberships.
The modified objective function for partitioning the input dataset into clusters is defined as,
𝐽 𝜎𝑛 = ∑ ∑(𝑣𝑖𝑗)
𝑛
𝐽
𝑗=1
𝐼
𝑖=1
‖𝑝𝑖 − 𝑞𝑗‖
2
𝜎𝑖
(6)
in (6), 𝑝𝑖 represents the data, 𝑞𝑗is the 𝑗 𝑡ℎ
cluster center and 𝑛 is the constant esteem. Where,
sigmoid function 𝜎𝑖 denotes the weighted mean distance in cluster 𝑖, and it is adapted for
the effective clustering in (6) given by:
𝜎𝑖 = {
∑ 𝑣𝑖𝑗
𝑛
‖𝑝𝑖 − 𝑞𝑗‖
2𝑘
𝑗=2
∑ 𝑣𝑖𝑗
𝑛𝑘
𝑗=1
}
1
2⁄
(7)
The function of being member signifies the likelihood of data flew which come from
same cluster. The probability of data in FCM algorithm is based on the distance of individual
Modified fuzzy C-means clustering
Input: input Ippppp ....,,, 321= be the set of data points after
adaptive particle swarm optimization and Jqqqqq ....,,, 321= be
the set of initialized centers.
Output: Clustered data
Begin
1. Initialize the centroids, Jjqj ,....1, =
2. Calculate the fuzzy membership
n
J
by equation (6),
3. At J -step: calculate the fuzzy centers vectors ijv using (8)
4. Compute the weighted mean distance
i
using (7)
5. Update the cluster centroids jz
6. If algorithm converges then STOP;
7. Otherwise return to step 2 until the algorithm converges;
8. return {Cluster}
End
6. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3091
insect with other team in same cluster. The functions of membership and cluster center vectors
are updated by the velocity and particle positions by (8) and (9).
𝑣𝑖𝑗 =
1
∑ (
‖𝑝𝑖 −
𝑞𝑗
𝜎𝑖
⁄ ‖
‖𝑝𝑖 −
𝑞𝑙
𝜎𝑖
⁄ ‖
)
2
𝑛−1
𝐽
𝑙=1
(8)
the clusters centroid values are computed by utilizing (9)
𝑧𝑗 =
∑ 𝑣𝑖𝑗
𝑛
. 𝑥𝑖
𝐼
𝑖=1
∑ 𝑣𝑖𝑗
𝑛𝐼
𝑖=1
(9)
algorithm will continue running till the change between two iterations reach the 𝜉, for the given
sensitivity threshold.
max𝑖𝑗‖𝑉𝑖𝑗
(𝑘)
− 𝑉𝑖𝑗
(𝑘+1)
‖ < 𝜓 (10)
where, 𝜓 = a termination condition lying in the range of 0 and 1, while 𝛿 = the iteration steps.
Repeat the steps until efficient clustering reached.
3.2. Phase 2: Effective Classification for Higgs Data Streams in Apache Spark
In the second stage, the Higgs data streaming is viably performed by pre-processing
the input information. Then the pre-processed information is classified utilizing the modified
support vector machine (MSVM) classifier with grid search optimization. At long last
the optimized information is assessed in spark engine then the assessed esteem is utilized to
discover the confusion matrix is accomplished. The proposed stage 2 work utilizing Higgs
datasets for the data streaming in Apache Spark. The flow diagram of phase 2 methodology for
the effective classification of higgs data streams is given in Figure 2.
Figure 2. Flow diagram of phase 2 proposed methodology
3.2.1. Preprocessing
In the proposed Higgs data streaming, first the input information utilized for
the proposed effective information streaming is taken from the dataset 𝐷′
= {𝐷1
′
, 𝐷2
′
, 𝐷3
′
, … , 𝐷 𝑛
′ }.
Here, 𝐷′ is the Higgs input dataset. Then the input Higgs information is preprocessed utilizing
7. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3092
tokenization and stop word removal processes which are utilized to expel conflicting information
or noisy information from dataset. Here, the input data is first preprocessed by utilizing
tokenization process given in (1) and subsequently tokenized data is processed by utilizing stop
word removal process given in (2).
3.2.2. Data Streaming Classification Grid Search Based Modified Svm
The SVM as a binary classification method is reliant on the structural risk minimization
approach. The SVM initiates by mapping the training data into a hyperplane which divides
2 classes of information in the feature space and maximize the edge of division among itself and
those focuses lying closest to it. This decision surface would then be able to be utilized as a
reason for categorizing unknown information [39]. SVM classification is improved by using
network grid search optimization. The grid search improvement adequately tunes the SVM
parameters for the better assortment.
In (11), 𝑌 = the input space, 𝑦𝑖 ∈ 𝑌= input vectors, 𝑇 = {1, −1} = target space,
𝑡𝑖 ∈ 𝑇 = classes, and 𝑆𝑡 = {(𝑦1, 𝑧1), … , (𝑦 𝑁, 𝑧 𝑁)} = training set. In the SVM, the most extreme
edge hyperplane executes the partitioning of the 2 boundaries 𝑇 = {1, −1}, i.e. the hyperplane
which maximizes the closest distance to the data points and provides the optimum
popularization on new models. Hence, a new point 𝑦𝑗 can be categorized by first defining
the assortment function 𝑔(𝑦𝑗):
𝑔(𝑦𝑗) = sgn ( ∑ 𝑤𝑖 𝑡𝑖 𝐾(𝑦𝑖, 𝑦𝑗) + 𝑏
𝑦 𝑖∈𝑠𝑣
) (11)
where, 𝑠𝑣 = the support vectors, 𝐾(𝑦𝑖, 𝑦𝑗)= kernel function, 𝑤𝑖 = weights, 𝑁= number of training
samples, 𝑏 = offset parameter. If 𝑔(𝑦𝑗) = +1, 𝑦𝑗 is in the positive class, if 𝑔(𝑦𝑗) = −1, 𝑦𝑗 is in
the negative class. Training SVM requires the solution of the accompanying optimization issue
expressed in (12) and (13) so as to attain the weight vector 𝑤 and the offset 𝑏.
min
𝑤,𝑏,𝜉
1
2
𝑤 𝑇
𝑤 + 𝐶′ ∑ 𝜉𝑖
𝑘
0
(12)
where (14) is subject to:
𝑡𝑖(𝑤 𝑇
𝜙(𝑦𝑖) + 𝑏) ≥ 1 − 𝜉𝑖, 𝜉𝑖 ≥ 0 (13)
The reason for employing the Gaussian SVM which employs parameters 𝐶′ and
gamma (𝛾′) is to transform the component vector space into the incensement of remoteness
such that partition can be performed with higher accuracy. The diversion is accomplished
using the kernel function 𝐾(𝑦𝑖, 𝑦𝑗) = 𝜙(𝑦𝑖)𝑇̃ 𝜙(𝑦𝑗), characterized for the Gaussian SVM is
𝐾(𝑦𝑖, 𝑦𝑗) = 𝑒𝑥𝑝 (−𝛾‖𝑦𝑖 − 𝑦𝑗‖
2
) , 𝛾 > 0. The choice of proper learning parameters is a significant
step in acquiring very much tuned support vector machines. For the most part, the settings of
these parameters depend on a grid search. The pseudo code for the optimization of SVM
parameter utilizing Grid search for better classification is given in algorithm 3.
The SVM initialize to main parameters 𝐶′ and gamma (𝛾′) and the procedure of
optimization by isolating the hyper-plane to get identical way of work out the information and
these are the parameter of SVM classifier for the regularization The parameter 𝐶′ characterizes
the mistake of data flew. When the value of 𝐶′ increases the mistake rate also increases and
brings down the number of permitted points in the error range. A smaller value of 𝐶′encourages
a bigger error gap upon the isolation of the hyper-plane. For Gaussian SVM, the 𝛾′ parameter is
determined as it affects its hyper-line adaptability. To reduce the values of 𝛾′, the hyper-plane
line is almost linear, and for increasing the numbers, it works out to be progressively curved.
Expanding the value of 𝛾′ to over-fitting on work out data. This grid search based modified SVM
classification provides the effective process of data streaming.
8. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3093
Algorithm 3: Modified SVM with Grid search optimization
4. Results and Discussion
The implementation of our proposed data streaming using adaptive clustering and
classification is performed in the working stage of Java apache spark. The Twitter dataset and
Higgs dataset is utilized to assess the proposed twitter data streaming. In order to investigate
the performance of the proposed data streaming is distinguished with the existing artificial bee
colony (ABC) optimization and Genetic algorithm (GA) techniques in regards of Recall,
Precision, F-measure and Convergence.
4.1. Performance Analysis of Proposed Clustering
The statistical metrics of F-score, precision, and recall can be expressed in the terms of
TP, FP, FN, and TN Where, TP (true positive), FP (false positive), FN (false negative) and TN
(true negative) esteems. The performance of our proposed work is analysed by utilizing the
statistical measures mentioned in this section.
4.1.1. Precision
The fraction of data recognized which are appropriate to the original data is termed
as precision:
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(14)
the comparison graph of proposed data streaming using improved fuzzy c-means clustering with
existing Fuzzy C-means clustering (FCM) and K-means clustering in terms of precision is
appeared in Figure 3. It depicts the proposed data streaming using improved fuzzy c-means
clustering resulting well in terms of precision than the existing Fuzzy C-means clustering (FCM)
and K-means clustering.
4.1.2. Recall
Recall ascertains the fraction of data which are appropriate to the query data that are
effectively recognized.
9. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3094
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(15)
The comparison graph of proposed data streaming using improved fuzzy c-means
clustering with existing Fuzzy C-means clustering (FCM) and K-means clustering in terms of
recall is appeared in Figure 4. It depicts the proposed data streaming using improved fuzzy
c-means clustering (IFCM) resulting well in terms of recall than the existing Fuzzy C-means
clustering (FCM) and K-means clustering.
Figure 3. Comparison graph in terms
of precision
Figure 4. Comparison graph in terms of recall
4.1.3. F-Score
This value determines the accuracy of a test. The best F-measure value is 1 while
the worst is 0. F-measure is computed using (16).
𝐹𝑀𝑒𝑎𝑠𝑢𝑟𝑒 = 2
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
(16)
The comparison graph of proposed data streaming using improved fuzzy c-means clustering
with existing Fuzzy C-means clustering (FCM) and K-means clustering in terms of
F-score is appeared in Figure 5. It depicts the proposed data streaming using improved fuzzy
c-means clustering resulting well in terms of F-score than the existing Fuzzy C-means clustering
(FCM) and K-means clustering.
4.1.4. Convergence Graph
The convergence graph of the suggested PSO using data streaming with ABC
optimization and GA techniques is given in Figure 6. In the proposed PSO system,
the convergence occurs between fitness and number of iterations is better than the existing
ABC and GA convergence.
4.1.5. Computational Time
It is the quantity of time taken for the completion of proposed twitter data streaming.
The computational time of data streaming in seconds can be obtained from the data stream size
in bit and the bit rate in bit/sec as:
𝑇𝑐 =
𝑆 𝑑
𝑏 𝑟
⁄ (17)
where, 𝐶̃ 𝑇 be the computational time of classification, 𝐷𝑠 be the size of the data stream, 𝐵𝑟 be
the Bit rate. The performance result of our proposed IFCM with existing FCM and K-means
10. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3095
clustering in terms of computational time is given in Figure 7. It depicts the proposed data
streaming using improved fuzzy c-means clustering (IFCM) achieved better computational time
compared to FCM and K-means clustering. The comparison results regarding of various
performance measures utilizing adaptive clustering is depicted in Table 1.
Figure 5. Comparison graph in terms
of F-score
Figure 6. Convergence graph of proposed
PSO utilized clustering with existing ABC
and GA techniques
Figure 7. Comparison graph in terms of computational time
4.2. Average Classification Error Percentage
The comparison assessment of the classification error percentage is given in Table 2.
The proposed modified support vector machine (MSVM) classification error percentage is
significantly lesser than the existing SVM and Anti-Bayes Multi classification.
Table 1. Comparison of
Proposed Clustering
Method Precision Recall F-measure
Proposed IFCM 95.7 93.2 94.43
FCM 77.9 75 76.42
K-means 69.09 67.81 68.44
Table 2. Comparison of Proposed
Classification in Terms Adaptive Clustering
Algorithm
Classification Error
Percentage
SVM 26.99
Anti-Bayes Multi
Classification
15.99
Proposed (MSVM) 13.13
11. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3096
4.2.1. Receiver Operating Characteristic (Roc) Curve
The ROC curve is a probability plot which expresses the fitness of a model in class
recognition. The ROC curve is generated by plotting the TP values against the FP values.
The assessment graph of proposed MSVM with existing Anti-Bayes Multi Classification and
SVM in terms of ROC is displayed in Figure 8. The convergence graph of the proposed grid
search optimized search optimization utilized classification with existing BAT and Cuckoo
search optimization techniques is given in Figure 9. In the proposed system, the convergence
occurs between fitness and number of iterations is better than the existing BAT and Cuckoo
search optimization convergences.
(a) (b)
(c)
Figure 8. Comparison between ROC curves obtained for
the (a) proposed MSVM with existing (b) anti-Bayes multi classification and (c) SVM
4.2.2. Simulation Time
It is the quantity of time taken for the completion of proposed twitter data streaming.
The computational time of data streaming in seconds can be obtained from the data stream size
in bit and the bit rate in bit/sec as:
𝑇𝑐 =
𝑆 𝑑
𝑏 𝑟
⁄ (18)
where, 𝑇𝑐 is the simulation time of classification, 𝑆 𝑑 is the size of the data stream, 𝑏 𝑟 is the Bit
rate. The performance result of our proposed MSVM with existing SVM and Anti-Bayes Multi
Classification in terms of computational time is given in Figure 10. Figure 10 depicts
the proposed data streaming using MSVM provides better results in terms of computational time
than the existing SVM and Anti-Bayes Multi Classification.
12. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3097
Figure 9. Convergence graph of proposed grid
search optimization utilized classification with
existing BAT and Cuckoo search techniques
Figure 10. Comparison graph in terms of
computational time
4.2.3. Accuracy
Accuracy is percentage of real outcome irrespective of the existence of TP or TN in a
given population. It estimates the level of accuracy of a data classification process. Accuracy is
computed using (19).
Accuracy=
(TN+TP)
(TN+TP+FN+FP)
(19)
The comparison results regarding of accuracy using proposed MSVM classification with
existing SVM and Anti-Bayes Multi Classification is depicted in Table 3. The comparison graph
of proposed MSVM classification with existing SVM and Anti-Bayes Multi Classifications in
terms of accuracy is shown in Figure 11. It illustrates the proposed MSVM classification
provides better classification results than the existing SVM and Anti-Bayes Multi Classifications.
Figure 11. Comparison graph-based classification accuracy
Table 3. Comparison of Proposed Clustering based Accuracy
Dataset size MSVM SVM Anti-Bayes Multi Classification
1000 94.44 84.70 83.01795
2000 92.45 86.30 85.2773
3000 93.21 85.99 83.2773
4000 92.90 85.79 82.8422
5000 93.33 86.08 84.8422
13. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 6, December 2019: 3086-3099
3098
5. Conclusion
In this paper we have presented an effective twitter data streaming using adaptive
clustering and classification algorithm. Here, the pre-processed data utilizing Improved Fuzzy
C-means clustering effectively clusters the twitter information with improved by utilizing an
Adaptive Particle swarm optimization (PSO) algorithm. Furthermore, the modified support vector
machine (MSVM) classifier with grid search optimization effectively performs the twitter data
streaming. The experimental outcomes exhibits that our proposed data streaming outperforms
the existing ABC, GA optimized clustering and also existing SVM and Anti-Bayes Multi
classifications regarding performance measures such as, accuracy, precision, recall,
convergence, ROC curve and F-score. This results proves that the proposed clustering
technique effectively process the twitter data streaming than the existing techniques and also
the proposed classification technique effectively process the Higgs data streaming than
the existing techniques.
References
[1] Danthala MK. Tweet analysis: twitter data processing using Apache Hadoop. International Journal of
Core Engineering & Management (IJCEM). 2015; 1: 94-102.
[2] Elzayady H, Badran KM, Salama GI. Sentiment Analysis on Twitter Data using Apache Spark
Framework. 2018 13th
International Conference on Computer Engineering and Systems (ICCES).
2018: 171-176.
[3] Tasoulis SK, Vrahatis AG, Georgakopoulos SV, Plagianakos VP. Real Time Sentiment Change
Detection of Twitter Data Streams. arXiv preprint arXiv: 1804.00482. 2018.
[4] Wang, Yunli, and Cyril Goutte. Detecting Changes in Twitter Streams using Temporal Clusters of
Hashtags. Proceedings of the Events and Stories in the News Workshop: 10-14. 2017.
[5] Panagiotou N, Katakis I, Gunopulos D. Detecting events in online social networks: Definitions, trends
and challenges. In: Michaelis S, Piatkowski N, Stolpe M. Editors. Solving Large Scale Learning Tasks
Challenges and Algorithms. Springer, Cham. 2016: 42-84.
[6] Wu Y, Cao N, Gotz D, Tan YP, Keim DA. Keim. A survey on visual analytics of social media data.
IEEE Transactions on Multimedia. 2016; 18(11): 2135-2148.
[7] Cao J, Guo J, Li X, Jin Z, Guo H, Li J. Automatic rumor detection on microblogs: A survey. arXiv
preprint arXiv:1807.03505. 2018.
[8] Anantharam P, Thirunarayan K, Sheth A. Topical anomaly detection from twitter stream. Proceedings
of the 4th
Annual ACM Web Science Conference. 2012: 11-14.
[9] Sakshi SU. Twitter Streaming API Using Apache Spark in Big Data Analytics. International Journal of
Scientific & Engineering Research. 2016; 7(12): 354- 359.
[10] Ali AH, Abdullah MZ. Recent trends in distributed online stream processing platform for big data:
Survey. 2018 1st
Annual International Conference on Information and Sciences (AiCIS). 2018.
[11] Yadranjiaghdam B, Yasrobi S, Tabrizi N. Developing a real-time data analytics framework for twitter
streaming data. 2017 IEEE International Congress on Big Data (BigData Congress). 2017: 329-336.
[12] Liu SM, Chen JH. A multi-label classification based approach for sentiment classification. Expert
Systems with Applications. 2015; 42(3): 1083-1093.
[13] Ahmed MA, Hasan RA, Ali AH, Mohammed MA. The classification of the modern Arabic poetry using
machine learning. TELKOMNIKA Telecommunication Computing Electronics and Control. 2019; 17(5):
2667-2674.
[14] Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH. Twitter spammer detection using data stream
clustering. Information Sciences. 2014; 260: 64-73.
[15] Alrubaian M, Al-Qurishi M, Hassan MM, Alamri A. A credibility analysis system for assessing information
on twitter. IEEE Transactions on Dependable and Secure Computing. 2018; 15(4): 661-674.
[16] Shi L, Wu Y, Liu L, Sun X, Jiang L. Event detection and identification of influential spreaders in social
media data streams. Big Data Mining and Analytics. 2018; 1(1): 34-46.
[17] Liang S, Yilmaz E, Kanoulas E. Collaboratively tracking interests for user clustering in streams of short
texts. IEEE Transactions on Knowledge and Data Engineering. 2019; 31(2): 257-272.
[18] Al-Khateeb T, Masud MM, Al-Naami KM, Seker SE, Mustafa AM, Khan L, Trabelsi Z, Aggarwal C,
Han J. Recurring and novel class detection using class-based ensemble for evolving data stream.
IEEE Transactions on Knowledge and Data Engineering. 2016; 28(10): 2752-2764.
[19] Nair LR, Shetty SD, Shetty SD. Applying spark based machine learning model on streaming big data
for health status prediction. Computers & Electrical Engineering. 2018; 65: 393-399.
[20] Nagarajan SM, Gandhi UD. Classifying streaming of Twitter data based on sentiment analysis using
hybridization. Neural Computing and Applications. 2018; 31(5): 1-9.
14. TELKOMNIKA ISSN: 1693-6930 ◼
An adaptive clustering and classification algorithm for Twitter... (Raed A. Hasan)
3099
[21] Mohammed, NQ, Ahmed MS, Mohammed MA, Hammood OA, Alshara HAN, Kamil AA. Comparative
Analysis between Solar and Wind Turbine Energy Sources in IoT Based on Economical and Efficiency
Considerations. In 2019 22nd
International Conference on Control Systems and Computer Science
(CSCS). IEEE. 2019; 448-452.
[22] Hasan RA, Mohammed MA, Salih ZH, Bin Ameedeen MA, Ţăpuş N, Mohammed MN. HSO: A Hybrid
Swarm Optimization Algorithm for Reducing Energy Consumption in the Cloudlets. TELKOMNIKA
Telecommunication Computing Electronics and Control. 2018; 16(5): 2144-2154.
[23] Hasan RA, Mohammed MA, Ţăpuş N, Hammood OA. A comprehensive study: Ant Colony
Optimization (ACO) for facility layout problem. 2017 16th RoEduNet Conference: Networking in
Education and Research (RoEduNet). 2017: 1-8.
[24] Hasan RA, Mohammed MN. A krill herd behaviour inspired load balancing of tasks in cloud
computing. Studies in Informatics and Control. 2017; 26(4): 413-424.
[25] Mohammed MA, Salih ZH, Ţăpuş N, Hasan RAK. Security and accountability for sharing the data
stored in the cloud. 2016 15th
RoEduNet Conference: Networking in Education and Research. 2016:
1-5.
[26] Mohammed MA, Ţăpuş N. A Novel Approach of Reducing Energy Consumption by Utilizing Enthalpy
in Mobile Cloud Computing. Studies in Informatics and Control. 2017; 26: 425-434.
[27] Mohammed MA, Hasan RA. Particle swarm optimization for facility layout problems FLP—A
comprehensive study. 2017 13th
IEEE International Conference on Intelligent Computer
Communication and Processing (ICCP). 2017: 93-99.
[28] Salih ZH, Hasan GT, Mohammed MA. Investigate and analyze the levels of electromagnetic radiations
emitted from underground power cables extended in modern cities. 2017 9th
International Conference
on Electronics, Computers and Artificial Intelligence (ECAI). 2017: 1-4.
[29] Mohammed MA, Hasan RA, Ahmed MA, Tapus N, Shanan MA, Khaleel MK, Ali AH. A Focal load
balancer based algorithm for task assignment in cloud environment. 2018 10th
International
Conference on Electronics, Computers and Artificial Intelligence (ECAI). 2018: 1-4.
[30] Z. Salih ZH, Hasan GT, Mohammed MA, Klib MAS, Ali AH, Ibrahim RA, editors. Study the Effect of
Integrating the Solar Energy Source on Stability of Electrical Distribution System. 2019 22nd
International Conference on Control Systems and Computer Science (CSCS); 2019 28-30 May 2019.
[31] Ahmed MA, Hasan RA, Ali AH, Mohammed MA. Using machine learning for the classification of
the modern arabic poetry. TELKOMNIKA Telecommunication Computing Electronics and Control.
2019; 17(5): 2667-2674.
[32] Hasan RA, Mohammed MN, Ameedeen MAB, Khalaf ET. Dynamic Load Balancing Model Based on
Server Status (DLBS) for Green Computing. Advanced Science Letters. 2018; 24(10): 7777-7782.
[33] Hammood OA, Nizam N, Nafaa M, Hammood WA. RESP: Relay Suitability-based Routing Protocol for
Video Streaming in Vehicular Ad Hoc Networks. International Journal of Computers, Communications
& Control. 2019; 14(1): 21-38.
[34] Hammood OA, Kahar MNM, Mohammed MN.. Enhancement the video quality forwarding Using
Receiver-Based Approach (URBA) in Vehicular Ad-Hoc Network. 2017 International Conference on
Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET). 2017: 64-67.
[35] Ayoob AA, Su G, Wang D, Mohammed MN, Hammood OA. Hybrid LTE-VANETs based optimal radio
access selection. International Conference of Reliable Information and Communication Technology.
2017: 189-200.
[36] Salh A, Audah L, Shah NSM, Hamzah SA. Pilot reuse sequences for TDD in downlink multi-cells to
improve data rates. TELKOMNIKA Telecommunication Computing Electronics and Control. 2019;
17(5): 2161-2168.
[37] Al-janabi HDK, Al-janabi HDK, Al-Bukamrh RAH. Impact of Light Pulses Generator in Communication
System Application by Utilizing Gaussian Optical Pulse. In 2019 22nd
International Conference on
Control Systems and Computer Science (CSCS). IEEE. 2019; 459-464.
[38] Hammood, OA Kahar, MNM Mohammed, MN Hammood WA, Sulaiman J.
The VANET-Solution Approach for Data Packet Forwarding Improvement. Advanced Science Letters.
2018; 24(10): 7423-7427.
[39] Mohammed, MN Nahar, AK Abdalla, AN Hammood, OA.
Peak-to-average power ratio reduction based on optimized phase shift technique. In 2017 17th
International Symposium on Communications and Information Technologies (ISCIT). IEEE. 2017; 1-6.