This document presents a new algorithm for extracting and summarizing news from online newspapers. The algorithm first extracts news related to the topic using keyword matching. It then distinguishes different types of news about the same topic. A term frequency-based summarization method is used to generate summaries. Sentences are scored based on term frequency and the highest scoring sentences are selected for the summary. The algorithm was evaluated on news datasets from various newspapers and showed good performance in intrinsic evaluation metrics like precision, recall and F-score. Thus, the proposed method can effectively extract and summarize online news for a given keyword or topic.
HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STREN...IJCSEA Journal
The multimedia information retrieval from World Wide Web is a challenging issue. Describing multimedia object in general, images in particular with low-level features increases the semantic gap. From WWW, information present in a HTML document as textual keywords can be extracted for capturing semantic information with the view to narrow the semantic gap. The high-level textual information of images can be extracted and associated with the textual keywords, which narrow down the search space and improve the precision of retrieval. In this paper, a strength matrix is being proposed, which is based on the frequency of occurrence of keywords and the textual information pertaining to image URLs. The strength of these textual keywords are estimated and used for associating these keywords with the images present in the documents. The high-level semantics of the image is described in the HTML documents in the form of image name, ALT tag, optional description, etc., is used for estimating the strength. In addition, word position and weighting mechanism is also used for further improving the association textual keywords with the image related text. The effectiveness of information retrieval of the proposed technique is found to be comparatively better than many of the recently proposed retrieval techniques. The experimental results of the proposed method endorse the fact that image retrieval using image information and textual keywords is better than those of the text based and the content-based approaches.
A BOOTSTRAPPING METHOD FOR AUTOMATIC CONSTRUCTING OF THE WEB ONTOLOGY INSTANC...IJwest
With the phenomenal growth of the Web resources, to construct ontologies by using existing resources structured in the Web has gotten more and more attention. Previous studies for constructing ontologies from the Web have not carefully considered all the semantic features of the Web documents. Hereby it is difficult to correctly construct ontology elements from the Web documents that are increasing daily. The machine learning methods play an important role in automatic constructing of the Web ontology. Bootstrapping technique is a semi-supervised learning method that can automatically generate many terms from the few seed terms entered by human. This paper proposes bootstrapping method that can automatically construct instances and data type properties of the Web ontology, taking proper noun as semantic core element of the Web table. Experimental result shows that proposed method can rapidly and effectually construct instances and its properties of the Web ontology
An Extensible Web Mining Framework for Real KnowledgeIJEACS
With the emergence of Web 2.0 applications that bestow rich user experience and convenience without time and geographical restrictions, web usage logs became a goldmine to researchers across the globe. User behavior analysis in different domains based on web logs has its utility for enterprises to have strategic decision making. Business growth of enterprises depends on customer-centric approaches that need to know the knowledge of customer behavior to succeed. The rationale behind this is that customers have alternatives and there is intense competition. Therefore business community needs business intelligence to have expert decisions besides focusing customer relationship management. Many researchers contributed towards this end. However, the need for a comprehensive framework that caters to the needs of businesses to ascertain real needs of web users. This paper presents a framework named eXtensible Web Usage Mining Framework (XWUMF) for discovering actionable knowledge from web log data. The framework employs a hybrid approach that exploits fuzzy clustering methods and methods for user behavior analysis. Moreover the framework is extensible as it can accommodate new algorithms for fuzzy clustering and user behavior analysis. We proposed an algorithm known as Sequential Web Usage Miner (SWUM) for efficient mining of web usage patterns from different data sets. We built a prototype application to validate our framework. Our empirical results revealed that the framework helps in discovering actionable knowledge.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STREN...IJCSEA Journal
The multimedia information retrieval from World Wide Web is a challenging issue. Describing multimedia object in general, images in particular with low-level features increases the semantic gap. From WWW, information present in a HTML document as textual keywords can be extracted for capturing semantic information with the view to narrow the semantic gap. The high-level textual information of images can be extracted and associated with the textual keywords, which narrow down the search space and improve the precision of retrieval. In this paper, a strength matrix is being proposed, which is based on the frequency of occurrence of keywords and the textual information pertaining to image URLs. The strength of these textual keywords are estimated and used for associating these keywords with the images present in the documents. The high-level semantics of the image is described in the HTML documents in the form of image name, ALT tag, optional description, etc., is used for estimating the strength. In addition, word position and weighting mechanism is also used for further improving the association textual keywords with the image related text. The effectiveness of information retrieval of the proposed technique is found to be comparatively better than many of the recently proposed retrieval techniques. The experimental results of the proposed method endorse the fact that image retrieval using image information and textual keywords is better than those of the text based and the content-based approaches.
A BOOTSTRAPPING METHOD FOR AUTOMATIC CONSTRUCTING OF THE WEB ONTOLOGY INSTANC...IJwest
With the phenomenal growth of the Web resources, to construct ontologies by using existing resources structured in the Web has gotten more and more attention. Previous studies for constructing ontologies from the Web have not carefully considered all the semantic features of the Web documents. Hereby it is difficult to correctly construct ontology elements from the Web documents that are increasing daily. The machine learning methods play an important role in automatic constructing of the Web ontology. Bootstrapping technique is a semi-supervised learning method that can automatically generate many terms from the few seed terms entered by human. This paper proposes bootstrapping method that can automatically construct instances and data type properties of the Web ontology, taking proper noun as semantic core element of the Web table. Experimental result shows that proposed method can rapidly and effectually construct instances and its properties of the Web ontology
An Extensible Web Mining Framework for Real KnowledgeIJEACS
With the emergence of Web 2.0 applications that bestow rich user experience and convenience without time and geographical restrictions, web usage logs became a goldmine to researchers across the globe. User behavior analysis in different domains based on web logs has its utility for enterprises to have strategic decision making. Business growth of enterprises depends on customer-centric approaches that need to know the knowledge of customer behavior to succeed. The rationale behind this is that customers have alternatives and there is intense competition. Therefore business community needs business intelligence to have expert decisions besides focusing customer relationship management. Many researchers contributed towards this end. However, the need for a comprehensive framework that caters to the needs of businesses to ascertain real needs of web users. This paper presents a framework named eXtensible Web Usage Mining Framework (XWUMF) for discovering actionable knowledge from web log data. The framework employs a hybrid approach that exploits fuzzy clustering methods and methods for user behavior analysis. Moreover the framework is extensible as it can accommodate new algorithms for fuzzy clustering and user behavior analysis. We proposed an algorithm known as Sequential Web Usage Miner (SWUM) for efficient mining of web usage patterns from different data sets. We built a prototype application to validate our framework. Our empirical results revealed that the framework helps in discovering actionable knowledge.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract In today’s competitive world paper less work is gaining utmost importance. For this to happen role of Web Based Systems are incomparable. Different sectors like banking, retail are fully ported towards Web Based Systems, whereas the education sector is also not far behind to them. All most all universities or institutions are providing their own web portal for notification of news related to seminar/workshop/examination/result. In this article we have considered web portal of BPUT, Odisha with url http://results.bput.ac.in. More specifically we put our interest on the way results are being published or displayed. In this web portal for some cases the results are being displayed in an unorganized manner over multiple pages. On this unorganized data we are applying the concepts of Web Content Mining and providing a Web Content Mining tool for an organized access/view to the above said web contents. Keywords: Web Based System, Web Content, Web Content Mining, Web Content Mining Tool, Organized view of unorganized Web Content
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
This slides are for a presentation at the 2009 IEEE/WIC/ACM International Conference on Web Intelligence. The major emphasis to this paper is concentrating on how to provide more personalized search support for a specific user considering his/her historical interests or recent interests. Cognitive memory retention like models are proposed and implemented in this system. Other supporting functionalities, such as domain distribution support, etc. are briefly mentioned. The whole paper can be downloaded from http://www.iwici.org/~yizeng/papers/WI2009-camera-ready.pdf
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
This tutorial, offered at the 10th International Conference on Web Engineering, presents the peculiarities of advanced Web search applications, describes some tools and techniques that can be exploited, and offers a methodological approach to development. The approach proposed in this tutorial is based on the paradigm of Model Driven Development (MDD), where models are the core artifacts of the application life-cycle and model transformations progressively refine models to achieve an executable version of the system. To cope with the process-intensive nature of the main interactions (i.e., content analysis, query management, etc.), we describe the use of Process Models (e.g., BPMN models). Indeed, search-based applications are considered as process- and content-intensive applications, due to the trends towards exploratory search and search as a process visions.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, research and review articles, engineering journal, International Journal of Engineering Inventions, hard copy of journal, hard copy of certificates, journal of engineering, online Submission, where to publish research paper, journal publishing, international journal, publishing a paper, hard copy journal, engineering journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A survey on ontology based web personalizationeSAT Journals
Abstract Over the last decade the data on World Wide Web has been growing in an exponential manner. According to Google the data is accelerating with a speed of billion pages per day [24]. Internet has around 2 million users accessing the World Wide Web for various information [25].These numbers certainly raise a severe concern over information over load challenges for the users. Many researchers have been working to overcome the challenge with web personalization, many researchers are looking at ontology based web personalization as an answer to the information overload, as each individual is unique. In this paper we present an overview of ontology based web personalization, Challenges and a survey of the work. This paper also points future work in web personalization. Index Terms: Web Personalization, Ontology, User modeling, web usage mining.
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM IAEME Publication
In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a sentence extraction based automatic multi-document summarization system that employs fuzzy logic and Genetic Algorithm (GA). At first, the different features are used to identify the significance of sentences in such a way that, each sentence in the documents is specified with the feature score.
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
Query sensitive summarization aims at providing the users with the summary of the contents of single or
multiple web pages based on the search query. This paper proposes a novel idea of generating a
comparative summary from a set of URLs from the search result. User selects a set of web page links from
the search result produced by search engine. Comparative summary of these selected web sites is
generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are
segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the
query and feature keywords. The important sentences from the concept blocks of different web pages are
extracted to compose the comparative summary on the fly. This system reduces the time and effort required
for the user to browse various web sites to compare the information. The comparative summary of the
contents would help the users in quick decision making.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract In today’s competitive world paper less work is gaining utmost importance. For this to happen role of Web Based Systems are incomparable. Different sectors like banking, retail are fully ported towards Web Based Systems, whereas the education sector is also not far behind to them. All most all universities or institutions are providing their own web portal for notification of news related to seminar/workshop/examination/result. In this article we have considered web portal of BPUT, Odisha with url http://results.bput.ac.in. More specifically we put our interest on the way results are being published or displayed. In this web portal for some cases the results are being displayed in an unorganized manner over multiple pages. On this unorganized data we are applying the concepts of Web Content Mining and providing a Web Content Mining tool for an organized access/view to the above said web contents. Keywords: Web Based System, Web Content, Web Content Mining, Web Content Mining Tool, Organized view of unorganized Web Content
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
This slides are for a presentation at the 2009 IEEE/WIC/ACM International Conference on Web Intelligence. The major emphasis to this paper is concentrating on how to provide more personalized search support for a specific user considering his/her historical interests or recent interests. Cognitive memory retention like models are proposed and implemented in this system. Other supporting functionalities, such as domain distribution support, etc. are briefly mentioned. The whole paper can be downloaded from http://www.iwici.org/~yizeng/papers/WI2009-camera-ready.pdf
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
This tutorial, offered at the 10th International Conference on Web Engineering, presents the peculiarities of advanced Web search applications, describes some tools and techniques that can be exploited, and offers a methodological approach to development. The approach proposed in this tutorial is based on the paradigm of Model Driven Development (MDD), where models are the core artifacts of the application life-cycle and model transformations progressively refine models to achieve an executable version of the system. To cope with the process-intensive nature of the main interactions (i.e., content analysis, query management, etc.), we describe the use of Process Models (e.g., BPMN models). Indeed, search-based applications are considered as process- and content-intensive applications, due to the trends towards exploratory search and search as a process visions.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, research and review articles, engineering journal, International Journal of Engineering Inventions, hard copy of journal, hard copy of certificates, journal of engineering, online Submission, where to publish research paper, journal publishing, international journal, publishing a paper, hard copy journal, engineering journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A survey on ontology based web personalizationeSAT Journals
Abstract Over the last decade the data on World Wide Web has been growing in an exponential manner. According to Google the data is accelerating with a speed of billion pages per day [24]. Internet has around 2 million users accessing the World Wide Web for various information [25].These numbers certainly raise a severe concern over information over load challenges for the users. Many researchers have been working to overcome the challenge with web personalization, many researchers are looking at ontology based web personalization as an answer to the information overload, as each individual is unique. In this paper we present an overview of ontology based web personalization, Challenges and a survey of the work. This paper also points future work in web personalization. Index Terms: Web Personalization, Ontology, User modeling, web usage mining.
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM IAEME Publication
In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a sentence extraction based automatic multi-document summarization system that employs fuzzy logic and Genetic Algorithm (GA). At first, the different features are used to identify the significance of sentences in such a way that, each sentence in the documents is specified with the feature score.
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
Query sensitive summarization aims at providing the users with the summary of the contents of single or
multiple web pages based on the search query. This paper proposes a novel idea of generating a
comparative summary from a set of URLs from the search result. User selects a set of web page links from
the search result produced by search engine. Comparative summary of these selected web sites is
generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are
segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the
query and feature keywords. The important sentences from the concept blocks of different web pages are
extracted to compose the comparative summary on the fly. This system reduces the time and effort required
for the user to browse various web sites to compare the information. The comparative summary of the
contents would help the users in quick decision making.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
Design of optimal search engine using text summarization through artificial i...TELKOMNIKA JOURNAL
Natural language processing is the trending topic in the latest research areas, which allows the developers to create the human-computer interactions to come into existence. The natural language processing is an integration of artificial intelligence, computer science and computer linguistics. The research towards natural Language Processing is focused on creating innovations towards creating the devices or machines which operates basing on the single command of a human. It allows various Bot creations to innovate the instructions from the mobile devices to control the physical devices by allowing the speech-tagging. In our paper, we design a search engine which not only displays the data according to user query but also performs the detailed display of the content or topic user is interested for using the summarization concept. We find the designed search engine is having optimal response time for the user queries by analyzing with number of transactions as inputs. Also, the result findings in the performance analysis show that the text summarization method has been an efficient way for improving the response time in the search engine optimizations.
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
Knowledge is the ultimate output of decisions on a dataset. The revolution of the Internet has made the global distance closer with the touch on the hand held electronic devices. Usage of social media sites have increased in the past decades. One of the most popular social media micro blog is Twitter. Twitter has millions of users in the world. In this paper the analysis of Twitter data is performed through the text contained in hash tags. After Preprocessing clustering algorithms are applied on text data. The different clusters formed are compared through various parameters. Visualization techniques are used to portray the results from which inferences like time series and topic flow can be easily made. The observed results show that the hierarchical clustering algorithm performs better than other algorithms.
A Multimodal Approach to Incremental User Profile Building dannyijwest
In the navigational applications, radar and satellite requires a device that is a radar altimeter. The working frequency of this system is 4.2 to 4.3GHz and also requires less weight, low profile, and high gain antennas. The above mentioned application is possible with microstrip antenna as also known as planar antenna. In this paper, the microstrip antennas are designed at 4.3GHz (C-band) in rectangular and circular shape patch antennas in single element and arrays with parasitic elements placed in H-plane coupling. The performance of all these shapes is analyzed in terms of radiation pattern, half power points, and gain and impedance bandwidth in MATLAB. This work extended here with designed in different shapes like Rhombic, Pentagon, Octagon and Edges-12 etc. Further these parameters are simulated in ANSOFT- HFSSTM V9.0 simulator.
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
In this new era, where tremendous information is available on the internet, it is of most important to
provide the improved mechanism to extract the information quickly and most efficiently. It is very difficult
for human beings to manually extract the summary of large documents of text. Therefore, there is a
problem of searching for relevant documents from the number of documents available, and absorbing
relevant information from it. In order to solve the above two problems, the automatic text summarization is
very much necessary. Text summarization is the process of identifying the most important meaningful
information in a document or set of related documents and compressing them into a shorter version
preserving its overall meanings. More specific, Abstractive Text Summarization (ATS), is the task of
constructing summary sentences by merging facts from different source sentences and condensing them
into a shorter representation while preserving information content and overall meaning. This Paper
introduces a newly proposed technique for Summarizing the abstractive newspapers’ articles based on
deep learning.
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
Research Report on Document Indexing-Nithish KumarNithish Kumar
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
1. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 6
Abstract— Online newspaper plays an important role for
the development of world. But it consists of several types of
labels, titles and links. As online newspapers are collection of
variety of newspaper, it is often much more difficult to extract
and summarize the news. To improve the accuracy a new
algorithm is introduced here based on web extraction and
summarization. Firstly, the news from newspapers are
extracted which are related to the topic. If different types of
news are found about the same topic then it has distinguished.
Then a summarization-based algorithm has proposed to
summarize the news. Basically, term frequency has used for
summarization and evaluate it along with several
newspapers’ contents. Various forms of words are also
compared such as Noun, Adjective, Adverb etc. So that the
term frequency can be counted more accurately. It will be
very helpful for a user who wants to find out very specific
news from the newspapers.
Keywords— Extraction, online news, precision, sentence
scoring, summary, term frequency.
I. INTRODUCTION
Information retrieval is the term that specifies extraction of
relevant information from various documents. Information
retrieval can be done in different ways. Web data extraction
is one of them. Data contained in websites (newspaper) is
increasing exponentially. But much of this information
cannot be used by other applications. As most of the web data
will be in XML format, it will solve the problem in future.
But now this is not the case and information in web have to
be retrieved efficiently. So, their emerged a new source of
information retrieval technique which is extraction of web
data. It is a process through which data can be extracted from
web without loss of information. Web data is in semi-
structured format. To extract data from web, it is necessary to
analyze each word and tag found in the particular website.
Present usability of the online news largely depends on
news summarization(web). Tailoring of the content of Web
documents to match specific displays through web document
summarization in an accessibility purpose, mainly range from
snippet generation by search engines, (e.g. for blind people).
To summarize automatically, plain text document is used.
Manuscript revised November 23, 2019 and published on December 03,
2019
Senjuthi Bhattacharjee, Lecturer, Dept. of Computer Science &
Engineering, Premier University, Chattogram , Bangladesh.
Asma Joshita Trisha, Lecturer, Dept. of Computer Science & Engineering,
Premier University, Chattogram , Bangladesh.
In an HTML document there are many elements like as
pictures, which cannot be summarized and it is difficult to
distinguish the relevant information among many news. In
recent years, many applications are introduced which
particularly works with the content of a HTML document.
Here the context of the document has used where information
is retrieved from all the documents linking to it.
Online news Summarization is a technique that search
newspaper for specific query and returns a compact summary
for a given newspaper to representing its main content. Here
the main purpose is to generate a compatible summary which
are as good as the summaries done by a person.
Textual snippet is the most widespread search-based
summarization (Zhanying He et al., 2013). When a user
submitted a query, web search engine provides the reference
for sequence of top-k documents. Each document contains a
title, a snippet, a URL. When there is less time for browsing
the site, web summary helps user to get idea about the content
of the page. This extracts the sentences which are more
significant from a web page and generates a summary to the
user. The web includes different kind of information like text,
images, video and audio. So, it is necessary to extract relevant
result. The good web page summary must be a clear, a simple
guide what is on the page.
There are two types of summary such as an abstract and an
extract. When the summary consists of remarkable text units
selected from the input then it is called extract summary
(N.Moratanch and S.Chitrakala, 2017). An abstract is a brief
summary of a definite subject, which are generated by
computing the noticeable units selected from an input. Text
units which are not present in input text can also be included
in abstract summary (N. Moratanch and S. Chitrakala, 2016).
II. RELATED WORKS
A well-known method is the centroid-based method
(Xindong Wu et al., 2011), in this method, TFID feature is
used for calculating the sentence score. For each single
feature, the score is calculated and then combine it for the
whole sentence. To extract and summarize online newspaper
for a single phrase it is required to categorize the news firstly.
Then summarization of the particular portion is done. There
is a approach named Conditional Random Fields (CRF) based
Sabrina Akter, Student, Dept. of Computer Science & Engineering,
Chittagong University of Engineering & Technology, Chattogram,
Bangladesh.
An Effective Approach for Online News Extraction and
Summarization for a Single Phase
Senjuthi Bhattacharjee, Asma Joshita Trisha, and Sabrina Akter
2. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 7
framework to treat the summarization task as a sequence
labeling problem (Dou Shen et al., 2007). The sentences
which has highest scores are extracted in extraction-based
summarization (Xiaojun Wan et al., 2007). There are some
approaches which mainly combines several sentence features
(Minqing Hu and Bing Liu, 2006). Now-a-days there are
various extraction-based approach for web
classification/categorization (Ioannis Antonellis et al., 2006)
and summarization (Furu Wei et al., 2008). Sentence
redundancy is a big obstacle for summary sentences. To
remove redundancy between summary sentences, The MMR
algorithm (Mohammad Al Hasan, 2009) is also another
popular approach. The Frequent Pattern Mining (FPM)
algorithms (Mohammad Al Hasan, 2009) is also used to
calculate the complex features, such as set, sequence, tree,
graph, etc. But large output set size causes lacking of
interpretability, and that’s why potentiality of this approach
is very low comparing to another algorithm.
An online newspaper generally contains a variety of
information cantered around a main title. To get the
summarized news for a single phrase, section-based
categorization (Giuseppe Attardi et al., 1999) is more
workable than other ways. For getting the filter news from
various news there can be used K-nearest algorithm and for
getting the summarized news there can be used pattern
mining or used term frequency.
III. PROPOSED METHOD
Online newspaper contains various types of news. They
show the details of news. Now day’s readers don’t have such
time to read all the news. They want to save their time. In this
project the user only put a keyword for knowing the news
which related with the keyword by extracting news. They also
can know the compact news which can cover the all
newspaper. People can also know the previous news.
A. Architecture of proposed method
The methodology or architecture of the proposed method
is discussed below:
Fig. 1. Architecture of proposed method
B. Step by step description of proposed method
This section gives an analytical description of the system
architecture given in previous parts.
B.1. Initialization and Connection
In the initialization and connection module, at first, the web
pages of each website are stored in separate files. Then each
of these pages will be connected using URL. A table is
created in news database for each website having news no,
name, date, Headline, description.
B.2. News Extraction
The most important part of this method is news extraction
(Y. Sankarasubramaniam et al., 2014). For that at first the
input newspapers are taken. Then the keywords will be given
as input. Matching the keywords with database contents for
extraction. After matching news contents with database
contents, the news will be extracted. So, every news is
separated in topic wise. For A single domain or phase,
different news can be gathered. First, the news of same
domain is collected. Then the news in different parts are
divided. For cricket much news are found. Such as T-20, One
day, Test match etc. Here, the desired news for a particular
phrase can be also found. The divide and conquer approach
are followed for similar text matching for extraction of news.
• Divide and Conquer
In computer science, divide and conquer (D&C) is a
method, in which the whole problem is divided into several
sub segments and then the whole system is combined to get
the solution of the original problem.
• Similar text matching
In this method, the query string uses a parameter, which
divides the string into low frequency and high frequency
group. The low frequency of a group is mainly the more
important terms of the bulk of the query, while the high
frequency group is the not much important terms is used only
for scoring, not for matching.
B.3. News Summarization
The most important part of this method is summarization
(J. Goldstein et al., 1999). Here, the extracted news has
summarized about the input phrase. In this part, first of all at
least two extract news of related phrase has taken from
several newspaper. Then every sentence will be checked or
compared of this news. In the case of similar sentence, it will
take similar sentence at once from both news. The sentences
don’t be repeated. Then it will summarize the news. Then the
process will check, whether there any extract news for
summarized. If it is “Yes” then the new news and summary
of previous news are summarized by continuing this process.
If it is “No”, then it will succeed to get the desired output
summary. Summarization will be done in using term
3. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 8
frequency. For that some conditions will be applied on the
method.
• Term Frequency
The importance of a word to a document in a collection or
corpus (Xindong Wu et al., 2011) is calculated by term
frequency which is a numerical term. It is mainly used to
retrieve information retrieve and for text mining. The number
of times a word appears in the document, the value of term
frequency increases proportionally. It mainly helps to control
the common words.
• Process of summarization using Term Frequency
Steps of Summarization:
Step 1: First take input Bangla documents as text file.
Step2: In this step tokenized the sentences of input documents
and punctuation character, single word, digits are removed
from the original Bangla text.
Step 3: Replace each word with common synonym for
counting keyword frequency.
Step 4: In this step sort the total term frequency (𝑇𝑇𝐹) in
descending order.
Step 5: Compute the score 𝑆𝐶 𝑘𝑗 the kth sentence of the jth
document by summing up 𝑇𝑇𝐹𝑖 of 𝑚 number of words in that
sentence.
𝑆𝐶 𝑘𝑗 = ∑(𝑇 − 𝑛 + 1) ∗ 𝑇𝑇𝐹𝑖
𝑚
𝑖=1
Step 6: Here all sentence is scored as decreasing order and
take only high score sentences that represent the most
important sentences in the given documents.
Step 7: Here all sentence is scored as decreasing order and
take only high score sentences that represent the most
important sentences in the given documents.
IV. RESULT
The main goal of this system is to develop an automatic
news extractor and summarizer (Vishal Gupta and Gurpreet
Singh, 2013). In this chapter the total implementation process
has explained. This chapter also contain a brief description of
experimental tools.
A. Tool used for Development
The Tools that are used to develop this method ―
✓ Windows 7 Operating System
✓ Xampp
B. User Interface
The Interface enables the user to enter the Home Page.
There are three sections in home page. 1st section shows all
news. It contains all the news in database. Another section is
search and the last section is summary.
Fig. 2. Home page
Here, Fig. 3 shows that if user click all news, they get to
know the know the all the news which are stored in database
for a particular date.
Fig. 3. Output of all news
Fig. 4 describe that of user want to search any keyword for
particular news, they get that news if the news available in
database, else it shows “no found”.
Fig 4. Output of search news
4. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 9
Fig. 5 shows that if user want to summarize news for a
particular topic or every topic, they can get that by using that
option.
Fig. 5. Options of summary
Here, Fig. 6 shows the desire summary of user which gives
the brief news of related news.
Fig 6. Output of summary
C. Experiment Setup
The system retrieves many news from “The Daily Sun”,
“Bangladesh Independence”, “Prothom Alo” (English
version) in November. This section contains some
experimental results that have been done during experiment.
In the following example. A user wants to know about
BGMEA. So, System extract the news which is related to
BGMEA.
Fig. 7. How to search a keyword
This is the extracting part of this experiment. If user want
the summary, he can get that.
News:
Fig. 8. Input news
D. Term Frequency & Total Term Frequency Count
Most frequent words in the text are the keywords. How
many tomes a word appears in the text is counted by the term
frequency. Now concatenate each document as a cluster to get
total term frequency. Total term frequency is calculated by
summing up the term frequency from every document.
Sentences with the keywords score higher than those of with
fewer keywords. For distinguishing the importance of
keyword, the keywords are multiplied which are positioned
in higher of the sorted total term frequency value. Table 1
shows the calculation of the occurrence of the keywords.
TABLE I. TERM FREQUENCY OF WORDS
E. Sentence Score Generation
Scoring is used to decide on the significance of each line in
the documents. Here at most ten sentences are collected for
the initial summarized content. The sentence score relies on
the word score, which is Total Term Frequency. Final
sentence score is the summation of Total Term Frequency.
• Score of Sentence 1:
32+15+45+80+1+8+28+4+18+45= 276
5. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 10
• Score of Sentence 2:
32+45+36+80+80= 273
• Score of Sentence 3:
8+ 28+ +15 = 51
• Score of Sentence 4:
28+8=108
Summary:
Fig. 9. Obtained summary
In this summary, it can be observed that most important
sentence is obtained high score. The table is given below,
TABLE II. SCORE OF SENTENCES IN SUMMARY
F. Performance Comparison of the System
To evaluate the system, 7 news sets from different
newspapers have gathered. Summarization evaluation
methods can divide into two categories: intrinsic and extrinsic
(Inderjeet Mani and Mark T. Maybury, 1999).
✓ the quality of summaries directly (e.g., by com-paring
them to ideal summaries) is measured by the Intrinsic
evaluation.
✓ how good the summaries help in performing a particular
task is measured by extrinsic method.
The system has evaluated in this way-
Compute Intrinsic Measures: Precision, Recall, F-Score and
Document Similarity.
TABLE III. INTRINSIC PERFORMANCE ANALYSIS
Fig. 10. Intrinsic Performance Analysis Graph
V. CONCLUSION
In this paper, a method has proposed to extract and
summarize online newspapers (English) using basic
statistical and data mining approaches. Here, challenges have
taken for saving times and solving relevancy. Also, the
extractive summarization has done more easily and concisely.
This work will narrow down the search space for the
researchers and thereby save time providing the summary of
various news. Moreover, as the methodology followed in this
approach is a generic. In future, it can be extended for other
newspapers of another languages. In this report, only online
newspaper has considered as an isolated document.
VI. REFERENCES
[1] Zhanying He,Chun Chen, Jiajun Bu , Can Wang and Lijun Zhang., “
Document summarization based on data reconstruction.”.Zhejiang
Provincial Key Laboratory of Service Robot, College of Computer
Science, 2013.
[2] N.Moratanch and S.Chitrakala, “A Survey on Extractive Text
Summarization”, IEEE International Conference on Computer,
Communication, and Signal Processing (ICCCSP-2017).
[3] N. Moratanch and S. Chitrakala, “A survey on abstractive text sum-
marization,” International Conference on Circuit, Power and
Computing Technologies (ICCPCT) 2016, International Conference
on. IEEE, 2016, pp. 1-7.
[4] Xindong Wu, Fei Xie, Gongqing Wu, Wei Ding. “Personalized News
Filtering and Summarization on the Web”, IEEE 23rd International
Conference on Tools with Artificial Intelligence, 2011.
6. International Journal of Research in Advent Technology, Vol.7, No.11, November 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org
doi: 10.32622/ijrat.710201947 11
[5] Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen.
“Document summarization using conditional random fields”, In
Proceedings of IJCAI-07.
[6] Xiaojun Wan, Jianwu Yang, and Jianguo Xiao, “Manifold-Ranking
Based Topic Focused Multi-Document Summarization”, IJCAI 7
(2007), 2903–2908, 2007.
[7] Minqing Hu and Bing Liu, “Opinion Extraction and Summarization on
the Web”, Department of Computer Science,University of Illinois at
Chicago,851 South Morgan Street, Chicago, IL 60607-7053, 2006.
[8] Ioannis Antonellis, Christos Bouras, and Vassilis Poulopoulos
“Personalized News Categorization Through Scalable Text
Classification” Research Academic Computer Technology 36 Institute
N. Kazantzaki, University Campus, bGR-26500 Patras, Greece,
Computer Engineering and Informatics Department, University of
Patras, GR-26500 Patras, Greece, 2006.
[9] Furu Wei, Wenjie Li, Qin Lu and Yanxiang He. “Query-sensitive
mutual reinforcement chain and its application in query-oriented
multi-document summarization. “In Proceedings of SIGIR-08.
[10] Mohammad Al Hasan, “Summarization in Pattern Mining”,
Encyclopedia of Data Warehousing and Mining, Second Edition,
pp.1877-1883, 2009.
[11] Giuseppe Attardi, Antonio Gullì, Fabrizio Sebastiani “Automatic Web
Page Categorization by Link and Context Analysis”. Dipartimento di
Informatica, Università di Pisa, Pisa, Italy, 1999.
[12] Y. Sankarasubramaniam, K. Ramanathan, and S. Ghosh, "Text sum-
marization using wikipedia," Information Processing & Management,
vol. 50, no. 3, pp. 443-461, 2014.
[13] J. Goldstein, M. Kantrowitz, V. Mittal and J. Carbonell. “Summarizing
Text Documents: Sentence Selection and Evaluation”
Metrics.Proceedings of ACM SIGIR-99.
[14] Vishal Gupta and Gurpreet Singh Lehal, “Automatic Text
Summarization System for Punjabi Language”, Journal of Emerging
Technologies in Web Intelligence 5, 3(2013), 257–271, 2013.
[15] Inderjeet Mani and Mark T. Maybury, “Advances in Automatic Text
Summarization”, 1999.
AUTHORS PROFILE
Senjuthi Bhattacharjee, B.Sc. in Computer Science &
Engineering, Chittagong University of Engineering &
Technology, Chattogram, Bangladesh. Lecturer, Dept. of
Computer Science & Engineering, Premier University,
Chattogram , Bangladesh. (from: January 2016 to
Present)
Asma Joshita Trisha, B.Sc. in Computer Science &
Engineering, University of Chittagong, Chattogram,
Bangladesh. M.Sc. in Computer Science & Engineering,
University of Chittagong, Chattogram, Bangladesh.
Lecturer, Dept. of Computer Science & Engineering,
Premier University, Chattogram, Bangladesh. (from:
January 2016 to Present)
Sabrina Akter, B.Sc. in Computer Science &
Engineering, Chittagong University of Engineering &
Technology, Chattogram, Bangladesh.