International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Integrated Web Recommendation Model with Improved Weighted Association Rule M...ijdkp
World Wide Web plays a significant role in human life. It requires a technological improvement to satisfy
the user needs. Web log data is essential for improving the performance of the web. It contains large,
heterogeneous and diverse data. Analyzing g the web log data is a tedious process for Web developers,
Web designers, technologists and end users. In this work, a new weighted association mining algorithm is
developed to identify the best association rules that are useful for web site restructuring and
recommendation that reduces false visit and improve users’ navigation behavior. The algorithm finds the
frequent item set from a large uncertain database. Frequent scanning of database in each time is the
problem with the existing algorithms which leads to complex output set and time consuming process. The
proposed algorithm scans the database only once at the beginning of the process and the generated
frequent item sets, which are stored into the database. The evaluation parameters such as support,
confidence, lift and number of rules are considered to analyze the performance of proposed algorithm and
traditional association mining algorithm. The new algorithm produced best result that helps the developer
to restructure their website in a way to meet the requirements of the end user within short time span.
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesIOSR Journals
Abstract: Web Mining is specialized field of Data Mining which deals with the methods and techniques of data
mining to extract useful patterns from the web data that is available in web server logs/databases. Web content
mining is one of the classifications of web mining which extracts information from the web documents
containing texts, links, videos and multimedia data available in World Wide Web databases. Further, web
structure mining is a kind of web content mining which extracts patterns and meaningful information from the
structure of hyperlinks contained in web documents having the same domain. The hyperlinks which are not
related to content or the invalid ones are called web structure outliers. In this paper the basic aim is to find out
these web structure outliers.
Keywords- Outliers, web outlier mining, web structure mining, Web mining, web structure documents
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBIJDKP
The cost of acquiring training data instances for induction of data mining models is one of the main concerns in real-world problems. The web is a comprehensive source for many types of data which can be used for data mining tasks. But the distributed and dynamic nature of web dictates the use of solutions which can handle these characteristics. In this paper, we introduce an automatic method for topical data acquisition from the web. We propose a new type of topical crawlers that use a hybrid link context extraction method for topical crawling to acquire on-topic web pages with minimum bandwidth usage and with the lowest cost. The new link context extraction method which is called Block Text Window (BTW), combines a text window method with a block-based method and overcomes challenges of each of these methods using the advantages of the other one. Experimental results show the predominance of BTW in comparison with state of the art automatic topical web data acquisition methods based on standard metrics.
In this world of information technology, everyone has the tendency to do business electronically. Today
lot of businesses are happening on World Wide Web (WWW), it is very important for the website owner to
provide a better platform to attract more customers for their site. Providing information in a better way is
the solution to bring more customers or users. Customer is the end-user, who accessing the information
in a way it yields some credit to the web site owners. In this paper we define web mining and present a
method to utilize web mining in a better way to know the users and website behaviour which in turn
enhance the web site information to attract more users. This paper also presents an overview of the
various researches done on pattern extraction, web content mining and how it can be taken as a catalyst
for E-business.
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...cscpconf
With the rapid development of Internet, Web search has been taken an important role in our
ordinary life. In web search, mining frequent patterns in large database is a major research area. Due to increase of user activities on web, web-searching methods, to predict the nextrequest of user visits in web pages plays a major role. Web searching methods are helpful to provide quality results, timely answer and also offer a customized navigation. In web search, Association rule mining is an important data analysis method to discover associated web pages. Most of the researchers implemented association mining using Apriori algorithm with binary representation. The problem of this approach is not address the issue like the navigation order of web pages. To overcome this problem researchers proposed a weighted Apriori to maintain navigation order but unable to produce optimal results. With the goal of a most favorable result we proposed a novel approach which combines weighted Apriori and dynamic programming. The experimental result shows that this approach maintains the navigation order of web pages and achieves a best solution. The proposed technique enhances the web site effectiveness, increases the user browsing knowledge, improves the prediction accuracy and decreases the computational complexities.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Integrated Web Recommendation Model with Improved Weighted Association Rule M...ijdkp
World Wide Web plays a significant role in human life. It requires a technological improvement to satisfy
the user needs. Web log data is essential for improving the performance of the web. It contains large,
heterogeneous and diverse data. Analyzing g the web log data is a tedious process for Web developers,
Web designers, technologists and end users. In this work, a new weighted association mining algorithm is
developed to identify the best association rules that are useful for web site restructuring and
recommendation that reduces false visit and improve users’ navigation behavior. The algorithm finds the
frequent item set from a large uncertain database. Frequent scanning of database in each time is the
problem with the existing algorithms which leads to complex output set and time consuming process. The
proposed algorithm scans the database only once at the beginning of the process and the generated
frequent item sets, which are stored into the database. The evaluation parameters such as support,
confidence, lift and number of rules are considered to analyze the performance of proposed algorithm and
traditional association mining algorithm. The new algorithm produced best result that helps the developer
to restructure their website in a way to meet the requirements of the end user within short time span.
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesIOSR Journals
Abstract: Web Mining is specialized field of Data Mining which deals with the methods and techniques of data
mining to extract useful patterns from the web data that is available in web server logs/databases. Web content
mining is one of the classifications of web mining which extracts information from the web documents
containing texts, links, videos and multimedia data available in World Wide Web databases. Further, web
structure mining is a kind of web content mining which extracts patterns and meaningful information from the
structure of hyperlinks contained in web documents having the same domain. The hyperlinks which are not
related to content or the invalid ones are called web structure outliers. In this paper the basic aim is to find out
these web structure outliers.
Keywords- Outliers, web outlier mining, web structure mining, Web mining, web structure documents
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBIJDKP
The cost of acquiring training data instances for induction of data mining models is one of the main concerns in real-world problems. The web is a comprehensive source for many types of data which can be used for data mining tasks. But the distributed and dynamic nature of web dictates the use of solutions which can handle these characteristics. In this paper, we introduce an automatic method for topical data acquisition from the web. We propose a new type of topical crawlers that use a hybrid link context extraction method for topical crawling to acquire on-topic web pages with minimum bandwidth usage and with the lowest cost. The new link context extraction method which is called Block Text Window (BTW), combines a text window method with a block-based method and overcomes challenges of each of these methods using the advantages of the other one. Experimental results show the predominance of BTW in comparison with state of the art automatic topical web data acquisition methods based on standard metrics.
In this world of information technology, everyone has the tendency to do business electronically. Today
lot of businesses are happening on World Wide Web (WWW), it is very important for the website owner to
provide a better platform to attract more customers for their site. Providing information in a better way is
the solution to bring more customers or users. Customer is the end-user, who accessing the information
in a way it yields some credit to the web site owners. In this paper we define web mining and present a
method to utilize web mining in a better way to know the users and website behaviour which in turn
enhance the web site information to attract more users. This paper also presents an overview of the
various researches done on pattern extraction, web content mining and how it can be taken as a catalyst
for E-business.
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...cscpconf
With the rapid development of Internet, Web search has been taken an important role in our
ordinary life. In web search, mining frequent patterns in large database is a major research area. Due to increase of user activities on web, web-searching methods, to predict the nextrequest of user visits in web pages plays a major role. Web searching methods are helpful to provide quality results, timely answer and also offer a customized navigation. In web search, Association rule mining is an important data analysis method to discover associated web pages. Most of the researchers implemented association mining using Apriori algorithm with binary representation. The problem of this approach is not address the issue like the navigation order of web pages. To overcome this problem researchers proposed a weighted Apriori to maintain navigation order but unable to produce optimal results. With the goal of a most favorable result we proposed a novel approach which combines weighted Apriori and dynamic programming. The experimental result shows that this approach maintains the navigation order of web pages and achieves a best solution. The proposed technique enhances the web site effectiveness, increases the user browsing knowledge, improves the prediction accuracy and decreases the computational complexities.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Web Page Recommendation Using Web MiningIJERA Editor
On World Wide Web various kind of content are generated in huge amount, so to give relevant result to user web recommendation become important part of web application. On web different kind of web recommendation are made available to user every day that includes Image, Video, Audio, query suggestion and web page. In this paper we are aiming at providing framework for web page recommendation. 1) First we describe the basics of web mining, types of web mining. 2) Details of each web mining technique.3)We propose the architecture for the personalized web page recommendation.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...IJSRD
The development of the web in past few years has created a lot of challenge in this field. The new work in this field is the search of the data in a search tree pattern based on tree. Various sequential mining algorithms have been devoloped till date. Web usage mining is used to operate the web server logs, that contains the navigation history of the user. Recommendater system is explained properly with the explanation of whole procedure of the recommendater system. The search results of the data leads to the proper ad efficient search. But the problem was the time utilization and the search results generated from them. So, a new local search algorithm is proposed for country-wise search that makes the searching more efficient on local results basis. This approach has lead to an advancement in the search based methods and the results generated.
A Web Extraction Using Soft Algorithm for Trinity Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
WEB LOG PREPROCESSING BASED ON PARTIAL ANCESTRAL GRAPH TECHNIQUE FOR SESSION ...cscpconf
Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is
the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for
efficient web usage mining process. Preprocessing comprises of three phases which includes data cleaning,
user identification and session construction. Session construction is very vital and numerous real world
problems can be modeled as traversals on graph and mining from these traversals would provide the
requirement for preprocessing phase. On the other hand, the traversals on unweighted graph have been
taken into consideration in existing works. This paper oversimplifies this to the case where vertices of
graph are given weights to reflect their significance. The proposed method constructs sessions as a Partial
Ancestral Graph which contains pages with calculated weights. This will help site administrators to find
the interesting pages for users and to redesign their web pages. After weighting each page according to
browsing time a PAG structure is constructed for each user session. Existing system in which there is a
problem of learning with the latent variables of the data and the problem can be overcome by the proposed
method.
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET Journal
S.Jagan, Dr.S.P.Rajagopalan "A Survey on Web Personalization of Web Usage Mining", International Research Journal of Engineering and Technology (IRJET),Volume 2,issue-01 Mar-2015. e-ISSN:2395-0056, p-ISSN:2395-0072. www.irjet.net , published by Fast Track Publications
Abstract
Now a day, World Wide Web (www) is a rich and most powerful source of information. Day by day it is becoming more complex and expanding in size to get maximum information details online. However, it is becoming more complex and critical task to retrieve exact information expected by its users. To deal with this problem one more powerful concept is personalization which is becoming more powerful now days. Personalization is a subclass of information filtering system that seek to predict the 'ratings' or 'preferences' that a user would give to an items, they had not yet considered, using a model built from the characteristics of an item (content-based approaches or collaborative filtering approaches). Web mining is an emerging field of data mining used to provide personalization on the web. It consist three major categories i.e. Web Content Mining, Web Usage Mining, and Web Structure Mining. This paper focuses on web usage mining and algorithms used for providing personalization on the web.
An Extensible Web Mining Framework for Real KnowledgeIJEACS
With the emergence of Web 2.0 applications that bestow rich user experience and convenience without time and geographical restrictions, web usage logs became a goldmine to researchers across the globe. User behavior analysis in different domains based on web logs has its utility for enterprises to have strategic decision making. Business growth of enterprises depends on customer-centric approaches that need to know the knowledge of customer behavior to succeed. The rationale behind this is that customers have alternatives and there is intense competition. Therefore business community needs business intelligence to have expert decisions besides focusing customer relationship management. Many researchers contributed towards this end. However, the need for a comprehensive framework that caters to the needs of businesses to ascertain real needs of web users. This paper presents a framework named eXtensible Web Usage Mining Framework (XWUMF) for discovering actionable knowledge from web log data. The framework employs a hybrid approach that exploits fuzzy clustering methods and methods for user behavior analysis. Moreover the framework is extensible as it can accommodate new algorithms for fuzzy clustering and user behavior analysis. We proposed an algorithm known as Sequential Web Usage Miner (SWUM) for efficient mining of web usage patterns from different data sets. We built a prototype application to validate our framework. Our empirical results revealed that the framework helps in discovering actionable knowledge.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Web Page Recommendation Using Web MiningIJERA Editor
On World Wide Web various kind of content are generated in huge amount, so to give relevant result to user web recommendation become important part of web application. On web different kind of web recommendation are made available to user every day that includes Image, Video, Audio, query suggestion and web page. In this paper we are aiming at providing framework for web page recommendation. 1) First we describe the basics of web mining, types of web mining. 2) Details of each web mining technique.3)We propose the architecture for the personalized web page recommendation.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...IJSRD
The development of the web in past few years has created a lot of challenge in this field. The new work in this field is the search of the data in a search tree pattern based on tree. Various sequential mining algorithms have been devoloped till date. Web usage mining is used to operate the web server logs, that contains the navigation history of the user. Recommendater system is explained properly with the explanation of whole procedure of the recommendater system. The search results of the data leads to the proper ad efficient search. But the problem was the time utilization and the search results generated from them. So, a new local search algorithm is proposed for country-wise search that makes the searching more efficient on local results basis. This approach has lead to an advancement in the search based methods and the results generated.
A Web Extraction Using Soft Algorithm for Trinity Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
WEB LOG PREPROCESSING BASED ON PARTIAL ANCESTRAL GRAPH TECHNIQUE FOR SESSION ...cscpconf
Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is
the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for
efficient web usage mining process. Preprocessing comprises of three phases which includes data cleaning,
user identification and session construction. Session construction is very vital and numerous real world
problems can be modeled as traversals on graph and mining from these traversals would provide the
requirement for preprocessing phase. On the other hand, the traversals on unweighted graph have been
taken into consideration in existing works. This paper oversimplifies this to the case where vertices of
graph are given weights to reflect their significance. The proposed method constructs sessions as a Partial
Ancestral Graph which contains pages with calculated weights. This will help site administrators to find
the interesting pages for users and to redesign their web pages. After weighting each page according to
browsing time a PAG structure is constructed for each user session. Existing system in which there is a
problem of learning with the latent variables of the data and the problem can be overcome by the proposed
method.
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET Journal
S.Jagan, Dr.S.P.Rajagopalan "A Survey on Web Personalization of Web Usage Mining", International Research Journal of Engineering and Technology (IRJET),Volume 2,issue-01 Mar-2015. e-ISSN:2395-0056, p-ISSN:2395-0072. www.irjet.net , published by Fast Track Publications
Abstract
Now a day, World Wide Web (www) is a rich and most powerful source of information. Day by day it is becoming more complex and expanding in size to get maximum information details online. However, it is becoming more complex and critical task to retrieve exact information expected by its users. To deal with this problem one more powerful concept is personalization which is becoming more powerful now days. Personalization is a subclass of information filtering system that seek to predict the 'ratings' or 'preferences' that a user would give to an items, they had not yet considered, using a model built from the characteristics of an item (content-based approaches or collaborative filtering approaches). Web mining is an emerging field of data mining used to provide personalization on the web. It consist three major categories i.e. Web Content Mining, Web Usage Mining, and Web Structure Mining. This paper focuses on web usage mining and algorithms used for providing personalization on the web.
An Extensible Web Mining Framework for Real KnowledgeIJEACS
With the emergence of Web 2.0 applications that bestow rich user experience and convenience without time and geographical restrictions, web usage logs became a goldmine to researchers across the globe. User behavior analysis in different domains based on web logs has its utility for enterprises to have strategic decision making. Business growth of enterprises depends on customer-centric approaches that need to know the knowledge of customer behavior to succeed. The rationale behind this is that customers have alternatives and there is intense competition. Therefore business community needs business intelligence to have expert decisions besides focusing customer relationship management. Many researchers contributed towards this end. However, the need for a comprehensive framework that caters to the needs of businesses to ascertain real needs of web users. This paper presents a framework named eXtensible Web Usage Mining Framework (XWUMF) for discovering actionable knowledge from web log data. The framework employs a hybrid approach that exploits fuzzy clustering methods and methods for user behavior analysis. Moreover the framework is extensible as it can accommodate new algorithms for fuzzy clustering and user behavior analysis. We proposed an algorithm known as Sequential Web Usage Miner (SWUM) for efficient mining of web usage patterns from different data sets. We built a prototype application to validate our framework. Our empirical results revealed that the framework helps in discovering actionable knowledge.
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
In today ’s global business, the web has been the most important means of communication. Clients and customers may find their products online, which is a benefit of doing business online. Web mining is the process of using data mining tools to analyse and extract the information from a Web pages and applications autonomously. Many firms use web structure mining to generate suitable predictions and judgments for business growth, productivity, manufacturing techniques, and more utilizing data mining business strategies. In the online booking domain, optimum web data mining analysis of web structure is a crucial component that gives a systematic manner of new application towards real-time data with various levels of implications. Web structure mining emphases on the construction of the web's hyperlinks. Linkage administration that is done correctly can lead to future connections, which can therefore increase the prediction performance of learnt models. A increased interest in Web mining, structural analysis research has expanded, resulting in a new research area that sits at the crossroads of work in the network analysis, hyperlink and the web mining, structural training, and empirical software design techniques, as well as graph mining. Web structure mining is the development of determining structure data from the web. The proposed WSM approach is a system of finding the structure of data stored over the Web. Web structure mining can encourage the clients to recover the significant records by breaking down the connection situated structure of Web content. Web structure mining has been one of the most important resources for information extraction and the knowledge discovery as the amount of data available online has increased.
Efficient intelligent crawler for hamming distance based on prioritization of...IJECEIAES
Search engines play a crucial role in today's Internet landscape, especially with the exponential increase in data storage. Ranking models are used in search engines to locate relevant pages and rank them in decreasing order of relevance. They are an integral component of a search engine. The offline gathering of the document is crucial for providing the user with more accurate and pertinent findings. With the web’s ongoing expansions, the number of documents that need to be crawled has grown enormously. It is crucial to wisely prioritize the documents that need to be crawled in each iteration for any academic or mid-level organization because the resources for continuous crawling are fixed. The advantages of prioritization are implemented by algorithms designed to operate with the existing crawling pipeline. To avoid becoming the bottleneck in pipeline, these algorithms must be fast and efficient. A highly efficient and intelligent web crawler has been developed, which employs the hamming distance method for prioritizing the pages to be downloaded in each iteration. This cutting-edge search engine is specifically designed to make the crawling process more streamlined and effective. When compared with other existing methods, the implemented hamming distance method achieves a high value of 99.8% accuracy.
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
There has been lot of research in recent years for efficient web searching. Several papers have proposed algorithm for user feedback sessions, to evaluate the performance of inferring user search goals. When the information is retrieved, user clicks on a particular URL. Based on the click rate, ranking will be done automatically, clustering the feedback sessions. Web search engines have made enormous contributions to the web and society. They make finding information on the web quick and easy. However, they are far from optimal. A major deficiency of generic search engines is that they follow the ‘‘one size fits all’’ model and are not adaptable to individual users.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
A Review on Pattern Discovery Techniques of Web Usage MiningIJERA Editor
In the recent years with the development of Internet technology the growth of World Wide Web exceeded all expectations. A lot of information is available in different formats and retrieving interesting content has become a very difficult task. One possible approach to solve this problem is Web Usage Mining (WUM), the important application of Web Mining. Extracting the hidden knowledge in the log files of a web server, recognizing various interests of web users, discovering customer behavior while at the site are normally referred as the applications of web usage mining. In this paper we provide an updated focused survey on techniques of web usage mining.
Abstract: In many fields, such as industry, commerce, government, and education, knowledge discovery and data
mining can be immensely valuable to the subject of Artificial Intelligence. Because of the recent increase in
demand for KDD techniques, such as those used in machine learning, databases, statistics, knowledge acquisition,
data visualisation, and high performance computing, knowledge discovery and data mining have grown in
importance. By employing standard formulas for computational correlations, we hope to create an integrated
technique that can be used to filter web world social information and find parallels between similar tastes of
diverse user information in a variety of settings
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
Any university management system accumulates a cartload of data and analytics can be applied on it to gather useful
information to aid the academic decision making process. This paper is a novel attempt to demonstrate the significance of a data
analytic web service in the education domain. This can be integrated with the University Management System or any other application
of the university easily. Analytics as a web service offers much benefits over the traditional analysis methods. The web service can be
hosted on a web server and accessed over the internet or on to the private cloud of the campus. The data from various courses from
different departments can be uploaded and analyzed easily. In this paper we design a web service framework to be used in educational
data mining that provide analysis as a service.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A Study of Pattern Analysis Techniques of Web Usageijbuiiir1
Web mining is the most important application of data mining techniques to extract knowledge from web data including web document, hyperlinks between documents, usage logs of web sites etc. Web mining has been explored to a vast degree and different techniques have been proposed for a huge variety of applications that includes search engine enhancement, optimization of web services, Business Intelligence, B2B and B2C business etc. Most research on web mining has been from a �process-centric� point of view which defined web mining as a sequence of tasks. In this paper, we highlight the significance of studying the evolving nature of the web pattern analysis (WPA). Web usage mining is used to discover interesting user navigation patterns and can be applied to many real-world problems, such as improving web sites/pages. A Web usage mining system performs five major tasks: i) data collection ii) information filtering iii) pattern discovery iv) pattern analysis and visualization techniques, and v) Knowledge Query Mechanism (KQM). Each task is explained in detail and its related technologies are introduced. The web mining research is a converging research area from several research communities, such as database system, information retrieval, information extraction and artificial intelligence. In this paper we implement how web usage mining techniques can be applied for the customization i.e. web visualization
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Impact of front-end architecture on development cost", Viktor Turskyi
Pf3426712675
1. Mohinder Singh, Navjot Kaur / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.2671-2675
2671 | P a g e
Retrieve Information Using Improved Document Object Model
Parser Tree Algorithm
Mohinder Singh*, Navjot Kaur**
*(Department of Computer Science and Engg., SGGSW University, Fatehgarh Sahib.
** (Department of Computer Science and Engg. , Punjabi University, Patiala
ABSTRACT
The Data mining refers to mining the
useful information from raw data or unstructured
data. Whereas in web content mining the data is
scattered or unstructured on web pages. Some
time the user wants to retrieve only fix kind of
data, but the unwanted data is also retrieved. The
unnecessary information can be removed with this
proposed work. The DOM Parser Tree Algorithm
to filter the web pages from unwanted data and
give the reliable output. The Document Object
Model Parser Tree Algorithm fetches the HTML
links. According to these Links the pages are
accessed. Then the data with is useful for user, is
send to the table. The DOM Parser Tree
Algorithm works upon tree structure and we have
used the table for output the results. As the results
are shown in the table, the information displayed
in the table is correct and reliable for the user. The
user fixes the data which he/she wants to access
time by time. The data dynamically fetched from
that particular website or link. Currently the
approach is implemented on limited field of
experiment because of some limits of privileges.
Hopefully the approach will be implemented on
large experimental area.
Keywords – Data mining, DOM Parser Tree
Algorithm, Web Content mining,
I. INTRODUCTION
In today’s world the whole system is totally
depended on computer system. The data stored in
system can be of various types like text, video, audio,
rich document and so on. The overall goal of the data
mining process is to extract information from a data
set and transform it into an understandable structure
for further use. Data mining is used in almost every
filed like business, army, finance, art etc. In the
proposed work, the information from different
websites is clustered in the table format. This is very
useful for an individual business organization. We
used the document Object Model that will allow
programs and scripts to dynamically access and
update the content, structure and style of documents.
The document can be further processed and the
results of that processing can be incorporated back
into the presented page. HTML links are fetched by
this application dynamically through official websites
of related product. With the help of this algorithm,
individual organization can retrieve their updated
data; this process is pretty handy to analysis. The user
specifies work related to his required information to
the system. The web crawler takes its seed URL and
searches for the relevant pages. Then DOM tree is
generated of those pages. Now the irrelevant contents
like advertisements, link lists to related articles,
disclaimer information, user comments, navigational
menus, headers, footers, copyright notices, and
privacy policies, are removed using DOM parser
algorithm which uses the DOM tree structure. Then
the table from the extracted content is build using
Document Object Model Tree.The flow from web
crawler to DOM tree implementation plays a crucial
part in the proposed work. Web crawlers download
web pages by starting from one or more seed URLs,
downloading each of the associated pages, extracting
the hyperlink URLs contained therein, and
recursively downloading those pages. Therefore, any
web crawler needs to keep track both of the URLs
that are to be downloaded, as well as those that have
already been downloaded.
II. LITERATURE REVIEW
Dr. M S Shashidhara2, Dr. M. Giri [1]
proposed new technique of web data extraction. It has
three phases. In the first phase list of web documents
are selected, second phase documents are pre-
processed, in the final phase results are presented to
users. Experimental results are compared with
existing methods. Performance of proposed system is
better than existing methods. Web mining is a class
of data mining. Web Mining is a variation of this
field that distils untapped source of abundantly
available free textual information. The importance of
web mining is growing along with the massive
volumes of data generated in web day-to-day life.
Jing Li and C.I. Ezeife [12] Classifying and mining
noise-free web pages will improve on accuracy of
search results as well as search speed, and may
benefit webpage organization applications. Noise on
web pages is irrelevant to the main content on the
web pages being mined, and includes advertisements,
navigation bar, and copyright notices. The few
existing work on web page cleaning detect noise
blocks with exact matching contents but are weak at
detecting near duplicate blocks, characterized by
items like navigation bars. They proposed, Webpage
Cleaner, for eliminating noise blocks from web pages
2. Mohinder Singh, Navjot Kaur / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.2671-2675
2672 | P a g e
for purposes of improving the accuracy and
efficiency of web content mining. A vision-based
technique is employed for extracting blocks from
web pages. Important blocks are exported to be used
for web content mining using Naive Bayes text
classification. Kamlesh Patidar(2011)[2] Website
plays a significant role in success of an e-business, e-
books & Knowledge Discovery. It is the main start
point of any organization and corporation for its
customers, so it’s important to customize and design
it according to the visitors. Also, websites are a place
to introduce services of an organization and highlight
new service to the visitors and audiences. They
proposed a prototype design in future as a search
engine they proposed a algorithm web content mining
using the database approach and multilevel Data
tracking for digital library. Qing Lu [16] the link-
based classification, unlabeled data provides useful
information is provided in three important ways: first,
it gives us additional information about the
distribution of object attribute values second, links
among unlabeled data in the test set provide useful
information about classification and third, links
between labeled (training) data and unlabeled (test)
data also provide useful information that should not
be ignored. When the classification problem is
properly modeled, and we don’t distort the data by
removing links between the test and training and
inference is used for collective classification, we are
able to make use of all of the information that
unlabeled data provides G. N. Shinde [3]: the
researchers represent an efficient method for software
development. They proposed MDA approach.MDA
is a promising approach for software development.
MDA using J2EE (Java to Enterprise Edition) is used
to describe behavior of agents. JADE (Java Agent
Development Environment) Framework provides a
standard for developing MAS (multi-agent systems).
Web Usage Mining, as well as Web Mining, is a new
research field, which has a long way to go. For the
Web-based data warehouse and data mining
technology, the development of the Internet provides
a broad application scope. With the rapid
development of Internet, communications
technology, the research of Web based data mining
will be further in-depth and Web site design and so
on. Also for Agent’s Modeling Language (AUML) is
being defined to effective implementation of defining
agent roles Shekhar Palta [11]: they proposed some
areas where the unimportant words can be
eliminated. The future work of this project involves a
lot of improvements that can be done to improve the
accuracy of the extracted text. The extracted text
right now also includes the name of the author and
the presence of dates and the likewise text that occurs
as part of the news article and it becomes very
difficult for the algorithm to prune out these phrases
from the extracted text. So, a solution we have
thought of is to use the concept of a suffix array to
detect the regular and repeated occurrence of certain
unimportant words over the set of extracted news
article from the same web site. This has already been
implemented by a colleague at Ask and we are
working to integrate it with the present algorithm
R.Cooley [13] has proposed some aspects of mining
the information and pattern discovery.The term Web
mining has been used to refer to techniques that
encompass a broad range of issues. However while
meaningful and attractive this very broadness has
caused Web Mining to mean different things to
different people and there is a need to develop a
common vocabulary. Towards this goal they
proposed a definition of Web mining and developed
taxonomy of the various ongoing efforts related to it
next we presented a survey of the research in this
area and concentrated on Web usage mining. They
provided a detailed survey of the efforts in this area
even though the survey is short because of the area
newness. They provided a general architecture of a
system to do Web usage mining and identified the
issues and problems in this area that require further
research and development. Niki R.Kapadia [4]:
They proposed an approach for extracting web
content structure based on visual representation. The
resulted web content structure is very helpful for
applications such as web adaptation, information
retrieval and information extraction. By identifying
the logic relationship of web content based on visual
layout information, web content structure can
effectively represent the semantic structure of the
web page. An automatic top-down, tag-tree
independent and scalable algorithm to detect web
content structure is presented. They compared it with
traditional DOM based algorithm as they got much
more reliable partitioning. The algorithm is evaluated
manually on a large data set, and also used for
selecting good expansion terms in a pseudo-relevance
feedback process in web information retrieval, both
of which achieve very satisfactory performance.
Recently, the developed algorithm is implemented on
website with horizontal separation. The further work
would be focused on development of improved code
in order to take care of vertical separation also.
Further, the various experiments of developed code
will be carried out on different domain website such
like education, shopping, social networking etc. to
check accuracy of the code and to judge which
consideration will give better results. This work will
be useful in understanding VIPS algorithm, web
content mining through partition based segmentation
and further research directions in this area to
computer engineering fraternity Niki R.Kapadia [5]:
concludes that there are many existing methods to
mine information. Hierarchical and partitioning
methods are used commonly to mine information
from the web. Each method has advantages and
disadvantages. G.Poonkujhali [15]: proposed new
algo. using signed approach for improving the results
3. Mohinder Singh, Navjot Kaur / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.2671-2675
2673 | P a g e
of web content mining by detecting both relevant and
irrelevant web documents. They aimed at
experimental evaluation of web content mining in
terms of reliability and to explore other mathematical
tools for mining the web content. Also, a comparative
study of this algorithm with existing algorithms is to
be done.
III. METHODOLOGY
In the proposed work we have used the document
Object Model Parser Algorithm for extracting the
useful data for specific user. In Figure 1 . we have
shown the methodology of the proposed algorithm
with the help of flow chart.
Input: we used selected URLs as an input
Output: Useful information in tables and execution
time
Method:
First read all the links one by one from the
database using sql query
Then extract the every link by using java http
library
Apply the DOM Parser Algorithm for extracting
the useful data
Links are fetched by the table
Finally output in the tabular form In the
methodology flowchart of DOM, our work is on
3rd
step i.e. where DOM Parser Algorithm is
applied.
Figure 1: Flow chart for Proposed System
COMPARISON BETWEEN PREVIOUS WORK
AND PROPOSED WORK: In the previous work
the FP-Growth algorithm is used. The approach uses
the log files for obtaining the frequent pattern. The
major problem in this algorithm was that it fetches
the frequent pattern. Some time unwanted data can
also be frequent in particular website, in that case the
algorithm not work as efficient as it should be.
In out proposed Document Object Model Parser Tree
Algorithm, it works upon the data which is selected
by the user single time. After selecting the required
website, the user will be able to get the updated data
or information. This algorithm works on dynamic
pages or links, on which the information changes
time by time.
DOM PARSER TREE ALGORITHM
Step 0: START
Step 1: First we add links as Right
Add and Left Add
Data leftAdd;
String[] sdata = new
String[10];
int portnum = 0;
Data rightAdd;
Step 2: Declare Class DST
int v = 0;
Data Root = null,
headNext = null;
Step 3: Declare Function DataSet
for (int i = 0; i < 6; i++)
node.sdata[i] = s1[i];
node.portnum = num;
Step 4: Now Check Root
if (Root == null)
node.leftAdd = null;
node.rightAdd = null;
Root = node;
return Root;
Step 5: Read HTML from source
links
URLConnection
urlConnection;
DataOutputStream
outStream;
DataInputStream
inStream;
URL url;
Step 6: Connect the Database and
Links
b1.addActionListener(this);
b2.addActionListener(this);
b3.addActionListener(this);
url = new URL(s);
urlConnection =
url.openConnection();
((HttpURLConnection)
urlConnection).setRequestMethod(
"POST");
Step 7: END
4. Mohinder Singh, Navjot Kaur / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.2671-2675
2674 | P a g e
IV. RESULT AND DISCUSSION
We run the project; it shows results as well
as time taken by algorithm. We used three phase’s
cars, bikes and mobiles by using tables as output of
our algorithm. All the unwanted data is cleaned by
DOM Parser Tree Algorithm. The output is shown as
below. In Figure 2 it displays the results of Car’s
module. Text box represents the time taken by the
algorithm to execute the whole process. Same as
Figure 3 and 4 respectively represents the Bike’s and
Mobile’s module. All the links are dynamically
fetched by the algorithm. Any kind of updating,
modification, altering etc. made in the related website
can be accessed with the help of proposed algorithm.
Figure 2: Car’s Experimental Result
Figure 3: Bike’s Experimental Result
Figure 4: Mobile’s Experimental Result
The graph represents the throughput of
algorithm in mili-seconds. In our proposed work
there are 3 attributes. These results are dynamic,
because the speed and quality of internet effects on
throughput of algorithm. So the result sets vary to the
conditions and system performance.
Figure 4.2: Result Based Graph
This is the major advantage of this algorithm
that the data doesn’t loss its reliability and
correctness. Any individual organization can retrieve
the updated data time by time. Another advantage of
this work is that the information is stored in tabular
form, as it is much simple and easy to understand.
V. CONCLUSION
We present the methodology of Document
Object Model Parser Tree Algorithm works upon
data content mining and data clustering. As we
represented, the algorithm removes the unnecessary
data like add-ons, pop-ups, unwanted material,
photographs etc. With the help of this algorithm, any
kind of individual organization can easily access its
useful data time by time dynamically. This algorithm
can be used as base for large scaled project for a
5. Mohinder Singh, Navjot Kaur / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.2671-2675
2675 | P a g e
company or firm. It is pretty useful especially where
there is a rich media data. It clusters the data which is
related to particular organization and isolates it from
the data which is not useful for that specific task.
VI. FUTURE WORK
Test the proposed algorithm can work on large
scaled data and privileges or need to improve
factors.
It can help to improve the efficiency of related
algorithms.
The proposed algorithm works on high speed
internet. When we run it on slow speed network
then its results are not as accurate as these should
be. So, in future work can be done on the
efficiency of algorithm to work on slow speed
networks to give the better results.
Acknowledgements
As a part of my course I have taken the
problem as “Retrieve Information using Improved
Document Object Model Parser Tree Algorithm”
as my Thesis Topic. I am very thankful to Mrs.
Navjot Kaur, Assistant Professor, Punjabi University,
Patiala for giving me such a valuable support in
doing my work. She provided all the relevant
material that was sufficient for me to complete my
thesis work. She provided help and time whenever
asked for. Last but not least, a word of thanks for the
authors of all those books and papers which I have
consulted during my thesis work as well as for
preparing the report. At the end thanks to the
Almighty for not letting me down at the time of crisis
and showing me the silver lining in the dark clouds.
REFERENCES
Journal Papers:
[1] Dr. M S Shashidhara2, Dr. M. Giri, An
Efficient Web Content Extraction Using
Mining Techniques, International Journal of
Computer Science and Management
Research Vol 1 Issue 4 November, 2012.
[2] Kamlesh Patidar, Preetesh Purohit, Kapil
Sharma,Web Content Mining Using
Database Approach and Multilevel Data
Tracking Methodology for Digital Library,
IJCST Vol. 2, Issue 1, March 2011.
[3] G. N. Shinde,Inamdar S.A,Web Data
Mining Using An Intelligent Information
System Design, Int. J. Comp. Tech. Appl.,
Vol 2 (2), 280-283
[4] Niki R.Kapadia, Kanu Patel, Mehul
C.Parikh, Partitioning Based Web Content
Mining, International Journal of
Engineering Research & Technology
(IJERT), ISSN: 2278-0181, Vol. 1 Issue 3,
May-2012.
[5] Niki R.Kapadia, Kinjal Patel, Web Content
Mining Techniques– A Comprehensive
Survey, IJREAS, Volume 2, Issue
2(February 2012), ISSN: 2249-3905.
Books:
[6] Jiawei Han, Micheline Kamber,“Data
Mining Concepts and Techniques”2nd
Edition,
[7] Jiawei Han, Micheline Kamber,“Data
Mining Concepts and Techniques”2nd
Edition,
[8] Jiawei Han, Micheline Kamber,“Data
Mining Concepts and Techniques”2nd
Edition,
[9] Jonathan Robie, Texcel Research,”What is
the Document Object Model?”
[10] Jonathan Robie, Texcel Research,”What is
the Document Object Model?”
[11] Shekhar Palta, Eliminating Noisy
Information from News Websites and
Extraction of the News article, International
Master on Information Technology IV
Edition, Pisa, Italy.
Thesis:
[12] Jing Li and C.I. Ezeife, Cleaning Web Pages
for Effective Web Content Mining, School
of Computer Science, University of Windsor,
Windsor, Ontario,Canada N9B 3P4.
[13] R.Cooley, B.mobasher, J.Srivastva, Web
mining: Information and Pattern Discovery
on the World Wide Web, Department of
Computer Science and Engineering,
university of Minnesota, MN 55455, USA
Proceedings Papers:
[14] Huiping Peng, Discovery of Interesting
Association Rules Based On Web Usage
Mining, IEEE Coference, pp.272-275, 2010.
[15] G.Poonkujhali, K.Thiagarajan, K.Sarukesi,
g.V.Uma,Signed Approach for Mining Web
Content Outliers,World Academy of Science
and Technology 32 2009
[16] Qing Lu, Lise Getoor, Link-based
Classification using Labeled to Unlabeled
Data, Proceedings of the ICML-2003,
Washington DC,2003