The exponential growth in the number and the complexity of information resources and services on the Web
has made log data an indispensable resource to characterize the users for Web-based environment. It
creates information of related web data in the form of hierarchy structure through approximation. This
hierarchy structure can be used as the input for a variety of data mining tasks such as clustering,
association rule mining, sequence mining etc.
In this paper, we present an approach for personalizing web user environment dynamically when he
interacting with web by clustering of web usage data using concept hierarchy. The system is inferred from
the web server’s access logs by means of data and web usage mining techniques to extract the information
about users. The extracted knowledge is used for the purpose of offering a personalized view of the
services to users.
Identifying the Number of Visitors to improve Website Usability from Educatio...Editor IJCATR
Web usage mining deals with understanding the Visitor’s behaviour with a Website. It helps in understanding the concerns
such as present and future probability of every website user, relationship between behaviour and website usability. It has different
branches such as web content mining, web structure and web usage mining. The focus of this paper is on web mining usage patterns of
an educational institution web log data. There are three types of web related log data namely web access log, error log and proxy log
data. In this paper web access log data has been used as dataset because the web access log data is the typical source of navigational
behaviour of the website visitor. The study of web server log analysis is helpful in applying the web mining techniques.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...ijdkp
The unexpected wide spread use of WWW and dynamically increasing nature of the web creates new
challenges in the web mining since the data in the web inherently unlabelled, incomplete, non linear, and
heterogeneous. The investigation of user usage behaviour on WWW is real time problem which involves
multiple conflicting measures of performance. These measures make not only computational intensive but
also needs to the possibility of be unable to find the exact solution. Unfortunately, the conventional methods
are limited to optimization problems due to the absence of semantic certainty and presence of human
intervention. In handling such data and overcome the limitations of conventional methodologies it is
necessary to use a soft computing model that can work intelligently to attain optimal solution.
Web is a collection of inter-related files on one or more web servers while web mining means extracting valuable information from web databases. Web mining is one of the data mining domains where data mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer behaviour, evaluate a particular website based on the information which is stored in web log files. Web mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library. Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase the complexity of dealing information from different web service providers. The collection of information becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Identifying the Number of Visitors to improve Website Usability from Educatio...Editor IJCATR
Web usage mining deals with understanding the Visitor’s behaviour with a Website. It helps in understanding the concerns
such as present and future probability of every website user, relationship between behaviour and website usability. It has different
branches such as web content mining, web structure and web usage mining. The focus of this paper is on web mining usage patterns of
an educational institution web log data. There are three types of web related log data namely web access log, error log and proxy log
data. In this paper web access log data has been used as dataset because the web access log data is the typical source of navigational
behaviour of the website visitor. The study of web server log analysis is helpful in applying the web mining techniques.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...ijdkp
The unexpected wide spread use of WWW and dynamically increasing nature of the web creates new
challenges in the web mining since the data in the web inherently unlabelled, incomplete, non linear, and
heterogeneous. The investigation of user usage behaviour on WWW is real time problem which involves
multiple conflicting measures of performance. These measures make not only computational intensive but
also needs to the possibility of be unable to find the exact solution. Unfortunately, the conventional methods
are limited to optimization problems due to the absence of semantic certainty and presence of human
intervention. In handling such data and overcome the limitations of conventional methodologies it is
necessary to use a soft computing model that can work intelligently to attain optimal solution.
Web is a collection of inter-related files on one or more web servers while web mining means extracting valuable information from web databases. Web mining is one of the data mining domains where data mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer behaviour, evaluate a particular website based on the information which is stored in web log files. Web mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library. Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase the complexity of dealing information from different web service providers. The collection of information becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web Content Mining Based on Dom Intersection and Visual Features Conceptijceronline
Structured Data extraction from deep Web pages is a challenging task due to the underlying complex structures of such pages. Also website developer generally follows different web page design technique. Data extraction from webpage is highly useful to build our own database from number applications. A large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they present different limitations and constraints for extracting data from such webpages. This paper presents two different approaches to get structured data extraction. The first approach is non-generic solution which is based on template detection using intersection of Document Object Model Tree of various webpages from the same website. This approach is giving better result in terms of efficiency and accurately locating the main data at the particular webpage. The second approach is based on partial tree alignment mechanism based on using important visual features such as length, size, and position of web table available on the webpages. This approach is a generic solution as it does not depend on one particular website and its webpage template. It is perfectly locating the multiple data regions, data records and data items within a given web page. We have compared our work's result with existing mechanism and found our result much better for number webpage
The World Wide Web (Web) is a popular and interactive medium to disseminate information today.
The Web is huge, diverse, and dynamic and thus raises the scalability, multi-media data, and temporal issues respectively.
In this world of information technology, everyone has the tendency to do business electronically. Today
lot of businesses are happening on World Wide Web (WWW), it is very important for the website owner to
provide a better platform to attract more customers for their site. Providing information in a better way is
the solution to bring more customers or users. Customer is the end-user, who accessing the information
in a way it yields some credit to the web site owners. In this paper we define web mining and present a
method to utilize web mining in a better way to know the users and website behaviour which in turn
enhance the web site information to attract more users. This paper also presents an overview of the
various researches done on pattern extraction, web content mining and how it can be taken as a catalyst
for E-business.
Data mining refers to the process of analysing the data from different perspectives and summarizing it into useful information.
Data mining software is one of the number of tools used for analysing data. It allows users to analyse from many different dimensions and angles, categorize it, and summarize the relationship identified.
Data mining is about technique for finding and describing Structural Patterns in data.
Data mining is the process of finding correlation or patterns among fields in large relational databases.
The process of extracting valid, previously unknown, comprehensible , and actionable information from large databases and using it to make crucial business decisions.
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...IJSRD
The development of the web in past few years has created a lot of challenge in this field. The new work in this field is the search of the data in a search tree pattern based on tree. Various sequential mining algorithms have been devoloped till date. Web usage mining is used to operate the web server logs, that contains the navigation history of the user. Recommendater system is explained properly with the explanation of whole procedure of the recommendater system. The search results of the data leads to the proper ad efficient search. But the problem was the time utilization and the search results generated from them. So, a new local search algorithm is proposed for country-wise search that makes the searching more efficient on local results basis. This approach has lead to an advancement in the search based methods and the results generated.
A hybrid fuzzy ann approach for software effort estimationijfcstjournal
Software development effort estimation is one of the major activities in software project management.
During the project proposal stage there is high probability of estimates being made inaccurate but later on
this inaccuracy decreases. In the field of software development there are certain matrices, based on which
the effort estimation is being made. Till date various methods has been proposed for software effort
estimation, of which the non algorithmic methods, like artificial intelligence techniques have been very
successful. A Hybrid Fuzzy-ANN model, known as Adaptive Neuro Fuzzy Inference System (ANFIS) is more
suitable in such situations. The present paper is concerned with developing software effort estimation
model based on ANFIS. The present study evaluates the efficiency of the proposed ANFIS model, for which
COCOMO81 datasets has been used. The result so obtained has been compared with Artificial Neural
Network (ANN) and Intermediate COCOCMO model developed by Boehm. The results were analyzed using
Magnitude of Relative Error (MRE) and Root Mean Square Error (RMSE). It is observed that the ANFIS
provided better results than ANN and COCOMO model.
Nowadays the technological progress allows us to have highly flexible solutions, easily accessible with
lower levels of investment, which leads to many companies adopting SaaS (Software-as-a-Service) to
support their business processes. Associated with this movement and considering the advantages of SaaS, it
is important to understand whether work is being developed that is underutilized because companies are
not taking advantage of it, and in this case it is necessary to understand the reasons thereof. This
knowledge is important even for people who do not use or do not develop/provide SaaS, since sooner or
later it will be unavoidable due to current trends. In the near future, nearly all decision-makers of IT
strategies will be forced to consider adopting SaaS as an IT solution for the convenience benefits
associated with technology or market competition. At that time they will have to know how to evaluate
impacts and decide. What are the real needs in the Portuguese market? What fears and what is being done
to mitigate them? What are the implications of the adoption of SaaS? Where should we focus attention on
SaaS offerings in order to create greater value? These are questions we must answer to actually be able to
assess and decide. Often, decision-makers of business strategies consider only the attractive incentives of
using SaaS ignoring the impacts associated with new technologies. The need for tools and processes to
assess these impacts before adopting a SaaS solution is crucial to ensure the sustainability of the
information system, reduce uncertainty and facilitate decision making. This article presents a framework
for evaluating impacts of SaaS called SIE (SaaS Impact Evaluation) which in addition to guidance for the
present research, aims to provide guidelines for the collection, data analysis, impact assessment and
decision making about including SaaS on the organizations strategic plans.
Mining of product reviews at aspect levelijfcstjournal
Today’s world is a world of Internet, almost all work can be done with the help of it, from simple mobile
phone recharge to biggest business deals can be done with the help of this technology. People spent their
most of the times on surfing on the Web; it becomes a new source of entertainment, education,
communication, shopping etc. Users not only use these websites but also give their feedback and
suggestions that will be useful for other users. In this way a large amount of reviews of users are collected
on the Web that needs to be explored, analyse and organized for better decision making. Opinion Mining or
Sentiment Analysis is a Natural Language Processing and Information Extraction task that identifies the
user’s views or opinions explained in the form of positive, negative or neutral comments and quotes
underlying the text. Aspect based opinion mining is one of the level of Opinion mining that determines the
aspect of the given reviews and classify the review for each feature. In this paper an aspect based opinion
mining system is proposed to classify the reviews as positive, negative and neutral for each feature.
Negation is also handled in the proposed system. Experimental results using reviews of products show the
effectiveness of the system.
α Nearness ant colony system with adaptive strategies for the traveling sales...ijfcstjournal
On account of ant colony algorithm easy to fall into local optimum, this paper presents an improved ant
colony optimization called α-AACS and reports its performance. At first, we provide an concise description
of the original ant colony system(ACS) and introduce α-nearness based on the minimum 1-tree for ACS’s
disadvantage, which better reflects the chances of a given link being a member of an optimal tour. Then, we
improve α-nearness by computing a lower bound and propose other adaptations for ACS. Finally, we
conduct a fair competition between our algorithm and others. The results clearly show that α-AACS has a
better global searching ability in finding the best solutions, which indicates that α-AACS is an effective
approach for solving the traveling salesman problem.
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines,
which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
Defragmentation of indian legal cases withijfcstjournal
The main aim of this research paper is to develop a rule based knowledge database for legal expert system
for consumer protection act, a domain within the Indian legal system which is often in demand. A
knowledge database developed here will further help the legal expert system to determine type of case with
respect to the Indian Judicial System. In this paper a rule based knowledge database will be developed to
determine the type of the case. The main aim of the study is to build a prototype which will rule based in
nature. The rule based knowledge database development is the first phase in the development of
comprehensive rule based legal expert system for consumer protection act which will be of great help in
process of solving consumer related cases.
Web Content Mining Based on Dom Intersection and Visual Features Conceptijceronline
Structured Data extraction from deep Web pages is a challenging task due to the underlying complex structures of such pages. Also website developer generally follows different web page design technique. Data extraction from webpage is highly useful to build our own database from number applications. A large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they present different limitations and constraints for extracting data from such webpages. This paper presents two different approaches to get structured data extraction. The first approach is non-generic solution which is based on template detection using intersection of Document Object Model Tree of various webpages from the same website. This approach is giving better result in terms of efficiency and accurately locating the main data at the particular webpage. The second approach is based on partial tree alignment mechanism based on using important visual features such as length, size, and position of web table available on the webpages. This approach is a generic solution as it does not depend on one particular website and its webpage template. It is perfectly locating the multiple data regions, data records and data items within a given web page. We have compared our work's result with existing mechanism and found our result much better for number webpage
The World Wide Web (Web) is a popular and interactive medium to disseminate information today.
The Web is huge, diverse, and dynamic and thus raises the scalability, multi-media data, and temporal issues respectively.
In this world of information technology, everyone has the tendency to do business electronically. Today
lot of businesses are happening on World Wide Web (WWW), it is very important for the website owner to
provide a better platform to attract more customers for their site. Providing information in a better way is
the solution to bring more customers or users. Customer is the end-user, who accessing the information
in a way it yields some credit to the web site owners. In this paper we define web mining and present a
method to utilize web mining in a better way to know the users and website behaviour which in turn
enhance the web site information to attract more users. This paper also presents an overview of the
various researches done on pattern extraction, web content mining and how it can be taken as a catalyst
for E-business.
Data mining refers to the process of analysing the data from different perspectives and summarizing it into useful information.
Data mining software is one of the number of tools used for analysing data. It allows users to analyse from many different dimensions and angles, categorize it, and summarize the relationship identified.
Data mining is about technique for finding and describing Structural Patterns in data.
Data mining is the process of finding correlation or patterns among fields in large relational databases.
The process of extracting valid, previously unknown, comprehensible , and actionable information from large databases and using it to make crucial business decisions.
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...IJSRD
The development of the web in past few years has created a lot of challenge in this field. The new work in this field is the search of the data in a search tree pattern based on tree. Various sequential mining algorithms have been devoloped till date. Web usage mining is used to operate the web server logs, that contains the navigation history of the user. Recommendater system is explained properly with the explanation of whole procedure of the recommendater system. The search results of the data leads to the proper ad efficient search. But the problem was the time utilization and the search results generated from them. So, a new local search algorithm is proposed for country-wise search that makes the searching more efficient on local results basis. This approach has lead to an advancement in the search based methods and the results generated.
A hybrid fuzzy ann approach for software effort estimationijfcstjournal
Software development effort estimation is one of the major activities in software project management.
During the project proposal stage there is high probability of estimates being made inaccurate but later on
this inaccuracy decreases. In the field of software development there are certain matrices, based on which
the effort estimation is being made. Till date various methods has been proposed for software effort
estimation, of which the non algorithmic methods, like artificial intelligence techniques have been very
successful. A Hybrid Fuzzy-ANN model, known as Adaptive Neuro Fuzzy Inference System (ANFIS) is more
suitable in such situations. The present paper is concerned with developing software effort estimation
model based on ANFIS. The present study evaluates the efficiency of the proposed ANFIS model, for which
COCOMO81 datasets has been used. The result so obtained has been compared with Artificial Neural
Network (ANN) and Intermediate COCOCMO model developed by Boehm. The results were analyzed using
Magnitude of Relative Error (MRE) and Root Mean Square Error (RMSE). It is observed that the ANFIS
provided better results than ANN and COCOMO model.
Nowadays the technological progress allows us to have highly flexible solutions, easily accessible with
lower levels of investment, which leads to many companies adopting SaaS (Software-as-a-Service) to
support their business processes. Associated with this movement and considering the advantages of SaaS, it
is important to understand whether work is being developed that is underutilized because companies are
not taking advantage of it, and in this case it is necessary to understand the reasons thereof. This
knowledge is important even for people who do not use or do not develop/provide SaaS, since sooner or
later it will be unavoidable due to current trends. In the near future, nearly all decision-makers of IT
strategies will be forced to consider adopting SaaS as an IT solution for the convenience benefits
associated with technology or market competition. At that time they will have to know how to evaluate
impacts and decide. What are the real needs in the Portuguese market? What fears and what is being done
to mitigate them? What are the implications of the adoption of SaaS? Where should we focus attention on
SaaS offerings in order to create greater value? These are questions we must answer to actually be able to
assess and decide. Often, decision-makers of business strategies consider only the attractive incentives of
using SaaS ignoring the impacts associated with new technologies. The need for tools and processes to
assess these impacts before adopting a SaaS solution is crucial to ensure the sustainability of the
information system, reduce uncertainty and facilitate decision making. This article presents a framework
for evaluating impacts of SaaS called SIE (SaaS Impact Evaluation) which in addition to guidance for the
present research, aims to provide guidelines for the collection, data analysis, impact assessment and
decision making about including SaaS on the organizations strategic plans.
Mining of product reviews at aspect levelijfcstjournal
Today’s world is a world of Internet, almost all work can be done with the help of it, from simple mobile
phone recharge to biggest business deals can be done with the help of this technology. People spent their
most of the times on surfing on the Web; it becomes a new source of entertainment, education,
communication, shopping etc. Users not only use these websites but also give their feedback and
suggestions that will be useful for other users. In this way a large amount of reviews of users are collected
on the Web that needs to be explored, analyse and organized for better decision making. Opinion Mining or
Sentiment Analysis is a Natural Language Processing and Information Extraction task that identifies the
user’s views or opinions explained in the form of positive, negative or neutral comments and quotes
underlying the text. Aspect based opinion mining is one of the level of Opinion mining that determines the
aspect of the given reviews and classify the review for each feature. In this paper an aspect based opinion
mining system is proposed to classify the reviews as positive, negative and neutral for each feature.
Negation is also handled in the proposed system. Experimental results using reviews of products show the
effectiveness of the system.
α Nearness ant colony system with adaptive strategies for the traveling sales...ijfcstjournal
On account of ant colony algorithm easy to fall into local optimum, this paper presents an improved ant
colony optimization called α-AACS and reports its performance. At first, we provide an concise description
of the original ant colony system(ACS) and introduce α-nearness based on the minimum 1-tree for ACS’s
disadvantage, which better reflects the chances of a given link being a member of an optimal tour. Then, we
improve α-nearness by computing a lower bound and propose other adaptations for ACS. Finally, we
conduct a fair competition between our algorithm and others. The results clearly show that α-AACS has a
better global searching ability in finding the best solutions, which indicates that α-AACS is an effective
approach for solving the traveling salesman problem.
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines,
which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
Defragmentation of indian legal cases withijfcstjournal
The main aim of this research paper is to develop a rule based knowledge database for legal expert system
for consumer protection act, a domain within the Indian legal system which is often in demand. A
knowledge database developed here will further help the legal expert system to determine type of case with
respect to the Indian Judicial System. In this paper a rule based knowledge database will be developed to
determine the type of the case. The main aim of the study is to build a prototype which will rule based in
nature. The rule based knowledge database development is the first phase in the development of
comprehensive rule based legal expert system for consumer protection act which will be of great help in
process of solving consumer related cases.
A comparative study on remote tracking of parkinson’s disease progression usi...ijfcstjournal
In recent years, applications of data mining method
s are become more popular in many fields of medical
diagnosis and evaluations. The data mining methods
are appropriate tools for discovering and extractin
g
of available knowledge in medical databases. In thi
s study, we divided 11 data mining algorithms into
five
groups which are applied to a dataset of patient’s
clinical variables data with Parkinson’s Disease (P
D) to
study the disease progression. The dataset includes
22 properties of 42 people that all of our algorit
hms
are applied to this dataset. The Decision Table wit
h 0.9985 correlation coefficients has the best accu
racy
and Decision Stump with 0.7919 correlation coeffici
ents has the lowest accuracy.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
Migration strategies for object oriented system to component based systemijfcstjournal
Migration of object oriented system to component based System is not an easy task, not only technically a
lot of changes needs to be done but also numerous other issues needs to be kept in mind. However Component based Software development has been gaining its popularity from the past few years and has higher reusability scope. Programs built using CBSE approach are confirmed to be suitable to new environments. These days it’s a universal practice to reuse components in project to achieve better quality and to save time. So moving to CBSE from object oriented seems wise decision. Number of approaches has been introduced to implement this and each one of them has its own pros and cons. The paper focuses on the brief review on works of different authors in this area from the year 2000 to 2014.
An interactive approach to requirements prioritization using quality factorsijfcstjournal
As the prevalence of software increases, so does the complexity and the number of requirements assoc
iated
to the software project. This presents a dilemma for the developers to clearly identify and prioriti
ze the
most important requirements in order to del
iver the project in given amount of resources and time.
A
number of prioritization methods have been proposed which provide consistent results, but they are v
ery
difficult and complex to implement in practical scenarios as well as lack proper structure to
analyze the
requirements properly. In this study, the users can provide their requirements in two forms: text ba
sed
story form and use case form.
Moreover, the existing prioritization techniques have a very little or no
interaction with the users. So, in t
his paper an attempt has been made to make the prioritization process
user interactive by adding a second level of prioritization where after the developer has properly a
nalyzed
and ranked the requirements on the basis of quality attributes in the first le
vel, takes the opinion of distinct
user’s about the requirements priority sequence. The developer then calculates the disagreement valu
e
associated with each user sequence in order to find out the final priority sequence.
Distribution of maximal clique size underijfcstjournal
In this paper, we analyze the evolution of a small-world network and its subsequent transformation to a
random network using the idea of link rewiring under the well-known Watts-Strogatz model for complex
networks. Every link u-v in the regular network is considered for rewiring with a certain probability and if
chosen for rewiring, the link u-v is removed from the network and the node u is connected to a randomly
chosen node w (other than nodes u and v). Our objective in this paper is to analyze the distribution of the
maximal clique size per node by varying the probability of link rewiring and the degree per node (number
of links incident on a node) in the initial regular network. For a given probability of rewiring and initial
number of links per node, we observe the distribution of the maximal clique per node to follow a Poisson
distribution. We also observe the maximal clique size per node in the small-world network to be very close
to that of the average value and close to that of the maximal clique size in a regular network. There is no
appreciable decrease in the maximal clique size per node when the network transforms from a regular
network to a small-world network. On the other hand, when the network transforms from a small-world
network to a random network, the average maximal clique size value decreases significantly.
LOGMIN: A MODEL FOR CALL LOG MINING IN MOBILE DEVICESijfcstjournal
In today’s instant communication era, mobile phones play an important role in the efficient communication with respect to both individual and official communication strata. With the drastic explosion in the quantity of calls received and made, there is a need for analyses of patterns in these call logs to assist the user of the
mobile device in the optimal utilization. This paper proposes a model termed “LogMin” (Log Mining of Calls in Mobile devices) which is aimed towards mining of call log in mobile phones to discover patterns and keep the user informed about the trends in the log. The logging of calls would facilitate the user to get an insight into patterns based on the six different parameters identified by the proposed LogMin model. The proposed model is validated with a prototype implementation in the Android platform and various experiments were conducted on it. The results of the experiments in the LogMin Android implementation validate the efficiency of the proposed model with respect to user’s relevancy metric which is computed as 96.52%.
GRAPH COLOURING PROBLEM BASED ON DISCRETE IMPERIALIST COMPETITIVE ALGORITHMijfcstjournal
In graph theory, Graph Colouring Problem (GCP) is an assignment of colours to vertices of any given graph such that the colours on adjacent vertices are different. The GCP is known to be an optimization and NP-hard problem.Imperialist Competitive Algorithm (ICA) is a meta-heuristic optimization and stochastic search strategy which is inspired from socio-political phenomenon of imperialistic competition. The ICA contains two main operators: the assimilation and the imperialistic competition. The ICA has excellent capabilities such as high convergence rate and better global optimum achievement. In this research, a discrete version of ICA is proposed to deal with the solution of GCP. We call this algorithm as the DICA.The performance of the proposed method is compared with Genetic Algorithm (GA) on seven well-known graph colouring benchmarks. Experimental results demonstrate the superiority of the DICA for the benchmarks. This means DICA can produce optimal and valid solutions for different GCP instances.
SHORT LISTING LIKELY IMAGES USING PROPOSED MODIFIED-SIFT TOGETHER WITH CONVEN...ijfcstjournal
The paper proposes the modified-SIFT algorithm which will be a modified form of the scale invariant feature transform. The modification consists of considering successive groups of 8 rows of pixel, along the height of the image. These are used to construct 8 bin histograms for magnitude as well as orientation individually. As a result the number of feature descriptors is significantly less (95%) than the standard SIFT approach. Fewer feature descriptor leads to reduced accuracy. This reduction in accuracy is quite drastic when searching for a single (RANK1) image match; however accuracy improves if a band of likely (say tolerance of 10%) images is to be returned. The paper therefore proposes a two-stage-approach where
First Modified-SIFT is used to obtain a shortlisted band of likely images subsequently SIFT is applied within this band to find a perfect match. It may appear that this process is tedious however it provides a significant reduction in search time as compared to applying SIFT on the entire database. The minor reduction in accuracy can be offset by the considerable time gained while searching a large database. The
modified-SIFT algorithm when used in conjunction with a face cropping algorithm can also be used to find a match against disguised images.
Formal method techniques provides a suitable platform for the software development in software systems.
Formal methods and formal verification is necessary to prove the correctness and improve performance of
software systems in various levels of design and implementation, too. Security Discussion is an important
issue in computer systems. Since the antivirus applications have very important role in computer systems
security, verifying these applications is very essential and necessary. In this paper, we present four new
approaches for antivirus system behavior and a behavioral model of protection services in the antivirus
system is proposed. We divided the behavioral model in to preventive behavior and control behavior and
then we formal these behaviors. Finally by using some definitions we explain the way these behaviors are
mapped on each other by using our new approaches.
There are several kinds of errors for which error detecting and error correcting codes have been
constructed. Solid burst errors are common in many communications. In general communication due to the
long messages, the strings of solid bursts of small length may repeat in a vector itself. The concept of
repeated bursts is introduced by Beraradi, Dass and Verma[3] which has opened a new area of study. They
defined 2-repeated bursts and obtained results for detection and correction of such type of errors.
This paper considers a new similar kind of error which will be termed as '2-repeated solid burst error of
length b'. Lower and upper bounds on the number of parity checks required for the existence of codes that
detect 2-repeated solid burst error of length b or less are obtained. This is followed by an example of such
a code. Further, codes capable of detecting and simultaneously correcting such errors have also been dealt
with.
Analysis of software cost estimation usingijfcstjournal
The growing application of software and resource constraints in software projects development need a
more accurate estimate of the cost and effort because of the importance in program planning, coordinated
scheduling and resource management including the number of programming's and software design using
tools and modern methods of modeling. Effectively control of investment for software development is
achieved by accurate cost estimation.The accurate Software Cost Estimation (SCE) is very difficult in the
early stages of software development because many of input parameters that are effective in software's
effort are very vague and uncertain in the early stages. SCE that is the basis of software projects
development planning is considered to be of high accuracy, because if the estimate is less than actual
values, confidence factor is reduce and this is means the possibility of failure in project. Conversely, if the
project is estimated at more than the actual value it would be the concept of unhelpful investment and
waste of resources. In the evaluation of software projects is commonly used deterministic method. But
software world is totally different from the linear variables and nowadays for performance and estimation
should be used nonlinear and non-probabilistic methods. In this paper, we have studied the SCE Using
Fuzzy Logic (FL) and we have compared it with COCOMO model. Results of investigations show that FL is
a performance model for SCE.
Applications of artificial immune system a reviewijfcstjournal
The Biological Immune System is a remarkable information processing and self-learning system that offers
stimulation to build Artificial Immune System (AIS).During the last two decades, the field of AIS is
progressing slowly and steadily as a branch of Computational Intelligence (CI). At present the AIS
algorithms such as Negative Selection Theory, Clonal Selection Theory, Immune Networks Theory, Danger
theory and Dendritic Cell Algorithm are widely used to solve many real world problems in a vast range of
domain areas such as Network Intrusion Detection (NID), Anomaly Detection, Clustering and
classification and Pattern recognition. This review paper critically discusses the theoretical foundation,
research methodologies and applications of the AIS.
A Novel Method for Data Cleaning and User- Session Identification for Web MiningIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
In today ’s global business, the web has been the most important means of communication. Clients and customers may find their products online, which is a benefit of doing business online. Web mining is the process of using data mining tools to analyse and extract the information from a Web pages and applications autonomously. Many firms use web structure mining to generate suitable predictions and judgments for business growth, productivity, manufacturing techniques, and more utilizing data mining business strategies. In the online booking domain, optimum web data mining analysis of web structure is a crucial component that gives a systematic manner of new application towards real-time data with various levels of implications. Web structure mining emphases on the construction of the web's hyperlinks. Linkage administration that is done correctly can lead to future connections, which can therefore increase the prediction performance of learnt models. A increased interest in Web mining, structural analysis research has expanded, resulting in a new research area that sits at the crossroads of work in the network analysis, hyperlink and the web mining, structural training, and empirical software design techniques, as well as graph mining. Web structure mining is the development of determining structure data from the web. The proposed WSM approach is a system of finding the structure of data stored over the Web. Web structure mining can encourage the clients to recover the significant records by breaking down the connection situated structure of Web content. Web structure mining has been one of the most important resources for information extraction and the knowledge discovery as the amount of data available online has increased.
Web is a collection of inter-related files on one or more web servers while web mining means extracting
valuable information from web databases. Web mining is one of the data mining domains where data
mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer
behaviour, evaluate a particular website based on the information which is stored in web log files. Web
mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library.
Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase
the complexity of dealing information from different web service providers. The collection of information
becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web is a collection of inter-related files on one or more web servers while web mining means extracting
valuable information from web databases. Web mining is one of the data mining domains where data
mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer
behaviour, evaluate a particular website based on the information which is stored in web log files. Web
mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library.
Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase
the complexity of dealing information from different web service providers. The collection of information
becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web is a collection of inter-related files on one or more web servers while web mining means extracting
valuable information from web databases. Web mining is one of the data mining domains where data
mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer
behaviour, evaluate a particular website based on the information which is stored in web log files. Web
mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library.
Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase
the complexity of dealing information from different web service providers. The collection of information
becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web is a collection of inter-related files on one or more web servers while web mining means extracting valuable information from web databases. Web mining is one of the data mining domains where data mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer behaviour, evaluate a particular website based on the information which is stored in web log files. Web mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library. Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase the complexity of dealing information from different web service providers. The collection of information becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web is a collection of inter-related files on one or more web servers while web mining means extracting
valuable information from web databases. Web mining is one of the data mining domains where data
mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer
behaviour, evaluate a particular website based on the information which is stored in web log files. Web
mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library.
Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase
the complexity of dealing information from different web service providers. The collection of information
becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
Web is a collection of inter-related files on one or more web servers while web mining means extracting valuable information from web databases. Web mining is one of the data mining domains where data mining techniques are used for extracting information from the web servers. The web data includes web
pages, web links, objects on the web and web logs. Web mining is used to understand the customer behaviour, evaluate a particular website based on the information which is stored in web log files. Web mining is evaluated by using data mining techniques, namely classification, clustering, and association
rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, Egovernment, E-policies, E-democracy, Electronic business, security, crime investigation and digital library. Retrieving the required web page from the web efficiently and effectively becomes a challenging task
because web is made up of unstructured data, which delivers the large amount of information and increase the complexity of dealing information from different web service providers. The collection of information becomes very hard to find, extract, filter or evaluate the relevant information for the users. In this paper,
we have studied the basic concepts of web mining, classification, processes and issues. In addition to this,
this paper also analyzed the web mining research challenges.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Implementation of Intelligent Web Server Monitoringiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Comparative Study of Recommendation System Using Web Usage Mining Editor IJMTER
Web Mining is one of the Developing field in research. Exact custom of the Web is to get the
beneficial material in the sites. To reduce the work time of user the Web Usage Mining (WUM) technique
is introduced. In this Technique use Web Page recommendation for the Web request from the user. For
the recommendation system in Web Usage Mining (WUM) variousauthor has introduce different
Algorithm and technique to improve the user interest in surfing the Web. Web log files are used todefine
the user interest and there next recommend page to view.The data stored in the web log file consist of
large amount oferoded, incomplete, and unnecessary information. So, the Web log files have to preprocess, customize, and to clean the data. In this paper we will survey different recommendation technique
to identify the issues in web surfing and to improve web usagemining (WUM) pre-processing for pattern
mining and analysis.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A Web Extraction Using Soft Algorithm for Trinity Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Framework for web personalization using web miningeSAT Journals
Abstract WWW is a large amount of information provider and a very big source of information. Users are increasing every day for accessing web sites. For efficient and effective handling, web mining coupled with suggestion techniques provides personalized contents at the disposal of users. Web Mining is an area of Data Mining dealing with the extraction of interesting knowledge from the Web. Here we are presenting a comprehensive overview of the personalization process based on Web usage mining. In this a host of Web usage mining activities required for this process, including the pre-processing and integration of data from multiple sources, and common pattern discovery techniques that are applied to the integrated usage data. Index Terms: Web-Usage Mining, Data Mining, Personalization, Pattern Discovery, Web Mining, Web Personalization
Similar to Web personalization using clustering of web usage data (20)
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
A COMPARATIVE ANALYSIS ON SOFTWARE ARCHITECTURE STYLESijfcstjournal
Software architecture is the structural solution that achieves the overall technical and operational
requirements for software developments. Software engineers applied software architectures for their
software system developments; however, they worry the basic benchmarks in order to select software
architecture styles, possible components, integration methods (connectors) and the exact application of
each style.
The objective of this research work was a comparative analysis of software architecture styles by its
weakness and benefits in order to select by the programmer during their design time. Finally, in this study,
the researcher has been identified architectural styles, weakness, and Strength and application areas with
its component, connector and Interface for the selected architectural styles.
SYSTEM ANALYSIS AND DESIGN FOR A BUSINESS DEVELOPMENT MANAGEMENT SYSTEM BASED...ijfcstjournal
A design of a sales system for professional services requires a comprehensive understanding of the
dynamics of sale cycles and how key knowledge for completing sales is managed. This research describes
a design model of a business development (sales) system for professional service firms based on the Saudi
Arabian commercial market, which takes into account the new advances in technology while preserving
unique or cultural practices that are an important part of the Saudi Arabian commercial market. The
design model has combined a number of key technologies, such as cloud computing and mobility, as an
integral part of the proposed system. An adaptive development process has also been used in implementing
the proposed design model.
AN ALGORITHM FOR SOLVING LINEAR OPTIMIZATION PROBLEMS SUBJECTED TO THE INTERS...ijfcstjournal
Frank t-norms are parametric family of continuous Archimedean t-norms whose members are also strict
functions. Very often, this family of t-norms is also called the family of fundamental t-norms because of the
role it plays in several applications. In this paper, optimization of a linear objective function with fuzzy
relational inequality constraints is investigated. The feasible region is formed as the intersection of two
inequality fuzzy systems defined by frank family of t-norms is considered as fuzzy composition. First, the
resolution of the feasible solutions set is studied where the two fuzzy inequality systems are defined with
max-Frank composition. Second, some related basic and theoretical properties are derived. Then, a
necessary and sufficient condition and three other necessary conditions are presented to conceptualize the
feasibility of the problem. Subsequently, it is shown that a lower bound is always attainable for the optimal
objective value. Also, it is proved that the optimal solution of the problem is always resulted from the
unique maximum solution and a minimal solution of the feasible region. Finally, an algorithm is presented
to solve the problem and an example is described to illustrate the algorithm. Additionally, a method is
proposed to generate random feasible max-Frank fuzzy relational inequalities. By this method, we can
easily generate a feasible test problem and employ our algorithm to it.
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
Underwater detector network is one amongst the foremost difficult and fascinating analysis arenas that
open the door of pleasing plenty of researchers during this field of study. In several under water based
sensor applications, nodes are square measured and through this the energy is affected. Thus, the mobility
of each sensor nodes are measured through the water atmosphere from the water flow for sensor based
protocol formations. Researchers have developed many routing protocols. However, those lost their charm
with the time. This can be the demand of the age to supply associate degree upon energy-efficient and
ascendable strong routing protocol for under water actuator networks. During this work, the authors tend
to propose a customary routing protocol named level primarily based routing protocol (LBRP), reaching to
offer strong, ascendable and energy economical routing. LBRP conjointly guarantees the most effective use
of total energy consumption and ensures packet transmission which redirects as an additional reliability in
compare to different routing protocols. In this work, the authors have used the level of forwarding node,
residual energy and distance from the forwarding node to the causing node as a proof in multicasting
technique comparisons. Throughout this work, the authors have got a recognition result concerning about
86.35% on the average in node multicasting performances. Simulation has been experienced each in a
wheezy and quiet atmosphere which represents the endorsement of higher performance for the planned
protocol.
STRUCTURAL DYNAMICS AND EVOLUTION OF CAPSULE ENDOSCOPY (PILL CAMERA) TECHNOLO...ijfcstjournal
This research paper examined and re-evaluates the technological innovation, theory, structural dynamics
and evolution of Pill Camera(Capsule Endoscopy) technology in redirecting the response manner of small
bowel (intestine) examination in human. The Pill Camera (Endoscopy Capsule) is made up of sealed
biocompatible material to withstand acid, enzymes and other antibody chemicals in the stomach is a
technology that helps the medical practitioners especially the general physicians and the
gastroenterologists to examine and re-examine the intestine for possible bleeding or infection. Before the
advent of the Pill camera (Endoscopy Capsule) the colonoscopy was the local method used but research
showed that some parts (bowel) of the intestine can’t be reach by mere traditional method hence the need
for Pill Camera. Countless number of deaths from stomach disease such as polyps, inflammatory bowel
(Crohn”s diseases), Cancers, Ulcer, anaemia and tumours of small intestines which ordinary would have
been detected by sophisticated technology like Pill Camera has become norm in the developing nations.
Nevertheless, not only will this paper examine and re-evaluate the Pill Camera Innovation, theory,
Structural dynamics and evolution it unravelled and aimed to create awareness for both medical
practitioners and the public.
AN OPTIMIZED HYBRID APPROACH FOR PATH FINDINGijfcstjournal
Path finding algorithm addresses problem of finding shortest path from source to destination avoiding
obstacles. There exist various search algorithms namely A*, Dijkstra's and ant colony optimization. Unlike
most path finding algorithms which require destination co-ordinates to compute path, the proposed
algorithm comprises of a new method which finds path using backtracking without requiring destination
co-ordinates. Moreover, in existing path finding algorithm, the number of iterations required to find path is
large. Hence, to overcome this, an algorithm is proposed which reduces number of iterations required to
traverse the path. The proposed algorithm is hybrid of backtracking and a new technique(modified 8-
neighbor approach). The proposed algorithm can become essential part in location based, network, gaming
applications. grid traversal, navigation, gaming applications, mobile robot and Artificial Intelligence.
EAGRO CROP MARKETING FOR FARMING COMMUNITYijfcstjournal
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines, which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
EDGE-TENACITY IN CYCLES AND COMPLETE GRAPHSijfcstjournal
It is well known that the tenacity is a proper measure for studying vulnerability and reliability in graphs.
Here, a modified edge-tenacity of a graph is introduced based on the classical definition of tenacity.
Properties and bounds for this measure are introduced; meanwhile edge-tenacity is calculated for cycle
graphs and also for complete graphs.
COMPARATIVE STUDY OF DIFFERENT ALGORITHMS TO SOLVE N QUEENS PROBLEMijfcstjournal
This Paper provides a brief description of the Genetic Algorithm (GA), the Simulated Annealing (SA)
Algorithm, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm and attempts to
explain the way as how the Proposed Genetic Algorithm (GA), the Proposed Simulated Annealing (SA)
Algorithm using GA, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm can be
employed in finding the best solution of N Queens Problem and also, makes a comparison between these
four algorithms. It is entirely a review based work. The four algorithms were written as well as
implemented. From the Results, it was found that, the Proposed Genetic Algorithm (GA) performed better
than the Proposed Simulated Annealing (SA) Algorithm using GA, the Backtracking (BT) Algorithm and
the Brute Force (BF) Search Algorithm and it also provided better fitness value (solution) than the
Proposed Simulated Annealing Algorithm (SA) using GA, the Backtracking (BT) Algorithm and the Brute
Force (BF) Search Algorithm, for different N values. Also, it was noticed that, the Proposed GA took more
time to provide result than the Proposed SA using GA.
PSTECEQL: A NOVEL EVENT QUERY LANGUAGE FOR VANET’S UNCERTAIN EVENT STREAMSijfcstjournal
In recent years, the complex event processing technology has been used to process the VANET’s temporal
and spatial event streams. However, we usually cannot get the accurate data because the device sensing
accuracy limitations of the system. We only can get the uncertain data from the complex and limited
environment of the VANET. Because the VANET’s event streams are consist of the uncertain data, so they
are also uncertain. How effective to express and process these uncertain event streams has become the core
issue for the VANET system. To solve this problem, we propose a novel complex event query language
PSTeCEQL (probabilistic spatio-temporal constraint event query language). Firstly, we give the definition
of the possible world model of VANET’s uncertain event streams. Secondly, we propose an event query
language PSTeCEQL and give the syntax and the operational semantics of the language. Finally, we
illustrate the validity of the PSTeCEQL by an example.
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON...ijfcstjournal
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
A MUTATION TESTING ANALYSIS AND REGRESSION TESTINGijfcstjournal
Software testing is a testing which conducted a test to provide information to client about the quality of the
product under test. Software testing can also provide an objective, independent view of the software to
allow the business to appreciate and understand the risks of software implementation. In this paper we
focused on two main software testing –mutation testing and mutation testing. Mutation testing is a
procedural testing method, i.e. we use the structure of the code to guide the test program, A mutation is a
little change in a program. Such changes are applied to model low level defects that obtain in the process
of coding systems. Ideally mutations should model low-level defect creation. Mutation testing is a process
of testing in which code is modified then mutated code is tested against test suites. The mutations used in
source code are planned to include in common programming errors. A good unit test typically detects the
program mutations and fails automatically. Mutation testing is used on many different platforms, including
Java, C++, C# and Ruby. Regression testing is a type of software testing that seeks to uncover
new software bugs, or regressions, in existing functional and non-functional areas of a system after
changes such as enhancements, patches or configuration changes, have been made to them. When defects
are found during testing, the defect got fixed and that part of the software started working as needed. But
there may be a case that the defects that fixed have introduced or uncovered a different defect in the
software. The way to detect these unexpected bugs and to fix them used regression testing. The main focus
of regression testing is to verify that changes in the software or program have not made any adverse side
effects and that the software still meets its need. Regression tests are done when there are any changes
made on software, because of modified functions.
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...ijfcstjournal
Advances in micro fabrication and communication techniques have led to unimaginable proliferation of
WSN applications. Research is focussed on reduction of setup operational energy costs. Bulk of operational
energy costs are linked to communication activities of WSN. Any progress towards energy efficiency has a
potential of huge savings globally. Therefore, every energy efficient step is an endeavour to cut costs and
‘Go Green’. In this paper, we have proposed a framework to reduce communication workload through: Innetwork compression and multiple query synthesis at the base-station and modification of query syntax
through introduction of Static Variables. These approaches are general approaches which can be used in
any WSN irrespective of application.
A NEW MODEL FOR SOFTWARE COSTESTIMATION USING HARMONY SEARCHijfcstjournal
Accurate and realistic estimation is always considered to be a great challenge in software industry.
Software Cost Estimation (SCE) is the standard application used to manage software projects. Determining
the amount of estimation in the initial stages of the project depends on planning other activities of the
project. In fact, the estimation is confronted with a number of uncertainties and barriers’, yet assessing the
previous projects is essential to solve this problem. Several models have been developed for the analysis of
software projects. But the classical reference method is the COCOMO model, there are other methods
which are also applied such as Function Point (FP), Line of Code(LOC); meanwhile, the expert`s opinions
matter in this regard. In recent years, the growth and the combination of meta-heuristic algorithms with
high accuracy have brought about a great achievement in software engineering. Meta-heuristic algorithms
which can analyze data from multiple dimensions and identify the optimum solution between them are
analytical tools for the analysis of data. In this paper, we have used the Harmony Search (HS)algorithm for
SCE. The proposed model which is a collection of 60 standard projects from Dataset NASA60 has been
assessed.The experimental results show that HS algorithm is a good way for determining the weight
similarity measures factors of software effort, and reducing the error of MRE.
AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKSijfcstjournal
Mining biological data is an emergent area at the intersection between bioinformatics and data mining
(DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining
(DDM) systems to address scalable mining over large scale distributed data. The nature of associations
between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to
develop new models and exploit and analyze the available distributed biological data sources. In this study,
we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative
Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such
globally strong association rules enhance understanding of protein composition and are desirable for
synthesis of artificial proteins. A real protein data bank is used to validate the system.
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
International Journal on Foundations of Computer Science & Technology (IJFCST) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Foundations of Computer Science & Technology. Over the last decade, there has been an explosion in the field of computer science to solve various problems from mathematics to engineering. This journal aims to provide a platform for exchanging ideas in new emerging trends that needs more focus and exposure and will attempt to publish proposals that strengthen our goals. Topics of interest include, but are not limited to the following:
Because the technology is used largely in the last decades; cybercrimes have become a significant
international issue as a result of the huge damage that it causes to the business and even to the ordinary
users of technology. The main aims of this paper is to shed light on digital crimes and gives overview about
what a person who is related to computer science has to know about this new type of crimes. The paper has
three sections: Introduction to Digital Crime which gives fundamental information about digital crimes,
Digital Crime Investigation which presents different investigation models and the third section is about
Cybercrime Law.
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...ijfcstjournal
In this paper, we analyze the evolution of a small-world network and its subsequent transformation to a
random network using the idea of link rewiring under the well-known Watts-Strogatz model for complex
networks. Every link u-v in the regular network is considered for rewiring with a certain probability and if
chosen for rewiring, the link u-v is removed from the network and the node u is connected to a randomly
chosen node w (other than nodes u and v). Our objective in this paper is to analyze the distribution of the
maximal clique size per node by varying the probability of link rewiring and the degree per node (number
of links incident on a node) in the initial regular network. For a given probability of rewiring and initial
number of links per node, we observe the distribution of the maximal clique per node to follow a Poisson
distribution. We also observe the maximal clique size per node in the small-world network to be very close
to that of the average value and close to that of the maximal clique size in a regular network. There is no
appreciable decrease in the maximal clique size per node when the network transforms from a regular
network to a small-world network. On the other hand, when the network transforms from a small-world
network to a random network, the average maximal clique size value decreases significantly
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Web personalization using clustering of web usage data
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
WEB PERSONALIZATION USING CLUSTERING OF
WEB USAGE DATA
Bhawesh Kumar Thakur 1, Syed Qamar Abbas2 and Mohd. Rizwan Beg 3
1Department of Computer Science & Engineering, B.I.E.T., U.P.T.U, Lucknow, India
2A.I.M.T.,U.P.T.U Lucknow, India
3Department of Computer Science & Engineering, Integral University, Lucknow, India
ABSTRACT
The exponential growth in the number and the complexity of information resources and services on the Web
has made log data an indispensable resource to characterize the users for Web-based environment. It
creates information of related web data in the form of hierarchy structure through approximation. This
hierarchy structure can be used as the input for a variety of data mining tasks such as clustering,
association rule mining, sequence mining etc.
In this paper, we present an approach for personalizing web user environment dynamically when he
interacting with web by clustering of web usage data using concept hierarchy. The system is inferred from
the web server’s access logs by means of data and web usage mining techniques to extract the information
about users. The extracted knowledge is used for the purpose of offering a personalized view of the
services to users.
KEYWORDS
Web Mining, Web Usage Data, Clustering, Personalization.
1. INTRODUCTION
Data mining gives capabilities to analyzing large amount of data and extract information that are
useful for statistical and business purposes as well. Now a day’s our capability of generating and
collecting data have been increasing exponentially. Contributing factors include, the
computerization of many business, scientific, and government transactions, and advances in data
collection, wide spread use of bar codes, satellite remote sensing data etc. Besides this, heavy
access of the World Wide Web as a knowledge repository provides a large amount of data and
information. This extra ordinary growth in collected data has generated an immediate requirement
for new automated platform that can intelligently help us in converting the huge collection of data
into valuable information and knowledge.
Data mining aims at discovering valuable information that is hidden in conventional databases.
Data mining applied to the web has the potential to be quite beneficial. The emerging field of web
mining aims at finding and extracting relevant information that is hidden in Web-related data.
Etzioni (1996) is believed to be the inventor of ‘Web Mining’ and he described it as: “Web
Mining is the use of data mining techniques to automatically discover and extract information
from World Wide Web documents and services.”
Web mining is the mining of data related to the World Wide Web. It is categorized into three
active research areas according to what part of web data is mined, of which Usage mining, also
known as web-log mining, which studies user access information from logged server data in order
to extract interesting usage patterns [1].
DOI:10.5121/ijfcst.2014.4507 69
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
Web mining is commonly divided into the following three sub-areas:
1. Web Content Mining: It consists of a number of techniques for searching the Web for
documents whose content meets a certain criterion. It is an Application of data mining techniques
to unstructured or semi structured text.
2. Web Structure Mining: It tries to extract information from the hyperlink structure between
different pages. It uses the hyperlink structure of the web as an (additional) information sources.
3. Web Usage Mining: Web Usage Mining is defined as “the application of data mining
techniques to discover usage patterns from web data”. It is an analysis of user interaction with a
web server.
The commonly used Web mining technologies are user access pattern analysis, Web document
clustering, classification and information filtering. There are some deficiencies such that most of
Web search systems are just in a simple search model, focused on the retrieval efficiency and
ignored the retrieval accuracy due to inherent subjectivity, inaccuracy and uncertainty of user
queries. The soft decision-making is not available. The relevance of the documents is only a part
of attributes and has not very clear boundaries.
Clustering is the process of grouping the data into classes or clusters so that objects within a
cluster have high similarity in comparison to one another, but are very dissimilar to objects in
other clusters. Dissimilarities are assessed based on the attribute values describing the objects.
Clustering has its roots in many areas including data mining, statistics, biology and machine
learning. The entity in data mining could have large number of attributes. Clustering of said high
dimensional data creates major difficulties, more than in supervised learning. In decision trees
irrelevant attributes will not be selected for decision nodes. In clustering high dimensionality
faces a different problem. First, the irrelevant attributes dilutes any indication on clustering
nature. This could also possible with low dimensional space, the probability of occurrence and
number of irrelevant attributes increases with dimension. The second problem is the
dimensionality curse that is a lack of separation of data in a high dimensional space.
Web mining is used to automatically discover and extract information from Web-based data
sources such as documents, log, services, and user profiles.[8] A usage-based Web
personalization system utilizes Web data in order to modify a Web site.[9]
According to Personalization Consortium the main reason of using information technology to
provide personalization are [10]:
• By anticipating needs to cater the customer requirements in a better manner;
• Make the interaction efficient and satisfying for both parties;
• Build a relationship with customer that encourages them to return for consequent visits.
Personalization is a process of collecting and storing information about website users, analyzing
the information relatd to them, and, on the basis of analysis, giving the proper information to each
visitor at the right time.[11]
2. PREPROCESSING OF WEB DATA
In the WWW (World Wide Web) environment, a large amount of traffic can be generated
between clients and web server. The traffic from a client to a server is a URL. The traffic from
server to client is a named HTML file that will be interpreted and displayed on the client screen.
The web usage log probably is not in a format that is usable by mining applications. As with the
data may need to be reformatted and cleaned. Steps that are part of the preprocessing phase
include cleansing, user identification, session identification, path completion, and formatting.
70
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
• Let P be a set of literals, called pages or clicks, and U be a set of users. A Log is a set of
triplets {(u1,p1,t1)…(un,pn,tn)} where ui U, pi P, and ti is a time stamp.
Standard log data consist of the following:-Source site, destination site, and time stamp. A
common technique for a server site to divide the log records into sessions.
A session is a set of page references from one source site during one logical period. Historically, a
user logging into a computer, performing work, and then logging off the login would identify a
session and log off represent the logical start and end of the session.
• Data Cleaning
To clean a server log to remove irrelevant items is very important for Web log analysis. The
discovered information are only useful if the data represented in the server log gives a correct
scenario of the user accesses to the Web site. The HTTP protocol maintains an individual
connection for each file requested from the Web server.
• User Identification
Next, unique users must be identified. This is a very complex task because of the existence of
caches, firewalls, and proxy servers. The Web Usage Mining methods that rely on user co-operation
or by the automatic identification of web user are the ways to deal with this problem.
• Session Identification
For logs that cover long duration of time, it is most probable that users will visit the Web site
multiple times. The objectives of session identification are to break the web page visits of each
user into individual sessions. The simplest method for this is through a timeout, where if the time
between page requests crossed a certain threshold, it is supposed that the user is starting a new
session. Many commercial products use 15-30 minutes as a default timeout.
• Path Completion
Another problem in reliably identifying unique user sessions is determining if there are vital
accesses that are not recorded in the web access log. This problem is referred to as path
completion. Methods similar to those used for user identification can be used for path completion.
• Formatting
Once the appropriate pre-processing steps have been applied to the server log, a final preparation
module can be used to properly format the sessions or transactions for the type of data mining to
be accomplished.
Let P = (p1,p2…..pu) is the sequence of Web pages accessed from a certain IP between t1 and tu.
Then , a user session v(t1,tf),tf tu, is defined as: v(t1,tf)=(p1,p2,….pf) : (Δt=tj-ti-1 δ, 1<j f) ∧ (f =
u∨tf+1-tf>δ), where δ is a predefined time threshold [3].
There are many problems associated with the pre-processing activities, and most of these
problems centre around the correct identification of the actual user. User identification is
complicated by the use of proxy servers, client side caching, and corporate firewalls. Cookies can
be used to assist in identifying a single user regardless of machine used to access the web.
3. CLUSTERING HIGH DIMENSIONAL WEB USAGE DATA
Hundreds of properties are there in data mining objects. A terrific complexity is faced in high
dimensional spaces clustering which is much more than predictive learning. For node splitting,
however in decision trees, unnecessary properties would not be selected and those unnecessary
properties would not show any impact on Naïve Bayes. A twofold problem is introduced through
high dimensionality in clustering. Firstly, if there is any desire on clustering tendency,
unnecessary properties occurrence removes it.under whatever definition of similarity,.However, if
71
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
there are no amount of clusters present then it is a useless process to conduct a search for clusters.
The low dimensional data also follows the same process, Unnecessary properties always rises in
dimension with the propability of its number and occurrence. Secondly, dimensionality curse that
is a loose way of speaking about a lack of data separation in high dimensional space.
Scientifically, nearest neighbour query becomes unbalanced: the distance to the bulk of points is
almost similar to the distance of nearest neighbor. For the dimensions higher than fifteen, this
impact starts showing its brutal effect. Therefore, related to the theory of distance, creation of
clusters is becoming unsure in such conditions.
• Dimensionality Reduction
Most of the clustering algorithms based on spatial datasets indices which helps in rapid search of
the nearest neighbours. As a result, indices provides fine proxies related to dimensionality curse
performance impact. Indices whose dimensions ranges under sixteen work successfully, these
indices are used in clustering algorithms. For a dimension d > 20 their performance degrades to
the level of sequential search (though newer indices achieve significantly higher limits). Hence,
possibly it states that more than sixteen properties data is high dimensional.
Web clustering totally depends on the contents of the site resulting two hundred to one thousand
properties (pages/contents) for reserved Web sites. Dimensions in genomic and biology figures
without any problem goes beyond two thousand to five thousand properties. Finally, thousands of
properties are dealt in data mining and information recovery. Two approaches are used to handle
high dimensionality:
(1) Attributes transformations and
(2) Domain decomposition.
4. CLUSTERING OF WEB USAGE DATA USING CONCEPT HIERARCHY
Clustering of data in a large dimension space is of great interest in many data mining applications.
In this, I present a method for personalizing web user environment using clustering of web usage
data in a high dimensional space based. In this approach, the relationship present in the web usage
data are mapped into a fuzzy proximity relation of user transactions. For this, we cluster the user
transactions using the concept hierarchy of web usage data and similarity relation of user
transactions.
• A user transaction is a sequence of URLS. Let the users’ transaction be:
= {T1, T2, T3…Tm}
• Let U be the set of Unique URLS appearing in the log data for all transactions
U = {URL1, URL2, URL3…, URLn}
Here, each Ti is a non-empty subset of U, which is a set of unique URLs appearing in the pre-processed
72
log.
4.1. Concept hierarchy of web usage data
Concept hierarchy plays an important role in knowledge discovery process as it specifies
background and domain knowledge and helps in better mining of data. In proposed approach,
clusters of user transaction are formed from pre-processed log of web usage data, which can be
put in a hierarchy.
• A concept hierarchy is a poset (partially ordered set) (H, < ) where H is a finite set of
concept and < is a partial order on H. [4]
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
4.2. Clustering user transactions
Clustering of user transactions tend to establish user transactions into groups having similar
characteristics that exhibit similar patterns. In this method, transactions related to user are mapped
into a multi-dimensional space as vectors of URL. Normally clustering technique partition this
space into groups of URL that are nearer to each other based on a distance measure. With
reference to web based transactions, each cluster represents a set of transactions that are similar
based on co-occurrence patterns of URL references [5]. Such knowledge is especially useful in
order to perform market segmentation in E-Commerce application or provide personalized web
content to the users.
• A pair (A, B) is a (formal) concept of (D, T, R) if and only if
A⊆D, B⊆T, A’=B∧B’=A
Where,
A’ = {t∈T | (d,t)∈R for all d∈D }
B’ = {d∈D | (d,t)∈R for all t∈T }[6]
• A proximity relation is a mapping S: D × D → [0, 1] such that for x, y, z D, the following
rules hold:
S (x, x) = 1 (reflexivity)
S (x, y) = S(y, x) (symmetry)
A fuzzy relation that is reflexive and symmetric is called fuzzy-similarity relation [2].
• - Similarity:-
If S is a proximity relation on D, then given an [0,1], two elements x, z D are -similar
(denoted by x S z) if and only if S (x, y) >= , definition ( Similarity) are said to -proximate
(denoted by x S+z) if and only if either x S z or there exist a sequence y1,y2,y3,….yn D such
that
x S y1, y1 S y2 , y2 S y3………., yn S z
• Indiscernibility:-
If S: D ×D→ [0, 1] is a proximity relation, then S
+ fulfill the property of reflexivity and symmetry. Based on
73
+ is an equivalence relation.
From proximity relation concept, S
Similarity condition, the clusters (transactions) will merges, and it appears as it is due to
transitivity.
Hence, S+ is an equivalence relation.
Let the children of a transaction T be denoted as child (T). The similarity of two transactions Ti
and Tj is defined as: - [4]
Sim(Ti, Tj) = Child (Ti) Child (Tj) / Child(Ti) Child( Tj )
Let = {T1, T2, T3…Tm} be the user transactions and U = {URL1, URL2, URL3…URLn}
be the set of unique URLs in all the transactions here Ti U for I =1 to m. For all Ti, Tj , I
define a fuzzy relation R on as
R ( Ti, Tj ) = Sim ( Ti, Tj )
From the concept of Indiscernibility, it can be seen that the similarity of two transactions Ti and
Tj is a number between 0 and 1.When the similarity of two transactions Ti and Tj is 0, then Ti
and Tj are completely dissimilar. On the other hand, if the similarity is 1, then the two
transactions are completely similar.
The measure of similarity gives information about the users’ browsing pattern. Actually, the
browsing of web by any two users may not be exactly similar but may have common interesting
sites. Based on the said definition, a similarity matrix can be calculated and used as the base for
6. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
clustering. In the context of clustering user transactions, the huge numbers of dimensions are the
URLs present in the access logs.
We consider the fuzzy relation to decide the amount of similarity between the user transactions.
For all Ti,Tj , I define a fuzzy relation R on as:-
R (Ti, Tj) = Sim (Ti, Tj)
Relation R formed by the similarity of user transaction = {T1, T2, T3…Tm} is a fuzzy proximity
relation and is given by R:
Where ij = Sim (Ti, Tj)
Now, for any given value >=0, R+ is an equivalence relation. The corresponding partition
represents transaction clusters. These partitions represent the transaction Clusters. The user
specified value is an important factor based on which the performance of cluster depends. A
domain expert can choose the value of [0,1] based on her/his experience. Algorithm -1
represents the overall approach used for clustering
74
Table 1. Algorithm-1 for cluster generation based on concept hierarchy using fuzzy similarity
INPUT:
1. = {T1, T2, T3…Tm} // Set of transactions for unique user
2. U = {URL1, URL2, URL3…URLn} // set of unique URLs appearing in the preprocessed log.
3. Value of (by domain expert)
OUT PUT:
1. Number of clusters
2. Clusters (Set of Transactions)
Step-1: Find the children (all URLs) in a single transaction T as Child (T)
Step-2: Find the similarity of two transactions. The similarity of two transactions Ti and Tj is
defined as: -
Sim(Ti,Tj)=│Child(Ti)Child(Tj)││Child(Ti)Child(Tj)│
Step-3: Compute similarity matrix for all Ti and Tj , where each value in matrix ij = Sim (Ti, Tj).
Proximity relation is defined as
Sim (Ti,Tj) [0,1]
Sim (Ti,Ti) = 1 (Reflexivity)
(Ti,Tj) = Sim (Tj,Ti) ( Symmetry)
Value of similarity measure always lies between 0 and 1 (fuzzy similarity)
Step-4 : Compute - Similarity.
For a given value (provided by domain expert) ,here [0,1], two elements Ti, Tz are -similar
if Sim (Ti, Tj) >= , definition ( Similarity) are said to -proximate if there exist a sequence
y1,y2,y3,….yn such that Ti Sim y1, y1 Sim y2 , y2 Sim y3………., yn Sim Tz. (Transitivity)
Step-6: Proximity relation between Ti and Tj fullfil the following properties
1. Reflexivity
2. Symmetry
3. Transitivity
Hence , Sim (Ti ,Tj ) proves the Equivalence Relation. This relation generates the equivalence
classes. For given value >=0, the corresponding partition (equivalence class) represent transaction
clusters
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
4.3. Experimental Results & Analysis for Clustering of High Dimensional Data
We assessed the effectiveness of presented new clustering approach by experimenting with real
data sets. The data sets came from MCU Web-site’s server log file. We show similarity matrix
related to one data set and their clusters and partition coefficients in table-I and table-II. The log
data from accesses to the server during a period of 30 days is used for the purpose of generation
of user sessions profiles.
In this model clustering quality is purely dependent on user specific value [0,1]. Cluster
members vary accordingly variation in the value of [0,1]. Validity of cluster goodness
produced by this model is highly subjective matter depends upon domain expert.Algorithm-2
gives an approach to check cluster validity.
We evaluate cluster goodness on different value of on different data sets (six data sets) using
Partition coefficient, which is simple method for evaluating cluster goodness and applicable on
high dimensional data (Web Data). The objective is to seek clustering schemes where most of the
vectors of the dataset exhibit high degree of similarity in one cluster. We note, here, that a fuzzy
clustering is defined by a matrix U =[uij] where uij denotes the degree of similarity of the vector
xi in the j cluster.
Partition Coefficient (Bezdek proposed in Bezdeck et al.(1984) the partition coefficient, which is
defined as[7]
The PC index values range in [1/nc, 1], where nc is the number of clusters. The closer to unity
the index the “crisper” the clustering is. In case that all membership values to a fuzzy partition are
equal, that is, uij=1/nc, the PC obtains its lower value. Thus, the closer the value of PC is to 1/nc,
the fuzzier the clustering is. Furthermore, a value close to 1/nc indicates that there is no clustering
tendency in the considered dataset.
75
Table 2. Cluster validity Algorithm
INPUT:
Nc – Number of clusters
N-Total number of transactions in all clusters
OUTPUT:
PC-Partition coefficient index value
ALGORITHM:
For i =1 to N do
For j = 1 to nc
U = 2
4.4. Snapshot of Web Log data of MCU Server:
2013-12-30 02:47:36
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username
c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) sc-status sc-substatus sc-win32-
status sc-bytes cs-bytes time-taken
ij
ij denotes the degree of similarity of the vector xi in the jth cluster
end for j
end for i
PC = U/N
8. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
2013-12-30 02:47:36 W3SVC8521 NS1 70.84.233.186 GET /robots.txt - 80 - 65.54.188.68
HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 404 0 2 1814 153 46
2013-12-30 02:47:36 W3SVC8521 NS1 70.84.233.186 GET /forum/Default.asp - 80 -
65.54.188.68 HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 200 0 0 14115 225
421
2013-12-30 03:01:44 W3SVC8521 NS1 70.84.233.186 GET /Study_Institue_Scheme_2005.htm -
80 - 65.54.188.68 HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 404 0 2 1814
249 46
2013-12-30 03:40:04 W3SVC8521 NS1 70.84.233.186 GET /forum/www.satpuralawcollege.org
- 80 - 65.54.188.68 HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 404 0 2 1814
250 93
2013-12-30 11:43:44 W3SVC8521 NS1 70.84.233.186 GET /robots.txt - 80 - 65.54.188.68
HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 404 0 2 1814 153 62
2013-12-30 11:43:44 W3SVC8521 NS1 70.84.233.186 GET /downloads.htm - 80 - 65.54.188.68
HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 200 0 0 8004 232 140
2013-12-30 13:27:27 W3SVC8521 NS1 70.84.233.186 GET /robots.txt - 80 - 65.54.188.68
HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 404 0 2 1814 153 62
2013-12-30 13:27:27 W3SVC8521 NS1 70.84.233.186 GET /index.htm - 80 - 65.54.188.68
HTTP/1.0 msnbot/1.0+(+http://search.msn.com/msnbot.htm) - - 200 0 0 16766 219 187
After processing (as per algorithm-1) we obtained following set of transactions (period of 15
minutes) and set of unique URLs for input to algorithm-2.
76
Table 3. User Transactions Detail.
User1 Transaction URLs
65.54.188.68 T1 /robots.txt
/forum/Default.asp
/Study_Institue_Scheme_2013.htm
,, T2 /forum/www.satpuralawcollege.org
,, T3 /robots.txt
/downloads.htm
,, T4 /robots.txt
/index.htm
,, T5 /ResearchProjectPhoto/a2.html
/Media_courses_syllabus_Download_Page.htm
Table 4. Unique URLs Detail.
URLs URL Name
URL1 /robots.txt
URL2 /forum/Default.asp
URL3 /Study_Institue_Scheme_2005.htm
URL4 /forum/www.satpuralawcollege.org
URL5 /downloads.htm
URL6 /index.htm
9. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
77
URL7
/ResearchProjectPhoto/a2.html
URL8 /Media_courses_syllabus_Download_Page.htm
Computation of similarity matrix based upon above 5 transactions and 8 unique URLs
Table 4. Similarity Matrix obtained from Test data set-I (text) file, IP Address-65.54.188.68, Date –30-12-
13
T1 T2 T3 T4 T5
T1 1 0 0.25 0.25 0
T2 0 1 0 0 0
T3 0.25 0 1 0.33 0
T4 0.25 0 0.33 1 0
T5 0 0 0 0 1
Table 5. Test Data Set-I for IP Address 65.54.188.68.
IP Address-65.54.188.68, Date –30-12-13
Total Transactions-5, Number of Unique URLS-8
>= Number of clusters Partition Coefficient (PC)
0.5 5 1.0
0.6 5 1.0
0.7 5 1.0
0.9 5 1.0
5. WEB PERSONALIZATION ARCHITECTURE
The overall architecture of the web personalization model is depicted in figure below. There are
three main components which are emphasized in this figure i.e. Client, Recommendation Engine
and the Server. The client only request for particular URL and in response server provides the
personalized environment i.e. the set of recommended URLs. When a user starts its active session
then the list of URLs gets stored at server’s log file. At the server side, data pre processing is done
using data cleaning, session identification and transaction identification. Structured data output
from the processing of server content data and usage data in a form of transaction file is used to
obtain the transaction cluster using fuzzy logic based similarity relation. This transaction cluster
is then fed to the recommendation engine component. In this component the recommendation set
10. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
is generated using its logic. The result of this component provides the set of recommendations
that will be used to personalize the user interface to the server.
78
Figure 1. Architecture for Web Personalization
6. RECOMMENDATION ALGORITHM
For our approach, we use transaction clusters which are obtained from a fuzzy logic based
method. Transaction clustering will result in a set C = {c1, c2, …, ck} of clusters, where each Ci is
a subset of T, i.e., a set of user transactions. Each cluster represents a group of users with
"common" access patterns. Generally transaction clusters are not an effective way of capturing a
combined view of common user profiles. Each transaction cluster may contain thousands of
transactions involving hundreds of URL. We present a method to personalize the user session by
providing a set of URL links dynamically from the highest match score clusters down to lowest
match score cluster.
Let A = {T1,T2,T3,……Tm} be the set of user transactions and
U = {URL1,URL2,URL3,….URLn} be the set of unique URLs in all the transactions here Ti U
for I =1 to m.For all Ti,Tj A.
Let C1,C2, …., Cm be the transaction clusters obtained from user transaction A with the help of
fuzzy logic based similarity relation.
STEP-1
For the given input i.e user transactions, accessed URLs and clusters, calculate value of URLi in
Cluster Ck. The value of URLi in Cluster Ck is defined as.
Definition 1: The proportion of number of occurrence of URLi across transactions in the cluster
Ck to the total number of transactions in the cluster is known as the Value of URLi in cluster
Ck.[12]
{T|URLi Child(T),T Ck}
Val(URLi,Ck) =
11. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
79
ΣT {Child(T)| T Ck
Where Child(T) is the set of children of transaction T.
The measure of value of an URL in a cluster gives the information on the weight of that URL in
that cluster
Let a new transaction is started ,say Tnew. Assign Tnew itself a cluster,that is, Cnew = {Tnew}
The potential usefulness of Ck with respect to Cnew is a factor defining its interestingness can be
estimated by a utility function such as support.
STEP-2
Definition 2: The support of new a cluster Cnew to cluster Ci is defined as
ΣURL(Val(URL,Cnew) – Val(URL, Ci)
Support(Cnew, Ci) =
{ Child(T) | T Ci Cnew }
Where Child(T) is the set of children of transaction T. Val(URL,Cnew) is the value of URL in the
cluster Cnew and val(URL,Ci) is the value of URL in the cluster Ci.
Once a new user starts a session, at every step, the objective is to match, the half-way user session
with the suitable clusters and provide correct recommendations to the user.
STEP-3
Definition 3:The match score between Ci and Cnew is defined as
Match(Cnew,Ci) = 1- Support(Cnew,Ci)
The preference set of URLs can be made from the highest match score clusters down to lowest
match score cluster. We can also enforce a threshold on the matching score to reduce the
dimensional space.
6.1. Experimental Result (User IP Address (65.54.188.68))
We used the access logs from the Web site of the MCRPV (http://www.mcu.ac.in) for our
experiment. The log data from accesses to the server during a period of 30 days was used for the
purpose of generation of user session profiles. The site includes a number of pages related to
university profile and activities held in university campus. After pre processing we prepared
several data sets for experiment purpose.
1. Log Data File: This is pre processed file taken from MCU web log file for the IP address
65.54.188.68.One transaction contains URL accessed by user within 15 minutes.
Table 6. Transactions with their URLs
Transactions URLs
T1 Url1
T1 Url2
T1 Url3
T2 Url4
T3 Url1
T3 Url5
T4 Url1
T4 Url6
T5 Url7
T5 Url8
12. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
2. Input Clusters:-These clusters are taken as input using clustering approach having fuzzy
80
logic based proximity relation.
Table 7.Clusters with their transaction
Clusters Transactions
C1 T1
C2 T2
C3 T3
C4 T4
C5 T5
3. New Transaction:- From the web log file I extract a transaction which to be assumed a
new transaction for which we have to provide personalized environment. Initially
transaction contain following unique URLs.
{Url1,Url3}
4. Unique URLs:- These are the unique URLs accessed by user within different
transactions. Each URL i.e. URL1,URL2,URL3 etc. represent a URL entry of web log file
given in table 4.
5. Values of URLs in Clusters:-
Value of URLi in cluster Ci is calculated as follows
Table 8. Value of URLi in Cluster Ci
13. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
81
URLs Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster NEW
Url1 0.333333 0.000000 0.500000 0.500000 0.000000 0.500000
Url2 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000
Url3 0.333333 0.000000 0.000000 0.000000 0.000000 0.500000
Url4 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
Url5 0.000000 0.000000 0.500000 0.000000 0.000000 0.000000
Url6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Url7 0.000000 0.000000 0.000000 0.500000 0.500000 0.000000
Url8 0.000000 0.000000 0.000000 0.000000 0.500000 0.000000
6. Support of new cluster to existing Clusters:-
The support calculation is as follows:.
Table 9. Clusters with their Support
Clusters Support
Cluster 1 0.222222
Cluster 2 0.666667
Cluster 3 0.333333
Cluster 4 0.333333
Cluster 5 0.500000
7. Match score to ClusterNEW to Clusteri :-
The Match score calculation is as follows:
Table 10. Clusters with Match Score
Clusters Match Score
Cluster 1 0.777778
Cluster 2 0.333333
Cluster 3 0.666667
14. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
82
Cluster 4 0.666667
Cluster 5 0.500000
Here, highest Match Score corresponds to Cluster1. Hence, user 65.54.188.68 will be provided
Cluster 1 as preference set. i.e. URLs of transactions belongs to cluster1 will be suggested as
preference links for providing the personalized environment.
6.2 Evaluation of Recommendation Effectiveness
In order to evaluate the recommendation effectiveness for given model, We measured the
performance using three different standard measures, namely, precision, coverage, and the F1
measure.[13] These measures are adaptations of the standard measures, precision and recall,
often used in information retrieval.
Table 11. Measures values for Data Sets
Data Set Precision Coverage F1 Measure
Data Set 1
IP Address: 65.54.188.68
0.33 1.0 0.4962
Data Set 2
IP Address: 59.94.43.47
0.5714 1.0 0.7261146
Data Set 3
IP Address:70.84.233.186
0.4705 1.0 0.6394558
Data Set 4
IP Address:61.0.56.2
0.5625 0.9 0.690411
Data Set 5
IP Address:67.180.89.140
0.63636 1.0 0.773
In the context of personalization, precision is used to identify the degree to which accurate
recommendations can be produced by the recommendation engine, while coverage identifies the
ability of the recommendation engine to produce all of the URLs that are likely to be visited by
the user. A low precision in this context will likely result in irritated visitors who are not
interested in the recommendation list, while low coverage will show the inability of the site to
generate relevant recommendations at the moment the user interact with the site. Personalization
systems are often evaluated based on two statistical measures, namely precision and coverage. In
our case of personalization approach, the proposed model is capable of producing
recommendation sets with high coverage and high F1 measure with moderate precision.
7. CONCLUSIONS
In this paper, we proposed a method to construct clusters based on the concept hierarchy and -
similarity. The new model to cluster the web user transactions based on the concept hierarchy of
web usage data and fuzzy proximity relations of user transactions was tested for its effectiveness.
The fuzzy concept ensemble to identify amount of similarities between user transaction clusters.
The user specified value is an important factor based on which the performance of cluster
depends. A domain expert can choose the value of [0,1] based on her/his experience. That is,
the approach makes use of domain knowledge for forming clusters and it is very helpful
especially for e-commerce application that uses these user transaction clusters to make
personalized environment for the Web users.
15. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
Employing Web usage mining techniques for personalization is directly related to the discovery
of effective profiles that can successfully capture relevant user navigational patterns. In this
paper, we also proposed an approach for automatic discovery of user interest domain. A web site
should permit users to make their choices in natural way. This is possible by giving broad
coverage and understandable choices, so that the user can formulate an easy choice that best
suited to their needs. The proposed model is capable of producing recommendation sets with high
coverage and high F1 measure with moderate precision.
REFERENCES
[1] Lin HuaXu, Hong Liu. Web User Clustering Analysis based on KMeans Algorithm, 2010
83
International Conference on Information, Networking and Automation (ICINA).
[2] Abdolreza Mirzaei and Mohammad Rahmati A Novel Hierarchical-Clustering-Combination Scheme
Based on Fuzzy-Similarity Relations, IEEE Transactions On Fuzzy Systems, VOL. 18, NO. 1,
FEBRUARY 2010.
[3] Dimitrios Pierrakos and Georgios Paliouras, Personalizing Web Directories with the Aid of Web
Usage Data, IEEE Transactions On Knowledge And Data Engineering, Vol. 22, NO. 9, SEPTEMBER
2010.
[4] Quan, Thanh Tho, Hui, Siu Cheug and Cao, Tru Hoang. A Fuzzy FCA-based Approach to
Conceptual Clustering for Automatic Generation of Concept Hierarchy on Uncertainty Data, V.
Sn´a¡sel, R. B¡elohl´avek (Eds.): CLA 2004, pp. 1–12, ISBN 80-248-0597-9.V¡SB – Technical
University of Ostrava, Dept. of Computer Science, 2004.
[5] B.Mobasher, R.Colley and J.Srivastava. Automatic personalization based on web usage mining,
Commun. ACM 43,8 (2000) 142-151.
[6] Yun Zhang, Boqin Feng, Yewei Xue, A New Search Results Clustering Algorithm based on Formal
Concept Analysis, Fifth IEEE International Conference on Fuzzy Systems and Knowledge Discovery,
2008
[7] Halkidi , Maria ,Batistakis, yannis ,et.al., On Clustering Validation Techniques, Journal of Intelligent
Information Systems, 17:2/3, 107– 145, 2001 [2] Gizem, Aksahya & Ayese, Ozcan (2009)
Coomunications & Networks, Network Books, ABC Publishers.
[8] Dragos Arotariteia, Sushmita Mitra. “Web mining: a survey in the fuzzy framework” ELSEVIER,
Fuzzy Sets and Systems 148 (2004)
[9] Magdalini Eirinaki And Michalis Vazirgiannis .”Web Mining for Web Personalization” ACM
Transactions on Internet Technology, Vol. 3, No. 1, February 2003
[10] The Personalization Consortium http://www.personalization.org/personalization.html
[11] Web site personalization http://www 128.ibm.com/developerworks/websphere/library
/techarticles/hipods/personalize.html
[12] B.Mobasher, R.Colley and J.Srivastava. “Automatic personalization based on web usage mining”
Commun. ACM 43,8 (2000)
[13] Bamshad Mobasher, Honghua Dai, Tao Luo and Miki Nakagawa “Discovery and Evaluation of
Aggregate Usage Profiles for Web Personalization” School of Computer Science,
Telecommunications, and Information Systems DePaual University, Chicago, Illinois, USA.
http://ai.stanford.edu/users/ronnyk/WEBKDD-DMKD/Mobasher.pd
Authors
Bhawesh Kuamr Thakur received the M.Tech degree in CSE from the M.C. University,
Bhopal, India. He is currently working toward the PhD degree in computer science at the
Department Of Computer Science & Engineering, Integral University, Lucknow, India.
He is working as a faculty member in the Bansal Institute of Engineering & Technology,
Lucknow, an UPTU, Lucknow affiliated Engineering College. His research interests lie
in the areas of Web mining and Web personalization.
16. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.4, No.5, September 2014
84
Prof. Dr. Syed Qamar Abbas completed his Master of Science (MS) from BITS Pilani.
His PhD was on computer-oriented study on Queueing models. He has more than 20
years of teaching and research experience in the field of Computer Science and
Information Technology. Currently, he is Director of Ambalika Institute of Management
and Technology, Lucknow
Prof. Dr. M. Rizwan Beg is M.Tech & Ph.D in Computer Sc. & Engg. Presently he is
working as Controller of Examination in Integral University Luck now, Uttar Pradesh,
India. He is having more than 16 years of experience which includes around 14 years of
teaching experience. His area of expertise is Software Engg., Requirement Engineering,
Software Quality, and Software Project Management. He has published more than 40
Research papers in International Journals & Conferences. Presently 8 research scholars
are pursuing their Ph.D in his supervision.