Relevance feedback is a technique used in information retrieval systems to improve search results based on user feedback. It involves the user marking initial search results as relevant or irrelevant. The system then analyzes these relevance judgments to compute a revised query representation that better matches the user's information need. Specifically, it finds a query vector that maximizes similarity to relevant documents while minimizing similarity to irrelevant documents. This revised query is then used to display a new set of hopefully more precise search results to the user.
Object‐oriented software development is an evolutionary process, and hence the opportunities for integration are abundant. Conceptually, classes are encapsulation of data attributes and their associated functions. Software components are amalgamation of logically and/or physically related classes. A complete software system is also an aggregation of software components. All of these various integration levels warrant contemporary integration techniques. Traditional integration techniques towards the end of software development process do not suffice any more. Integration strategies are needed at class level, component level, sub‐system level, and system levels. Classes require integration of methods. Various types of class interaction mechanisms demand different testing strategies. Integration of classes into components presses its own integration requirements. Finally, the system integration demands different types of integration testing strategies.
Object‐oriented software development is an evolutionary process, and hence the opportunities for integration are abundant. Conceptually, classes are encapsulation of data attributes and their associated functions. Software components are amalgamation of logically and/or physically related classes. A complete software system is also an aggregation of software components. All of these various integration levels warrant contemporary integration techniques. Traditional integration techniques towards the end of software development process do not suffice any more. Integration strategies are needed at class level, component level, sub‐system level, and system levels. Classes require integration of methods. Various types of class interaction mechanisms demand different testing strategies. Integration of classes into components presses its own integration requirements. Finally, the system integration demands different types of integration testing strategies.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
Semantic annotation is done through first representing words and documents in the vector space model using Word2Vec and Doc2Vec implementations, the vectors are taken as features into a classifier, trained and a model is made which can classify a document with ACM classification tree categories, with the help of Wikipedia corpus.
Project Presentation: https://youtu.be/706HJteh1xc
Project Webpage: http://rohitsakala.github.io/semanticAnnotationAcmCategories/
Source Code: https://github.com/rohitsakala/semanticAnnotationAcmCategories
References:
Quoc V. Le, and Tomas Mikolov, ''Distributed Representations of Sentences and Documents ICML", 2014
Web clustering Engines are emerging trend in the field of data retrieval. They organize search results by topic, thus providing a complementary view to the flat ranked list returned by the standard search engines.
Supporting scientific discovery through linkages of literature and dataDon Pellegrino
Poster on "Supporting scientific discovery through linkages of literature and data" by Don Pellegrino, Jean-Claude Bradley and Chaomei Chen. To be presented on May 3, 2011 as part of an event for the Center for Visual and Decision Informatics. More this NSF funded center can be found online at [http://nsf-cvdi.louisiana.edu/]. More on Don Pellegrino's research activities can be found online at [http://www.donpellegrino.com].
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
Semantic annotation is done through first representing words and documents in the vector space model using Word2Vec and Doc2Vec implementations, the vectors are taken as features into a classifier, trained and a model is made which can classify a document with ACM classification tree categories, with the help of Wikipedia corpus.
Project Presentation: https://youtu.be/706HJteh1xc
Project Webpage: http://rohitsakala.github.io/semanticAnnotationAcmCategories/
Source Code: https://github.com/rohitsakala/semanticAnnotationAcmCategories
References:
Quoc V. Le, and Tomas Mikolov, ''Distributed Representations of Sentences and Documents ICML", 2014
Web clustering Engines are emerging trend in the field of data retrieval. They organize search results by topic, thus providing a complementary view to the flat ranked list returned by the standard search engines.
Supporting scientific discovery through linkages of literature and dataDon Pellegrino
Poster on "Supporting scientific discovery through linkages of literature and data" by Don Pellegrino, Jean-Claude Bradley and Chaomei Chen. To be presented on May 3, 2011 as part of an event for the Center for Visual and Decision Informatics. More this NSF funded center can be found online at [http://nsf-cvdi.louisiana.edu/]. More on Don Pellegrino's research activities can be found online at [http://www.donpellegrino.com].
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
PROJECT REPORT
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the
amount of data in web. These recommender systems help users to select products on the web, which is the
most suitable for them. Collaborative filtering-systems collect user’s previous information about an item
such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms,
which are based on different approaches. The most known algorithms are User-based and Item-based
algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms.
The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with
many different similarity indexes with their accuracy and performance. We provide an approach to
determine the best algorithm, which give the most accurate recommendation by using statistical accuracy
metrics. The results are compared the User-based and Item-based algorithms with movie recommendation
data set.
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the amount of data in web. These recommender systems help users to select products on the web, which is the most suitable for them. Collaborative filtering-systems collect user’s previous information about an item such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms, which are based on different approaches. The most known algorithms are User-based and Item-based algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms. The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with many different similarity indexes with their accuracy and performance. We provide an approach to determine the best algorithm, which give the most accurate recommendation by using statistical accuracy metrics. The results are compared the User-based and Item-based algorithms with movie recommendation data set.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Design of an Effective Method for Image RetrievalAM Publications
Image retrieval has been highly concussed in major sectors and systems. The relevance feedback technique is
commonly used for analyzing the images for maintaining secure systems. It covers the wide range of techniques in
supporting user information and improving the facilities for image retrieval. In this paper, relevance feedback with respect
to Self-Organizing Maps (SOM) has been discussed .Comparative results show the high impact of relevance feedback in image retrieval.
This Presentation is on recommended system on question paper predication using machine learning techniques. We did literature survey and implement using same technique.
Information retrival system and PageRank algorithmRupali Bhatnagar
We discuss the various models for Information retrieval system present in literature and discuss them mathematically. We also study the PageRank Algorithm which is used for relevant search.
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...journalBEEI
The Internet has facilitated the growth of recommendation system owing to the ease of sharing customer experiences online. It is a challenging task to summarize and streamline the online textual reviews. In this paper, we propose a new framework called Fuzzy based contextual recommendation system. For classification of customer reviews we extract the information from the reviews based on the context given by users. We use text mining techniques to tag the review and extract context. Then we find out the relationship between the contexts from the ontological database. We incorporate fuzzy based semantic analyzer to find the relationship between the review and the context when they are not found therein. The sentence based classification predicts the relevant reviews, whereas the fuzzy based context method predicts the relevant instances among the relevant reviews. Textual analysis is carried out with the combination of association rules and ontology mining. The relationship between review and their context is compared using the semantic analyzer which is based on the fuzzy rules.
Costomization of recommendation system using collaborative filtering algorith...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An Adaptive Framework for Enhancing Recommendation Using Hybrid Techniqueijcsit
Recommender systems provide useful recommendations to a collection of users for items or products that
might be of concern or interest to them. Several techniques have been proposed for recommendation such
as collaborative filtering, content-based, knowledge-based, and demographic filtering. Each of these
techniques suffers from scalability, data sparsity, and cold-start problems when applied individually
resulting in poor recommendations. This paper proposes an adaptive hybrid recommender system that
combines multiple techniques together to achieve some synergy between them. Collaborative filtering and
demographic techniques are combined in a weighted linear formula. Different experiments applied using
movieLen dataset confirm that the proposed adaptable hybrid framework outperforms the weaknesses
resulted when using traditional recommendation techniques.
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
Slides of the presentation given at ECIR 2016 for the following paper:
Daniel Valcarce, Javier Parapar, Alvaro Barreiro: Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation. ECIR 2016: 602-613
http://dx.doi.org/10.1007/978-3-319-30671-1_44
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL IJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
2. The same concept may be referred to using different
words.
known as synonymy, has an impact on the recall of most
information retrieval systems.
e.g., aircraft to match plane(only airplane)
The methods for tackling this problem split into two major
classes: global methods and local methods.
Global methods are techniques for expanding or reformulating
query terms independent of the query and results returned from
it, so that changes in the query wording will cause the new query
to match other semantically similar terms.
3. Local methods adjust a query relative to the documents
that initially appear to match the query. The basic
methods here are:
Relevance feedback
Pseudo relevance feedback (Blind relevance feedback)
The idea of relevance feedback (RF) is to involve the user
in the retrieval process so as to improve the final result set
4. The basic procedure is:
1. The user issues a (short, simple) query
2. The system returns an initial set of retrieval results
3. The user marks some returned documents as relevant or
non relevant.
4. The system computes a better representation of the
information need based on the user feedback.
5. The system displays a revised set of retrieval results.
5. Fig: The Rocchio optimal query for separating relevant and
non relevant
Find a query vector q, that
maximizes similarity with
relevant documents while
minimizing similarity with non
relevant documents.
Under cosine similarity, optimal query will be the vector difference between the centroids of
the relevant and non relevant documents
6. In IR, we have a user query and partial knowledge of known
relevant and nonrelevant documents. The algorithm proposes
using the modified query vector qm:
where q0 is the original query vector, Dr and Dnr are the set of
known relevant and non relevant documents respectively, and α,
β, and γ are weights attached to each term.
These control the balance between trusting the judged
document set versus the query: if we have a lot of judged
documents, we would like a higher β and γ.
7. a b
Relevance feedback searching over images. (a) The user views the initial query results
for a query of bike, selects the first, third and fourth result in the top row and the fourth
result in the bottom row as relevant, and submits this feedback. (b) The users sees the
revised result set. Precision is greatly improved.