This document describes an intelligent meta search engine that was developed to efficiently retrieve relevant web documents. The meta search engine submits user queries to multiple traditional search engines including Google, Yahoo, Bing and Ask. It then uses a crawler and modified page ranking algorithm to analyze and rank the results from the different search engines. The top results are then generated and displayed to the user, aimed to be more relevant than results from individual search engines. The meta search engine was implemented using technologies like PHP, MySQL and utilizes components like a graphical user interface, query formulator, metacrawler and redundant URL eliminator.
The web has become a resourceful tool for almost all domains today. Search engines prominently use
inverted indexing technique to locate the web pages having the users query. The performance of inverted
index fundamentally depends upon the searching of keyword in the list maintained by search engine. Text
matching is done with the help of string matching algorithm. It is important to any string matching
algorithm to locate quickly the occurrences of the user specified pattern in large text. In this paper a new
string matching algorithm for keyword searching is proposed. The proposed algorithm relies on new
technique based on pattern length and FML (First-Middle-Last) character match. This proposed
algorithm is analysed and implemented. The extensive testing and comparisons are done with BoyerMoore, Naïve, Improved Naïve, Horspool and Zhu Takaoka. The result shows that the proposed
algorithm takes less time than other existing algorithm.
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
BLOSEN: BLOG SEARCH ENGINE BASED ON POST CONCEPT CLUSTERINGijasa
This paper focuses on building a blog search engine which doesn’t focus only on keyword search but
includes extended search capabilities. It also incorporates the blog-post concept clustering which is based
on the category extracted from the blog post semantic content analysis. The proposed approach is titled as
“BloSen (Blog Search Engine)” It involves in extracting the posts from blogs and parsing them to extract
the blog elements and store them as fields in a document format. Inverted index is being built on the fields
of the documents. Search is induced on the index and requested query is processed based on the documents
so far made from blog posts. It currently focuses on Blogger and Wordpress hosted blogs since both these
hosting services are the most popular ones in the blogosphere. The proposed BloSen model is experimented
with a prototype implementation and the results of the experiments with the user’s relevance cumulative
metric value of 95.44% confirms the efficiency of the proposed model.
meta search technology are search engine tools that pass quries on to many other search engines and then summerized all the resultsin one handy interface
The Research on Related Technologies of Web CrawlerIRJESJOURNAL
ABSTRACT: Web crawler is a computer program which can automatically download page or automation scripts, and it is an important part of the search engine. With the rapid growth of Internet, more and more network resources, search engines have been unable to meet people's need for useful information. As an important part of the search engine, web crawler is becoming more and more important role. This article mainly discusses about the working principle, classification of web crawler, etc were related in this paper. And then discusses the research and the subject of the search engine important topic web crawler.
The web has become a resourceful tool for almost all domains today. Search engines prominently use
inverted indexing technique to locate the web pages having the users query. The performance of inverted
index fundamentally depends upon the searching of keyword in the list maintained by search engine. Text
matching is done with the help of string matching algorithm. It is important to any string matching
algorithm to locate quickly the occurrences of the user specified pattern in large text. In this paper a new
string matching algorithm for keyword searching is proposed. The proposed algorithm relies on new
technique based on pattern length and FML (First-Middle-Last) character match. This proposed
algorithm is analysed and implemented. The extensive testing and comparisons are done with BoyerMoore, Naïve, Improved Naïve, Horspool and Zhu Takaoka. The result shows that the proposed
algorithm takes less time than other existing algorithm.
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
BLOSEN: BLOG SEARCH ENGINE BASED ON POST CONCEPT CLUSTERINGijasa
This paper focuses on building a blog search engine which doesn’t focus only on keyword search but
includes extended search capabilities. It also incorporates the blog-post concept clustering which is based
on the category extracted from the blog post semantic content analysis. The proposed approach is titled as
“BloSen (Blog Search Engine)” It involves in extracting the posts from blogs and parsing them to extract
the blog elements and store them as fields in a document format. Inverted index is being built on the fields
of the documents. Search is induced on the index and requested query is processed based on the documents
so far made from blog posts. It currently focuses on Blogger and Wordpress hosted blogs since both these
hosting services are the most popular ones in the blogosphere. The proposed BloSen model is experimented
with a prototype implementation and the results of the experiments with the user’s relevance cumulative
metric value of 95.44% confirms the efficiency of the proposed model.
meta search technology are search engine tools that pass quries on to many other search engines and then summerized all the resultsin one handy interface
The Research on Related Technologies of Web CrawlerIRJESJOURNAL
ABSTRACT: Web crawler is a computer program which can automatically download page or automation scripts, and it is an important part of the search engine. With the rapid growth of Internet, more and more network resources, search engines have been unable to meet people's need for useful information. As an important part of the search engine, web crawler is becoming more and more important role. This article mainly discusses about the working principle, classification of web crawler, etc were related in this paper. And then discusses the research and the subject of the search engine important topic web crawler.
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
Focused web crawling using named entity recognition for narrow domainseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
Personalized search refers to search experiences that are tailored specifically to an individual's interests by incorporating information about the individual beyond specific query provided. Especially people working in a software development organization (analysts, developers, testers, maintenance team members), find it increasingly difficult to get relevant results to their searches. We propose methods to personalize searches by resolving the ambiguity of query terms, and increase the relevance of search results in order to match the user’s interests. Difficulty in web searches has given rise to the need for development of personalized search engines. Personalized search engines create user profiles to capture the users’ personal preferences and as such identify the actual goal of the input query. Since users are usually reluctant to explicitly provide their preferences due to the extra manual effort involved, the search engine faces the entire burden of predicting the user’s preferences and intentions behind a query in order to yield more relevant search results. In this paper we define a QUEST to be the objective of user’s search; here we combine quest level analysis of user’s search logs and semantic analysis of the user’s query in order to personalize user’s search results. Most personalization methods focus on the creation of one single profile for a user and apply the same profile to all of the user’s queries. Hence we propose a personalized search for a software development organization by creating QUEST or domain based profile rather than individual user based profile.
A Two Stage Crawler on Web Search using Site Ranker for Adaptive LearningIJMTST Journal
The cyber world is a verity collection of billions of web pages containing terabytes of information arranged in thousands of servers using HTML. The size of this amassment itself is a difficultto retrieving required and relevant information. This made search engines a paramount part of our lives. Search engines strive to retrieve information as useful as possible. One of the building blocks of search engines is the Web Crawler. The main idea is to propose a an efficient harvesting deep-web interfaces using site ranker and adoptive learning methodology framework, concretely two keenly intellective Crawlers, for efficient accumulating deep web interfaces. Within the first stage, A Smart WebCrawler performs site-predicated sorting out centre pages with the support of search engines, evading visiting an oversized variety of pages. To realize supplemental correct results for a targeted crawl, keenly belong to the Crawler, ranks websites to inductively authorize prodigiously relevant ones for a given topic. Within the second stage, smart Crawler, achieves quick in website looking by excavating most useful links with associate degree accommodative link -ranking.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
HIGWGET-A Model for Crawling Secure Hidden WebPagesijdkp
The conventional search engines existing over the internet are active in searching the appropriate
information. The search engine gets few constraints similar to attainment the information seeked from a
different sources. The web crawlers are intended towards a exact lane of the web.Web Crawlers are limited
in moving towards a different path as they are protected or at times limited because of the apprehension of
threats. It is possible to make a web crawler,which will have the ability of penetrating from side to side the
paths of the web, not reachable by the usual web crawlers, so as to get a improved answer in terms of
infoemation, time and relevancy for the given search query. The proposed web crawler is designed to
attend Hyper Text Transfer Protocol Secure (HTTPS) websites including the web pages,which requires
verification to view and index.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
The internet is a vast collection of billions of web pages containing terabytes of information
arranged in thousands of servers using HTML. The size of this collection itself is a formidable obstacle in
retrieving necessary and relevant information. This made search engines an important part of our lives. Search
engines strive to retrieve information as relevant as possible. One of the building blocks of search engines is the
Web Crawler. We tend to propose a two - stage framework, specifically two smart Crawler, for efficient
gathering deep net interfaces. Within the first stage, smart Crawler, performs site-based sorting out centre
pages with the assistance of search engines, avoiding visiting an oversized variety of pages. To realize
additional correct results for a targeted crawl, smart Crawler, ranks websites to order extremely relevant ones
for a given topic. Within the second stage, smart Crawler, achieves quick in – site looking by excavating most
relevant links with associate degree accommodative link -ranking
Mechanism of the Reaction of Plasma Albumin with Formaldehyde in Ethanol - Wa...IOSR Journals
The Spectrophotometric determination of the acid dissociation/ionisation constant (pKa) of plasma albumin-formaldehyde adduct in both water solution and Ethanol solutions was carried out in this study. The pKa values obtained in both media were used to establish the Bronsted-linear type constants from plots of pKa against logarithm of second order rate constants obtained at varying pHs in the study. The result of the pKa values obtained in both water solution and ethanol-water mixtures were found to be in the range of 5.0 - 8.0. This pointed to the fact that only lysine residue with pKa value 8.3 that might have possibly reacted with formaldehyde in this reaction of all the known amino acid residues in plasma albumin. The corresponding Brønsted-type plots proportionality constants (β) for the reaction in water and ethanol-water mixtures were found to be β = 0.059 and 0.0057 respectively. The reaction mechanisms that have low values for proportionality constants α or β are considered to have a transition state closely resembling the reactant with little proton transfer (Cox et al, 1988). Thus, one would suggest that the cross-linking of formaldehyde with plasma albumin in water and ethanol-water mixtures proceeds through little proton transfer
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
Focused web crawling using named entity recognition for narrow domainseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
Personalized search refers to search experiences that are tailored specifically to an individual's interests by incorporating information about the individual beyond specific query provided. Especially people working in a software development organization (analysts, developers, testers, maintenance team members), find it increasingly difficult to get relevant results to their searches. We propose methods to personalize searches by resolving the ambiguity of query terms, and increase the relevance of search results in order to match the user’s interests. Difficulty in web searches has given rise to the need for development of personalized search engines. Personalized search engines create user profiles to capture the users’ personal preferences and as such identify the actual goal of the input query. Since users are usually reluctant to explicitly provide their preferences due to the extra manual effort involved, the search engine faces the entire burden of predicting the user’s preferences and intentions behind a query in order to yield more relevant search results. In this paper we define a QUEST to be the objective of user’s search; here we combine quest level analysis of user’s search logs and semantic analysis of the user’s query in order to personalize user’s search results. Most personalization methods focus on the creation of one single profile for a user and apply the same profile to all of the user’s queries. Hence we propose a personalized search for a software development organization by creating QUEST or domain based profile rather than individual user based profile.
A Two Stage Crawler on Web Search using Site Ranker for Adaptive LearningIJMTST Journal
The cyber world is a verity collection of billions of web pages containing terabytes of information arranged in thousands of servers using HTML. The size of this amassment itself is a difficultto retrieving required and relevant information. This made search engines a paramount part of our lives. Search engines strive to retrieve information as useful as possible. One of the building blocks of search engines is the Web Crawler. The main idea is to propose a an efficient harvesting deep-web interfaces using site ranker and adoptive learning methodology framework, concretely two keenly intellective Crawlers, for efficient accumulating deep web interfaces. Within the first stage, A Smart WebCrawler performs site-predicated sorting out centre pages with the support of search engines, evading visiting an oversized variety of pages. To realize supplemental correct results for a targeted crawl, keenly belong to the Crawler, ranks websites to inductively authorize prodigiously relevant ones for a given topic. Within the second stage, smart Crawler, achieves quick in website looking by excavating most useful links with associate degree accommodative link -ranking.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
HIGWGET-A Model for Crawling Secure Hidden WebPagesijdkp
The conventional search engines existing over the internet are active in searching the appropriate
information. The search engine gets few constraints similar to attainment the information seeked from a
different sources. The web crawlers are intended towards a exact lane of the web.Web Crawlers are limited
in moving towards a different path as they are protected or at times limited because of the apprehension of
threats. It is possible to make a web crawler,which will have the ability of penetrating from side to side the
paths of the web, not reachable by the usual web crawlers, so as to get a improved answer in terms of
infoemation, time and relevancy for the given search query. The proposed web crawler is designed to
attend Hyper Text Transfer Protocol Secure (HTTPS) websites including the web pages,which requires
verification to view and index.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
The internet is a vast collection of billions of web pages containing terabytes of information
arranged in thousands of servers using HTML. The size of this collection itself is a formidable obstacle in
retrieving necessary and relevant information. This made search engines an important part of our lives. Search
engines strive to retrieve information as relevant as possible. One of the building blocks of search engines is the
Web Crawler. We tend to propose a two - stage framework, specifically two smart Crawler, for efficient
gathering deep net interfaces. Within the first stage, smart Crawler, performs site-based sorting out centre
pages with the assistance of search engines, avoiding visiting an oversized variety of pages. To realize
additional correct results for a targeted crawl, smart Crawler, ranks websites to order extremely relevant ones
for a given topic. Within the second stage, smart Crawler, achieves quick in – site looking by excavating most
relevant links with associate degree accommodative link -ranking
Mechanism of the Reaction of Plasma Albumin with Formaldehyde in Ethanol - Wa...IOSR Journals
The Spectrophotometric determination of the acid dissociation/ionisation constant (pKa) of plasma albumin-formaldehyde adduct in both water solution and Ethanol solutions was carried out in this study. The pKa values obtained in both media were used to establish the Bronsted-linear type constants from plots of pKa against logarithm of second order rate constants obtained at varying pHs in the study. The result of the pKa values obtained in both water solution and ethanol-water mixtures were found to be in the range of 5.0 - 8.0. This pointed to the fact that only lysine residue with pKa value 8.3 that might have possibly reacted with formaldehyde in this reaction of all the known amino acid residues in plasma albumin. The corresponding Brønsted-type plots proportionality constants (β) for the reaction in water and ethanol-water mixtures were found to be β = 0.059 and 0.0057 respectively. The reaction mechanisms that have low values for proportionality constants α or β are considered to have a transition state closely resembling the reactant with little proton transfer (Cox et al, 1988). Thus, one would suggest that the cross-linking of formaldehyde with plasma albumin in water and ethanol-water mixtures proceeds through little proton transfer
On The Use of Transportation Techniques to Determine the Cost of Transporting...IOSR Journals
This paper aims at identifying an effective and appropriate method of calculating the cost of transporting goods from several supply centers to several demand centers out of many available methods. Transportation algorithms of North-West corner method (NWCM), Least Cost Method (LCM), Vogel’s Approximation Method (VAM) and Optimality Test were carried out to estimate the cost of transporting produced newspaper from production center to ware-houses using Statistical software called TORA. The results revealed that: NWCM = 36,689,050.00, LCM = 55,250,034.00, VAM = 29,097,700.00 and Optimal solution = 19,566,332.00. It was discovered that Vogel’s Approximation method gives the transportation cost that closer to optimal solution. Also, the study revealed that a production center should be created at northern part of Nigeria to replace the dummy supply center used in the analysis, so as to make production capacity equal to requirement.
A Study On Psychological Variables On Women Sports Participation Levels In Un...IOSR Journals
Abstract: This research is to examine whether the psychological variable on women have any impact on
women’s sports participation between Anna and Pondicherry Universities at different levels. The subject were
(was) randomly selected from Anna University in Tamil Nadu State (300 Players) and Pondicherry University
(Central University) in Puducherry State (300 Players), India. The selected subjects were with a brief
questionnaire, to find out their level of sports participation, anxiety, aggression and stress. Data obtained were
subjected to find out statistical significance among the means using 3 (levels – Zonal, Inter – Zonal and
University participations) X 2 (University – Anna and Pondicherry) Factorial analysis. The result proved that
there were significant differences in selected psychological factors of the different level of women sports
players. There was no significant difference in stress at different levels. There was no significant difference
between the Anna and Pondicherry in anxiety, aggression and stress of the women sports players. It was
concluded that anxiety, aggression, and stress play a vital role in the participation level of women players.
Key Worlds: Different Level of Players, Women Sports Participation, University, Anxiety, Aggression and
Stress
Link Stability and Energy Aware routing Protocol for Mobile Adhoc NetworkIOSR Journals
Abstract: MOBILE ad hoc networks (MANETs) have more popularity among mobile network devices and
wireless communication technologies. A MANET is multihop mobile wireless network that have neither a fixed
infrastructure nor a central server. Every node in a MANET will act as a router, and also communicates with
each other. The mobility constraints in mobile nodes will lead to problems in link stability. Energy saving, path
duration and stability will be two major efforts and to satisfy them can be difficult task. A self node which is
present in the network may also consume little energy during the transmission. This proposed approach tries to
account for link stability and for minimum drain rate energy consumption. In order to verify the correctness of
the proposed solution a objective optimization formulation has been designed and a novel routing protocol
called Link-Stability and Energy aware Routing protocols is proposed. This novel routing scheme has been
compared with other two protocols: PERRA and GPSR. The protocol performance has been evaluated in terms
of Data Packet Delivery Ratio, Normalized Control Overhead, Link duration, Nodes lifetime, and Average
energy consumption.
Keywords-component; Energy Consumption, Link Stability, Routing, Self node
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
Information is overloaded in the Internet due to the unstable growth of information and it makes information search as complicate process. Recommendation System (RS) is the tool and largely used nowadays in many areas to generate interest items to users. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Web recommendation systems assist the users to get the exact information and facilitate the information search easier. Web recommendation is one of the techniques of web personalization, which recommends web pages or items to the user based on the previous browsing history. But the tremendous growth in the amount of the available information and the number of visitors to web sites in recent years places some key challenges for recommender system. The recent recommender systems stuck with producing high quality recommendation with large information, resulting unwanted item instead of targeted item or product, and performing many recommendations per second for millions of user and items. To avoid these challenges a new recommender system technologies are needed that can quickly produce high quality recommendation, even for a very large scale problems. To address these issues we use two recommender system process using fuzzy clustering and collaborative filtering algorithms. Fuzzy clustering is used to predict the items or product that will be accessed in the future based on the previous action of user browsers behavior. Collaborative filtering recommendation process is used to produce the user expects result from the result of fuzzy clustering and collection of Web Database data items. Using this new recommendation system, it results the user expected product or item with minimum time. This system reduces the result of unrelated and unwanted item to user and provides the results with user interested domain.
Enhance Crawler For Efficiently Harvesting Deep Web Interfacesrahulmonikasharma
Scenario in web is varying quickly and size of web resources is rising, efficiency has become a challenging problem for crawling such data. The hidden web content is the data that cannot be indexed by search engines as they always stay behind searchable web interfaces. The proposed system purposes to develop a framework for focused crawler for efficient gathering hidden web interfaces. Firstly Crawler performs site-based searching for getting center pages with the help of web search tools to avoid from visiting additional number of pages. To get more specific results for a focused crawler, projected crawler ranks websites by giving high priority to more related ones for a given search. Crawler accomplishes fast in-site searching via watching for more relevant links with an adaptive link ranking. Here we have incorporated spell checker for giving correct input and apply reverse searching with incremental site prioritizing for wide-ranging coverage of hidden web sites.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
G017254554
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 2, Ver. V (Mar – Apr. 2015), PP 45-54
www.iosrjournals.org
DOI: 10.9790/0661-17254554 www.iosrjournals.org 45 | Page
An Intelligent Meta Search Engine for Efficient Web Document
Retrieval
Keerthana.I.P 1
, Aby Abahai.T 2
1(
Dept. of CSE, Mar Athanasius College of Engineering, Kothamangalam , Kerala, India)
2(
Dept. of CSE, Assistant Professor, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India)
Abstract: In daily use of internet, when searching information we face lots of difficulty due to the rapid growth
of Information Resources. This is because of the fact that a single search engine cannot index the entire web of
resources. A Meta Search Engine is a solution for this, which submits the query to many other search engines
and returns summary of the results. Therefore, the search results receive are an aggregate result of multiple
searches. This strategy gives our search a boarder scope than searching a single search engine, but the results
are not always better. Because the Meta Search Engine must use its own algorithm to choose the best results
from multiple search engines. In this paper we proposed a new Meta Search Engine to overcome these
drawbacks. It uses a new page rank algorithm called modified ranking for ranking and optimizing the search
results in an efficient way. It is a two phase ranking algorithm used for ordering the web pages based on their
relevance and popularity. This Meta Search Engine is developed in such a way that it will produce more
efficient results than traditional search engines.
Keywords: Metacrawler, Meta Search Engine, Ranking, Search Engine, Web Crawler
I. Introduction
The World Wide Web (WWW) is a collection of information linked together from all over the world. It
is based on hypertext, which means that when the user is navigating on the ocean of information he can pick up
an interesting word or expression within a text and request for more information about it. This can not apply to
all words in a text, but only to those who have been properly designated as such by the producer of the
information and which are displayed on screen example as underlined. The web is highly dynamic in nature i.e.,
new web pages are being added, some are removed and some others are modified. So it is very difficult to find
relevant information from it. Search engines are used to extract valuable Information from the internet. A search
engine [1] is a program that searches for and identifies items in a database that correspond to keywords or
characters specified by the user, especially for finding particular sites on the WWW.
On the Internet, a search engine is a coordinated set of programs that includes: A spider (also called a
"crawler" or a "bot") that goes to every page or representative pages on every web site that wants to be
searchable and reads it, uses hypertext links on each page to discover and read a site's other pages. Search
engine includes program that creates a huge index (sometimes called a "catalog") from the pages that have been
read. It also contains a program that receives our search request, compares them to the entries in the index and
then returns results.
A Meta Search Engine [2] is a search engine that queries other search engines and then combines the
results that are received from all. Here, the user is not using just one search engine but a combination of many
search engines at once to optimize web searching. Meta Search Engines don't create their own database of
information. They search the other search engine‟s databases. The major advantage of a Meta Search Engine is
that it allows the user to search several search engines simultaneously. So there is no need to search in each
search engine separately.
In this paper a new Meta Search Engine is proposed, which is capable of producing relevant results for
particular query string. It will display the results from different search engines as soon as those found by a
crawler. Finally it will rank each and every web page based on its popularity and relevance. For that, this paper
proposes a new ranking algorithm. This paper is organized as follows: in section II, the works related to search
engines is discussed. Section III describes the Meta Search Engine implementation. In section IV the
performance analysis of the proposed Meta Search Engine is discussed. Section V describes the advantages of
the proposed system. Finally the section VI gives the conclusion for the work.
II. Related Works
2.1 Search Engines
A search engine is designed to search WWW for information about the given search query and returns
links to various documents in which the search query‟s keywords are found. There are mainly three types of
search engines-Web Search Engines, Meta Search Engines, Vertical Search Engines [3]. A web search engine
2. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 46 | Page
searches for information in WWW. A Vertical Search provides the user with results for queries on a particular
domain. Meta Search Engines send the user‟s search queries to various search engines and combine the search
results. It is important for search engine to maintain a high quality websites. A database is to be made in which
following attributes should be there like length of title, keywords in title, number of back-links, in-links, length
of the URL, ranking. WEKA and Tanagra a data mining tools can be used to extract meaningful knowledge
from this large set of data [4].
The search engines used in this paper for the Meta Search Engine construction are Google, Yahoo, Ask
and Bing [5]. Google is a web search engine owned by Google Inc. it is the most widely used search engine on
the Web. The main purpose of Google Search is to search for text in WebPages, as opposed to other data, such
as with Image Search. Google was developed by Larry Page and Sergey Brin in 1997. Yahoo is a web search
engine, owned by Yahoo! Inc. the 2nd largest search engine on the web by query volume. It was founded in
January 1994 by Jerry Yang and David Filo. Ask is a Search Engine, which is also known as Ask Jeeves. It is
basically designed to answer the user‟s Queries in the mode of Q&A and is proved to be a focused search
engine. Ask was developed in 1996 by Garrett Gruener and David Warthen in Berkeley, California. Bing is a
Search Engine, which was formerly known as Live Search and MSN Search. It is a web search engine that was
owned by Microsoft. It went fully online on 2009 June 3, with a preview version released on June 1, 2009.
2.2 Web Crawlers
All search engines internally use web crawlers to keep the copies of data a fresh. A search engine is
divided into different modules. Among those modules crawler module is the module on which search engine
relies the most because it helps to provide the best possible results to the search engine. Crawlers or spiders are
small programs that browse the web on the search engine‟s behalf, similarly how a human user would follow
links to reach different pages. Google crawlers run on a distributed network of thousands of low-cost computers
and can therefore carry out fast parallel processing. This is the way Google returns results within fraction of
seconds. It has three main components [6]: a frontier which stores the list of URL‟s to visit, a Page Downloader
which download pages from WWW and Web Repository receives web pages from a crawler and stores it in the
database. The repository stores only standard HTML pages. The working of a web crawler is as follows [6]:
1. Initializing the seed URL or URLs
2. Adding it to the frontier
3. Selecting the URL from the frontier
4. Fetching the web-page corresponding to that URLs
5. Parsing the retrieved page to extract the URLs
6. Adding all the unvisited links to the frontier
7. Again start with step 2 and repeat till the frontier is empty.
Based on different strategies employed in web crawling there are four types of web crawlers: focused
crawler, distributed crawler, incremental crawler, and parallel crawler [6]. A focused crawler is the Web crawler
that tries to download pages which are related to each other. It collects documents which are specific and
relevant to the given topic. An incremental crawler is a traditional crawler which refreshes its collection
periodically and replaces the old documents with the newly downloaded documents. It also exchanges less
important pages by new and more important pages. A distributed Crawler uses a distributed computing
technique. Many crawlers are working to distribute in the process of web crawling to get most coverage of the
web. A central server manages the communication and synchronization of the nodes, as it is geographically
distributed. In the case of parallel crawler multiple crawlers are often run in parallel, which are termed as
Parallel crawlers. A parallel crawler consists of multiple crawling Processes called as C-procs which can run on
network of workstations. The Parallel crawlers depend on Page freshness and Page Selection. A Parallel crawler
can be on local network or be distributed at geographically distant locations.
Popular examples of web crawlers are Yahoo Slurp was the name of the Yahoo search crawler,
Googlebot is the name of the Google search crawler, Bingbot is the name of Microsoft's Bing web crawler, etc.
2.3 Web Page Elements
The main web page elements that will be extracted by a crawler are given below [7].
Title Tag: Each web page has a unique title tag, which helps search engines to know how the page is distinct
from the others on the site. Titles can be both short and informative. If the title is too long, then search engines
will show only a portion of it in the search result. Page titles are an important aspect of search engine
optimization.
Description Meta Tag: A page's description meta tag gives search engines a summary of what the page is
about. A page's title may be a few words or a phrase, but a page's description meta tag might be a sentence or
two or a short paragraph. Fig.1 shows an example for title and description tags.
3. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 47 | Page
Fig.1: Title and Description Tags
URL (Uniform Resource Locator): The URL to a document is displayed as part of a search result in search
engines, below the document's title and snippet. Search engines are good for crawling all types of URL
structures, even if they are quite complex, but spending the time to make the URLs as simple as possible for
both users and search engines can help. Fig.2 shows a sample URL.
Fig.2: URL
Anchor Text: Anchor text (See Fig.3) is the clickable text that users will see as a result of a link, and is placed
within the anchor tag <a href="..."></a>.This text tells search engines something about the page you're linking
to. It may be internal (pointing to other pages on the same site) or external (leading to content on other sites).
Fig.3: Anchor Text
2.4 Searching Techniques & Strategies
Several searching techniques and strategies are essential for narrowing search results and will assist
you with locating valuable resources via the Internet. These techniques and strategies include wildcard, plus and
minus signs, quotation marks and brackets, pipe symbol, Boolean operators, and nesting [8].
Wildcard: A wildcard is a special character that can be added to a phrase while searching and the search engine
or subject directory looks for all possible endings. The results will provide all possible documents in their
database that have those letters.
Plus and Minus Signs: The plus sign used before a keyword or phrase should retrieve results that include that
specific keyword or phrase. The minus sign used before a keyword or phrase should retrieve results that exclude
that specific keyword or phrase.
Quotation Marks and Brackets: Quotation marks and brackets assist with narrowing the search results from
the search tools. When they are used, the search engine will only retrieve documents that have those key terms
appearing together.
4. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 48 | Page
Pipe Symbol (|): The pipe (|) symbol is used for narrowing down results within a broad category.
Boolean Operators: Working of AND operator is similar to the plus sign and the NOT operator is similar to the
minus sign. The OR operator tells the search engine to retrieve one term or the other.
Near: The NEAR phrase indicates to the search tools that those terms must be located within a certain number
of words. The results may vary depending on the search tool. To illustrate, some search tools may try to locate
the terms within 2, 10 or 25 words of each other. The command to use is NEAR/#.
Nesting: Nesting allows the user to perform complex search. Here the parentheses are used to group the key
words and Boolean operators together.
2.5 Challenges Faced By Search Engines
Search engines are the major tools for retrieving information from the web. But the list of retrieved
documents contains a high percentage of redundant and near document result. So there is the need to improve
the performance of search results. Most of the current search engine use data filtering algorithm which will
eliminate near duplicate and duplicate documents to save the users time and effort [9]. The identification of
near-duplicate pairs or similar documents in a large collection is a major problem with many applications.
The problem with most of the search engines are [5], most people don‟t take advantage of the offered
tools, but instead, they just type a few keywords for a query. Thus, search engines have to find a way how to
improve basic queries so that they can provide users successful research at the same time. Using a search
engine, an index is searched rather than the entire Web. An index is created and maintained by automated web
searching by spiders. Plain search engines prove to be very effective for certain types of search tasks, such as
retrieving of a particular URL and transactional queries. However, search engines can‟t address informational
queries, where the user has information that needs to be satisfied. A Meta Search Engine overcomes the above
by virtue of sending the user‟s query to a set of search engines, collects the data from them displays the relevant
records by using a clustering algorithm [10].
III. Implementation
The new Meta Search Engine has the following components: Meta Search Engine GUI, Query
Formulator, Metacrawler, Redundant URL Eliminator, Modified Ranking and Result Generation. The new
architecture is built by using four search engines Google, Yahoo, Bing and Ask. The proposed architecture is
shown in Fig.4. It shows the interaction between various components of the Meta Search Engine.
Fig.4: Meta Search Engine Architecture
5. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 49 | Page
3.1 Meta Search Engine GUI
In the proposed architecture, user gives the keyword to be searched through the Meta Search Engine
Graphical User Interface (GUI). Here user has the choice to select the search engines he wants to search
(See Fig.5). The query is the given to all selected search engines. The results corresponding to the query are then
stored into the web database. The new Meta Search Engine retrieves important results from four different
traditional search engines available. The retrieved results are then extracted by using the crawler. We have
developed this application using Adobe Dreamweaver, using PHP for front end designing and My SQL Server
as back end. We have used XAMPP Apache distribution to link front end and back end in our application.
Fig.5: Meta Search Engine GUI
3.2 Query Formulator
As different search engines follow different styles for the representation of the query search string,
different search query strings are generated for a given user input. The query strings are then sent to various
search engines to extract desired results from the search engines. The query string formats for each of the search
engines are given below.
For Google: http://www.google.co.in/search?q=sword
For Yahoo: http://in.search.yahoo.com/search;_ylt=?p=sword
For Bing: http://www.bing.com/search?q=sword
For Ask: http://www.ask.com/web?q=sword
Here sword is the user input. For example if we want to search “data mining” then the query string
corresponding to this for Google is:
http://www.google.co.in/search?q=data+mining.
In the new Meta Search Engine we are focusing in crawling the first result page suggested by every search
engine. Because important and highly relevant results are located in the first page of the results and majority of
the users are focusing on those pages.
3.3 Metacrawler
The web crawler used in the new Meta Search Engine is termed as metacrawler. Because it helps to
crawl web pages through multiple search engines. We developed this using PHP curl. Curl is a library that helps
us to make HTTP requests in PHP. Using curl we can set the various parameters for HTTP requests like user
agent, follow location details, HTTP version, header details, certificate verification details etc. It will help to
handle authentication. The metacrawler will crawl through each and every search engine to find the web pages
relevant to a query string. When crawling a web page it will remove scripts, style sheets and other information
from the web page source code. It will only extract html code. Then it will convert the HTML code into DOM
(Document Object Model) structure.
Compared with traditional plain text, a Web page has more structure. Web pages are also regarded as
semi-structured data, which are represented as a DOM structure. The DOM structure of a Web page is a tree
structure, in which every HTML tag in the page represents a node in the DOM tree. The Web page can be
segmented by some predefined structural tags. Useful tags include <P> (paragraph), <TABLE> (table), <UL>
(list), <H1>-<H6> (heading), etc. Thus the DOM structure can be used to facilitate information extraction. From
the DOM structure it will extract the document or web page URL, title and description. The extracted details are
the stored to the web database.
6. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 50 | Page
3.4 Redundant URL Eliminator
Since the metacrawler combines the results from different search engines, there is a chance for the link
suggested by one search engine will also suggested by another search engines. Here the redundant URL
eliminator component will check whether an extracted link is a duplicate one by looking if it is already in the
web database URL list. If the URL is already present in the web database it is duplicate, so it will not add into
the result page.
As soon as results are available from one search engine, they are displayed on the screen to view. The
results are therefore that much faster, and while you are down through the list of hits, other results are being
added to the page even as you view. The new results found will be added to result page only after checking if it
is duplicate or not. To check this for each new link suggested, the system will find its domain name. If this
domain name matches with the domain name of another link in the same result set then the previous one will
kept and the later one will avoid. So here duplicate link elimination is not only on the basis of URL similarities
but also on the basis of domain a name matches. It will help to optimize the results.
3.5 Modified Ranking
Ranking is used for ranking or ordering the web pages based on their relevance. This assigns a score to
each link in the result set. The system is based on two phase ranking. First phase is based on the domain name of
each web page. Second phase is based on outgoing and incoming link counts of each web page.
Phase 1: First the system will find the domain name of each link in the result set. Then it will calculate the
count of each domain name in the result set. Then arrange the links in decreasing order of their domain name
counts in the result set.
Phase 2: In this phase a web page is assigned a rank based on its domain name count and its popularity. A web
page that has more incoming and outgoing links is considered as more popular and relevant. So that phase 1
follows a modified ranking which calculate a rank for each web page based on this.
Algorithm: Modified Ranking
Input: Result Set R which contains links from phase 1
Output: Ordered list of Result Set R
1. Start
2. Initialise the score of all links to their domain name count in the result set R
3. for each link Li in result set R
4. Crawl web page corresponding to link Li
5. Find all its outgoing links and store them into list OL
6. for each link Lj in OL
7. if Lj is in R then
8. Increment score of Li by 1
9. Increment score of Lj by 1
10. end if
11. end for
12. end for
13. Order links in R in the decreasing order of their scores
14. return R to result page
15. End
The above algorithm calculates the score of the web pages based on their outgoing links. The incoming
links of a web page are included in the scoring when outgoing links of another web page is calculated. For
example if link to the web page A is included in web page B, then A is the outgoing link of B and B is the
incoming link of A.
3.6 Results Generation
Alternatively the ordering process is performed and the ranked list is displayed as a single list with
most relevant links on the top of the result window. The result will contain the extracted title of the web page,
URL, and description. The relevancy of the pages is calculated based on input query and the web pages are
ranked using the modified ranking algorithm. The pages after ranking are displayed to the user. The top links are
the links with highest rank. When moving from top to bottom the rank of web pages is decreasing. Fig.6 shows
the ranked results for the search query „Sachin‟. It gives the details of web pages suggested by each of the four
search engines Google, Yahoo, Bing and Ask. The results shows the search results merged from the first result
page given by each of the search engines. The Meta Search Engine also supports searches other than web
search. Fig.7 shows the image search results, Fig.8 shows the video search results and Fig.9 shows the latest
news search results.
7. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 51 | Page
Fig.6: Web Search Results
Fig.7: Image Search Results
8. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 52 | Page
Fig.8: Video Search Results
Fig.9: Latest News Search Results
9. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 53 | Page
IV. Results
The two basic measures for checking the effectiveness of the search engines in document retrieval are
precision and recall.
Precision: This is the percentage of retrieved documents that are in fact relevant to the query. It is defined as:
Precision =
Relevant ∩ Retrieved
| Retrieved |
Recall: This is the percentage of documents that are relevant to the given query and were, in fact, retrieved. It is
defined as:
Recall=
Relevant ∩ Retrieved
| Relevant |
The total number of relevant documents retrieved by executing ten different queries on the traditional
search engines and the new Meta Search Engine, with the average precision value is given in Table 1. It
represents the performance of the proposed Meta Search Engine in terms of precision.
Table 1: Relevant Documents Retrieved
Query
Number
Search Engines
Google Yahoo Ask Bing Meta Search Engine
1 7 5 5 6 13
2 9 7 6 7 15
3 8 7 5 6 16
4 5 5 5 4 12
5 7 6 5 5 15
6 8 6 6 6 11
7 7 6 5 6 14
8 7 7 6 6 16
9 6 6 4 5 14
10 6 5 5 6 12
Average
Precision
0.93 0.88 0.83 0.87 0.96
The Fig.10 shows the comparison between the precision of different search engines. From the results
we can see that the new Meta Search Engine has a higher precision value when compared to the search engines
Google, Yahoo, Ask and Bing. So we conclude that the Meta Search Engine is more efficient in relevant
document retrieval and its performance is better than the performance of individual search engines.
Fig.10: Precision
10. An Intelligent Meta Search Engine for Efficient Web Document Retrieval
DOI: 10.9790/0661-17254554 www.iosrjournals.org 54 | Page
V. Advantages
The proposed Meta Search Engine has many advantages compared to other search engines. It extends
the search coverage of the topic and allows more web pages to be found. It can utilize some of the specific
functionality of the search engines that they employ. It searches all the major search engines at once so it
reduces the work of users from having to separately type in different search engines to look for resources. It is
easy to use and similar format to other search engines. It supports and utilizes the searching syntax of each
search engine, including booleans, phrase searching, pluses and minuses, wildcards, field searching, etc. It can
consolidate and remove duplicates from the results. It is customizable for user's preferences. It use the indexes
created by other search engines, combining and post processing results in unique way. Using this more results
can be retrieved with the same effort. Overall it provides improved recall and precision.
VI. Conclusion
WWW search engines have a number of deficiencies including: periods of downtime, low coverage of
the WWW, inconsistent and inefficient user interfaces, out of date databases, poor relevancy ranking and
precision, and difficulties with spamming techniques. In this paper a new Meta Search Engine has been
introduced which address some of these and other difficulties in searching the WWW. It is understood that Meta
Search Engine retrieves efficient results than any Search Engine and its performance also improves factors like
recall and precision. It will combine and rank the results in an efficient way using the modified page rank
algorithm. It is powerful because one search can highlight strengths of a number of top search engines such as
Google, Yahoo, Bing and Ask.
References
[1]. What Is A Search Engine? Digitallearn.Org a PLA Initiative, http://digitallearn.org/sites/default/files/class/Basic%20Search/supple
mental-materials/basicsearchhandout.pdf
[2]. Subarna Kumar Das, “Role Of Meta Search Engines In Web- Based Information System: Fundamentals And Challenges”, 4th
Convention Planner -2006, Mizoram Univ, Aizawl, 09-10 November, 2006
[3]. D.Minnie, S.Srinivasan, “Meta Search Engine With An Intelligent Interface For Information Retrieval On Multiple Domains”,
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.4, October 2011
[4]. Minky Jindal , Nisha Kharb, “K-Means Clustering Technique On Search Engine Dataset Using Data Mining Tool”, International
Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510
[5]. Sonali Kushwah, Avdhesh Singh, Indu Gupta, “Analysis on Meta Search Engine”
[6]. Trupti V. Udapure, Ravindra D. Kale, Rajesh C. Dharmik, “Study Of Web Crawler And Its Different Types”, IOSR Journal of
Computer Engineering (IOSR-JCE) e-Volume 16, Issue 1, Ver. VI (Feb. 2014), PP 01-05
[7]. Google Search Engine Optimization Starter Guide, http://static.googleusercontent.com/media/www.google.com/en//webmasters/doc
s/search-engine-optimization-starter-guide.pdf
[8]. Kimber ly McCoy , “Search Engines, Subject Directories, And Meta-Search Engines”, Winter 2000 The OLRC News Volume 5,
No: 1
[9]. Bassma S.Alsulami, Maysoon F. Abulkhair, Fathy E. Eassa,“Near Duplicate Document Detection Survey”, International Journal of
Computer Science & Communication Networks,Vol 2(2), 147-151
[10]. K R Remesh Babu, AP Arya, “Design of a Metacrawler for Web Document Retrieval”, 12th International Conference on
Intelligent Systems Design and Applications (ISDA), 27-29 Nov. 2012, pp.478-48