Search Engine
What is a Search
Engine?
• We use search engine like
Google like every second.
• Search Engine is a software
program which helps us find
information over internet with
just a piece of string or query.
How do you think, a search engine work?
Does it literally searches all the internet
when you type something into it?
Ans: Yes and No
How does a search engine works?
• Search engines, have a vast database, which contains information about every publicly
accessible and allowed pages.
• Search engines just performs a search in their respective databases to provide users
with the results.
So, who is entering data
into these databases?
IS THERE SOMEONE SITTING IN A
CUBICLE AND SAVING EACH PAGE
INTO THE DATABASE?
Heck No!
• There is no one doing this manually. During
the dot-com bubble, search engines like
Yahoo, used to index web pages manually.
Then Google came, Google did an
innovation and replaced this process with a
bot called "Crawler". After, search engines
like Yahoo and Bing also adopted this.
How does a search engine
work?
Search Engines performs a process in which they reads
and saves the content of every reachable page over the
internet into their databases.
This process of indexing webpage content into
database is called "Crawling".
The program which performs this process is termed
as crawler.
So how do we
get results?
• We simply get results, after the
search engine performs a search
in their DATABASE.
• Databases of search engines are
so large, they almost contain
every reachable site on the
internet. So we can say they
perform the search on internet, but
they don’t actually perform search
over internet.
FlexFind does the same,
but less
Components of FlexFind
Search Engine
Crawler
As discussed before, crawler is an
important for a search engine, our
project is no different. It too have
a crawler built in NodeJS and
TypeScript. This crawler is
responsible for going on each
possible page on the internet and
index every accessible page into
FlexFind database.
So Why TypeScript?
Not PHP?
• Making bots like crawler is possible but is not feasible.
Crawlers are the bots, whose task will never be
finished, as internet is almost filled with infinite
number of pages, every minute new page comes live.
So its almost infinite process, and PHP can not handle
it. We can not perform infinite loading with PHP, as
browsers will cancel the loading process and will
declare the site as "THIS SITE TOOK TOO LONG
RESPOND". That's why a programming language out
of browser was needed to be chosen, here we chose
TypeScript in NodeJS.
Database
We are using MySQL
database, to store the
indexed data for the
search engine. We
have divided the data
into two tables.
Table 1, for domains,
and second table for
pages and are
constrained with
FOREIGN KEYS.
So, why a SQL
based database?
• Of course, we could use a NoSQL
database like MySQL, but we
decided to use MySQL as it
provides and outstanding
integration with both PHP and
NodeJS.
• Whereas, database like MongoDB
is not well made for PHP, and it
would be easy to create complex
queries in SQL.
Backend
• We are using PHP to process user's search
query and create SQL queries and fetch relevant
records from the Database.
Don’t ask about HTML or
CSS that isn't the focus.
Relevancy?
So, you must have used Google to search
something on Google, how does Google
decide what are you searching for, what to
rank on top, hence is the most relevant and
what to show afterwards. Is it on random
basis?
Why this matters?
• Users will probably visit the site in top 3 in
the search result.
• Ranking on top in Google on a keyword
with large searches, will create a million
dollars business and will surely generate a
lot of revenue.
• And this can not be on luck basis, and it is
not.
Search Algorithms
• Search engines like Google validates a
page for over 200 factors to rank, like
backlinks, on page optimization, mobile
friendliness, bounce back rate etc. Other
search engines do the same.
• This presentation is not about Google or
SEO, hence we'll not go deep into it.
FlexFind uses similar
approach
• Remember crawler? In FlexFind, crawler is responsible
for not only indexing page content, but also to index meta
data about the page. And also, the score for the pages.
FlexFind crawler calculates a score for on page meta data
available.
• Crawler also increments the backlink count for target
URL's it find.
• Finally, crawler increment the domain authority for each
backlink it finds.
How FlexFind performs a search?
FlexFind provide ranks to matched pages in following manner.
Pages with
1. Search string in their URL
2. Search string exists distributively in single type of data about page.
3. Search string in H1 tags.
4. Search string in H2 / H3 / H4 / H5 / H6 HTML Tags.
5. Search string exists in anywhere in page body.
6. All words of the search string in the page present separetely.
7. Any word except pronoun of search query exists in the page.
Algorithm of
FlexFind
• With the ranking levels in the
previous slide, FlexFind
priorities ranking in
descending order of a derived
value which is calculated as
• (domain autority + number of
backlinks +(1.5*page
optimization score))/3
Scope of Features &
Correction
• This project can provide more relevant results, with more efficient
search algorithm.
• Feature to search Multimedia Files like Images and Videos.
• Using OpenGraph tags to record answers and more accurate
information in database to provide user directly with answers.
• Storing meta keywords from webpages into separate table to
provide autocomplete feature.
• Lastly, providing Pagination or On Page End Scroll instead of
loading all results in one go.
That's it for 10 Marks

Technical Club PPT for BTech CS and Btech IT

  • 1.
  • 2.
    What is aSearch Engine? • We use search engine like Google like every second. • Search Engine is a software program which helps us find information over internet with just a piece of string or query.
  • 3.
    How do youthink, a search engine work? Does it literally searches all the internet when you type something into it? Ans: Yes and No
  • 4.
    How does asearch engine works? • Search engines, have a vast database, which contains information about every publicly accessible and allowed pages. • Search engines just performs a search in their respective databases to provide users with the results.
  • 5.
    So, who isentering data into these databases? IS THERE SOMEONE SITTING IN A CUBICLE AND SAVING EACH PAGE INTO THE DATABASE?
  • 6.
    Heck No! • Thereis no one doing this manually. During the dot-com bubble, search engines like Yahoo, used to index web pages manually. Then Google came, Google did an innovation and replaced this process with a bot called "Crawler". After, search engines like Yahoo and Bing also adopted this.
  • 7.
    How does asearch engine work? Search Engines performs a process in which they reads and saves the content of every reachable page over the internet into their databases. This process of indexing webpage content into database is called "Crawling". The program which performs this process is termed as crawler.
  • 8.
    So how dowe get results? • We simply get results, after the search engine performs a search in their DATABASE. • Databases of search engines are so large, they almost contain every reachable site on the internet. So we can say they perform the search on internet, but they don’t actually perform search over internet.
  • 9.
    FlexFind does thesame, but less
  • 10.
  • 11.
    Crawler As discussed before,crawler is an important for a search engine, our project is no different. It too have a crawler built in NodeJS and TypeScript. This crawler is responsible for going on each possible page on the internet and index every accessible page into FlexFind database.
  • 12.
    So Why TypeScript? NotPHP? • Making bots like crawler is possible but is not feasible. Crawlers are the bots, whose task will never be finished, as internet is almost filled with infinite number of pages, every minute new page comes live. So its almost infinite process, and PHP can not handle it. We can not perform infinite loading with PHP, as browsers will cancel the loading process and will declare the site as "THIS SITE TOOK TOO LONG RESPOND". That's why a programming language out of browser was needed to be chosen, here we chose TypeScript in NodeJS.
  • 13.
    Database We are usingMySQL database, to store the indexed data for the search engine. We have divided the data into two tables. Table 1, for domains, and second table for pages and are constrained with FOREIGN KEYS.
  • 14.
    So, why aSQL based database? • Of course, we could use a NoSQL database like MySQL, but we decided to use MySQL as it provides and outstanding integration with both PHP and NodeJS. • Whereas, database like MongoDB is not well made for PHP, and it would be easy to create complex queries in SQL.
  • 15.
    Backend • We areusing PHP to process user's search query and create SQL queries and fetch relevant records from the Database.
  • 16.
    Don’t ask aboutHTML or CSS that isn't the focus.
  • 17.
    Relevancy? So, you musthave used Google to search something on Google, how does Google decide what are you searching for, what to rank on top, hence is the most relevant and what to show afterwards. Is it on random basis?
  • 18.
    Why this matters? •Users will probably visit the site in top 3 in the search result. • Ranking on top in Google on a keyword with large searches, will create a million dollars business and will surely generate a lot of revenue. • And this can not be on luck basis, and it is not.
  • 19.
    Search Algorithms • Searchengines like Google validates a page for over 200 factors to rank, like backlinks, on page optimization, mobile friendliness, bounce back rate etc. Other search engines do the same. • This presentation is not about Google or SEO, hence we'll not go deep into it.
  • 20.
    FlexFind uses similar approach •Remember crawler? In FlexFind, crawler is responsible for not only indexing page content, but also to index meta data about the page. And also, the score for the pages. FlexFind crawler calculates a score for on page meta data available. • Crawler also increments the backlink count for target URL's it find. • Finally, crawler increment the domain authority for each backlink it finds.
  • 21.
    How FlexFind performsa search? FlexFind provide ranks to matched pages in following manner. Pages with 1. Search string in their URL 2. Search string exists distributively in single type of data about page. 3. Search string in H1 tags. 4. Search string in H2 / H3 / H4 / H5 / H6 HTML Tags. 5. Search string exists in anywhere in page body. 6. All words of the search string in the page present separetely. 7. Any word except pronoun of search query exists in the page.
  • 22.
    Algorithm of FlexFind • Withthe ranking levels in the previous slide, FlexFind priorities ranking in descending order of a derived value which is calculated as • (domain autority + number of backlinks +(1.5*page optimization score))/3
  • 23.
    Scope of Features& Correction • This project can provide more relevant results, with more efficient search algorithm. • Feature to search Multimedia Files like Images and Videos. • Using OpenGraph tags to record answers and more accurate information in database to provide user directly with answers. • Storing meta keywords from webpages into separate table to provide autocomplete feature. • Lastly, providing Pagination or On Page End Scroll instead of loading all results in one go.
  • 24.
    That's it for10 Marks