Technical Club PPT for BTech CS and Btech IT

What is a Search
Engine?
• We use search engine like
Google like every second.
• Search Engine is a software
program which helps us find
information over internet with
just a piece of string or query.

How do you think, a search engine work?
Does it literally searches all the internet
when you type something into it?
Ans: Yes and No

How does a search engine works?
• Search engines, have a vast database, which contains information about every publicly
accessible and allowed pages.
• Search engines just performs a search in their respective databases to provide users
with the results.

So, who is entering data
into these databases?
IS THERE SOMEONE SITTING IN A
CUBICLE AND SAVING EACH PAGE
INTO THE DATABASE?

Heck No!
• There is no one doing this manually. During
the dot-com bubble, search engines like
Yahoo, used to index web pages manually.
Then Google came, Google did an
innovation and replaced this process with a
bot called "Crawler". After, search engines
like Yahoo and Bing also adopted this.

How does a search engine
work?
Search Engines performs a process in which they reads
and saves the content of every reachable page over the
internet into their databases.
This process of indexing webpage content into
database is called "Crawling".
The program which performs this process is termed
as crawler.

So how do we
get results?
• We simply get results, after the
search engine performs a search
in their DATABASE.
• Databases of search engines are
so large, they almost contain
every reachable site on the
internet. So we can say they
perform the search on internet, but
they don’t actually perform search
over internet.

FlexFind does the same,
but less

Components of FlexFind
Search Engine

Crawler
As discussed before, crawler is an
important for a search engine, our
project is no different. It too have
a crawler built in NodeJS and
TypeScript. This crawler is
responsible for going on each
possible page on the internet and
index every accessible page into
FlexFind database.

So Why TypeScript?
Not PHP?
• Making bots like crawler is possible but is not feasible.
Crawlers are the bots, whose task will never be
finished, as internet is almost filled with infinite
number of pages, every minute new page comes live.
So its almost infinite process, and PHP can not handle
it. We can not perform infinite loading with PHP, as
browsers will cancel the loading process and will
declare the site as "THIS SITE TOOK TOO LONG
RESPOND". That's why a programming language out
of browser was needed to be chosen, here we chose
TypeScript in NodeJS.

Database
We are using MySQL
database, to store the
indexed data for the
search engine. We
have divided the data
into two tables.
Table 1, for domains,
and second table for
pages and are
constrained with
FOREIGN KEYS.

So, why a SQL
based database?
• Of course, we could use a NoSQL
database like MySQL, but we
decided to use MySQL as it
provides and outstanding
integration with both PHP and
NodeJS.
• Whereas, database like MongoDB
is not well made for PHP, and it
would be easy to create complex
queries in SQL.

Backend
• We are using PHP to process user's search
query and create SQL queries and fetch relevant
records from the Database.

Don’t ask about HTML or
CSS that isn't the focus.

Relevancy?
So, you must have used Google to search
something on Google, how does Google
decide what are you searching for, what to
rank on top, hence is the most relevant and
what to show afterwards. Is it on random
basis?

Why this matters?
• Users will probably visit the site in top 3 in
the search result.
• Ranking on top in Google on a keyword
with large searches, will create a million
dollars business and will surely generate a
lot of revenue.
• And this can not be on luck basis, and it is
not.

Search Algorithms
• Search engines like Google validates a
page for over 200 factors to rank, like
backlinks, on page optimization, mobile
friendliness, bounce back rate etc. Other
search engines do the same.
• This presentation is not about Google or
SEO, hence we'll not go deep into it.

FlexFind uses similar
approach
• Remember crawler? In FlexFind, crawler is responsible
for not only indexing page content, but also to index meta
data about the page. And also, the score for the pages.
FlexFind crawler calculates a score for on page meta data
available.
• Crawler also increments the backlink count for target
URL's it find.
• Finally, crawler increment the domain authority for each
backlink it finds.

How FlexFind performs a search?
FlexFind provide ranks to matched pages in following manner.
Pages with
1. Search string in their URL
2. Search string exists distributively in single type of data about page.
3. Search string in H1 tags.
4. Search string in H2 / H3 / H4 / H5 / H6 HTML Tags.
5. Search string exists in anywhere in page body.
6. All words of the search string in the page present separetely.
7. Any word except pronoun of search query exists in the page.

Algorithm of
FlexFind
• With the ranking levels in the
previous slide, FlexFind
priorities ranking in
descending order of a derived
value which is calculated as
• (domain autority + number of
backlinks +(1.5*page
optimization score))/3

Scope of Features &
Correction
• This project can provide more relevant results, with more efficient
search algorithm.
• Feature to search Multimedia Files like Images and Videos.
• Using OpenGraph tags to record answers and more accurate
information in database to provide user directly with answers.
• Storing meta keywords from webpages into separate table to
provide autocomplete feature.
• Lastly, providing Pagination or On Page End Scroll instead of
loading all results in one go.

Technical Club PPT for BTech CS and Btech IT

Recommended

Recommended

More Related Content

Similar to Technical Club PPT for BTech CS and Btech IT

Similar to Technical Club PPT for BTech CS and Btech IT (20)

Recently uploaded

Recently uploaded (20)

Technical Club PPT for BTech CS and Btech IT