Full text and relational search

Full-text & Relational search
VIJAY YADAV
072-BCT-547

2
SEARCH IS HARD
 Average no. of search per day
is over 3.5 billion on Google
alone.
 That’s one search for every two
people (including babies and
grandmothers, but excluding
zombies) in the world.

3
 That doesn’t even include the number of searches on Amazon,
LinkedIn and Facebook. We use search for everything.
 Oh, except the company data. We still use BI analysts, data
scientists, specialized tools, and SQL for that.

4
So what is full-text search ?
 It is simply a document-based
search mainly employed by
word processing applications
and various search engines.
 It often performs two tasks:
indexing and searching.

5
 The indexing stage will scan the text of all the documents and
build a list of search terms (often called as index). The indexer
will ignore stop words such as "the" and "and“. Also the word
drives, drove, driven will be recorded only as a single word
“drive”.
 In the search stage, when performing a specific query, only the
index is referenced, rather than the text of the original
documents.

6
Two ways of performance improvements
I. Improved query tools
II. Improved search algorithms
Improved search algorithms
 PageRank algorithm developed by google.

7
Improved query tools
 Keywords: Creators are asked to list the words that best
describe the text including synonyms.
 Phrase search: Will search only those documents that
contain certain phrase.
 Fuzzy search: Will search documents with even some
variations around the given term.

Some fuzzy search algorithms
Soundex
Metaphone
Double Metaphone

Soundex
In PostgreSQL, below two queries will result in same
output with soundex algorithm and hence even wrong
typed word can give right result.
1. SELECT soundex(‘elephant’);
————————————-
=> E415
2. SELECT soundex(‘elephents’);
————————————–
=> E415

Software performing full-text search

Problems with full-text search
• The results may not be 100% accurate.
• Large number of irrelevant search results due to
lack of relation among the words

Why Relational search ?
• Gives more accurate and relevant
result.
• Much useful for business analytics.
but…

Relational search is even harder because
1. Company’s data is complicated
• Search on LinkedIn probably means searching for a
person or a company.
• Search on Amazon probably means searching for a
product.
• But company’s data includes multiple databases, tables,
columns, rows with complicated relationships between
them.

2. Needs to be 100% accurate or you risk your business
What’s worse
than guessing?
Being
convinced by
bad data.

Relational search makes huge
difference in enterprise because it
takes deterministic input to give
deterministic output.

Full text and relational search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Full text and relational search

Similar to Full text and relational search (20)

More from Vijay Yadav

More from Vijay Yadav (8)

Recently uploaded

Recently uploaded (20)

Full text and relational search