SEARCH ENGINE v1.0
Submitted to :
Ms. NIHARIKA GARG
By:
Sachin Sharma 11CSU123
Mohit Choudhary 11CSU091
Rahul Khatri 11CSU117
S Renuka 11CSU122
Faculty GUIDE: Mrs. Richa ChHAbra
InTRODUCTION
A Web search engine is a tool for locating
information on a specific subject. When
you search for information in any Web
database, you use keywords (words or
phrases) to identify or describe the
information you are seeking. The search
engine then searches through the indexes
in its database and returns a list of Web
sites that match your searching criteria.
Why SE1.0?
 To generate targeted results from the
websites that do not have a search engine
of their own
 Other search engines of google, yahoo,
bing etc. do not generate website specific
results and usually give lot of unnecessary
data
Components of a Search Engine
 Crawling
Process by which web pages are discovered and added to an
index
 Indexing
Compile a massive index of all the words found on:
◦ The web page itself
◦ Key content tags (ex. title, alt)
 Process Queries & Rank Results
Search the index for matching pages and return in relevant order
the most relevant results to the user
 Serving Results
Crawler
 The crawler uses PHP library PHPCrawl to
crawl through the website
 The HTML_DOM we will ignore the
boolean operators and other unnecessary
data and focus on storing only the
keywords and URLs
 The crawler will then send the metadata
and the URL to MySQL for processing and
storing
WAMP Server
 Windows Apache MySQL PHP server is the
integration of fore mentioned technologies
 It permits us to remotely store and access
the web pages and database using a web
host
 It also allows us to take our web pages
online in order to make it accessible over
a range of systems
Indexing
 The storage will be done in MySQL
 Database structure will be as follows:
Search(IndexNO,description,tags,URL)
Store(URL, Description, tags)
Rank(URL, Close_fit, Rank_no)
Process queries and Rank Results
 The query is being processed in MySQL
and will be retrieved using search table
 The searched queries will be then ranked
before displaying it on the web page
 The ranking will be done on the basis of
closest fit: the most correctly matched
record will be ranked first if two records
are equally matched, then the record that
was later searched will be ranked before
the first query
Serving Results
 The results will be formatted and
displayed using HTML5 and CSS3.0
 The results will be organized in such a
way that they appear in order of
descending ranks and will contain an
onClick function that will shift the control
to the specified URL
Technologies,IDE,Languages Used
 WAMP server that includes
APACHE
MySQL
PHP 5.3
• HTML 5 and CSS 3.0, XML
• Browser(IE 9 and Above, Mozilla Firefox,
Google Chrome)
• Notepad++, Rational Rose(for SRS),
SublimeText2
Bibliography
 http://phpcrawl.cuab.de/example.html
 www.phpmyadmin.net/
 www.wikipedia.org

Search engine v1.0

  • 1.
    SEARCH ENGINE v1.0 Submittedto : Ms. NIHARIKA GARG By: Sachin Sharma 11CSU123 Mohit Choudhary 11CSU091 Rahul Khatri 11CSU117 S Renuka 11CSU122 Faculty GUIDE: Mrs. Richa ChHAbra
  • 2.
    InTRODUCTION A Web searchengine is a tool for locating information on a specific subject. When you search for information in any Web database, you use keywords (words or phrases) to identify or describe the information you are seeking. The search engine then searches through the indexes in its database and returns a list of Web sites that match your searching criteria.
  • 3.
    Why SE1.0?  Togenerate targeted results from the websites that do not have a search engine of their own  Other search engines of google, yahoo, bing etc. do not generate website specific results and usually give lot of unnecessary data
  • 4.
    Components of aSearch Engine  Crawling Process by which web pages are discovered and added to an index  Indexing Compile a massive index of all the words found on: ◦ The web page itself ◦ Key content tags (ex. title, alt)  Process Queries & Rank Results Search the index for matching pages and return in relevant order the most relevant results to the user  Serving Results
  • 5.
    Crawler  The crawleruses PHP library PHPCrawl to crawl through the website  The HTML_DOM we will ignore the boolean operators and other unnecessary data and focus on storing only the keywords and URLs  The crawler will then send the metadata and the URL to MySQL for processing and storing
  • 6.
    WAMP Server  WindowsApache MySQL PHP server is the integration of fore mentioned technologies  It permits us to remotely store and access the web pages and database using a web host  It also allows us to take our web pages online in order to make it accessible over a range of systems
  • 7.
    Indexing  The storagewill be done in MySQL  Database structure will be as follows: Search(IndexNO,description,tags,URL) Store(URL, Description, tags) Rank(URL, Close_fit, Rank_no)
  • 8.
    Process queries andRank Results  The query is being processed in MySQL and will be retrieved using search table  The searched queries will be then ranked before displaying it on the web page  The ranking will be done on the basis of closest fit: the most correctly matched record will be ranked first if two records are equally matched, then the record that was later searched will be ranked before the first query
  • 9.
    Serving Results  Theresults will be formatted and displayed using HTML5 and CSS3.0  The results will be organized in such a way that they appear in order of descending ranks and will contain an onClick function that will shift the control to the specified URL
  • 10.
    Technologies,IDE,Languages Used  WAMPserver that includes APACHE MySQL PHP 5.3 • HTML 5 and CSS 3.0, XML • Browser(IE 9 and Above, Mozilla Firefox, Google Chrome) • Notepad++, Rational Rose(for SRS), SublimeText2
  • 11.