Your SlideShare is downloading. ×
A customized web search engine [autosaved]
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A customized web search engine [autosaved]


Published on

A customized web search engine is graduation project .This presentation displays what search engine is and open source software which used in this project

A customized web search engine is graduation project .This presentation displays what search engine is and open source software which used in this project

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Supervised By Dr. Mohamed A. El-Rashidy Eng. Ahmed Ghozia Dept. of Computer Science & Engineering Faculty of Electronic Engineering, Menoufiya University.
  • 2.  The main purpose of this project is to build our own search engine that should suffice for our needs as a nation  In this project has been tried to add customized features to search engine such as building and developing a time-based search engine that is meant to deal with local and international news
  • 3.  Question : What is a Search Engine?  How web search engine work?  Web crawler , Indexing , Ranking  Lucene , Nutch , Solr  Who uses solr?  Setup Nutch for web crawling  Setup Solr for search  Running Nutch in Eclipse for developing  Experiments
  • 4.  Answer: A software that  builds an index on text  answers queries using that index  A search engine offers Scalability Relevance Ranking Integrates different data sources (email, web pages, files, database,...)‫‏‬
  • 5.  A search engine operates, in the following order 1. Web crawling 2. Indexing 3. Ranking
  • 6.  a program or automated script which browses the World Wide Web  used to create a copy of all the visited pages for later processing by a search engine  it starts with a list of URLs to visit, called the seeds  URLs recursively visited according to a set of policies  A selection policy  A re-visit policy  A politeness policy  A parallelization policy
  • 7.  Indexing process entails how data is collected, parsed, and stored to facilitate fast and accurate search query evaluation.  The process involves the following steps  Data collection  Data traversal  Indexing
  • 8.  Indexing process:  Convert document  Extract text and meta data  Normalize text(stop word,stim)  Write (inverted) index  Example:  Document 1: “Apache Lucene at Jazoon“  Document 2: “Jazoon conference“  Index:  apache -> 1  conference -> 2  Jazoon -> 1, 2  lucene -> 1
  • 9.  The web search engine responds to a query that a user enters into a web search engine to satisfy his or her information needs
  • 10.  a high-performance, scalable information retrieval (IR) library  lets you add searching capabilities to your applications.  free, open source project implemented in Java  With Lucene, you can index and search email messages, mailing-list archives, instant messenger chats, your wiki pages…the list goes on.
  • 11.  Web Search Engine Software  Open source web crawler  Coded entirely in the Java programming language  Advantages  Scalability  Crawler Politeness  Crawler Management  Quality
  • 12.  Open source enterprise search platform based on Apache Lucene project.  Powerful full-text search, hit highlighting, faceted search  Database integration, and rich document (e.g., Word, PDF) handling
  • 13.  Download a binary package (  cd apache-nutch-1.X/  bin/nutch crawl urls -dir crawl -depth 3 -topN 5  Now you should be able to see the following directories created:  crawl/crawldb  crawl/linkdb  crawl/segments
  • 14.  If you have a Solr core already set up and wish to index to it we should use bin/nutch crawl urls -solr http://localhost:8983/solr/ - depth 3 -topN 5 Now skip to here for how to set up your Solr instance and index your crawl data.
  • 15.  Download binary file (  cd ${APACHE_SOLR_HOME}/example  java -jar start.jar  After you started Solr admin console, you should be able to access the following link: http://localhost:8983/solr/admin/  Integrate Solr with Nutch cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/
  • 16.  restart Solr with the command “java -jar start.jar” under ${APACHE_SOLR_HOME}/example  run the Solr Index command: bin/nutch solrindex crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
  • 17.  Crawling the Egyptian Universities
  • 18.  Crawling the Arabic news websites
  • 19.  Crawling the Arabic news websites
  • 20. Mustafa Mohammed Ahmed Elkhiat