SlideShare a Scribd company logo
1 of 11
Must be similar to screenshots
I must be able to run the projects on Eclipse so that I can upload
the codes to my Github account
The projects must say that they were created by
Juliet Mercado
Zachary Willis
Ihor Panchenko
Craig Anderson
Building a Search Engine, Part I: Governance, Workflow, and
UI
(This is the first project in this series)
You are going to design, build, and test a scaled-down version
of “Google Search”. Rather than searching the Internet's files,
you will only search local files added to your search engine's
index. Your search engine will allow an administrator to add,
update, and remove files from the index. Users will be able to
enter search terms, and select between Boolean AND, OR, or
PHRASE search. The matching file names (if any) are then
displayed in a list.
You also need to design the system architecture (the high-level
design), so you can plan each part.
Search Engine Project Proposal:
Build a search engine with simple GUI, that can do AND, OR,
and PHRASE Boolean searches on a small set of text files. The
user should be able to say the type of search to do, and enter
some search terms. The results should be a list of file
pathnames that match the search. This should be a stand-alone
application
User Interfaces
In addition to the main user interface (for doing searching), you
will need a separate administrator or maintenance interface to
manage your application. It should be easy to add and remove
files (from the set of indexed files), and to regenerate the index
anytime. When starting, your application should check if any of
the files have been changed or deleted since the application last
saved the index. If so, the administrator should be able to have
the index updated with the modified file(s).
Note that with HTML, Word, or other types of documents, you
would need to extract a plain text version before indexing. That
isn't hard, but the search engine is complex enough already. For
these projects, limit your search engine to only plain text files
(including .txt, .html, and other text files).
The index must be stored on disk, so next time your application
starts it can reload its data. The index, list of files, and other
data, can be stored in one or more file(s) or in a database. The
saved data should be read whenever your application starts. The
saved data should be updated (or recreated) when you add,
update, or remove documents from your set (of indexed
documents), or perhaps just when your application exits. If you
use files, the file formats are up to you; have a format that is
fast and simple to load and store.
To keep things as simple as possible, in this project you can
assume that only a small set of documents will be indexed, and
thus the whole index can be kept in memory at once. (That's
probably not the case for Google's data!) All you need to do is
be able to read the index data from disk at startup into memory,
and write it back either when updating the index, or when your
application shuts down. Note, the names (pathnames) of the
added files as well as their last modification time must be
stored in addition to the index.
If using XML file, you can define an XML schema for it and
have some tool such as Notepad++ validate your file format for
you. XML may have other benefits, but it isn't as simple as
using plain text files. JSON might be the easiest format for
storing and reading the index data. In any case, don't forget to
include the list of file pathnames and other data you decide is
needed, along with the index itself.
Requirements:
In this project, we will follow the model-view-controller design
pattern for the project organization. This allows one to develop
each part mostly independently from the other parts.
Develop Stub User Interfaces:
In this part of the project, you must implement a non-functional
(that means looks good but doesn't do a thing) graphic user
interface for the application. (The “view”.) The main (default)
user interface must support searching and displaying results. It
should have various other features, such as an “About...” menu
or button, a way to quit the application (if a stand-alone
application; if your group creates a web application, there is no
need to quit), and a way to get to the administrator/maintenance
view.
The maintenance/administrator view must allow the user to
perform various administration operations: view the list of
indexed file names, adding files to the index, remove files from
the index, and update the index (when files have been modified
since they were indexed).
The user interface should be complete, but none of the
functionality needs to be implemented at this time. You should
implement stub methods for the functionality not yet
implemented, and invoke them from your event handlers. The
stub methods can either return “canned” (fake but realistic)
data, or throw an OperationNotSupported exception. The only
button that needs to do anything is the one used to switch to the
maintenance view.
Since the user interfaces don't do anything, there is nothing to
test yet. However, you must create a test class with at least one
test method (it can just return success if you wish). I suggest
you agree to use JUnit 4 style tests for now.
Building a Search Engine, Part II: Persistent Data
Please read the background information and full project
description from Search Engine Project, Part I. In this project,
you will implement the persistent data (the “model”) part of the
project: the saving of data and the loading of data at the next
start. The persistent data contains the list of files used in the
index, and the index itself.
First discuss which persistence solution you will use: text files,
XML or JSON files, or a database (and chose between
embedded (my suggestion) or server, and if using a database,
chose between the JDBC and JPA database APIs (I suggest
JPA). You can make this decision before knowing the details of
the data structures used.
Before working on actual code, you need to decide on the data
structures to be used for the file list and the inverted index. Try
to read the Java collections material before deciding.
It should be easy to add and remove files (from the set of
indexed files). When starting, your application should check if
any of the files used have been changed or deleted since the
application last saved the index. If so, the “admin” user should
be able to have the inverted index file(s) updated, from the
maintenance interface.
(Note that with HTML or Word documents, you would need to
extract a plain text version before indexing.) In this project, all
the “indexible” files are plain text. You are free to assume the
system-default text file encoding, or assume UTF-8 encoding,
for all files.
The inverted index can be stored in one or more file(s), and that
should be read whenever your application starts. The file(s)
should be updated (or recreated) when you add, update, or
remove documents from your set (of indexed documents). The
file format is up to you, but should have a format that is fast
and simple to search. However, to keep things simpler, in this
project you can assume that only a small set of documents will
be indexed, and thus the whole index can be kept in memory.
All you need to do is be able to read the index data from a file
at startup into memory, and write it back when updating the
index. Don't forget the names (pathnames) of the files as well as
their last modification time must be stored as well. It is your
choice to use a single file or multiple files, in plain text, JSON,
XML, or any format your group chooses, to hold the persistent
data. If you want, you can use any DBMS. (In that case, I
suggest using the JavaDB included with the JDK, as an
embedded database.) In any case, your file format(s) or database
schema must be documented completely, so that someone else,
without access to your source code could use your file(s) or
database correctly.
If using XML format, you can define an XML schema for your
file and have some tool such as Notepad++ validate your file
format for you. XML may have other benefits, but it isn't as
simple as plain text files or even JSON files. In any case, don't
forget to include the list of file (path) names, along with the
index itself, in your persistent data store.
Part II Requirements:
In this part, you must implement the file operations of your
search engine application (the model). That includes reading
and updating your persistent data (that is, the inverted index as
well as any other information you need to store between runs of
your application, such as the list of files (their pathnames) that
have been indexed). The main file operations are reading each
file to be indexed a “word” at a time; you also need to checking
if the previously indexed files still exist or have been modified
since last indexed.
The maintenance part of the user interface should allow users to
select files for indexing, and to keep track of which files have
been added to the index. For each file, you need to keep the full
pathname of the file as well as the file's last modification time.
Your code should correctly handle the user entering in non-
existent files and unreadable files. How you handle such errors
is up to you
You can download a Search Engine model solution, to play with
it and inspect its user interface. My solution keeps all persistent
data in a single text file in the user's home directory, but you
can certainly use a different persistence solution.
Possible Data Structures you can use. In part III, you will
implement the index operations, including Boolean searching,
adding to the index, and removing files from the index. (The
index is a complex collection of collections.) Because the
format of the index and file list will affect the code used to read
and write them to and from storage, you must decide on the in-
memory data structures to be used early. In the model solution,
I used a List of FileItem objects for the list of indexed files;
each FileItem contained a file's pathname and date it was read
for the index. The index data itself is stored in a Map, with the
using the indexed words as keys, and a Set of IndexData objects
as the values. Each IndexData object holds the id of the file
containing the word and the position of the word in that
document. (The classes FileItem and IndexData were trivial to
write.)
This is NOT the only, or the best, way to represent the index or
file list! (For example, a List of int[2] arrays might be simpler
than a Set of IndexData objects.) Your should decide on the
types of collections used. Only then can you implement the
methods to read and write the data.
Building a Search Engine, Part III:
Collections
Please read the background information and full project
description from Search Engine Project, Part I.
In this final part of the project, you will complete the
application by implementing the index functions. These include
adding a file to the index, and removing a file from the index,
and reading and writing the index from/to a file. (Updating the
index when a file has been changed, can then be done by
removing and then re-adding a file.) Other operations include
searching the index for a given word, and returning a Set of
pairs (document ID and position) for that word.
Finally, you will have to implement the Boolean search
functions of the main user interface. (This is complex enough,
that it should have been another project!) I suggest you start
with an “OR” search, then worry about implementing the
“AND” and “PHRASE” search functions.
When building the index, keep in mind you will need to define
what you mean by “word”. One possibility is to strip out any
non-digits or letters, and convert the result to all lowercase,
both when you build the inverted index and when you read the
search terms entered by the user. Ideally, you can use the I18N
methods discussed in class to normalize the words.
Implementing Boolean Search:
The exact method depends in part on how you implement the
inverted index. In the suggested implementation (a Map with
words as the keys, and a List or Set of (document ID, position)
pairs as the values), you could implement the Boolean searches
using algorithms similar to the following (you can come up with
your own if you wish):
OR Search
This is the easiest one to implement. The general idea is to start
with an empty Set of matching files. Then add to that Set, the
files containing each search term; Just search the Map for that
word, and add each document found (if any). The result is the
OR search results, the files that contain any word in the search
list. (If user inputs no search words, say “ ,.”, then no files are
considered as matching.)
AND Search
This is done the opposite way from an OR search, and is only a
little harder to implement. The idea is to start with a set of all
files in the index. Then for each search term, for each file in the
Set, make sure that file is contained in the index for that search
term. Remove any files from the set that don't contain that
word. The resulting final set is the documents matching all
search terms. (If user inputs no search words, say “ ,.”, then all
files are considered as matching. If that isn't the behavior you
want, you need to treat that as a special case.)
PHRASE Search
This is the hardest search to implement. Unlike the OR and the
AND searches, with PHRASE searching, the position of the
search terms in the files matters. The algorithm I came up with
is:
Create an initially empty Set of Pair objects.
Add to the set the Pair objects for the files that contain the first
word of the phrase. This is the easy part: Just lookup that word
in the Map, and add all Pair objects found to a set.
The Set now contains Pair objects for just the files that might
contain the phrase. Next, loop over the remaining words of the
phrase, removing any Pairs from the set that are no longer
possible phrase continuations. (Actually, I just build a new Set.)
For each remaining word in the phrase:
Create a new, empty set of Pairs.
For each Pair in the previous set, see if the word appears in the
same file, but in the next position. If so, add the Pair object for
the word to the new set.
An example may help clarify this. Suppose the search phrase is
“big top now”. The set initially contains all the Pair objects for
the word “big”. Let's say for example, that set looks like:
(file1,position7), (file1,position22), (file3,position4)
For each Pair object in that set, you need to see if “top” is in
that same file, but the next position. If so, you add the Pair
object for that to the new Set. The (inner) loop for this example
checks each of the following:
Is a (file1,position8) Pair object in the Map for the word "top"?
Is a (file1,position23) Pair object in the Map for the word
"top"?
Is a (file3,position5) Pair object in the Map for the word "top"?
If the answer is “yes”, then add that Pair object to the new set.
When this loop ends, the new set will contain the Pair objects
for the phrase “big top” (pointing to the position of the word
“top”).
For example, suppose “top” is only found in (file1,position8)
and (file3,position5). You replace the first set with this new set:
(file1,position8), (file3,position5)
Repeat for the next word in the phrase, using the set built in the
previous loop.
Continue until the set is empty (so phrase not found), or until
the last word of the phrase has been processed. The Pair objects
remaining in the final set are the ones that contain the phrase;
the position will be that of the last word of the phrase. (We only
need to display the file name; in this project, the position of the
phrase doesn't matter.)
Part III Requirements:
This project has been split into three parts. Each part counts as
a separate project. In the first two parts, you designed and
implemented a graphic user interface for the application, and
added all required file operations.
In this part, you must implement the remaining operations of
your search engine application: the index operations, and the
searching.
You can download a Search Engine model solution, to play with
it and inspect its user interface, but please keep in mind you
should not copy that user interface; instead, invent a better,
nicer-looking one.
Hints:
Keep your code as simple as possible
The inverted index is naturally a Map, from words (the keys) to
a Set of objects (the values). Each of the objects represent a
document and a location within that document, where the word
was found. I called these objects Pairs, since they are a pair of
numbers, but you can use any name for your classes. Note, you
will need to be able to go from a document number to a file
name, when you display the search results.

More Related Content

Similar to Build search engine with GUI for AND, OR, PHRASE searches

Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptxWidsoulDevil
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchErudite
 
OPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONOPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONSUMIT KUMAR
 
Fusion P8 for FileNet Overview
Fusion P8 for FileNet OverviewFusion P8 for FileNet Overview
Fusion P8 for FileNet OverviewMarc-Henri Cerar
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performanceintelliyole
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET Journal
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Toolsmcthedog
 
InnerSoft STATS - Introduction
InnerSoft STATS - IntroductionInnerSoft STATS - Introduction
InnerSoft STATS - IntroductionInnerSoft
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
 
pega training with project level Trainingwhatsup@8142976573
pega training  with project level Trainingwhatsup@8142976573pega training  with project level Trainingwhatsup@8142976573
pega training with project level Trainingwhatsup@8142976573Santhoo Vardan
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573Santhoo Vardan
 
Doc manual 3.x
Doc manual 3.xDoc manual 3.x
Doc manual 3.xsetankecos
 
Vipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul Divyanshu
 

Similar to Build search engine with GUI for AND, OR, PHRASE searches (20)

Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptx
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
django
djangodjango
django
 
OPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONOPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATION
 
Fusion P8 for FileNet Overview
Fusion P8 for FileNet OverviewFusion P8 for FileNet Overview
Fusion P8 for FileNet Overview
 
Intro to OctoberCMS
Intro to OctoberCMSIntro to OctoberCMS
Intro to OctoberCMS
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Tools
 
InnerSoft STATS - Introduction
InnerSoft STATS - IntroductionInnerSoft STATS - Introduction
InnerSoft STATS - Introduction
 
ssssss
ssssssssssss
ssssss
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
 
pega training with project level Trainingwhatsup@8142976573
pega training  with project level Trainingwhatsup@8142976573pega training  with project level Trainingwhatsup@8142976573
pega training with project level Trainingwhatsup@8142976573
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573
 
Doc manual 3.x
Doc manual 3.xDoc manual 3.x
Doc manual 3.x
 
Vipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentation
 
Dost.jar and fo.jar
Dost.jar and fo.jarDost.jar and fo.jar
Dost.jar and fo.jar
 

More from herthaweston

TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docx
TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docxTOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docx
TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docxherthaweston
 
Topic 4 DQ 1In an SEI classroom, the language objective dr.docx
Topic 4 DQ 1In an SEI classroom, the language objective dr.docxTopic 4 DQ 1In an SEI classroom, the language objective dr.docx
Topic 4 DQ 1In an SEI classroom, the language objective dr.docxherthaweston
 
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docx
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docxTopic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docx
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docxherthaweston
 
Topic 2 Partial Correlation 1.From your textbook readings.docx
Topic 2 Partial Correlation 1.From your textbook readings.docxTopic 2 Partial Correlation 1.From your textbook readings.docx
Topic 2 Partial Correlation 1.From your textbook readings.docxherthaweston
 
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docx
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docxTopic 5 Conformity, Deviance, Crime, and StratificationObject.docx
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docxherthaweston
 
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docx
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docxTopic 2 The Disaster Management CycleReadRichards, E. A., Nova.docx
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docxherthaweston
 
Topic 1Identify an experience where you had to assess the needs, i.docx
Topic 1Identify an experience where you had to assess the needs, i.docxTopic 1Identify an experience where you had to assess the needs, i.docx
Topic 1Identify an experience where you had to assess the needs, i.docxherthaweston
 
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docx
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docxTopic 1 Vicarious Liability StatutesWhat is vicarious liability .docx
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docxherthaweston
 
Topic 2 DQ 1Qualitative research is a research method which .docx
Topic 2 DQ 1Qualitative research is a research method which .docxTopic 2 DQ 1Qualitative research is a research method which .docx
Topic 2 DQ 1Qualitative research is a research method which .docxherthaweston
 
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docx
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docxTopic 2 DQ 1Child abuse is any Act of Violence either physic.docx
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docxherthaweston
 
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docx
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docxTOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docx
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docxherthaweston
 
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docx
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docxTopic 1 How Does Transcultural Nursing Theory FitTranscultural n.docx
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docxherthaweston
 
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docx
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docxTopic 1 FTC’s Green Guide Research the policy behind the FTC’.docx
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docxherthaweston
 
Topic 1Personal Philosophy From your readings about teaching and .docx
Topic 1Personal Philosophy From your readings about teaching and .docxTopic 1Personal Philosophy From your readings about teaching and .docx
Topic 1Personal Philosophy From your readings about teaching and .docxherthaweston
 
Topic 1 Based upon an A through F scale, what grade would you give .docx
Topic 1 Based upon an A through F scale, what grade would you give .docxTopic 1 Based upon an A through F scale, what grade would you give .docx
Topic 1 Based upon an A through F scale, what grade would you give .docxherthaweston
 
Topic #1Define transnational crime. How has it changed in the la.docx
Topic #1Define transnational crime. How has it changed in the la.docxTopic #1Define transnational crime. How has it changed in the la.docx
Topic #1Define transnational crime. How has it changed in the la.docxherthaweston
 
Tony’s Chips has recently been sold to a new independent company. .docx
Tony’s Chips has recently been sold to a new independent company. .docxTony’s Chips has recently been sold to a new independent company. .docx
Tony’s Chips has recently been sold to a new independent company. .docxherthaweston
 
Topic #1 Discuss several technologies that have changed the face o.docx
Topic #1 Discuss several technologies that have changed the face o.docxTopic #1 Discuss several technologies that have changed the face o.docx
Topic #1 Discuss several technologies that have changed the face o.docxherthaweston
 
Tony’s Chips has recently been sold to a new independent company. Th.docx
Tony’s Chips has recently been sold to a new independent company. Th.docxTony’s Chips has recently been sold to a new independent company. Th.docx
Tony’s Chips has recently been sold to a new independent company. Th.docxherthaweston
 
Today, many infants and toddlers are learning two languages. Wha.docx
Today, many infants and toddlers are learning two languages. Wha.docxToday, many infants and toddlers are learning two languages. Wha.docx
Today, many infants and toddlers are learning two languages. Wha.docxherthaweston
 

More from herthaweston (20)

TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docx
TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docxTOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docx
TOPIC 6 NEGOTIATIONNegotiation is a highly important personal a.docx
 
Topic 4 DQ 1In an SEI classroom, the language objective dr.docx
Topic 4 DQ 1In an SEI classroom, the language objective dr.docxTopic 4 DQ 1In an SEI classroom, the language objective dr.docx
Topic 4 DQ 1In an SEI classroom, the language objective dr.docx
 
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docx
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docxTopic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docx
Topic 6 Unit 3 Part 2 Definition EssayObjectivesDraft .docx
 
Topic 2 Partial Correlation 1.From your textbook readings.docx
Topic 2 Partial Correlation 1.From your textbook readings.docxTopic 2 Partial Correlation 1.From your textbook readings.docx
Topic 2 Partial Correlation 1.From your textbook readings.docx
 
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docx
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docxTopic 5 Conformity, Deviance, Crime, and StratificationObject.docx
Topic 5 Conformity, Deviance, Crime, and StratificationObject.docx
 
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docx
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docxTopic 2 The Disaster Management CycleReadRichards, E. A., Nova.docx
Topic 2 The Disaster Management CycleReadRichards, E. A., Nova.docx
 
Topic 1Identify an experience where you had to assess the needs, i.docx
Topic 1Identify an experience where you had to assess the needs, i.docxTopic 1Identify an experience where you had to assess the needs, i.docx
Topic 1Identify an experience where you had to assess the needs, i.docx
 
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docx
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docxTopic 1 Vicarious Liability StatutesWhat is vicarious liability .docx
Topic 1 Vicarious Liability StatutesWhat is vicarious liability .docx
 
Topic 2 DQ 1Qualitative research is a research method which .docx
Topic 2 DQ 1Qualitative research is a research method which .docxTopic 2 DQ 1Qualitative research is a research method which .docx
Topic 2 DQ 1Qualitative research is a research method which .docx
 
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docx
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docxTopic 2 DQ 1Child abuse is any Act of Violence either physic.docx
Topic 2 DQ 1Child abuse is any Act of Violence either physic.docx
 
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docx
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docxTOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docx
TOPIC 1 Active Listening Skill Set; Assessing your Listening Skills.docx
 
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docx
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docxTopic 1 How Does Transcultural Nursing Theory FitTranscultural n.docx
Topic 1 How Does Transcultural Nursing Theory FitTranscultural n.docx
 
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docx
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docxTopic 1 FTC’s Green Guide Research the policy behind the FTC’.docx
Topic 1 FTC’s Green Guide Research the policy behind the FTC’.docx
 
Topic 1Personal Philosophy From your readings about teaching and .docx
Topic 1Personal Philosophy From your readings about teaching and .docxTopic 1Personal Philosophy From your readings about teaching and .docx
Topic 1Personal Philosophy From your readings about teaching and .docx
 
Topic 1 Based upon an A through F scale, what grade would you give .docx
Topic 1 Based upon an A through F scale, what grade would you give .docxTopic 1 Based upon an A through F scale, what grade would you give .docx
Topic 1 Based upon an A through F scale, what grade would you give .docx
 
Topic #1Define transnational crime. How has it changed in the la.docx
Topic #1Define transnational crime. How has it changed in the la.docxTopic #1Define transnational crime. How has it changed in the la.docx
Topic #1Define transnational crime. How has it changed in the la.docx
 
Tony’s Chips has recently been sold to a new independent company. .docx
Tony’s Chips has recently been sold to a new independent company. .docxTony’s Chips has recently been sold to a new independent company. .docx
Tony’s Chips has recently been sold to a new independent company. .docx
 
Topic #1 Discuss several technologies that have changed the face o.docx
Topic #1 Discuss several technologies that have changed the face o.docxTopic #1 Discuss several technologies that have changed the face o.docx
Topic #1 Discuss several technologies that have changed the face o.docx
 
Tony’s Chips has recently been sold to a new independent company. Th.docx
Tony’s Chips has recently been sold to a new independent company. Th.docxTony’s Chips has recently been sold to a new independent company. Th.docx
Tony’s Chips has recently been sold to a new independent company. Th.docx
 
Today, many infants and toddlers are learning two languages. Wha.docx
Today, many infants and toddlers are learning two languages. Wha.docxToday, many infants and toddlers are learning two languages. Wha.docx
Today, many infants and toddlers are learning two languages. Wha.docx
 

Recently uploaded

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 

Recently uploaded (20)

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 

Build search engine with GUI for AND, OR, PHRASE searches

  • 1. Must be similar to screenshots I must be able to run the projects on Eclipse so that I can upload the codes to my Github account The projects must say that they were created by Juliet Mercado Zachary Willis Ihor Panchenko Craig Anderson Building a Search Engine, Part I: Governance, Workflow, and UI (This is the first project in this series) You are going to design, build, and test a scaled-down version of “Google Search”. Rather than searching the Internet's files, you will only search local files added to your search engine's index. Your search engine will allow an administrator to add, update, and remove files from the index. Users will be able to enter search terms, and select between Boolean AND, OR, or PHRASE search. The matching file names (if any) are then displayed in a list. You also need to design the system architecture (the high-level design), so you can plan each part. Search Engine Project Proposal:
  • 2. Build a search engine with simple GUI, that can do AND, OR, and PHRASE Boolean searches on a small set of text files. The user should be able to say the type of search to do, and enter some search terms. The results should be a list of file pathnames that match the search. This should be a stand-alone application User Interfaces In addition to the main user interface (for doing searching), you will need a separate administrator or maintenance interface to manage your application. It should be easy to add and remove files (from the set of indexed files), and to regenerate the index anytime. When starting, your application should check if any of the files have been changed or deleted since the application last saved the index. If so, the administrator should be able to have the index updated with the modified file(s). Note that with HTML, Word, or other types of documents, you would need to extract a plain text version before indexing. That isn't hard, but the search engine is complex enough already. For these projects, limit your search engine to only plain text files (including .txt, .html, and other text files). The index must be stored on disk, so next time your application starts it can reload its data. The index, list of files, and other data, can be stored in one or more file(s) or in a database. The saved data should be read whenever your application starts. The saved data should be updated (or recreated) when you add, update, or remove documents from your set (of indexed documents), or perhaps just when your application exits. If you use files, the file formats are up to you; have a format that is fast and simple to load and store. To keep things as simple as possible, in this project you can
  • 3. assume that only a small set of documents will be indexed, and thus the whole index can be kept in memory at once. (That's probably not the case for Google's data!) All you need to do is be able to read the index data from disk at startup into memory, and write it back either when updating the index, or when your application shuts down. Note, the names (pathnames) of the added files as well as their last modification time must be stored in addition to the index. If using XML file, you can define an XML schema for it and have some tool such as Notepad++ validate your file format for you. XML may have other benefits, but it isn't as simple as using plain text files. JSON might be the easiest format for storing and reading the index data. In any case, don't forget to include the list of file pathnames and other data you decide is needed, along with the index itself. Requirements: In this project, we will follow the model-view-controller design pattern for the project organization. This allows one to develop each part mostly independently from the other parts. Develop Stub User Interfaces: In this part of the project, you must implement a non-functional (that means looks good but doesn't do a thing) graphic user interface for the application. (The “view”.) The main (default) user interface must support searching and displaying results. It should have various other features, such as an “About...” menu or button, a way to quit the application (if a stand-alone application; if your group creates a web application, there is no need to quit), and a way to get to the administrator/maintenance view. The maintenance/administrator view must allow the user to
  • 4. perform various administration operations: view the list of indexed file names, adding files to the index, remove files from the index, and update the index (when files have been modified since they were indexed). The user interface should be complete, but none of the functionality needs to be implemented at this time. You should implement stub methods for the functionality not yet implemented, and invoke them from your event handlers. The stub methods can either return “canned” (fake but realistic) data, or throw an OperationNotSupported exception. The only button that needs to do anything is the one used to switch to the maintenance view. Since the user interfaces don't do anything, there is nothing to test yet. However, you must create a test class with at least one test method (it can just return success if you wish). I suggest you agree to use JUnit 4 style tests for now. Building a Search Engine, Part II: Persistent Data Please read the background information and full project description from Search Engine Project, Part I. In this project, you will implement the persistent data (the “model”) part of the project: the saving of data and the loading of data at the next start. The persistent data contains the list of files used in the index, and the index itself. First discuss which persistence solution you will use: text files, XML or JSON files, or a database (and chose between embedded (my suggestion) or server, and if using a database, chose between the JDBC and JPA database APIs (I suggest JPA). You can make this decision before knowing the details of the data structures used. Before working on actual code, you need to decide on the data
  • 5. structures to be used for the file list and the inverted index. Try to read the Java collections material before deciding. It should be easy to add and remove files (from the set of indexed files). When starting, your application should check if any of the files used have been changed or deleted since the application last saved the index. If so, the “admin” user should be able to have the inverted index file(s) updated, from the maintenance interface. (Note that with HTML or Word documents, you would need to extract a plain text version before indexing.) In this project, all the “indexible” files are plain text. You are free to assume the system-default text file encoding, or assume UTF-8 encoding, for all files. The inverted index can be stored in one or more file(s), and that should be read whenever your application starts. The file(s) should be updated (or recreated) when you add, update, or remove documents from your set (of indexed documents). The file format is up to you, but should have a format that is fast and simple to search. However, to keep things simpler, in this project you can assume that only a small set of documents will be indexed, and thus the whole index can be kept in memory. All you need to do is be able to read the index data from a file at startup into memory, and write it back when updating the index. Don't forget the names (pathnames) of the files as well as their last modification time must be stored as well. It is your choice to use a single file or multiple files, in plain text, JSON, XML, or any format your group chooses, to hold the persistent data. If you want, you can use any DBMS. (In that case, I suggest using the JavaDB included with the JDK, as an embedded database.) In any case, your file format(s) or database schema must be documented completely, so that someone else, without access to your source code could use your file(s) or database correctly.
  • 6. If using XML format, you can define an XML schema for your file and have some tool such as Notepad++ validate your file format for you. XML may have other benefits, but it isn't as simple as plain text files or even JSON files. In any case, don't forget to include the list of file (path) names, along with the index itself, in your persistent data store. Part II Requirements: In this part, you must implement the file operations of your search engine application (the model). That includes reading and updating your persistent data (that is, the inverted index as well as any other information you need to store between runs of your application, such as the list of files (their pathnames) that have been indexed). The main file operations are reading each file to be indexed a “word” at a time; you also need to checking if the previously indexed files still exist or have been modified since last indexed. The maintenance part of the user interface should allow users to select files for indexing, and to keep track of which files have been added to the index. For each file, you need to keep the full pathname of the file as well as the file's last modification time. Your code should correctly handle the user entering in non- existent files and unreadable files. How you handle such errors is up to you You can download a Search Engine model solution, to play with it and inspect its user interface. My solution keeps all persistent data in a single text file in the user's home directory, but you can certainly use a different persistence solution. Possible Data Structures you can use. In part III, you will implement the index operations, including Boolean searching, adding to the index, and removing files from the index. (The
  • 7. index is a complex collection of collections.) Because the format of the index and file list will affect the code used to read and write them to and from storage, you must decide on the in- memory data structures to be used early. In the model solution, I used a List of FileItem objects for the list of indexed files; each FileItem contained a file's pathname and date it was read for the index. The index data itself is stored in a Map, with the using the indexed words as keys, and a Set of IndexData objects as the values. Each IndexData object holds the id of the file containing the word and the position of the word in that document. (The classes FileItem and IndexData were trivial to write.) This is NOT the only, or the best, way to represent the index or file list! (For example, a List of int[2] arrays might be simpler than a Set of IndexData objects.) Your should decide on the types of collections used. Only then can you implement the methods to read and write the data. Building a Search Engine, Part III: Collections Please read the background information and full project description from Search Engine Project, Part I. In this final part of the project, you will complete the application by implementing the index functions. These include adding a file to the index, and removing a file from the index, and reading and writing the index from/to a file. (Updating the index when a file has been changed, can then be done by removing and then re-adding a file.) Other operations include searching the index for a given word, and returning a Set of pairs (document ID and position) for that word. Finally, you will have to implement the Boolean search
  • 8. functions of the main user interface. (This is complex enough, that it should have been another project!) I suggest you start with an “OR” search, then worry about implementing the “AND” and “PHRASE” search functions. When building the index, keep in mind you will need to define what you mean by “word”. One possibility is to strip out any non-digits or letters, and convert the result to all lowercase, both when you build the inverted index and when you read the search terms entered by the user. Ideally, you can use the I18N methods discussed in class to normalize the words. Implementing Boolean Search: The exact method depends in part on how you implement the inverted index. In the suggested implementation (a Map with words as the keys, and a List or Set of (document ID, position) pairs as the values), you could implement the Boolean searches using algorithms similar to the following (you can come up with your own if you wish): OR Search This is the easiest one to implement. The general idea is to start with an empty Set of matching files. Then add to that Set, the files containing each search term; Just search the Map for that word, and add each document found (if any). The result is the OR search results, the files that contain any word in the search list. (If user inputs no search words, say “ ,.”, then no files are considered as matching.) AND Search This is done the opposite way from an OR search, and is only a little harder to implement. The idea is to start with a set of all files in the index. Then for each search term, for each file in the
  • 9. Set, make sure that file is contained in the index for that search term. Remove any files from the set that don't contain that word. The resulting final set is the documents matching all search terms. (If user inputs no search words, say “ ,.”, then all files are considered as matching. If that isn't the behavior you want, you need to treat that as a special case.) PHRASE Search This is the hardest search to implement. Unlike the OR and the AND searches, with PHRASE searching, the position of the search terms in the files matters. The algorithm I came up with is: Create an initially empty Set of Pair objects. Add to the set the Pair objects for the files that contain the first word of the phrase. This is the easy part: Just lookup that word in the Map, and add all Pair objects found to a set. The Set now contains Pair objects for just the files that might contain the phrase. Next, loop over the remaining words of the phrase, removing any Pairs from the set that are no longer possible phrase continuations. (Actually, I just build a new Set.) For each remaining word in the phrase: Create a new, empty set of Pairs. For each Pair in the previous set, see if the word appears in the same file, but in the next position. If so, add the Pair object for the word to the new set. An example may help clarify this. Suppose the search phrase is “big top now”. The set initially contains all the Pair objects for the word “big”. Let's say for example, that set looks like:
  • 10. (file1,position7), (file1,position22), (file3,position4) For each Pair object in that set, you need to see if “top” is in that same file, but the next position. If so, you add the Pair object for that to the new Set. The (inner) loop for this example checks each of the following: Is a (file1,position8) Pair object in the Map for the word "top"? Is a (file1,position23) Pair object in the Map for the word "top"? Is a (file3,position5) Pair object in the Map for the word "top"? If the answer is “yes”, then add that Pair object to the new set. When this loop ends, the new set will contain the Pair objects for the phrase “big top” (pointing to the position of the word “top”). For example, suppose “top” is only found in (file1,position8) and (file3,position5). You replace the first set with this new set: (file1,position8), (file3,position5) Repeat for the next word in the phrase, using the set built in the previous loop. Continue until the set is empty (so phrase not found), or until the last word of the phrase has been processed. The Pair objects remaining in the final set are the ones that contain the phrase; the position will be that of the last word of the phrase. (We only need to display the file name; in this project, the position of the phrase doesn't matter.) Part III Requirements:
  • 11. This project has been split into three parts. Each part counts as a separate project. In the first two parts, you designed and implemented a graphic user interface for the application, and added all required file operations. In this part, you must implement the remaining operations of your search engine application: the index operations, and the searching. You can download a Search Engine model solution, to play with it and inspect its user interface, but please keep in mind you should not copy that user interface; instead, invent a better, nicer-looking one. Hints: Keep your code as simple as possible The inverted index is naturally a Map, from words (the keys) to a Set of objects (the values). Each of the objects represent a document and a location within that document, where the word was found. I called these objects Pairs, since they are a pair of numbers, but you can use any name for your classes. Note, you will need to be able to go from a document number to a file name, when you display the search results.