SlideShare a Scribd company logo
P1WU
UNIT – I: INTRODUCTION
TOPIC -1 : INFORMATION RETRIEVAL
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the
IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
INTRODUCTION TO INFORMATION RETRIEVAL
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
What is IR?
• Information retrieval (IR) is finding material . . .
• of an unstructured nature . . .
• that satisfies an information need from within large collections .
. . . (usually stored on computers).
• Unstructured data means that
• a formal, semantically overt, easy-for-computer structure is missing.
• In contrast to the rigidly structured data used in DB style searching
(e.g. product inventories, personnel records)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
What is IR?
• The process of actively seeking out information relevant
to a topic of interest (van Rijsbergen)
• Typically it refers to the automatic (rather thanmanual)
retrieval of documents
• Information Retrieval System (IRS)
• “Document” is the generic term for an information
• holder (book, chapter, article, webpage, etc)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
What is IR?
• Information retrieval is
the science of searching for information
a) in a document,
b) searching for documents themselves, and
c) also searching for the metadata that describes data, and
d) for databases of texts, images or sounds.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is IR?
• Information Retrieval (IR) can be defined as
• a software program that deals with the organization, storage,
retrieval, and evaluation of information from document
repositories, particularly textual information.
• Information Retrieval is the activity of obtaining material
that can usually be documented on an unstructured nature
• i.e. usually text which satisfies an information need from
within large collections which is stored on computers.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is IR?
• IR helps users
• find information that matches their information needs expressed as queries.
• Historically, IR is about document retrieval, emphasizing
document as the basic unit. – Finding documents relevant to
user queries
• Technically, IR studies the acquisition, organization, storage,
retrieval, and distribution of information.
• For example, Information Retrieval can be when a user
enters a query into the system.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is IR?
• Information retrieval (IR) is a broad area of
Computer Science focused
• primarily on providing the users with easy access
to information of their interest, as follows.
• Information retrieval deals with
• the representation,
• storage,
• organization of, and
• access to information items such as documents,
Web pages, online catalogs, structured and semi-
structured records, multimedia objects.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is IR?
• Information retrieval (IR)
in computing and information science is the process of
obtaining information system resources that are
relevant to an information need from a collection of
those resources. Searches can be based on full-text or
other content-based indexing.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
What is IR?
• Information retrieval (IR) Quality :
• Are the retrieved documents about the target subject up-
to-date?
• from a trusted source?
• satisfying the user’s needs?
• How should we rank documents in terms of these factors?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What are IR Types?
• Information retrieval (IR) are of two types:
1. Precision and
2. recall
Above are the two parameters of retrieval effectiveness.
1. Precision refers to how many of the retrieved documents are
relevant to the user.
2. Recall refers to what fraction of relevant documents in the collection
are retrieved.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What are the types of information retrieval?
• Methods/Techniques in which information retrieval
techniques are employed include:
• Adversarial information retrieval.
• Automatic summarization. Multi-document summarization.
• Compound term processing.
• Cross-lingual retrieval.
• Document classification.
• Spam filtering.
• Question answering.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Why we do IR?
• Information retrieval can provide organizations with
immediate value
• --while it's important to try to figure out ways to capture
tacit knowledge,
• information retrieval provides a means to get at
information that already exists in electronic formats
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Why we do IR?
• Information retrieval can provide organizations with immediate value--while it's important to try to
figure out ways to capture tacit knowledge, information retrieval provides a means to get at
information that already exists in electronic formats.
• The representation and organization of the information items should be such as to
• provide the users with easy access to information of their interest.
• Nowadays, research in IR includes
• modeling, Web search, text classification, systems architecture, user interfaces, data visualization, filtering,
languages.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is the need for information retrieval?
•Information retrieval can provide :
•organizations with immediate value
--while it's important to try to figure out ways to capture
tacit knowledge, information retrieval provides
a means to get at information that already
exists in electronic formats.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is the need for information retrieval in WEB?
• Web: – A huge, widely-distributed, highly heterogeneous,
semistructured,, interconnected, evolving,
hypertext/hypermedia information repository
• Main issues – Abundance of information
• The 99% of all the information are not interesting for the 99% of
all users – The static Web is a very small part of all the Web.
• Dynamic Website – To access the Web user need to exploit Search
Engines (SE)
• SE must be improved
• To help people to better formulate their information needs
• More personalization is needed
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB IR : What is Web IR?
• Web IR can be defined as
• the application of theories and methodologies from IR to the World Wide Web.
Web Information Retrieval models are ways of integrating many sources of
evidence about documents,
1. the links,
2. the structure of the document,
3. the actual content of the document,
4. the quality of the document, etc.
so that an effective Web search engine can be achieved.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT – I : INTRODUCTION
Issues in IR
•The main issues of the
Information Retrieval
(IR) are :
1. Document and Query
Indexing
2. Query Evaluation
3. System Evaluation.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Issues in IR : Document and Query Indexing
• Document and Query Indexing
Main goal of Document and
Query Indexing is to find
important meanings and
creating an internal
representation.
• The factors to be considered
are accuracy to represent
semantics, exhaustiveness, and
facility for a computer to
manipulate.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Issues in IR : Query Evaluation
• Query Evaluation –
In the retrieval model how can a document be
represented with the selected keywords and
how are documents and query representations
compared to calculate a score.
• Information Retrieval (IR) deals with issues like
uncertainty and vagueness in information
systems.
• Uncertainty :
The available representation does not typically reflect
true semantics of objects such as images, videos etc.
• Vagueness :
The information that the user requires lacks clarity, is
only vaguely expressed in a query, feedback or user
action.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Issues in IR : System Evaluation
• System Evaluation –
System Evaluation tells about the
importance of determining the impact of
information given on user achievement.
Here, we see if the efficiency of the
particular system related to time and
space.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR in practice
• Not only librarians, professional searchers, etc engage
themselves in the activity of information retrieval
• but nowadays hundreds of millions of people engage in IR every day when
they use web search engines.
• Information Retrieval is believed to be the dominant form of
Information access
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR in practice
• Information Retrieval is a research-driven theoretical and
experimental discipline
• The focus is on different aspects of the information–
seeking process, depending on the researcher’s
background or interest:
• Computer scientist – fast and accurate search engine
• Librarian – organization and indexing of information
• Cognitive scientist – the process in the searcher’s mind
• Philosopher – Is this really relevant ?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR in practice
• Progress influenced by advances in Computational
Linguistics, Information Visualization, Cognitive
Psychology, HCI, …
• Experimental vs. operational systems
• Analogy to wcawrmwa.nsutfuadcteurnintgsf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR Process
• An information retrieval process begins when a user enters a
query into the system.
• Queries are formal statements of information needs, for
example search strings in web search engines. In information
retrieval a query does not uniquely identify a single object in
the collection.
• Instead, several objects may match the query, perhaps with
different degrees of relevancy.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Fundamental concepts in IR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Fundamental concepts in IR
• What is information ?
• Meaning vs. form
• Data vs. Information Retrieval
• Relevance
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
The stages of IR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
FORMULATION OF IR PROCESS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR system
• The IR system assists the users in finding the information they
require
• but it does not explicitly return the answers to the question.
• It notifies regarding the existence and location of documents
that might consist of the required information.
• Information retrieval also extends support to users in browsing
or filtering document collection or processing a set of retrieved
documents.
• The system searches over billions of documents stored on
millions of computers.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR system
• An IR system has the ability to represent, store, organize, and access
information items.
• A set of keywords are required to search. Keywords are what people
are searching for in search engines.
• These keywords summarize the description of the information.
• A spam filter, manual or automatic means are provided by Email
program for classifying the mails so that it can be placed directly into
particular folders.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
TOPIC – 2: EARLY DEVELOPMENTS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the
IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• For more than 5, 000 years, man has organized information f or later
retrieval and searching.
• In its most usual form, this has been done by
• compiling,
• storing,
• organizing, and
• indexing clay tablets,
• hieroglyphics,
• papyrus rolls, and
• books.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• IR in the 17th
century: Samuel
Pepys, the famous
English diarist,
subject-indexed his
treasured 1000+
books library with
key words.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• IR in the 17th
century: Samuel
Pepys, the famous
English diarist,
subject-indexed his
treasured 1000+
books library with
key words.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• Document Collection: text units we have built an IR system
over.
• Usually documents
• But could be
• memos
• book chapters paragraphs scenes of a movie
• turns in a conversation...
• Lots of them
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• For holding the various items,
• special purpose buildings called libraries,
• from the Latin word liber for book, or bibliothekes,
• from the Greek word biblion for papyrus roll, are used.
• The oldest known library was created in Elba,
• in the “Fertile Crescent”,
• currently northern Syria,
• some time between 3,000 and 2,500 BC.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• In the seventh century BC,
• Assyrian king Ashurbanipal created the library of Nineveh, on the
Tigris River (today, north of Iraq),
• which contained more than 30,000 clay tablets at the time of its destruction in
612 BC.
• By 300 BC, Ptolemy Soter, a Macedonian general, created the Great
Library in Alexandria – the Egyptian city at the mouth of the Nile
named after
• the Macedonian king Alexander the Great (356-323 BC).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• For seven centuries the Great Library, jointly with other major libraries
in the city,
• made Alexandria the intellectual capital of the Western world.
• Since then, libraries have expanded and flourished. Nowadays, they
are everywhere.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• . They constitute the collective memory of the human race and their
popularity is in the rising.
• In 2008 alone, people in the US visited their libraries some 1.3 billion
times and checked out more than 2 billion items
• an increase in both yearly figures of more than 10 percent.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• Since the volume of information in libraries is always growing,
• it is necessary to build specialized data structures for fast search – the indexes.
In one form or another,
• indexes are at the core of every modern information retrieval system.
• They provide fast access to the data and allow speeding up query
processing.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• For centuries indexes have been created manually as sets of categories.
• Each category in the index is typically composed of labels that identify its
associated topics and of pointers to the documents that discuss those
topics.
• While these indexes are usually designed by library and information science
researchers,
• the advent of modern computers has allowed the construction of large indexes
automatically,
• which has accelerated the development of the area of Information Retrieval (IR).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• Early developments in IR date back to research efforts conducted in the 50’s
by pioneers such as
• Hans Peter Luhn,
• Eugene Garfield,
• Philip Bagley, and
• Cal vi n Moores,
• this last one having allegedly coined the term information retrieval.
• In 1955, Allen Kent and colleagues published a paper describing the
precision and recall metrics,
• which was followed by the publication in 1962 of the Cranfield studies by Cyril
Cleverdon.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• In 1963, Joseph Becker and Robert Hayes published the first book on
information retrieval [164].
• Throughout the 60’s, Gerard Salton and Karen Sparck Jones, among
others, shaped the field
• by developing the fundamental concepts that led to the modern technologies
of ranking in IR.
• In1968,thefirstIR book authored by Salton was published.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• . In 1971, N.Jardine and C.J.VanRijsbergen articulated the “cluster
hypothesis”.
• In 1978, the first ACM Conference on IR (ACM SIGIR) was held in
Rochester, New York.
• In 1979, C.J. Van Rijsbergen published Information Retrieval, which
focused on probabilistic models.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• In 1983, Salton and McGill published Introduction to Modern
Information Retrieval, a classic book on IR focused on vector models.
• Since then,
• the IR community has grown to include
• thousands of professors,
• researchers,
• students,
• engineers, and
• practitioners
• throughout the world.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
EARLY DEVELOPMENTS
• The main conference in the area,
• the ACM International Conference on Information Retrieval (ACM SIGIR),
• now attracts hundreds of attendees and receives hundreds of submitted
papers on an yearly basis.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
MODERN IR DEVELOPMENTS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
MODERN IR DEVELOPMENTS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
MODERN IR DEVELOPMENTS
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 3: THE IR PROBELM
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the
IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO INFORMATION RETRIEVAL (IR) PROBELM
What is IR PROBLEM?
Users of modern IR systems, such as search engine users, have
information needs of varying complexity.
In the simplest case, they are looking for the link to the homepage of a
company, government, or institution.
In the more sophisticated cases, they are looking for information
required to execute tasks associated with their jobs or immediate
needs.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM EXAMPLE
An example of a more complex information need is as
follows:
• Find all documents that address the role of the Federal
Government in financing the operation of the National Railroad
Transportation Corporation (AMTRAK).
• This full description of the user need does not necessarily provide
• the best formulation for querying the IR system.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM EXAMPLE
• Instead, the user might want to first translate this
information need into
• a query, or sequence of queries, to be posed to the
system.
• In its most common form, this translation yields
• a set of keywords, or index terms, which summarize the
user information need.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM EXAMPLE
• Given the user query,
• the key goal of the IR system is to retrieve information that is useful or relevant to the user.
• The emphasis is on the retrieval of information as opposed to the
retrieval of data.
• To be effective in its attempt to satisfy the user information need,
• the IR system must somehow ‘interpret’ the contents of the information items.
• That is, the documents in a collection, and rank them according to a
degree of relevance to the user query.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM
• Given the user query,
• the key goal of the IR system is to retrieve information that is useful or relevant to the user.
• The emphasis is on the retrieval of information as opposed to the
retrieval of data.
• To be effective in its attempt to satisfy the user information need,
• the IR system must somehow ‘interpret’ the contents of the information items.
• That is, the documents in a collection, and rank them according to a
degree of relevance to the user query.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM
• This ‘interpretation’ of a document content involves
extracting syntactic and semantic information from the
document text and using this information to match the user
information need.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM
• The IR Problem: the primary goal of an IR system is to
• retrieve all the documents that are relevant to a user query while retrieving as
few non relevant documents as possible.
• The difficulty is knowing not only how to extract information
from the documents
• but also knowing how to use it to decide relevance.
• That is, the notion of relevance is of central importance in IR.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IR PROBLEM
• One main issue is that relevance is a personal assessment that depends on the
task being solved and its context.
• For example:
• Relevance can change
• with time (e.g., new information becomes available),
• with location (e.g., the most relevant answer is the closest one), or
• even with the device (e.g., the best answer is a short document that is easier to download
and visualize).
• In this sense, no IR system can provide perfect answers to all users all the time.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 4:THE USERS TASK
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the
IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
AN OVERVIEW OF USER TASK : What is USER TASK IN IRS?
• The user of a retrieval system has to :
• translate his information need into a query in the language provided by the system.
• With an information retrieval system, this normally implies
• specifying a set of words which convey the semantics of the information need.
• With a data retrieval system,
• a query expression (such as, for instance, a regular expression) is used to
• convey the constraints that must be satisfied by objects in the answer set.
• In both cases, we say that the user searches for useful information executing
a retrieval task.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK : What is USER TASK IN IRS?
• Users of modern IR systems,
• such as search engine users, have information needs of varying complexity.
• The user of a retrieval system has to translate their information need into a query in
the language provided by the system.
• With an IR system, such as a search engine,
• this usually implies specifying a set of words that convey the semantics of the
information need.
• We say that the user is searching or querying for information of their interest.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK EXAMPLE
To illustrate,
• The user might be interested in documents about
• car racing in general and might decide to glance related documents about Formula 1 racing,
• Formula Indy, and the ‘24 Hours of Le Mans.
• We say that
• the user is browsing or navigating the documents in the collection, not searching.
• It is still a process of retrieving information,
• but one whose main objectives are less clearly defined in the beginning.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• While searching for information of interest is the main
retrieval task on the Web,
• search can also be used for satisfying other user needs distinct from
information access,
• such as the buying of goods and the placing of reservations.
• Consider now a user
• who has an interest that is either poorly defined or inherently
broad,
• such that the query to specify is unclear.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• The task in this case is more related to
• exploratory search and resembles a process of quasi-sequential search for
information of interest.
• Here we, make a clear distinction between the different tasks
the user of the retrieval system might be engaged in.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• The task might be then of two distinct types: searching and
browsing, as illustrated in Figure:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• In a process of retrieving
information, one whose
main objectives are not
clearly defined in the
beginning and whose
purpose might change
during the interaction with
the system.
• Then, user task may go
with Browsing only.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• User Choice of
Information
Retrieval:
• Push
• Pull
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
IRS – USER TASK
• Both retrieval and browsing are, in the language of the World Wide
Web, `pulling' actions.
• That is, the user requests the information in an interactive manner.
• An alternative is to do retrieval in an automatic and permanent
fashion using software agents which push the information towards
the user.
• For instance, information useful to a user could be extracted
periodically from a news service.
• In this case, we say that the IR system is executing a particular retrieval task which
consists of filtering relevant information for later inspection by the user.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 5: THE INFORMATION VERSUS
RETRIEVAL
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus
Data Retrieval
6. The IR System
7. The Software Architecture of the IR System
8. The Retrieval and Ranking Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO INFORMATION
What is Information?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO RETRIEVAL
What is Retrieval?
• This does not mean that there is
no structure in the data
Document structure (headings,
paragraphs, lists. . . )
• Explicit markup formatting (e.g.
in HTML, XML. . . ) Linguistic
structure (latent, hidden)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
SELECT *
FROM business
catalogue WHERE
category =
’florist’ AND city
zip = ’cb1’
INTRODUCTION TO RETRIEVAL
What is Information retrieval (IR) ?
Information retrieval (IR) is finding material (usually
documents) of an unstructured nature (usually text)
that satisfies an information need from within large
collections (usually stored on computers).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO RETRIEVAL
• An information need is
• the topic about which the user desires to know
more about.
• A query is
• what the user conveys to the computer in an
attempt to communicate the information need.
• A document is relevant
• if the user perceives that it contains information
of value with respect to their personal
information need.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Known-item search.
Precise information seeking
search Open-ended search
(“topical search”)
THE INFORMATION VERSUS RETRIEVAL
• Data retrieval, in the context of an IR system, consists
• mainly of determining which documents of a collection contain
• the keywords in the user query which, most frequently, is not enough to satisfy
the user information need.
• In fact, the user of an IR system is concerned more with
• retrieving information about a subject than with retrieving data that satisfies a
given query.
• For instance, a user of an IR system is willing to accept
• documents that contain synonyms of the query terms in the result set,
• even when those documents do not contain any query terms.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INFORMATION RETRIEVAL
• “Ad hoc” retrieval web retrieval
• Support for browsing and filtering document collections:
• Clustering
• Classification; using fixed labels (common information needs, age groups,
topics; )
• Further processing a set of retrieved documents,
• e.g., by using natural language processing
• Information extraction Summarization Question answering.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
RETRIEVAL TYPES
1) Web search ( )
• Search ground are billions of documents on millions of computers
• issues: spidering; efficient indexing and search; malicious manipulation to boost
search engine rankings
2) Enterprise and institutional search ( )
• e.g company’s documentation, patents, research articles often domain-specific
• Centralised storage; dedicated machines for search.
• Most prevalent IR evaluation scenario: US intelligence analyst’s searches
3) Personal information retrieval (email, pers. documents; )
• e.g., Mac OS X Spotlight; Windows’ Instant Search
• Issues: different file types; maintenance-free, lightweight to run in background
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE INFORMATION VERSUS RETRIEVAL
• In an IR system the retrieved objects might be
• inaccurate and small errors are likely to go unnoticed.
• In a data retrieval system, on the contrary,
• a single erroneous object among a retrieval system,
• such as defined structure and semantics thousand retrieved
objects means total failure.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE INFORMATION VERSUS RETRIEVAL
• While A data a relational database, deals with data that has a well
defined structure and semantics.
• while an IR system deals with natural language text which is not
well structured.
• Data retrieval, while providing a solution to the user of a database
system, does not solve
• the problem of retrieving information about a subject or topic.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE INFORMATION VERSUS RETRIEVAL
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
S
n
Data Information
1
Unorganized raw facts that need processing without
which it is seemingly random and useless to humans
Information is a processed, organized data presented in a given
context and is useful to humans.
2
Data is an individual unit that contains raw material
which does not carry any specific meaning.
Information is a group of data thatcollectively carry a logical
meaning
3
.
Data Doesn't depended on information. Information depends on data.
4 It is measured in bits and bytes. Information is measured in meaningfulunits like time, quantity, etc.
5
Data is never suited to thespecific needs of a designer. Information is specific to the expectations and requirements
because all the irrelevant facts and figures are removed, during the
transformation process.
6
An example of Data is a
Student’score.
The average score of a class is the information derived from the
given data.
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 6: THE IR SYSTEM
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the
IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
The role of an IR system
A modern view:
• Support the user in
– exploring a problem domain, understanding its
terminology, concepts and structure
– clarifying, refining and formulating an information need
– finding documents that match the info need description
• As many relevant docs as possible
• As few non-relevant documents as possible
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
How does it do this ?
• User interfaces and visualization tools for
– exploring a collection of documents
– exploring search results
• Query expansion based on
– Thesauri
– Lexical/statistic analysis of text / context and concept formation
– Relevance feedback
• Indexing and matching model
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
How well does it do this?
• Evaluation
– Of the components
• Indexing / matching algorithms
– Of the exploratory process overall
• Usability issues
• Usefulness to task
• User satisfaction
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What do we want from an IRS ?
• Systemic approach
– Goal (for a known information need):
• Return as many relevant documents as possible and as few
non-relevant documents as possible
• Cognitive approach
– Goal (in an interactive information-seeking environment,
with a given IRS):
• Support the user’s exploration of the problem domain and the task completion.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
OIRS is the techniques of storing and recovering and
often disseminating recorded data especially through
the use of a computerized system”
(Merriam Webster Dictionary)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
QUALITIES OF IRS
• The effectiveness of an IR system (i.e., the quality of its search results) is
determined by two key statistics about the system’s returned results for a query:
1. Precision: What fraction of the returned results are relevant to the
information need?
2. Recall: What fraction of the relevant documents in the collection were
returned by the system?
• Queries to be addressed:
• What is the best balance between the two?
• Easy to get perfect recall: just retrieve everything
• Easy to get good precision: retrieve only the most relevant
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM
• To describe the IR system, we use a simple and generic software architecture as shown in Figure
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
•The first step in setting up an IR system is to
assemble the document collection,
• which can be private or be crawled from the Web. In the
second case a crawler module is responsible for
collecting the documents.
•The document collection is stored in disk storage
usually referred to as the central repository.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
•The documents in the central repository need to be
indexed for fast retrieval and ranking.
•The most used index structure is an inverted index
composed of all the distinct words of the collection
and, for each word,
• a list of the documents that contain it.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
•Given that the document collection is indexed, the
retrieval process can be initiated.
•It consists of retrieving documents that satisfy either
a user query or a click in a hyper link.
• In the first case, we say that the user is searching for
information of interest;
• in the second case, we say that the user is browsing for
information of interest.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• Use retrieval as PROCESS it applies to the searching process
requires browsing and how it compares to searching.
• To search, the user first specifies a query that reflects their
information need.
• Next, the user query is parsed and expanded with, for
instance, spelling variants of a query word.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• The expanded query,
• which we refer to as the system query, is then processed against the index to
retrieve a subset of all documents.
• Following, the retrieved documents are ranked and the top
documents are returned to the user.
• The purpose of ranking is to identify the documents that are
most likely to be considered relevant by the user, and
constitutes the most critical part of the IR system.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• Given the inherent subjectivity in deciding relevance,
• evaluating the quality of the answer set is a key step for improving the IR
system.
• A systematic evaluation process allows
• fine tuning the ranking algorithm and improving the quality of the results.
• The most common evaluation procedure consists of
comparing
• the set of results produced by the IR system with results suggested by
human specialists.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 7: THE SOFTWARE ARCHITECTURE OF
THE IR SYSTEM
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of
the IR System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
SYSTEM ARCHITECTURE OF IRS
A high level view of the software architecture of an IR
system will provide:
1) Components
2) Tools
3) Environment
4) Data source(s)
Also additional elements needed for through
understanding of Data flow.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM
• To describe the IR system, we use a simple and generic software architecture as shown in Figure
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM
• To describe the IR system, we use a simple and generic software architecture as shown in Figure
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• The first step in setting up an IR
system is to assemble the
document collection,
• which can be private or be crawled
from the Web. In the second case a
crawler module is responsible for
collecting the documents.
• The document collection is stored
in disk storage usually referred to
as the central repository.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• The documents in the central
repository need to be indexed
for fast retrieval and ranking.
• The most used index structure
is an inverted index composed
of all the distinct words of the
collection and, for each word,
• a list of the documents that
contain it.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
•Given that the document collection is indexed, the
retrieval process can be initiated.
•It consists of retrieving documents that satisfy either
a user query or a click in a hyper link.
• In the first case, we say that the user is searching for
information of interest;
• in the second case, we say that the user is browsing for
information of interest.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• Use retrieval as PROCESS it applies to the searching process
requires browsing and how it compares to searching.
• To search, the user first specifies a query that reflects their
information need.
• Next, the user query is parsed and expanded with, for
instance, spelling variants of a query word.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• The expanded query,
• which we refer to as the system query, is then processed against the index to
retrieve a subset of all documents.
• Following, the retrieved documents are ranked and the top
documents are returned to the user.
• The purpose of ranking is to identify the documents that are
most likely to be considered relevant by the user, and
constitutes the most critical part of the IR system.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• Given the inherent subjectivity in deciding relevance,
• evaluating the quality of the answer set is a key step for improving the IR
system.
• A systematic evaluation process allows
• fine tuning the ranking algorithm and improving the quality of the results.
• The most common evaluation procedure consists of
comparing
• the set of results produced by the IR system with results suggested by
human specialists.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• To improve the ranking,
• we might collect feedback from the users and use this
information to change the results.
• In the Web,
• the most abundant form of user feedback are the clicks on the
documents in the results set.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• Another important source of information for Web ranking are the
hyperlinks among pages,
• which allow identifying sites of high authority.
• There are many other concepts and technologies that bear impact
on
• the design of a full fledged IR system, such as a modern search engine.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 8: THE RETRIEVAL AND RANKING
PROCESSES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Ranked Retrieval
• Ranked retrieval
• The documents are ranked based on their score
• Advantages
– Query easy to specify
– The output is ranked based on the estimated relevance of the
documents to the query
– A wide variety of theoretical models exist
• Disadvantages
– Query less precise (although weighting can be used)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• To describe the retrieval and ranking processes, we further
elaborate on our description of the modules shown in Figure
1.2, as illustrated in Figure 1.3.
• Given the documents of the collection,
1. we first apply text operations to them such as eliminating stop
words, stemming, and
2. selecting a subset of all terms for use as indexing terms.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• To describe the IR system Retrieval and Ranking Processes, we illustrate it through the figure
as given below:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• To describe the IR system Retrieval and Ranking Processes, we illustrate it through the figure
as given below:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• The indexing terms are then used to compose document
representations,
• which might be smaller than the documents themselves (depending on the
subset of index terms selected).
• Given the document representations, it is necessary to build
an index of the text.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• Different index structures might be used,
• but the most popular one is an inverted index.
• The steps required to generate the index compose the indexing process and
must be executed offline,
• before the system is ready to process any queries.
• The resources (time and storage space) spent on the indexing process are amortized by
querying the retrieval system many times.
• Given that the document collection is indexed, the retrieval process can be initiated.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• The user first specifies a query that reflects their information
need.
• This query is then parsed and modified by operations that
resemble those applied to the documents.
• Typical operations at this point consist of spelling corrections
and elimination of terms such as stop words,
• whenever appropriated.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• Next, the transformed query is expanded and modified.
• For instance, the query might be modified using query suggestions made by
the system and confirmed by the user.
• The expanded and modified query is then processed to obtain the
set of retrieved documents,
• which is composed of documents that contain the query terms.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• Fast query processing is made possible by the index structure
previously built.
• The steps required to produce the set of retrieved documents
constitute the retrieval process.
• Next, the retrieved documents are ranked according to a likelihood
of relevance to the user.
• This is a most critical step because the quality of the results,
• as perceived by the users, is fundamentally dependent on the ranking.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE RETRIEVAL AND RANKING PROCESSES
• The top ranked documents are then formatted for presentation to
the user.
• The formatting consists of retrieving the title of the documents and
generating snippets for them,
• i.e., text excerpts that contain the query terms,
• which are then displayed to the user.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS)
• The documents in the central
repository need to be indexed
for fast retrieval and ranking.
• The most used index structure
is an inverted index composed
of all the distinct words of the
collection and, for each word,
• a list of the documents that
contain it.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 9: THE WEB
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB – What is Web?
• The Web, or World Wide Web (W3), is basically a
system of Internet servers that support specially
formatted documents.
• The documents are formatted in a markup
language called HTML (HyperText Markup
Language) that supports links to other
documents, as well as graphics, audio, and video
files.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB – What Does Web Mean?
• The Web is the common name for the World Wide
Web, a subset of the Internet consisting of the pages
that can be accessed by a Web browser.
• Many people assume that the Web is the same as the
Internet, and use these terms interchangeably.
• However, the term Internet actually refers to the global network of
servers that makes the information sharing that happens over the
Web possible.
• So, although the Web does make up a large portion of the Internet,
but they are not one and same.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB – KEY TOPICS
• WEB SITE OR WEB PAGES
• WEB SERVER
• WEB CLIENT
• WEB APPLICATIONS OR WEB APPs
• WEB SOFTWARES – WEB 1.0, WRB 2.0, WEB 3.0
• WEB SERVICES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB - A Brief History
• “As We May Think” influenced people like Douglas
Engelbart who,
• at the Fall Joint Computer Conference in San Francisco in December
of 1968,
• ran a demonstration in which he introduced the first ever computer
mouse, video conferencing, teleconferencing, and hypertext.
• It was so incredible that it became known as “the
mother of all demos” [1690].
• Of the innovations displayed, the one that interests us
the most here is hypertext.
• The term was coined by Ted Nelson in his Project
Xanadu.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB - A Brief History
• Hypertext allows the reader to jump from one electronic document
to another,
• which was one important property regarding the problem Tim Berners-Lee
faced in 1989.
• At the time, Berners-Lee worked in Geneva at the CERN – Conseil
Europeen pourla Recherche Nucleaire.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB - A Brief History
• Researchers who wanted to share documentation with others had to
• reformat their documents to make them compatible with an internal
publishing system.
• It was annoying and generated many questions,
• many of which ended up been directed towards Berners-Lee.
• He understood that a better solution was required.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB - A Brief History
• It just so happened that CERN was the largest Internet node in Europe.
• Berners Lee reasoned that
• it would be nice if the solution to the problem of sharing documents were
decentralized, such that the researchers could share their contributions freely.
• He saw that
• a networked hypertext, through the Internet, would be a good solution
and started working on its implementation
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
WEB - A Brief History
• In 1990,
• he wrote the HTTP protocol, defined the HTML language,
• wrote the first browser, which he called “World Wide Web”, and the first
Web server.
• In 1991,
• he made his browser and server software available in the Internet. The Web
was born.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Web Information Retrieval : What is web in information retrieval?
•Web Information retrieval is
• the process of searching within
a huge World Wide Web
document collection for a
particular information
need (called a query).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Web Information Retrieval : What is web in information retrieval?
• The Web can be considered as
• a large-scale document collection, for which classical text retrieval techniques can be
applied.
• However, its unique features and structure offer new sources of evidence that can be
used to enhance the effectiveness of Information Retrieval (IR) systems.
• Generally, Web IR examines
• the combination of evidence from both the textual content of documents and
• the structure of the Web,
• as well as the search behavior of users and issues related to the evaluation of retrieval
effectiveness in the Web setting.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Web Information Retrieval
• Web documents / data
• No traditional collection
– Huge
• Time and space to crawl index
• IRSs cannot store copies of documents
– Dynamic, volatile, anarchic, un-
controlled
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Web Information Retrieval
– Homogeneous sub-collections
• Structure
– In documents (un-/semi-/fully-structured)
– Between docs: network of inter-connected nodes
– Hyper-links - conceptual vs. physical documents
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Web Information Retrieval
• Web Information Retrieval models are
• ways of integrating many sources of evidence about documents, such
as
1. the links,
2. the structure of the document, the actual content of the document,
3. the quality of the document, etc.
4. so that an effective Web search engine can be achieved.
• In contrast with the traditional library-type settings of IR systems,
• the Web is a hostile environment, where Web search engines have to deal with
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 10: THE E-PUBLISHING ERA
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is E-Publishing?
• Electronic publishing (also referred to as publishing, digital publishing,
or online publishing) includes
• the digital publication of
• e-books,
• digital magazines, and
• the development of digital libraries and catalogues.
• It also includes
• the editing of
• books,
• journals and
• Magazines
to be posted on a screen (computer, e-reader, tablet, or Smartphone).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Need of E-Publishing
• Since its inception,
• the Web became a huge success.
• The number of Web pages now far exceeds
• 20 billion and the number of Web users in the world exceeds 1.7 billion.
• It is known that there are
• more than one trillion distinct URLs on the Web,
• even if many of them are pointers to dynamic pages, not static HTML pages.
• A viable model of economic sustainability based on online advertising was
developed.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE E-PUBLISHING ERA
• Electronic publishing has become common in scientific publishing
• where it has been argued that peer-reviewed scientific journals are in the process of
being replaced by electronic publishing.
• It is also becoming common to distribute
• books, magazines, and newspapers to consumers through tablet reading devices,
• a market that is growing by millions each year,generated by online vendors .
• Apple's iTunes bookstore, Amazon's bookstore for Kindle, and books in the Google Play
Bookstore.
• Market research suggested that half of all magazine and newspaper
circulation is based on E-Publishing.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE E-PUBLISHING ERA
• The advent of the Web changed the world in a way that few people
could have anticipated.
• Yet, one has to wonder on the characteristics of the Web that have made it
so successful.
• Is there a single characteristic of the Web that was most decisive
for its success?
• The simple HTML markup language, the low access costs, the wide spread
reach of the Internet, the interactive browser interface, the search engines.
• While providing the fundamental infrastructure for the Web,
• these technologies were not the root cause of its popularity.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE E-PUBLISHING ERA – It ‘s Time of Birth
• What was it then?
• The fundamental shift in human relationships, introduced by the Web, was freedom to publish.
• Example:
• Jane Austen is one of the most famous writers in English literature. Her books are read
by people all over the world and have been made into countless TV, film, theatre and
radio adaptations.
• This is all the more impressive because she only wrote six full-length novels.
• Jane Austen did not have that freedom,
• so she had to either convince a publisher of the quality of her work or pay for the publication of
an edition of it herself.
• Since she could not pay for it,
• she had to be patient and wait for the publisher to become convinced.
• It took 15 years.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
THE E-PUBLISHING ERA – It ‘s Time of Birth
• In the world of the Web, this is no longer the case.
• People can now publish their ideas on the Web and reach millions of
people over night,
• without paying anything for it and without having to convince the editorial board of
a large publishing company.
• Restrictions imposed by mass communication media companies and
• by natural geographical barriers were almost entirely removed by the invention of
the Web,
• which has led to a freedom to publish that marks the birth of a new era.
• One which we refer to as The e-Publishing Era.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 11: HOW THE WEB CHANGED SEARCH
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
• Web search is
• today ‘s the most prominent application of IR and its techniques.
• Indeed, the ranking and indexing components of
any search engine are
• fundamentally IR pieces of technology.
• An immediate consequence is that
• the Web has had a major impact in the development of IR
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
1) The first major impact of the Web on search is
• related to the characteristicsof the document collection itself.
• The Web collection is
• composed of documents(or pages) distributed over millions of sites and connectedthrough hyperlinks,
• i.e., links that associate a piece of text of a page with other Web pages.
• The inherent distributed nature of the Web collection requires
• collectingall documentsand storing copies of them in a central repository,prior to indexing.
• This new phase in the IR process, introduced by the Web, is called Web Search.
• The system that implements the IR process(es) is called Search Engine or IR System.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
2) The second major impact of the Web on search is related to
• the size of the collection and the volume of user queries submitted on a daily basis.
• Given that the Web grew larger and faster than any previous known text collection,
• the search engines have now to handle a volume of text that far exceeds 20 billion pages,
• i.e., a volume of text much larger than any previous text collection
The volume of user queries is also much larger than ever before, even if estimates vary
widely.
• The combination of
• a very large text collection with a very high query traffic has pushed the performance
and scalability of search engines to limits that largely exceed those of any previous IR
system.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
• That is,
• performance and scalability have become critical characteristics of the IR system, much
more than they used to be prior to the Web.
• While we do not discuss performance and scalability of search engines.
3) The third major impact of the Web on search is also related to the vast
size of the document collection.
• In a very large collection, predicting relevance is much harder than before.
• Basically, any query retrieves a large number of documents that match its terms,
which means that there are many noisy documents in the set of retrieved documents.
• That is, documents that seem related to the query
• but are actually not relevant to it according to the judgment of a large fraction of the
users are retrieved.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
• This problem first showed up in the early Web search engines and became more severe as the Web grew.
• Fortunately, the Web also includes
• new sources of evidence not present in standard document collections that can be used to alleviatethe problem,
such as hyperlinks and user clicks in documents in the answer set.
• Two other major impacts of the Web on search derive from the fact that the Web is not just a repository of
documents and data, but also a medium to do business.
• One immediate implication is that the search problem has been extended beyond the seeking of text
information to also encompass other user needs such as the price of a book, the phone number of a
hotel, the link for downloading a software.
• Providing effective answers to
• these types of information needs frequentlyrequiresidentifying structured dataassociatedwith the object of
interest such as price, location, or descriptionsof some of its key characteristics.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
The fifth and final impact of the Web on search derives from Web
advertising and other economic incentives.
The continued success of the Web as an interactive media for the
masses created incentives for its economic exploration in the
form of, for instance, advertising and electronic commerce.
These incentives led also to the abusive availability of commercial
information disguised in the form of purely informational
content, which is usually referred to as Web spam.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW THE WEB CHANGED SEARCH
The increasingly pervasive presence of spam on the Web has made the quest for
relevance even more difficult than before,
i.e., spam content is sometimes so compelling that it is confused with truly
relevant content.
Because of that, it is not unreasonable to think that spam makes relevance
negative,
i.e., the presence of spam makes the current ranking algorithms produce
answers sets that are worst than they would be if the Web were spam
free.
This difficulty is so large that today we talk of Adversarial Web Retrieval.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 12: PRACTICAL ISSUES ON THE WEB
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the
Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
PRACTICAL ISSUES ON THE WEB
• Electronic commerce is a major trend on the
Web nowadays and one which has benefited
millions of people.
• In an electronic transaction, the buyer usually
submits to the vendor credit information to be
used for charging purposes.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
PRACTICAL ISSUES ON THE WEB
• In its most common form, such information consists of
a credit card number.
• For security reasons, this information is usually
encrypted, as done by institutions and companies that
deploy automatic authentication processes.
• Besides security, another issue of major interest is
privacy. Frequently, people are willing to exchange
information as long as it does not become public.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
PRACTICAL ISSUES ON THE WEB
• The reasons are many,
• but the most common one is to protect oneself against misuse of private information by third
parties.
• Privacy is another issue which affects the deployment of the Web and which has not
been properly addressed yet.
• Two other important issues are
1. copyright and
2. patent rights.
• It is far from clear how the wide spread of data on the Web affects copyright and
patent laws in the various countries.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
PRACTICAL ISSUES ON THE WEB
• This is important because it affects the business of building up and
deploying large digital libraries.
• For instance,
• is a site which supervises all the information it posts acting as a publisher?
• And if so, is it responsible for misuse of the information it posts (even if it
is not the source)?
• Additionally, other practical issues of interest include
• scanning, optical character recognition (OCR), and cross-language retrieval
• (in which the query is in one language but the documents retrieved are in
another language).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 13: HOW PEOPLE SEARCH
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW PEOPLE SEARCH?
• Search tasks range from the relatively simple
• e.g., looking up disputed facts or finding weather
information
• to the rich and complex
• e.g., job seeking and planning vacations.
• Search interfaces should support a range of tasks,
• while taking into account how people think about searching
for information.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Information Lookup versus Exploratory Search
• User interaction with search interfaces differs depending on
a) the type of task,
b) the amount of time and
c) effort available to invest in the process, and the domain expertise
of the information seeker.
• The simple interaction dialogue used in Web search engines is
most appropriate for :
1. finding answers to questions or
2. to finding Web sites or other resources that act as search starting
points.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
HOW PEOPLE SEARCH?
•But, as Marchionini notes,
•the “turn-taking” interface of Web search engines is
• inherently limited and in many cases is being supplanted by
specialty search engines
• such as
• for travel and health information – that offer richer
interaction models.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classic versus Dynamic Model of Information Seeking
• Researchers have developed numerous theoretical models of
• how people go about doing search tasks?
• The classic notion of the information seeking process model as
described by
• Sutcliffe and Ennis is formulated as a cycle consisting of four main
activities:
1. problem identification,
2. articulation of information need(s),
3. query formulation, and
4. results evaluation.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classic versus Dynamic Model of Information Seeking
• The standard model of the information seeking process contains
• an underlying assumption that the user’s information need is static and the
information seeking process is one of successively refining a query until
• all and only those documents relevant to the original information need have been
retrieved.
• More recent models emphasize the dynamic nature of the search process,
• noting that users learn as they search, and their information needs adjust as they
see retrieval results and other document surrogates.
• This dynamic process is sometimes referred to as the berry picking model
of search.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
P1WU
UNIT – I: INTRODUCTION
Topic 14: SEARCH INTERFACES TODAY
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT I INTRODUCTION
1. Information Retrieval
2. Early Developments
3. The IR Problem
4. The Users Task
5. Information versus Data Retrieval
6. The IR System
7. The Software Architecture of the IR
System
8. The Retrieval and Ranking
Processes
9. The Web
10. The e-Publishing Era
11. How the web changed Search
12. Practical Issues on the Web
13. How People Search
14. Search Interfaces Today
15. Visualization in Search
Interfaces.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF

More Related Content

What's hot

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
Primya Tamil
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Term weighting
Term weightingTerm weighting
Term weighting
Primya Tamil
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Sudarsun Santhiappan
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
Nisha Arankandath
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
Vaibhav Khanna
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
Stephane Marchand-Maillet
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
pkaviya
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
Hemant Sharma
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1
DigiGurukul
 

What's hot (20)

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Wot
WotWot
Wot
 
web mining
web miningweb mining
web mining
 
Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1
 

Similar to CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF

Information RetrievalsT_I_materials.pptx
Information RetrievalsT_I_materials.pptxInformation RetrievalsT_I_materials.pptx
Information RetrievalsT_I_materials.pptx
lekhacce
 
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with trainingASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
APNIC
 
Electronic Commerce - Bascis
Electronic Commerce - Bascis Electronic Commerce - Bascis
Electronic Commerce - Bascis
Mohanraj Subramaniam
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
Sease
 
Information system
Information systemInformation system
Information system
Rao Majid Shamshad
 
Community Capability Model Framework Checklist Tool - Demo & Review
Community Capability Model Framework Checklist Tool - Demo & ReviewCommunity Capability Model Framework Checklist Tool - Demo & Review
Community Capability Model Framework Checklist Tool - Demo & Review
ManjulaPatel
 
New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...
IRJET Journal
 
FDP MP IITJ TISC.pdf
FDP MP IITJ TISC.pdfFDP MP IITJ TISC.pdf
FDP MP IITJ TISC.pdf
gurukhade1
 
Unit 1
Unit 1Unit 1
UI introduction_to_data_mining YA.ppt
UI introduction_to_data_mining YA.pptUI introduction_to_data_mining YA.ppt
UI introduction_to_data_mining YA.ppt
mirbella
 
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
Bhojaraju Gunjal
 
ICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceCharles Lim
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
IIRindia
 
Cloud solution to educational organization
Cloud solution to educational organizationCloud solution to educational organization
Cloud solution to educational organizationKaustubh Joshi
 
Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2
pdemian
 

Similar to CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF (20)

Information RetrievalsT_I_materials.pptx
Information RetrievalsT_I_materials.pptxInformation RetrievalsT_I_materials.pptx
Information RetrievalsT_I_materials.pptx
 
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with trainingASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
ASEAN-JAPAN Cyber Security Seminar: How to fill your team gaps with training
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 
Electronic Commerce - Bascis
Electronic Commerce - Bascis Electronic Commerce - Bascis
Electronic Commerce - Bascis
 
DOWLD SLIDES.pptx
DOWLD SLIDES.pptxDOWLD SLIDES.pptx
DOWLD SLIDES.pptx
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
 
Information system
Information systemInformation system
Information system
 
Lecture 2: Information system
Lecture 2: Information systemLecture 2: Information system
Lecture 2: Information system
 
Community Capability Model Framework Checklist Tool - Demo & Review
Community Capability Model Framework Checklist Tool - Demo & ReviewCommunity Capability Model Framework Checklist Tool - Demo & Review
Community Capability Model Framework Checklist Tool - Demo & Review
 
New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...
 
FDP MP IITJ TISC.pdf
FDP MP IITJ TISC.pdfFDP MP IITJ TISC.pdf
FDP MP IITJ TISC.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
UI introduction_to_data_mining YA.ppt
UI introduction_to_data_mining YA.pptUI introduction_to_data_mining YA.ppt
UI introduction_to_data_mining YA.ppt
 
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...
 
ICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security Governance
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
 
Cloud solution to educational organization
Cloud solution to educational organizationCloud solution to educational organization
Cloud solution to educational organization
 
Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2Network f ountain-cib-w78-2019 v2
Network f ountain-cib-w78-2019 v2
 
Accessing internet resources best practices
Accessing internet resources  best practicesAccessing internet resources  best practices
Accessing internet resources best practices
 
ALFRED_MUTANGA_CABIF
ALFRED_MUTANGA_CABIFALFRED_MUTANGA_CABIF
ALFRED_MUTANGA_CABIF
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
CS8080  IRT UNIT - III  SLIDES IN PDF.pdfCS8080  IRT UNIT - III  SLIDES IN PDF.pdf
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdfCS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (20)

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.ppt
 
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS3251-_PIC
 
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
 
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
 
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
CS8080  IRT UNIT - III  SLIDES IN PDF.pdfCS8080  IRT UNIT - III  SLIDES IN PDF.pdf
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdfCS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
 

Recently uploaded

Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 

Recently uploaded (20)

Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 

CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF

  • 1. P1WU UNIT – I: INTRODUCTION TOPIC -1 : INFORMATION RETRIEVAL AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 2. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 3. INTRODUCTION TO INFORMATION RETRIEVAL AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 4. What is IR? • Information retrieval (IR) is finding material . . . • of an unstructured nature . . . • that satisfies an information need from within large collections . . . . (usually stored on computers). • Unstructured data means that • a formal, semantically overt, easy-for-computer structure is missing. • In contrast to the rigidly structured data used in DB style searching (e.g. product inventories, personnel records) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 5. What is IR? • The process of actively seeking out information relevant to a topic of interest (van Rijsbergen) • Typically it refers to the automatic (rather thanmanual) retrieval of documents • Information Retrieval System (IRS) • “Document” is the generic term for an information • holder (book, chapter, article, webpage, etc) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 6. What is IR? • Information retrieval is the science of searching for information a) in a document, b) searching for documents themselves, and c) also searching for the metadata that describes data, and d) for databases of texts, images or sounds. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 7. What is IR? • Information Retrieval (IR) can be defined as • a software program that deals with the organization, storage, retrieval, and evaluation of information from document repositories, particularly textual information. • Information Retrieval is the activity of obtaining material that can usually be documented on an unstructured nature • i.e. usually text which satisfies an information need from within large collections which is stored on computers. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 8. What is IR? • IR helps users • find information that matches their information needs expressed as queries. • Historically, IR is about document retrieval, emphasizing document as the basic unit. – Finding documents relevant to user queries • Technically, IR studies the acquisition, organization, storage, retrieval, and distribution of information. • For example, Information Retrieval can be when a user enters a query into the system. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 9. What is IR? • Information retrieval (IR) is a broad area of Computer Science focused • primarily on providing the users with easy access to information of their interest, as follows. • Information retrieval deals with • the representation, • storage, • organization of, and • access to information items such as documents, Web pages, online catalogs, structured and semi- structured records, multimedia objects. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 10. What is IR? • Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 11. What is IR? • Information retrieval (IR) Quality : • Are the retrieved documents about the target subject up- to-date? • from a trusted source? • satisfying the user’s needs? • How should we rank documents in terms of these factors? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 12. What are IR Types? • Information retrieval (IR) are of two types: 1. Precision and 2. recall Above are the two parameters of retrieval effectiveness. 1. Precision refers to how many of the retrieved documents are relevant to the user. 2. Recall refers to what fraction of relevant documents in the collection are retrieved. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 13. What are the types of information retrieval? • Methods/Techniques in which information retrieval techniques are employed include: • Adversarial information retrieval. • Automatic summarization. Multi-document summarization. • Compound term processing. • Cross-lingual retrieval. • Document classification. • Spam filtering. • Question answering. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 14. Why we do IR? • Information retrieval can provide organizations with immediate value • --while it's important to try to figure out ways to capture tacit knowledge, • information retrieval provides a means to get at information that already exists in electronic formats AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 15. Why we do IR? • Information retrieval can provide organizations with immediate value--while it's important to try to figure out ways to capture tacit knowledge, information retrieval provides a means to get at information that already exists in electronic formats. • The representation and organization of the information items should be such as to • provide the users with easy access to information of their interest. • Nowadays, research in IR includes • modeling, Web search, text classification, systems architecture, user interfaces, data visualization, filtering, languages. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 16. What is the need for information retrieval? •Information retrieval can provide : •organizations with immediate value --while it's important to try to figure out ways to capture tacit knowledge, information retrieval provides a means to get at information that already exists in electronic formats. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 17. What is the need for information retrieval in WEB? • Web: – A huge, widely-distributed, highly heterogeneous, semistructured,, interconnected, evolving, hypertext/hypermedia information repository • Main issues – Abundance of information • The 99% of all the information are not interesting for the 99% of all users – The static Web is a very small part of all the Web. • Dynamic Website – To access the Web user need to exploit Search Engines (SE) • SE must be improved • To help people to better formulate their information needs • More personalization is needed AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 18. WEB IR : What is Web IR? • Web IR can be defined as • the application of theories and methodologies from IR to the World Wide Web. Web Information Retrieval models are ways of integrating many sources of evidence about documents, 1. the links, 2. the structure of the document, 3. the actual content of the document, 4. the quality of the document, etc. so that an effective Web search engine can be achieved. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII : PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES UNIT – I : INTRODUCTION
  • 19. Issues in IR •The main issues of the Information Retrieval (IR) are : 1. Document and Query Indexing 2. Query Evaluation 3. System Evaluation. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 20. Issues in IR : Document and Query Indexing • Document and Query Indexing Main goal of Document and Query Indexing is to find important meanings and creating an internal representation. • The factors to be considered are accuracy to represent semantics, exhaustiveness, and facility for a computer to manipulate. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 21. Issues in IR : Query Evaluation • Query Evaluation – In the retrieval model how can a document be represented with the selected keywords and how are documents and query representations compared to calculate a score. • Information Retrieval (IR) deals with issues like uncertainty and vagueness in information systems. • Uncertainty : The available representation does not typically reflect true semantics of objects such as images, videos etc. • Vagueness : The information that the user requires lacks clarity, is only vaguely expressed in a query, feedback or user action. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 22. Issues in IR : System Evaluation • System Evaluation – System Evaluation tells about the importance of determining the impact of information given on user achievement. Here, we see if the efficiency of the particular system related to time and space. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 23. IR in practice • Not only librarians, professional searchers, etc engage themselves in the activity of information retrieval • but nowadays hundreds of millions of people engage in IR every day when they use web search engines. • Information Retrieval is believed to be the dominant form of Information access AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 24. IR in practice • Information Retrieval is a research-driven theoretical and experimental discipline • The focus is on different aspects of the information– seeking process, depending on the researcher’s background or interest: • Computer scientist – fast and accurate search engine • Librarian – organization and indexing of information • Cognitive scientist – the process in the searcher’s mind • Philosopher – Is this really relevant ? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 25. IR in practice • Progress influenced by advances in Computational Linguistics, Information Visualization, Cognitive Psychology, HCI, … • Experimental vs. operational systems • Analogy to wcawrmwa.nsutfuadcteurnintgsf AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 26. IR Process • An information retrieval process begins when a user enters a query into the system. • Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. • Instead, several objects may match the query, perhaps with different degrees of relevancy. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 27. Fundamental concepts in IR AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 28. Fundamental concepts in IR • What is information ? • Meaning vs. form • Data vs. Information Retrieval • Relevance AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 29. The stages of IR AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 30. FORMULATION OF IR PROCESS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 31. IR system • The IR system assists the users in finding the information they require • but it does not explicitly return the answers to the question. • It notifies regarding the existence and location of documents that might consist of the required information. • Information retrieval also extends support to users in browsing or filtering document collection or processing a set of retrieved documents. • The system searches over billions of documents stored on millions of computers. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 32. IR system • An IR system has the ability to represent, store, organize, and access information items. • A set of keywords are required to search. Keywords are what people are searching for in search engines. • These keywords summarize the description of the information. • A spam filter, manual or automatic means are provided by Email program for classifying the mails so that it can be placed directly into particular folders. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 33. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 34. P1WU UNIT – I: INTRODUCTION TOPIC – 2: EARLY DEVELOPMENTS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 35. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 36. EARLY DEVELOPMENTS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 37. EARLY DEVELOPMENTS • For more than 5, 000 years, man has organized information f or later retrieval and searching. • In its most usual form, this has been done by • compiling, • storing, • organizing, and • indexing clay tablets, • hieroglyphics, • papyrus rolls, and • books. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 38. EARLY DEVELOPMENTS • IR in the 17th century: Samuel Pepys, the famous English diarist, subject-indexed his treasured 1000+ books library with key words. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 39. EARLY DEVELOPMENTS • IR in the 17th century: Samuel Pepys, the famous English diarist, subject-indexed his treasured 1000+ books library with key words. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 40. EARLY DEVELOPMENTS • Document Collection: text units we have built an IR system over. • Usually documents • But could be • memos • book chapters paragraphs scenes of a movie • turns in a conversation... • Lots of them AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 41. EARLY DEVELOPMENTS • For holding the various items, • special purpose buildings called libraries, • from the Latin word liber for book, or bibliothekes, • from the Greek word biblion for papyrus roll, are used. • The oldest known library was created in Elba, • in the “Fertile Crescent”, • currently northern Syria, • some time between 3,000 and 2,500 BC. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 42. EARLY DEVELOPMENTS • In the seventh century BC, • Assyrian king Ashurbanipal created the library of Nineveh, on the Tigris River (today, north of Iraq), • which contained more than 30,000 clay tablets at the time of its destruction in 612 BC. • By 300 BC, Ptolemy Soter, a Macedonian general, created the Great Library in Alexandria – the Egyptian city at the mouth of the Nile named after • the Macedonian king Alexander the Great (356-323 BC). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 43. EARLY DEVELOPMENTS • For seven centuries the Great Library, jointly with other major libraries in the city, • made Alexandria the intellectual capital of the Western world. • Since then, libraries have expanded and flourished. Nowadays, they are everywhere. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 44. EARLY DEVELOPMENTS • . They constitute the collective memory of the human race and their popularity is in the rising. • In 2008 alone, people in the US visited their libraries some 1.3 billion times and checked out more than 2 billion items • an increase in both yearly figures of more than 10 percent. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 45. EARLY DEVELOPMENTS • Since the volume of information in libraries is always growing, • it is necessary to build specialized data structures for fast search – the indexes. In one form or another, • indexes are at the core of every modern information retrieval system. • They provide fast access to the data and allow speeding up query processing. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 46. EARLY DEVELOPMENTS • For centuries indexes have been created manually as sets of categories. • Each category in the index is typically composed of labels that identify its associated topics and of pointers to the documents that discuss those topics. • While these indexes are usually designed by library and information science researchers, • the advent of modern computers has allowed the construction of large indexes automatically, • which has accelerated the development of the area of Information Retrieval (IR). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 47. EARLY DEVELOPMENTS • Early developments in IR date back to research efforts conducted in the 50’s by pioneers such as • Hans Peter Luhn, • Eugene Garfield, • Philip Bagley, and • Cal vi n Moores, • this last one having allegedly coined the term information retrieval. • In 1955, Allen Kent and colleagues published a paper describing the precision and recall metrics, • which was followed by the publication in 1962 of the Cranfield studies by Cyril Cleverdon. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 48. EARLY DEVELOPMENTS • In 1963, Joseph Becker and Robert Hayes published the first book on information retrieval [164]. • Throughout the 60’s, Gerard Salton and Karen Sparck Jones, among others, shaped the field • by developing the fundamental concepts that led to the modern technologies of ranking in IR. • In1968,thefirstIR book authored by Salton was published. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 49. EARLY DEVELOPMENTS • . In 1971, N.Jardine and C.J.VanRijsbergen articulated the “cluster hypothesis”. • In 1978, the first ACM Conference on IR (ACM SIGIR) was held in Rochester, New York. • In 1979, C.J. Van Rijsbergen published Information Retrieval, which focused on probabilistic models. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 50. EARLY DEVELOPMENTS • In 1983, Salton and McGill published Introduction to Modern Information Retrieval, a classic book on IR focused on vector models. • Since then, • the IR community has grown to include • thousands of professors, • researchers, • students, • engineers, and • practitioners • throughout the world. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 51. EARLY DEVELOPMENTS • The main conference in the area, • the ACM International Conference on Information Retrieval (ACM SIGIR), • now attracts hundreds of attendees and receives hundreds of submitted papers on an yearly basis. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 52. MODERN IR DEVELOPMENTS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 53. MODERN IR DEVELOPMENTS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 54. MODERN IR DEVELOPMENTS AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 55. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 56. P1WU UNIT – I: INTRODUCTION Topic 3: THE IR PROBELM AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 57. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 58. INTRODUCTION TO INFORMATION RETRIEVAL (IR) PROBELM What is IR PROBLEM? Users of modern IR systems, such as search engine users, have information needs of varying complexity. In the simplest case, they are looking for the link to the homepage of a company, government, or institution. In the more sophisticated cases, they are looking for information required to execute tasks associated with their jobs or immediate needs. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 59. IR PROBLEM EXAMPLE An example of a more complex information need is as follows: • Find all documents that address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). • This full description of the user need does not necessarily provide • the best formulation for querying the IR system. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 60. IR PROBLEM EXAMPLE • Instead, the user might want to first translate this information need into • a query, or sequence of queries, to be posed to the system. • In its most common form, this translation yields • a set of keywords, or index terms, which summarize the user information need. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 61. IR PROBLEM EXAMPLE • Given the user query, • the key goal of the IR system is to retrieve information that is useful or relevant to the user. • The emphasis is on the retrieval of information as opposed to the retrieval of data. • To be effective in its attempt to satisfy the user information need, • the IR system must somehow ‘interpret’ the contents of the information items. • That is, the documents in a collection, and rank them according to a degree of relevance to the user query. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 62. IR PROBLEM • Given the user query, • the key goal of the IR system is to retrieve information that is useful or relevant to the user. • The emphasis is on the retrieval of information as opposed to the retrieval of data. • To be effective in its attempt to satisfy the user information need, • the IR system must somehow ‘interpret’ the contents of the information items. • That is, the documents in a collection, and rank them according to a degree of relevance to the user query. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 63. IR PROBLEM • This ‘interpretation’ of a document content involves extracting syntactic and semantic information from the document text and using this information to match the user information need. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 64. IR PROBLEM • The IR Problem: the primary goal of an IR system is to • retrieve all the documents that are relevant to a user query while retrieving as few non relevant documents as possible. • The difficulty is knowing not only how to extract information from the documents • but also knowing how to use it to decide relevance. • That is, the notion of relevance is of central importance in IR. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 65. IR PROBLEM • One main issue is that relevance is a personal assessment that depends on the task being solved and its context. • For example: • Relevance can change • with time (e.g., new information becomes available), • with location (e.g., the most relevant answer is the closest one), or • even with the device (e.g., the best answer is a short document that is easier to download and visualize). • In this sense, no IR system can provide perfect answers to all users all the time. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 66. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 67. P1WU UNIT – I: INTRODUCTION Topic 4:THE USERS TASK AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 68. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 69. AN OVERVIEW OF USER TASK : What is USER TASK IN IRS? • The user of a retrieval system has to : • translate his information need into a query in the language provided by the system. • With an information retrieval system, this normally implies • specifying a set of words which convey the semantics of the information need. • With a data retrieval system, • a query expression (such as, for instance, a regular expression) is used to • convey the constraints that must be satisfied by objects in the answer set. • In both cases, we say that the user searches for useful information executing a retrieval task. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 70. IRS – USER TASK : What is USER TASK IN IRS? • Users of modern IR systems, • such as search engine users, have information needs of varying complexity. • The user of a retrieval system has to translate their information need into a query in the language provided by the system. • With an IR system, such as a search engine, • this usually implies specifying a set of words that convey the semantics of the information need. • We say that the user is searching or querying for information of their interest. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 71. IRS – USER TASK EXAMPLE To illustrate, • The user might be interested in documents about • car racing in general and might decide to glance related documents about Formula 1 racing, • Formula Indy, and the ‘24 Hours of Le Mans. • We say that • the user is browsing or navigating the documents in the collection, not searching. • It is still a process of retrieving information, • but one whose main objectives are less clearly defined in the beginning. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 72. IRS – USER TASK • While searching for information of interest is the main retrieval task on the Web, • search can also be used for satisfying other user needs distinct from information access, • such as the buying of goods and the placing of reservations. • Consider now a user • who has an interest that is either poorly defined or inherently broad, • such that the query to specify is unclear. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 73. IRS – USER TASK • The task in this case is more related to • exploratory search and resembles a process of quasi-sequential search for information of interest. • Here we, make a clear distinction between the different tasks the user of the retrieval system might be engaged in. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 74. IRS – USER TASK • The task might be then of two distinct types: searching and browsing, as illustrated in Figure: AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 75. IRS – USER TASK • In a process of retrieving information, one whose main objectives are not clearly defined in the beginning and whose purpose might change during the interaction with the system. • Then, user task may go with Browsing only. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 76. IRS – USER TASK • User Choice of Information Retrieval: • Push • Pull AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 77. IRS – USER TASK • Both retrieval and browsing are, in the language of the World Wide Web, `pulling' actions. • That is, the user requests the information in an interactive manner. • An alternative is to do retrieval in an automatic and permanent fashion using software agents which push the information towards the user. • For instance, information useful to a user could be extracted periodically from a news service. • In this case, we say that the IR system is executing a particular retrieval task which consists of filtering relevant information for later inspection by the user. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 78. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 79. P1WU UNIT – I: INTRODUCTION Topic 5: THE INFORMATION VERSUS RETRIEVAL AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 80. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 81. INTRODUCTION TO INFORMATION What is Information? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 82. INTRODUCTION TO RETRIEVAL What is Retrieval? • This does not mean that there is no structure in the data Document structure (headings, paragraphs, lists. . . ) • Explicit markup formatting (e.g. in HTML, XML. . . ) Linguistic structure (latent, hidden) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES SELECT * FROM business catalogue WHERE category = ’florist’ AND city zip = ’cb1’
  • 83. INTRODUCTION TO RETRIEVAL What is Information retrieval (IR) ? Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 84. INTRODUCTION TO RETRIEVAL • An information need is • the topic about which the user desires to know more about. • A query is • what the user conveys to the computer in an attempt to communicate the information need. • A document is relevant • if the user perceives that it contains information of value with respect to their personal information need. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Known-item search. Precise information seeking search Open-ended search (“topical search”)
  • 85. THE INFORMATION VERSUS RETRIEVAL • Data retrieval, in the context of an IR system, consists • mainly of determining which documents of a collection contain • the keywords in the user query which, most frequently, is not enough to satisfy the user information need. • In fact, the user of an IR system is concerned more with • retrieving information about a subject than with retrieving data that satisfies a given query. • For instance, a user of an IR system is willing to accept • documents that contain synonyms of the query terms in the result set, • even when those documents do not contain any query terms. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 86. INFORMATION RETRIEVAL • “Ad hoc” retrieval web retrieval • Support for browsing and filtering document collections: • Clustering • Classification; using fixed labels (common information needs, age groups, topics; ) • Further processing a set of retrieved documents, • e.g., by using natural language processing • Information extraction Summarization Question answering. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 87. RETRIEVAL TYPES 1) Web search ( ) • Search ground are billions of documents on millions of computers • issues: spidering; efficient indexing and search; malicious manipulation to boost search engine rankings 2) Enterprise and institutional search ( ) • e.g company’s documentation, patents, research articles often domain-specific • Centralised storage; dedicated machines for search. • Most prevalent IR evaluation scenario: US intelligence analyst’s searches 3) Personal information retrieval (email, pers. documents; ) • e.g., Mac OS X Spotlight; Windows’ Instant Search • Issues: different file types; maintenance-free, lightweight to run in background AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 88. THE INFORMATION VERSUS RETRIEVAL • In an IR system the retrieved objects might be • inaccurate and small errors are likely to go unnoticed. • In a data retrieval system, on the contrary, • a single erroneous object among a retrieval system, • such as defined structure and semantics thousand retrieved objects means total failure. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 89. THE INFORMATION VERSUS RETRIEVAL • While A data a relational database, deals with data that has a well defined structure and semantics. • while an IR system deals with natural language text which is not well structured. • Data retrieval, while providing a solution to the user of a database system, does not solve • the problem of retrieving information about a subject or topic. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 90. THE INFORMATION VERSUS RETRIEVAL AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES S n Data Information 1 Unorganized raw facts that need processing without which it is seemingly random and useless to humans Information is a processed, organized data presented in a given context and is useful to humans. 2 Data is an individual unit that contains raw material which does not carry any specific meaning. Information is a group of data thatcollectively carry a logical meaning 3 . Data Doesn't depended on information. Information depends on data. 4 It is measured in bits and bytes. Information is measured in meaningfulunits like time, quantity, etc. 5 Data is never suited to thespecific needs of a designer. Information is specific to the expectations and requirements because all the irrelevant facts and figures are removed, during the transformation process. 6 An example of Data is a Student’score. The average score of a class is the information derived from the given data.
  • 91. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 92. P1WU UNIT – I: INTRODUCTION Topic 6: THE IR SYSTEM AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 93. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 94. The role of an IR system A modern view: • Support the user in – exploring a problem domain, understanding its terminology, concepts and structure – clarifying, refining and formulating an information need – finding documents that match the info need description • As many relevant docs as possible • As few non-relevant documents as possible AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 95. How does it do this ? • User interfaces and visualization tools for – exploring a collection of documents – exploring search results • Query expansion based on – Thesauri – Lexical/statistic analysis of text / context and concept formation – Relevance feedback • Indexing and matching model AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 96. How well does it do this? • Evaluation – Of the components • Indexing / matching algorithms – Of the exploratory process overall • Usability issues • Usefulness to task • User satisfaction AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 97. What do we want from an IRS ? • Systemic approach – Goal (for a known information need): • Return as many relevant documents as possible and as few non-relevant documents as possible • Cognitive approach – Goal (in an interactive information-seeking environment, with a given IRS): • Support the user’s exploration of the problem domain and the task completion. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 98. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) OIRS is the techniques of storing and recovering and often disseminating recorded data especially through the use of a computerized system” (Merriam Webster Dictionary) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 99. QUALITIES OF IRS • The effectiveness of an IR system (i.e., the quality of its search results) is determined by two key statistics about the system’s returned results for a query: 1. Precision: What fraction of the returned results are relevant to the information need? 2. Recall: What fraction of the relevant documents in the collection were returned by the system? • Queries to be addressed: • What is the best balance between the two? • Easy to get perfect recall: just retrieve everything • Easy to get good precision: retrieve only the most relevant AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 100. THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM • To describe the IR system, we use a simple and generic software architecture as shown in Figure AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 101. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) •The first step in setting up an IR system is to assemble the document collection, • which can be private or be crawled from the Web. In the second case a crawler module is responsible for collecting the documents. •The document collection is stored in disk storage usually referred to as the central repository. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 102. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) •The documents in the central repository need to be indexed for fast retrieval and ranking. •The most used index structure is an inverted index composed of all the distinct words of the collection and, for each word, • a list of the documents that contain it. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 103. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) •Given that the document collection is indexed, the retrieval process can be initiated. •It consists of retrieving documents that satisfy either a user query or a click in a hyper link. • In the first case, we say that the user is searching for information of interest; • in the second case, we say that the user is browsing for information of interest. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 104. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • Use retrieval as PROCESS it applies to the searching process requires browsing and how it compares to searching. • To search, the user first specifies a query that reflects their information need. • Next, the user query is parsed and expanded with, for instance, spelling variants of a query word. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 105. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • The expanded query, • which we refer to as the system query, is then processed against the index to retrieve a subset of all documents. • Following, the retrieved documents are ranked and the top documents are returned to the user. • The purpose of ranking is to identify the documents that are most likely to be considered relevant by the user, and constitutes the most critical part of the IR system. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 106. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • Given the inherent subjectivity in deciding relevance, • evaluating the quality of the answer set is a key step for improving the IR system. • A systematic evaluation process allows • fine tuning the ranking algorithm and improving the quality of the results. • The most common evaluation procedure consists of comparing • the set of results produced by the IR system with results suggested by human specialists. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 107. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 108. P1WU UNIT – I: INTRODUCTION Topic 7: THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 109. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 110. SYSTEM ARCHITECTURE OF IRS A high level view of the software architecture of an IR system will provide: 1) Components 2) Tools 3) Environment 4) Data source(s) Also additional elements needed for through understanding of Data flow. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 111. THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM • To describe the IR system, we use a simple and generic software architecture as shown in Figure AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 112. THE SOFTWARE ARCHITECTURE OF THE IR SYSTEM • To describe the IR system, we use a simple and generic software architecture as shown in Figure AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 113. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • The first step in setting up an IR system is to assemble the document collection, • which can be private or be crawled from the Web. In the second case a crawler module is responsible for collecting the documents. • The document collection is stored in disk storage usually referred to as the central repository. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 114. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • The documents in the central repository need to be indexed for fast retrieval and ranking. • The most used index structure is an inverted index composed of all the distinct words of the collection and, for each word, • a list of the documents that contain it. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 115. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) •Given that the document collection is indexed, the retrieval process can be initiated. •It consists of retrieving documents that satisfy either a user query or a click in a hyper link. • In the first case, we say that the user is searching for information of interest; • in the second case, we say that the user is browsing for information of interest. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 116. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • Use retrieval as PROCESS it applies to the searching process requires browsing and how it compares to searching. • To search, the user first specifies a query that reflects their information need. • Next, the user query is parsed and expanded with, for instance, spelling variants of a query word. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 117. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • The expanded query, • which we refer to as the system query, is then processed against the index to retrieve a subset of all documents. • Following, the retrieved documents are ranked and the top documents are returned to the user. • The purpose of ranking is to identify the documents that are most likely to be considered relevant by the user, and constitutes the most critical part of the IR system. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 118. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • Given the inherent subjectivity in deciding relevance, • evaluating the quality of the answer set is a key step for improving the IR system. • A systematic evaluation process allows • fine tuning the ranking algorithm and improving the quality of the results. • The most common evaluation procedure consists of comparing • the set of results produced by the IR system with results suggested by human specialists. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 119. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • To improve the ranking, • we might collect feedback from the users and use this information to change the results. • In the Web, • the most abundant form of user feedback are the clicks on the documents in the results set. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 120. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • Another important source of information for Web ranking are the hyperlinks among pages, • which allow identifying sites of high authority. • There are many other concepts and technologies that bear impact on • the design of a full fledged IR system, such as a modern search engine. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 121. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 122. P1WU UNIT – I: INTRODUCTION Topic 8: THE RETRIEVAL AND RANKING PROCESSES AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 123. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 124. Ranked Retrieval • Ranked retrieval • The documents are ranked based on their score • Advantages – Query easy to specify – The output is ranked based on the estimated relevance of the documents to the query – A wide variety of theoretical models exist • Disadvantages – Query less precise (although weighting can be used) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 125. THE RETRIEVAL AND RANKING PROCESSES • To describe the retrieval and ranking processes, we further elaborate on our description of the modules shown in Figure 1.2, as illustrated in Figure 1.3. • Given the documents of the collection, 1. we first apply text operations to them such as eliminating stop words, stemming, and 2. selecting a subset of all terms for use as indexing terms. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 126. THE RETRIEVAL AND RANKING PROCESSES • To describe the IR system Retrieval and Ranking Processes, we illustrate it through the figure as given below: AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 127. THE RETRIEVAL AND RANKING PROCESSES • To describe the IR system Retrieval and Ranking Processes, we illustrate it through the figure as given below: AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 128. THE RETRIEVAL AND RANKING PROCESSES • The indexing terms are then used to compose document representations, • which might be smaller than the documents themselves (depending on the subset of index terms selected). • Given the document representations, it is necessary to build an index of the text. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 129. THE RETRIEVAL AND RANKING PROCESSES • Different index structures might be used, • but the most popular one is an inverted index. • The steps required to generate the index compose the indexing process and must be executed offline, • before the system is ready to process any queries. • The resources (time and storage space) spent on the indexing process are amortized by querying the retrieval system many times. • Given that the document collection is indexed, the retrieval process can be initiated. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 130. THE RETRIEVAL AND RANKING PROCESSES • The user first specifies a query that reflects their information need. • This query is then parsed and modified by operations that resemble those applied to the documents. • Typical operations at this point consist of spelling corrections and elimination of terms such as stop words, • whenever appropriated. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 131. THE RETRIEVAL AND RANKING PROCESSES • Next, the transformed query is expanded and modified. • For instance, the query might be modified using query suggestions made by the system and confirmed by the user. • The expanded and modified query is then processed to obtain the set of retrieved documents, • which is composed of documents that contain the query terms. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 132. THE RETRIEVAL AND RANKING PROCESSES • Fast query processing is made possible by the index structure previously built. • The steps required to produce the set of retrieved documents constitute the retrieval process. • Next, the retrieved documents are ranked according to a likelihood of relevance to the user. • This is a most critical step because the quality of the results, • as perceived by the users, is fundamentally dependent on the ranking. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 133. THE RETRIEVAL AND RANKING PROCESSES • The top ranked documents are then formatted for presentation to the user. • The formatting consists of retrieving the title of the documents and generating snippets for them, • i.e., text excerpts that contain the query terms, • which are then displayed to the user. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 134. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 135. ONLINE INFORMATION RETRIEVAL SYSTEM(OIRS) • The documents in the central repository need to be indexed for fast retrieval and ranking. • The most used index structure is an inverted index composed of all the distinct words of the collection and, for each word, • a list of the documents that contain it. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 136. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 137. P1WU UNIT – I: INTRODUCTION Topic 9: THE WEB AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 138. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 139. WEB – What is Web? • The Web, or World Wide Web (W3), is basically a system of Internet servers that support specially formatted documents. • The documents are formatted in a markup language called HTML (HyperText Markup Language) that supports links to other documents, as well as graphics, audio, and video files. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 140. WEB – What Does Web Mean? • The Web is the common name for the World Wide Web, a subset of the Internet consisting of the pages that can be accessed by a Web browser. • Many people assume that the Web is the same as the Internet, and use these terms interchangeably. • However, the term Internet actually refers to the global network of servers that makes the information sharing that happens over the Web possible. • So, although the Web does make up a large portion of the Internet, but they are not one and same. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 141. WEB – KEY TOPICS • WEB SITE OR WEB PAGES • WEB SERVER • WEB CLIENT • WEB APPLICATIONS OR WEB APPs • WEB SOFTWARES – WEB 1.0, WRB 2.0, WEB 3.0 • WEB SERVICES AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 142. WEB - A Brief History • “As We May Think” influenced people like Douglas Engelbart who, • at the Fall Joint Computer Conference in San Francisco in December of 1968, • ran a demonstration in which he introduced the first ever computer mouse, video conferencing, teleconferencing, and hypertext. • It was so incredible that it became known as “the mother of all demos” [1690]. • Of the innovations displayed, the one that interests us the most here is hypertext. • The term was coined by Ted Nelson in his Project Xanadu. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 143. WEB - A Brief History • Hypertext allows the reader to jump from one electronic document to another, • which was one important property regarding the problem Tim Berners-Lee faced in 1989. • At the time, Berners-Lee worked in Geneva at the CERN – Conseil Europeen pourla Recherche Nucleaire. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 144. WEB - A Brief History • Researchers who wanted to share documentation with others had to • reformat their documents to make them compatible with an internal publishing system. • It was annoying and generated many questions, • many of which ended up been directed towards Berners-Lee. • He understood that a better solution was required. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 145. WEB - A Brief History • It just so happened that CERN was the largest Internet node in Europe. • Berners Lee reasoned that • it would be nice if the solution to the problem of sharing documents were decentralized, such that the researchers could share their contributions freely. • He saw that • a networked hypertext, through the Internet, would be a good solution and started working on its implementation AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 146. WEB - A Brief History • In 1990, • he wrote the HTTP protocol, defined the HTML language, • wrote the first browser, which he called “World Wide Web”, and the first Web server. • In 1991, • he made his browser and server software available in the Internet. The Web was born. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 147. Web Information Retrieval : What is web in information retrieval? •Web Information retrieval is • the process of searching within a huge World Wide Web document collection for a particular information need (called a query). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 148. Web Information Retrieval : What is web in information retrieval? • The Web can be considered as • a large-scale document collection, for which classical text retrieval techniques can be applied. • However, its unique features and structure offer new sources of evidence that can be used to enhance the effectiveness of Information Retrieval (IR) systems. • Generally, Web IR examines • the combination of evidence from both the textual content of documents and • the structure of the Web, • as well as the search behavior of users and issues related to the evaluation of retrieval effectiveness in the Web setting. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 149. Web Information Retrieval • Web documents / data • No traditional collection – Huge • Time and space to crawl index • IRSs cannot store copies of documents – Dynamic, volatile, anarchic, un- controlled AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 150. Web Information Retrieval – Homogeneous sub-collections • Structure – In documents (un-/semi-/fully-structured) – Between docs: network of inter-connected nodes – Hyper-links - conceptual vs. physical documents AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 151. Web Information Retrieval • Web Information Retrieval models are • ways of integrating many sources of evidence about documents, such as 1. the links, 2. the structure of the document, the actual content of the document, 3. the quality of the document, etc. 4. so that an effective Web search engine can be achieved. • In contrast with the traditional library-type settings of IR systems, • the Web is a hostile environment, where Web search engines have to deal with AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 152. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 153. P1WU UNIT – I: INTRODUCTION Topic 10: THE E-PUBLISHING ERA AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 154. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 155. What is E-Publishing? • Electronic publishing (also referred to as publishing, digital publishing, or online publishing) includes • the digital publication of • e-books, • digital magazines, and • the development of digital libraries and catalogues. • It also includes • the editing of • books, • journals and • Magazines to be posted on a screen (computer, e-reader, tablet, or Smartphone). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 156. Need of E-Publishing • Since its inception, • the Web became a huge success. • The number of Web pages now far exceeds • 20 billion and the number of Web users in the world exceeds 1.7 billion. • It is known that there are • more than one trillion distinct URLs on the Web, • even if many of them are pointers to dynamic pages, not static HTML pages. • A viable model of economic sustainability based on online advertising was developed. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 157. THE E-PUBLISHING ERA • Electronic publishing has become common in scientific publishing • where it has been argued that peer-reviewed scientific journals are in the process of being replaced by electronic publishing. • It is also becoming common to distribute • books, magazines, and newspapers to consumers through tablet reading devices, • a market that is growing by millions each year,generated by online vendors . • Apple's iTunes bookstore, Amazon's bookstore for Kindle, and books in the Google Play Bookstore. • Market research suggested that half of all magazine and newspaper circulation is based on E-Publishing. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 158. THE E-PUBLISHING ERA • The advent of the Web changed the world in a way that few people could have anticipated. • Yet, one has to wonder on the characteristics of the Web that have made it so successful. • Is there a single characteristic of the Web that was most decisive for its success? • The simple HTML markup language, the low access costs, the wide spread reach of the Internet, the interactive browser interface, the search engines. • While providing the fundamental infrastructure for the Web, • these technologies were not the root cause of its popularity. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 159. THE E-PUBLISHING ERA – It ‘s Time of Birth • What was it then? • The fundamental shift in human relationships, introduced by the Web, was freedom to publish. • Example: • Jane Austen is one of the most famous writers in English literature. Her books are read by people all over the world and have been made into countless TV, film, theatre and radio adaptations. • This is all the more impressive because she only wrote six full-length novels. • Jane Austen did not have that freedom, • so she had to either convince a publisher of the quality of her work or pay for the publication of an edition of it herself. • Since she could not pay for it, • she had to be patient and wait for the publisher to become convinced. • It took 15 years. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 160. THE E-PUBLISHING ERA – It ‘s Time of Birth • In the world of the Web, this is no longer the case. • People can now publish their ideas on the Web and reach millions of people over night, • without paying anything for it and without having to convince the editorial board of a large publishing company. • Restrictions imposed by mass communication media companies and • by natural geographical barriers were almost entirely removed by the invention of the Web, • which has led to a freedom to publish that marks the birth of a new era. • One which we refer to as The e-Publishing Era. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 161. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 162. P1WU UNIT – I: INTRODUCTION Topic 11: HOW THE WEB CHANGED SEARCH AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 163. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 164. HOW THE WEB CHANGED SEARCH • Web search is • today ‘s the most prominent application of IR and its techniques. • Indeed, the ranking and indexing components of any search engine are • fundamentally IR pieces of technology. • An immediate consequence is that • the Web has had a major impact in the development of IR AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 165. HOW THE WEB CHANGED SEARCH 1) The first major impact of the Web on search is • related to the characteristicsof the document collection itself. • The Web collection is • composed of documents(or pages) distributed over millions of sites and connectedthrough hyperlinks, • i.e., links that associate a piece of text of a page with other Web pages. • The inherent distributed nature of the Web collection requires • collectingall documentsand storing copies of them in a central repository,prior to indexing. • This new phase in the IR process, introduced by the Web, is called Web Search. • The system that implements the IR process(es) is called Search Engine or IR System. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 166. HOW THE WEB CHANGED SEARCH 2) The second major impact of the Web on search is related to • the size of the collection and the volume of user queries submitted on a daily basis. • Given that the Web grew larger and faster than any previous known text collection, • the search engines have now to handle a volume of text that far exceeds 20 billion pages, • i.e., a volume of text much larger than any previous text collection The volume of user queries is also much larger than ever before, even if estimates vary widely. • The combination of • a very large text collection with a very high query traffic has pushed the performance and scalability of search engines to limits that largely exceed those of any previous IR system. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 167. HOW THE WEB CHANGED SEARCH • That is, • performance and scalability have become critical characteristics of the IR system, much more than they used to be prior to the Web. • While we do not discuss performance and scalability of search engines. 3) The third major impact of the Web on search is also related to the vast size of the document collection. • In a very large collection, predicting relevance is much harder than before. • Basically, any query retrieves a large number of documents that match its terms, which means that there are many noisy documents in the set of retrieved documents. • That is, documents that seem related to the query • but are actually not relevant to it according to the judgment of a large fraction of the users are retrieved. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 168. HOW THE WEB CHANGED SEARCH • This problem first showed up in the early Web search engines and became more severe as the Web grew. • Fortunately, the Web also includes • new sources of evidence not present in standard document collections that can be used to alleviatethe problem, such as hyperlinks and user clicks in documents in the answer set. • Two other major impacts of the Web on search derive from the fact that the Web is not just a repository of documents and data, but also a medium to do business. • One immediate implication is that the search problem has been extended beyond the seeking of text information to also encompass other user needs such as the price of a book, the phone number of a hotel, the link for downloading a software. • Providing effective answers to • these types of information needs frequentlyrequiresidentifying structured dataassociatedwith the object of interest such as price, location, or descriptionsof some of its key characteristics. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 169. HOW THE WEB CHANGED SEARCH The fifth and final impact of the Web on search derives from Web advertising and other economic incentives. The continued success of the Web as an interactive media for the masses created incentives for its economic exploration in the form of, for instance, advertising and electronic commerce. These incentives led also to the abusive availability of commercial information disguised in the form of purely informational content, which is usually referred to as Web spam. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 170. HOW THE WEB CHANGED SEARCH The increasingly pervasive presence of spam on the Web has made the quest for relevance even more difficult than before, i.e., spam content is sometimes so compelling that it is confused with truly relevant content. Because of that, it is not unreasonable to think that spam makes relevance negative, i.e., the presence of spam makes the current ranking algorithms produce answers sets that are worst than they would be if the Web were spam free. This difficulty is so large that today we talk of Adversarial Web Retrieval. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 171. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 172. P1WU UNIT – I: INTRODUCTION Topic 12: PRACTICAL ISSUES ON THE WEB AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 173. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 174. PRACTICAL ISSUES ON THE WEB • Electronic commerce is a major trend on the Web nowadays and one which has benefited millions of people. • In an electronic transaction, the buyer usually submits to the vendor credit information to be used for charging purposes. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 175. PRACTICAL ISSUES ON THE WEB • In its most common form, such information consists of a credit card number. • For security reasons, this information is usually encrypted, as done by institutions and companies that deploy automatic authentication processes. • Besides security, another issue of major interest is privacy. Frequently, people are willing to exchange information as long as it does not become public. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 176. PRACTICAL ISSUES ON THE WEB • The reasons are many, • but the most common one is to protect oneself against misuse of private information by third parties. • Privacy is another issue which affects the deployment of the Web and which has not been properly addressed yet. • Two other important issues are 1. copyright and 2. patent rights. • It is far from clear how the wide spread of data on the Web affects copyright and patent laws in the various countries. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 177. PRACTICAL ISSUES ON THE WEB • This is important because it affects the business of building up and deploying large digital libraries. • For instance, • is a site which supervises all the information it posts acting as a publisher? • And if so, is it responsible for misuse of the information it posts (even if it is not the source)? • Additionally, other practical issues of interest include • scanning, optical character recognition (OCR), and cross-language retrieval • (in which the query is in one language but the documents retrieved are in another language). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 178. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 179. P1WU UNIT – I: INTRODUCTION Topic 13: HOW PEOPLE SEARCH AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 180. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 181. HOW PEOPLE SEARCH? • Search tasks range from the relatively simple • e.g., looking up disputed facts or finding weather information • to the rich and complex • e.g., job seeking and planning vacations. • Search interfaces should support a range of tasks, • while taking into account how people think about searching for information. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 182. Information Lookup versus Exploratory Search • User interaction with search interfaces differs depending on a) the type of task, b) the amount of time and c) effort available to invest in the process, and the domain expertise of the information seeker. • The simple interaction dialogue used in Web search engines is most appropriate for : 1. finding answers to questions or 2. to finding Web sites or other resources that act as search starting points. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 183. HOW PEOPLE SEARCH? •But, as Marchionini notes, •the “turn-taking” interface of Web search engines is • inherently limited and in many cases is being supplanted by specialty search engines • such as • for travel and health information – that offer richer interaction models. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 184. Classic versus Dynamic Model of Information Seeking • Researchers have developed numerous theoretical models of • how people go about doing search tasks? • The classic notion of the information seeking process model as described by • Sutcliffe and Ennis is formulated as a cycle consisting of four main activities: 1. problem identification, 2. articulation of information need(s), 3. query formulation, and 4. results evaluation. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 185. Classic versus Dynamic Model of Information Seeking • The standard model of the information seeking process contains • an underlying assumption that the user’s information need is static and the information seeking process is one of successively refining a query until • all and only those documents relevant to the original information need have been retrieved. • More recent models emphasize the dynamic nature of the search process, • noting that users learn as they search, and their information needs adjust as they see retrieval results and other document surrogates. • This dynamic process is sometimes referred to as the berry picking model of search. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 186. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 187. P1WU UNIT – I: INTRODUCTION Topic 14: SEARCH INTERFACES TODAY AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 188. UNIT I INTRODUCTION 1. Information Retrieval 2. Early Developments 3. The IR Problem 4. The Users Task 5. Information versus Data Retrieval 6. The IR System 7. The Software Architecture of the IR System 8. The Retrieval and Ranking Processes 9. The Web 10. The e-Publishing Era 11. How the web changed Search 12. Practical Issues on the Web 13. How People Search 14. Search Interfaces Today 15. Visualization in Search Interfaces. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES