On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
information extraction by selamu shirtawiPresentation Transcript
Faculty of natural and computational
DEPARTEMENT OF INFORMATION SCIENCE
COURSE TITLE:INFORMATION STORAGE AND
ASSIGNMENT TITTLE: INFPRMATION EXTRACTON
of information extraction
Types of information extraction
Application of information extraction
Function of Information Extraction
The difference between IR and IE
NE-Named Entity recognition.
CO-Co reference resolution
ST-Scenario Template production
TR-Template Relation construction
PR- Public Relation
Information Extraction (IE) is a technology based on
analysing natural language in order to extract snippets of
It is the process takes texts as input and produces fixedformat, unambiguous data as output. This data may be used
directly for display to users.
The user would then read the documents and extract the
requisite information themselves. They might then enter the
information in a spreadsheet and produce a chart for a report
IE systems are more difficult and knowledge-intensive to
build, and are to varying degrees tied to particular domains
Information Extraction (IE):is to automatically extract
structured information from unstructured and/or semi
It is system to analyses unrestricted text in order to extract
information about pre-specified types of events, entities or
It is the automatic extraction of structured information from
It is systems to extract clear, factual information from
unstructured document. Roughly: Who did what to
It is the task of automatically extracting structured
information from unstructured data and semi-
data is a data which includes web
pages, text documents, office documents,
presentations, emails,…It doesn’t have a data model.
also referred to as “dark matter“.
Information Extraction split into five types: these are
1.Named Entity recognition (NE) - The simplest and most
reliable IE technology.
This about identifying textual information relating to
people, organizations, places, brands, products and so on.
These are typically nouns and proper nouns.
2. Co reference resolution (CO)-it involves
identifying identity relations between entities in texts.
entities are both those identified by NE
recognition and anaphoric references to that entities.
3. Template Element construction (TE) - The TE task builds on
NE recognition and co reference resolution, associating
descriptive information with the entities.
4. Template Relation construction (TR)- Finds relations
between TE entities.
This helps IR systems to answer particular information-seeking
5.Scenario Template production (ST)-It Fits TE and
TR results into specified event scenarios. Scenario
templates (STs) are the prototypical outputs .
NE- is about finding entities;
CO- about which entities and references (such as
pronouns) refer to the same thing;
TE- about what attributes entities have;
TR- about what relationships between entities there
ST- about events that the entities participate.
1. Financial Analysts:- IE can enable analysts
to answer questions such as, How many
instances predicting strong performance for a
particular company are out there ?
2. Marketing Strategists:- IE can be used to create a range
of media metrics, for example the media distance, or extent
of collocation between concepts and products/companies.
3. Public Relation Workers (PR):-Public relations staff are
concerned to identify negative reporting events as quickly
as possible in order to respond .
Some of the function of IEs are:
To retrieving and storing structured data,
To transform unstructured data into something that can be
To extract automatically structured information from
unstructured and/or semi-structured machine-readable
Information Extraction is not Information Retrieval.
Information Retrieval- refers to the human-computer interaction
(HCI) that happens when we use a machine to search a body of
information for information objects (content) that match our
It is used to reduce what has been called "information
Information Extraction-is to automatically extract
structured information from unstructured documents.
It refers to the machine's ability to automatically extract
IR is there to find relevant documents but,
IE is there to extract relevant information from the
Information extraction systems search large bodies of
unrestricted text for specific types of entities and relations, and
use them to populate well-organized databases.
These databases can then be used to find answers for specific
The typical architecture for an information extraction system
begins by segmenting, tokenizing, and part-of-speech tagging
The resulting data is then searched for specific types of entity.
Finally, the information extraction system looks at entities that
are mentioned near one another in the text, and tries to determine
whether specific relationships hold between those entities.