information extraction by selamu shirtawi


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

information extraction by selamu shirtawi

  2. 2. Acronyms Introduction Definition of information extraction Types of information extraction Application of information extraction Function of Information Extraction The difference between IR and IE Conclusion
  3. 3.  IE-Information Extraction  IR-Information Retrieval  NE-Named Entity recognition.  CO-Co reference resolution  ST-Scenario Template production  TR-Template Relation construction  PR- Public Relation
  4. 4.     Information Extraction (IE) is a technology based on analysing natural language in order to extract snippets of information. It is the process takes texts as input and produces fixedformat, unambiguous data as output. This data may be used directly for display to users. The user would then read the documents and extract the requisite information themselves. They might then enter the information in a spreadsheet and produce a chart for a report or presentation. IE systems are more difficult and knowledge-intensive to build, and are to varying degrees tied to particular domains and scenarios.
  5. 5. Why IE is needed ?
  6. 6. Information Extraction (IE):is to automatically extract structured information from unstructured and/or semi structured documents.  It is system to analyses unrestricted text in order to extract information about pre-specified types of events, entities or relationships.  It is the automatic extraction of structured information from unstructured documents.
  7. 7.  It is systems to extract clear, factual information from unstructured document. Roughly: Who did what to whom when?  It is the task of automatically extracting structured information from unstructured data and semi- structured documents.
  8. 8.  Unstructured data is a data which includes web pages, text documents, office documents, presentations, emails,…It doesn’t have a data model.  It’s also referred to as “dark matter“.
  9. 9. Information Extraction split into five types: these are 1.Named Entity recognition (NE) - The simplest and most reliable IE technology.  This about identifying textual information relating to people, organizations, places, brands, products and so on.  . These are typically nouns and proper nouns.
  10. 10. 2. Co reference resolution (CO)-it involves identifying identity relations between entities in texts.  These entities are both those identified by NE recognition and anaphoric references to that entities.
  11. 11. Conti...... 3. Template Element construction (TE) - The TE task builds on NE recognition and co reference resolution, associating descriptive information with the entities. 4. Template Relation construction (TR)- Finds relations between TE entities.  This helps IR systems to answer particular information-seeking queries.
  12. 12. 5.Scenario Template production (ST)-It Fits TE and TR results into specified event scenarios. Scenario templates (STs) are the prototypical outputs .
  13. 13.   NE- is about finding entities; CO- about which entities and references (such as pronouns) refer to the same thing;   TE- about what attributes entities have; TR- about what relationships between entities there are;  ST- about events that the entities participate.
  14. 14. APPLICATION OF INFORMATION EXTRACTION 1. Financial Analysts:- IE can enable analysts to answer questions such as, How many instances predicting strong performance for a particular company are out there ?
  15. 15. 2. Marketing Strategists:- IE can be used to create a range of media metrics, for example the media distance, or extent of collocation between concepts and products/companies. 3. Public Relation Workers (PR):-Public relations staff are concerned to identify negative reporting events as quickly as possible in order to respond .
  16. 16.    Some of the function of IEs are: To retrieving and storing structured data, To transform unstructured data into something that can be reasoned with.  To extract automatically structured information from unstructured and/or semi-structured machine-readable documents.
  17. 17. Information Extraction is not Information Retrieval. Information Retrieval- refers to the human-computer interaction (HCI) that happens when we use a machine to search a body of information for information objects (content) that match our search query.  It is used to reduce what has been called "information overload”
  18. 18.  Information Extraction-is to automatically extract structured information from unstructured documents.  It refers to the machine's ability to automatically extract structured information. Generally,  IR is there to find relevant documents but,  IE is there to extract relevant information from the documents
  19. 19.      Information extraction systems search large bodies of unrestricted text for specific types of entities and relations, and use them to populate well-organized databases. These databases can then be used to find answers for specific questions. The typical architecture for an information extraction system begins by segmenting, tokenizing, and part-of-speech tagging the text. The resulting data is then searched for specific types of entity. Finally, the information extraction system looks at entities that are mentioned near one another in the text, and tries to determine whether specific relationships hold between those entities.