Introduction to Natural Language Processing
Upcoming SlideShare
Loading in...5
×
 

Introduction to Natural Language Processing

on

  • 8,168 views

Natural Language Processing has matured a lot recently. With the availability of great open source tools complementing the needs of the Semantic Web we believe this field should be on the radar of all ...

Natural Language Processing has matured a lot recently. With the availability of great open source tools complementing the needs of the Semantic Web we believe this field should be on the radar of all software engineering professionals.

Statistics

Views

Total Views
8,168
Views on SlideShare
8,143
Embed Views
25

Actions

Likes
4
Downloads
253
Comments
0

3 Embeds 25

http://www.slideshare.net 23
https://twitter.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to Natural Language Processing Introduction to Natural Language Processing Presentation Transcript

  • Natural Language Processing Quick Introduction Rohit Nayak Talentica Software
    • Part 1: Semantic Web, Uses of NLP, Core Concepts, Intro to GATE
    • Part 2: GATE Detailed Demo
  • NLP 420
    • Falling Tree Hits, Kills OR Forest Service Worker
    • Time flies like an arrow
    • Choosing a Program to Improve Your Future
    • Monkeys like bananas when they wake up
    • Monkeys like bananas when they are ripe
  • I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘ intelligent agents ’ people have touted for ages will finally materialize. – Tim Berners -Lee , 1999
    • Disaster Type: earthquake
      • location: Afghanistan
      • date: 05/30/1998
      • magnitude: 6.9
      • epicenter: a remote part of the country
      • damage:
        • human-effect:
          • victim: Thousands of people
          • number: Thousands
          • outcome: dead
        • physical-effect:
          • object: entire villages
          • outcome: damaged
    QUAKE IN AFGHANISTAN Thousands of people are feared dead following... (voice-over) ... a powerful earthquake that hit Afghanistan today. The quake registered 6.9 on the Richter scale, centered in a remote part of the country . (on camera) Details now hard to come by, but reports say entire villages were buried by the quake .
  • Text Categorization Is the document about plants? sports? health and fitness? corporate acquisitions? … stock market? Document
  • Sentiment Classification Is the overall sentiment in the document positive? negative? In general, sentiment classification appears to be harder than categorizing by topic. Document
  • Information Extraction Information Extraction System text collection Who: _____ What: _____ Where:_____ When: _____ How: _____ Who: _____ What: _____ Where:_____ When: _____ How: _____ Who: _____ What: _____ Where:_____ When: _____ How: _____
  • Information Extraction (IE)
    • Recognition, tagging, and extraction into a structured representation, certain key elements of information, e.g. persons, companies, locations, organizations, from large collections of text.
    • These extractions can then be utilized for a range of applications including question-answering, visualization, and data mining.
  • Question-Answering
    • In contrast to Information Retrieval, which provides a list of potentially relevant documents in response to a user’s query
    • provides the user with either just the text of the answer itself or answer-providing passages.
  • Summarization
    • reduces a larger text into a shorter, yet richly constituted abbreviated narrative representation of the original document.
  • Machine Translation
    • perhaps the oldest of all NLP applications, various levels of NLP have been utilized in MT systems, ranging from the ‘word-based’ approach to applications that include higher levels of analysis.
  • Dialogue Systems
    • perhaps the omnipresent application of the future, in the systems envisioned by large providers of end-user applications.
    • Dialogue systems usually focus on a narrowly defined application (e.g. your refrigerator or home sound system),
    • currently utilize the phonetic and lexical levels of language. It is believed that utilization of all the levels of language processing explained above offer the potential for truly habitable dialogue systems.
  • Challenge of Semantic Web
    • Machine processable data to complement hypertext
    • Attach metadata to documents
      • Explicit: title, author, creation date
      • Implicit: deduced information like names of entities and their relation
  • Ontology
    • Specification of conceptualisation
    • Basis of document “understanding”
    • Creating and populating is very time-consuming, practically impossible
  • Simple Workflow
    • Classification
    • Tokeniser
    • Gazetteer
    • Sentence Splitter
    • Parts Of Speech Tagging
    • Named Entity Tagging
    • Final Extraction
  • Tools
    • GATE
    • OpenNLP
    • NLTK (python)
    • Stanford Parser
    • Weka for classification
  • GATE
    • General Architecture for Text Engineering
    • Over 10 years, active development
    • Most popular NLP platform
    • Current version 5.0
    • Built as a framework for both programmers and developers
    • Powerful GUI and well-documented Java API
    • Multilingual
  • GATE
    • Clean separation of low-level tasks (e.g., data storage) from the NLP components
    • Separation between linguistic data and algorithms that process it
  • JAPE
    • Just A Pleasant Experience
    • Pattern-Matching over Annotations
    • Regular Expression like
    • Can use Java in actions
  • Rule: Company1 Priority: 25 ( ({Token.orthography == upperInitial})+ {Lookup.kind == companyDesignator} ):companyMatch --> :companyMatch.NamedEntity = {kind = "company", rule = "Company1"}
  • CREOLE components
    • GATE plugins uses CREOLE
    • Collection of Reusable Objects for Language Engineering
    • Modified JavaBeans with XML configuration
    • Minimal component: 10 lines of Java, 10 lines of XML
  • External Slideshow
    • http://www.authorstream.com/presentation/Esteban-22479-ekaw2006-tutorial-Aims-Terminology-Semantic-Annotation-Motivation-Challenge-Web-Metadata-ext-as-Entertainment-ppt-powerpoint/ (27)
  • GATE Demo
    • Quick look
    • Detailed Demo next SIG