Your SlideShare is downloading. ×
0
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
2009 God
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

2009 God

467

Published on

Published in: Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
467
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Information Extraction Tasks
    Yen Ling
    2009
    1
  • 2. Outline
    Information Integration
    Generating an Extractor
    Information Extraction Tasks
    2
  • 3. Introduction
    3
    Web site A
    Result pages A
    Web site B
    Result pages B
    Integrated Information
    pages
    Web site C
    Result pages C
  • 4. Introduction
    Information integration is the merging of information from disparate sources with differing conceptual, contextual and typographical representations.
    It is used in consolidation of data from unstructured or semi-structured resources.
    Final result will be displayed in rich modules, i.e. tables, lists, graphs and maps.
    Users could get them via RSS, gadget or mails.
    4
  • 5. Related Work
    Relations, Cards, and Search Templates
    UIST’07
    In Figure, three objects in the left of arrow stand for search templates, relations, cards.
    Cards in the right of arrow mean information after integrating.
    Leverage the supervised extractor.
    5
  • 6. Related Work
    Damia:Data Mashups for Intranet Applications
    SIGMOD’08
    Integrate information from the internal data source of company.
    Chiefs will operate the system easily and quickly without programmers.
    Employees will get mashups from a feed server.
    6
  • 7. Related Work
    Transcendence: Enabling a Personal View of the Deep Web
    IUI’08
    Leverage the unsupervised extractor
    Users must the use firefox browser, but GoD is not because it’s a web-based application.
    7
  • 8. Related Work
    User-centric Web Data Integration: Design and Implementation of Gadget on Demand System
    Leverage the unsupervised extractor
    Integrate information from multiple source
    Only have a few clicks to integrate information from multiple source.
    Users can use the system without the ability programming.
    8
  • 9. Related Work
    Dapper
    For purposes, it is similar to GoD.
    Leverage the supervised extractor.
    Provide a virtual browser to achieve “What You See Is What You Get”.
    It’s not like GoD to extract information from multiple source.
    9
  • 10. Web Information Extraction
    Full operators for a wrapper
    Mapping of an incoming query
    By hand
    The construction of an extractor
    Construct a base framework
    10
  • 11. Outline
    Information Integration
    Generating an Extractor
    Information Extraction Tasks
    11
  • 12. Analysis Different Extractors
    Unsupervised extractor
    Supervised extractor
    Induction based labeled page examples
    Knowledge-based extractors
    12
  • 13. GoD with Unsupervised/supervised Extractor
    13
    Supervised
    Input web pages
    Label page
    IE system
    Unsupervised
    Select Fields & Data
    Select Display Module
    Publish
    Integrate sources
  • 14. GoD with Unsupervised/supervised Extractor
    For extractor’s precision:
    Supervised > Unsupervised
    For user case flow:
    Unsupervised is easier then supervised.
    For designing the user interface:
    Supervised is more complex than unsupervised.
    14
  • 15. Extractor for GoD
    Problem Formulation:
    Give a web page and a pattern tree that FiVaTech produced.
    The task is to make the use of a pattern tree to extract data from a web page.
    The problem will become two sub-problems.
    Pattern matching
    Approximate matching of textual attributes
    15
  • 16. Extractor for GoD
    Preprocessing
    Pattern Tree
    Dom Tree
    Pattern matching
    Content
    Matching
    Candidate paths
    Data
    Existed Data
    16
  • 17. Extractor
    17
  • 27. Outline
    Information Integration
    Generating an Extractor
    Information Extraction Tasks
    18
  • 28. Information Integration Tasks
    Real-time system phase
    Users could use the system to create the gadget they think.
    Backgroud gadget execution phase
    The system will update the content of gadget periodically or for request.
    19
  • 29. Real-time System Phase
    Domain exists?
    No
    FiVaTech
    Data
    Web pages
    Yes
    Pattern Tree
    Get Pattern Tree from DB
    Extractor
    Extractor
    Data
    Data
    20
  • 30. Backgroud Gadget Execution Phase – Using Extractor
    DB
    Download web pages
    Gadget’s profile
    Web Pages
    Pattern Tree
    Extractor
    Update Gadget’s profile
    Data
    21
  • 31. Backgroud Gadget Execution Phase – Using Schema Matching
    DB
    Download web pages
    Gadget’s profile
    Web Pages
    Update Gadget’s profile
    FiVaTech
    Schema Matching
    Data
    Data
    22
  • 32. Future Work
    We will implement the Web information extraction system.
    We will also redesign easy-to-use interface and information integration chart.
    23
  • 33. Thanks for your time.
    24

×