• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Deep Information and Extraction Tool

Deep Information and Extraction Tool






Total Views
Views on SlideShare
Embed Views



10 Embeds 41

http://www.flintbox.com 15
http://techupdateunivalor.wordpress.com 11
http://www.linkedin.com 4
http://louisville.flintbox.com 2
https://flintbox.com 2
http://www.slideshare.net 2
http://www.slideee.com 2 1
http://www.flintbox.ca 1
http://univalor.flintbox.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Good Morning, My mane is Thomas and I work for Univalor, the commercialization arm of Montréal Universitity

Deep Information and Extraction Tool Deep Information and Extraction Tool Presentation Transcript

  • Thomas Martinuzzo, Jr. Eng.
    • What is DIET ?
      • DIET is an information extraction and manipulation tool
      • DIET can extract information from the DEEP web by understanding
        • pages structures
    Web surface : 20 Billion pages indexed by search engines DEEP web : +600 Billion pages «  The 60 largest Deep Web sources contain 84 billion pages of content. That's about 750 terabytes of information, sufficient by themselves to exceed the size of the surface Web by 40 times.  » Brightplanet.com Pic from Maxumowners.org
    • DIET Features & Benefits
      • Use artificial intelligence to build automatic wrappers
      • No to minimal user intervention
      • User can easily extract and manipulate information
    • Car website :
      • Characteristics: List of cars by name with description, date, price, picture … Over 100 pages of data !
      • Problem : No local search engine.
    • But … I am looking for Acura MDX 2005 or something like that !
    • Job website :
      • Characteristics: List of jobs by title with small description, salary, city. Over 800 jobs. Local search engine. Sort capabilities.
      • Problem : We can only see 10 jobs by page. Unable to search by salary range. Unable to sort by city.
    • BUT … I want to see all jobs over 75 000$ in one single page and save it for future consultation.
    • DIET Technologies
      • DIET Core Web Services
        • Access only by certified clients
      • DIET Web Application
        • Users and services managers
        • Web based application (JSP/Servlet/JavaServer Faces/JavaBean)
      • Based on Java EE 5/Glassfish/MySql technology
    • Univalor Website
      • List of new technology group by domains
      • Simple search engine available
    • Using DIET
      • We want to extract and them to manipulate all available technologies
      • Give Univalor technologies URL to DIET :
        • http://www.univalor.ca/companies_available_technologies.asp
    • Wrapper are generated
      • DIET creates a Wrapper by learning the structures of Univalor
        • Webpages.
      • DIET extracts data thru the Wrapper.
      • DIET displays the results
    • Manipulate information with DIET
      • Once the information was extracted, it can be manipulated.
    • Plug-in opportunity
      • DIET Core Web Services can be used by third party clients
      • Internet Explorer and Mozilla Firefox integration
    • Export capabilities
      • Extracted information can be export on multiple storages formats
    • And more …
      • Users can create their own Wrappers
      • DIET can be the perfect tool for DEEP search
    • Research and Development:
    • Samuel Pierre, Ph.D
    • [email_address]
    • Commercialization and licensing
    • Didier Leconte, MBA
    • [email_address]
    • Thomas Martinuzzo, Jr. Eng.
    • [email_address]
    Thanks !