Thomas Martinuzzo, Jr. Eng.
 
<ul><li>What is DIET ? </li></ul><ul><ul><li>DIET is an information extraction and manipulation tool </li></ul></ul><ul><u...
<ul><li>DIET Features & Benefits   </li></ul><ul><ul><li>Use artificial intelligence to build automatic wrappers </li></ul...
<ul><li>Car website : </li></ul><ul><ul><li>Characteristics:  List of cars by name with description, date, price, picture ...
<ul><li>DIET Technologies </li></ul><ul><ul><li>DIET Core Web Services </li></ul></ul><ul><ul><ul><li>Access only by certi...
<ul><li>Univalor Website </li></ul><ul><ul><li>List of new technology group by domains </li></ul></ul><ul><ul><li>Simple s...
<ul><li>Using DIET </li></ul><ul><ul><li>We want to extract and them to manipulate all available technologies </li></ul></...
<ul><li>Wrapper are generated </li></ul><ul><ul><li>DIET creates a Wrapper by learning the structures of Univalor  </li></...
<ul><li>Manipulate information with DIET </li></ul><ul><ul><li>Once the information was extracted, it can be manipulated. ...
<ul><li>Plug-in opportunity </li></ul><ul><ul><li>DIET Core Web Services can be used by third party clients </li></ul></ul...
<ul><li>Research and Development: </li></ul><ul><li>Samuel Pierre, Ph.D </li></ul><ul><li>[email_address] </li></ul><ul><l...
Upcoming SlideShare
Loading in …5
×

Deep Information and Extraction Tool

1,480 views

Published on

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,480
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Good Morning, My mane is Thomas and I work for Univalor, the commercialization arm of Montréal Universitity
  • Deep Information and Extraction Tool

    1. 1. Thomas Martinuzzo, Jr. Eng.
    2. 3. <ul><li>What is DIET ? </li></ul><ul><ul><li>DIET is an information extraction and manipulation tool </li></ul></ul><ul><ul><li>DIET can extract information from the DEEP web by understanding </li></ul></ul><ul><ul><ul><li>pages structures </li></ul></ul></ul>Web surface : 20 Billion pages indexed by search engines DEEP web : +600 Billion pages «  The 60 largest Deep Web sources contain 84 billion pages of content. That's about 750 terabytes of information, sufficient by themselves to exceed the size of the surface Web by 40 times.  » Brightplanet.com Pic from Maxumowners.org
    3. 4. <ul><li>DIET Features & Benefits </li></ul><ul><ul><li>Use artificial intelligence to build automatic wrappers </li></ul></ul><ul><ul><li>No to minimal user intervention </li></ul></ul><ul><ul><li>User can easily extract and manipulate information </li></ul></ul>
    4. 5. <ul><li>Car website : </li></ul><ul><ul><li>Characteristics: List of cars by name with description, date, price, picture … Over 100 pages of data ! </li></ul></ul><ul><ul><li>Problem : No local search engine. </li></ul></ul><ul><li>But … I am looking for Acura MDX 2005 or something like that ! </li></ul>… <ul><li>Job website : </li></ul><ul><ul><li>Characteristics: List of jobs by title with small description, salary, city. Over 800 jobs. Local search engine. Sort capabilities. </li></ul></ul><ul><ul><li>Problem : We can only see 10 jobs by page. Unable to search by salary range. Unable to sort by city. </li></ul></ul><ul><li>BUT … I want to see all jobs over 75 000$ in one single page and save it for future consultation. </li></ul>
    5. 6. <ul><li>DIET Technologies </li></ul><ul><ul><li>DIET Core Web Services </li></ul></ul><ul><ul><ul><li>Access only by certified clients </li></ul></ul></ul><ul><ul><li>DIET Web Application </li></ul></ul><ul><ul><ul><li>Users and services managers </li></ul></ul></ul><ul><ul><ul><li>Web based application (JSP/Servlet/JavaServer Faces/JavaBean) </li></ul></ul></ul><ul><ul><li>Based on Java EE 5/Glassfish/MySql technology </li></ul></ul>
    6. 7. <ul><li>Univalor Website </li></ul><ul><ul><li>List of new technology group by domains </li></ul></ul><ul><ul><li>Simple search engine available </li></ul></ul>
    7. 8. <ul><li>Using DIET </li></ul><ul><ul><li>We want to extract and them to manipulate all available technologies </li></ul></ul><ul><ul><li>Give Univalor technologies URL to DIET : </li></ul></ul><ul><ul><ul><li>http://www.univalor.ca/companies_available_technologies.asp </li></ul></ul></ul>
    8. 9. <ul><li>Wrapper are generated </li></ul><ul><ul><li>DIET creates a Wrapper by learning the structures of Univalor </li></ul></ul><ul><ul><ul><li>Webpages. </li></ul></ul></ul><ul><ul><li>DIET extracts data thru the Wrapper. </li></ul></ul><ul><ul><li>DIET displays the results </li></ul></ul>
    9. 10. <ul><li>Manipulate information with DIET </li></ul><ul><ul><li>Once the information was extracted, it can be manipulated. </li></ul></ul>
    10. 11. <ul><li>Plug-in opportunity </li></ul><ul><ul><li>DIET Core Web Services can be used by third party clients </li></ul></ul><ul><ul><li>Internet Explorer and Mozilla Firefox integration </li></ul></ul><ul><li>Export capabilities </li></ul><ul><ul><li>Extracted information can be export on multiple storages formats </li></ul></ul><ul><li>And more … </li></ul><ul><ul><li>Users can create their own Wrappers </li></ul></ul><ul><ul><li>DIET can be the perfect tool for DEEP search </li></ul></ul>
    11. 12. <ul><li>Research and Development: </li></ul><ul><li>Samuel Pierre, Ph.D </li></ul><ul><li>[email_address] </li></ul><ul><li>Commercialization and licensing </li></ul><ul><li>Didier Leconte, MBA </li></ul><ul><li>[email_address] </li></ul><ul><li>Thomas Martinuzzo, Jr. Eng. </li></ul><ul><li>[email_address] </li></ul><ul><li>  </li></ul>Thanks !

    ×