SlideShare a Scribd company logo
In today's web
Information Extraction
from the Web
Benjamin Habegger
University of Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205
Seminary on Information Extraction from the Web
ENSIAS, Rabat, Morocco - June 19, 2013
About Me
@b_habegger
http://www.linkedin.com/in/benjaminhabegger
benjamin.habegger@insa-lyon.fr
Where is the web today ?
Web of humans
● Interlinked documents
● Social Web
● Web 2.0
● Crowd-sourcing
Web of machines
● REST / API
● Service Interaction
● Open Data
● Semantic Web
Somehow we're creating 2 webs
Web of DataWeb of humans
HTML
Javascript
CSS
RDF
REST
SPARQL
There are some interactions
Open data still has some way to go
Data thrown on the web in its original format
● Not many standardized formats
● Not many standardized semantics
● Can be
– An Excel, CSV file
– A REST service
Still the Linked Open Data and
Semantic Web are emerging
● Vocabularies
– Foaf
– Dublin Core
– …
● Datasets
– DBPedia
– ...
But still, can't we dream a little ?
Having (a little) smarter machines...
Shared web
Learning capabilities
Making our web robots smarter
could even help improve our web...
What does the following query give you today ?
“lyon informatique emploi”
Do you see any jobs there ?
Nope, listing of pages which
contain lists of jobs, ...
There's still a long way to go...
but information extraction from the web
is a little step in making machines smarter
And there are many people
interested out there...
Freelancer.com search for web scrapping
So where does information
extraction from the web fit in ?
Open DataOpen Data
Linked DataLinked Data
Semantic WebSemantic Web
Information ExtractionInformation Extraction
Machine LearningMachine Learning
Pattern MiningPattern Mining
Data IntegrationData Integration
Standardized VocabulariesStandardized Vocabularies
Machine LearningMachine Learning
Web ScrappingWeb Scrapping
And what is it about ?
...
Data for humans
Data for machines
How do we do that ?
We'll see that after the break :)
http://www.slideshare.net/BenjaminHabegger/2013-06ensiasrabatiealg

More Related Content

Viewers also liked

Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
Anna Lisa Gentile
 
Anne-Catherine Gerber 1954 - 2015
Anne-Catherine Gerber 1954 - 2015Anne-Catherine Gerber 1954 - 2015
Anne-Catherine Gerber 1954 - 2015
Benjamin Habegger
 
Feedback from a startup experience in collaboration with academia
Feedback from a startup experience in collaboration with academiaFeedback from a startup experience in collaboration with academia
Feedback from a startup experience in collaboration with academia
Benjamin Habegger
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
Matthew Rowe
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
Matthew Rowe
 
Comparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache StanbolComparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache Stanbol
Vladimir Alexiev, PhD, PMP
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
Benjamin Habegger
 

Viewers also liked (7)

Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
Anne-Catherine Gerber 1954 - 2015
Anne-Catherine Gerber 1954 - 2015Anne-Catherine Gerber 1954 - 2015
Anne-Catherine Gerber 1954 - 2015
 
Feedback from a startup experience in collaboration with academia
Feedback from a startup experience in collaboration with academiaFeedback from a startup experience in collaboration with academia
Feedback from a startup experience in collaboration with academia
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Comparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache StanbolComparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache Stanbol
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 

Similar to Information Extraction from the Web - In today's web

CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP
 
Microblogging: A Semantic Web and Distributed Approach
Microblogging: A Semantic Web and Distributed ApproachMicroblogging: A Semantic Web and Distributed Approach
Microblogging: A Semantic Web and Distributed Approach
Alexandre Passant
 
The Semantic Web: The Why? What? How?
The Semantic Web: The Why? What? How?The Semantic Web: The Why? What? How?
The Semantic Web: The Why? What? How?
iLinkoln Meetup
 
Semantic Web 2.0
Semantic Web 2.0Semantic Web 2.0
Semantic Web 2.0
hchen1
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
animove
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson
 
The semantic web
The semantic webThe semantic web
The semantic web
Dotkumo
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
Hakka Labs
 
Web Developments & Trends
Web Developments & TrendsWeb Developments & Trends
Web Developments & Trends
Guus van den Brekel
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql Database
Karen Oliver
 
Building Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopBuilding Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On Hadoop
Nikolai Avteniev
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic Web
William McKee
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
Sören Auer
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
M. Tamer Özsu
 
The Web, The User and the Library (and why to get in between)
The Web, The User and the Library (and why to get in between)The Web, The User and the Library (and why to get in between)
The Web, The User and the Library (and why to get in between)
Guus van den Brekel
 
Apprendre Via les Objets Xin Chen
Apprendre Via les Objets  Xin ChenApprendre Via les Objets  Xin Chen
Apprendre Via les Objets Xin Chen
cecilechen85
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
Andrea Volpini
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
IOSR Journals
 
Semantic Web: Explanation
Semantic Web: ExplanationSemantic Web: Explanation
Semantic Web: Explanation
Anil Mishra
 
Office 2010 cloud computing farhad_javidi
Office 2010 cloud computing farhad_javidiOffice 2010 cloud computing farhad_javidi
Office 2010 cloud computing farhad_javidi
javidi
 

Similar to Information Extraction from the Web - In today's web (20)

CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
 
Microblogging: A Semantic Web and Distributed Approach
Microblogging: A Semantic Web and Distributed ApproachMicroblogging: A Semantic Web and Distributed Approach
Microblogging: A Semantic Web and Distributed Approach
 
The Semantic Web: The Why? What? How?
The Semantic Web: The Why? What? How?The Semantic Web: The Why? What? How?
The Semantic Web: The Why? What? How?
 
Semantic Web 2.0
Semantic Web 2.0Semantic Web 2.0
Semantic Web 2.0
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
The semantic web
The semantic webThe semantic web
The semantic web
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
 
Web Developments & Trends
Web Developments & TrendsWeb Developments & Trends
Web Developments & Trends
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql Database
 
Building Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopBuilding Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On Hadoop
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic Web
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
The Web, The User and the Library (and why to get in between)
The Web, The User and the Library (and why to get in between)The Web, The User and the Library (and why to get in between)
The Web, The User and the Library (and why to get in between)
 
Apprendre Via les Objets Xin Chen
Apprendre Via les Objets  Xin ChenApprendre Via les Objets  Xin Chen
Apprendre Via les Objets Xin Chen
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
Semantic Web: Explanation
Semantic Web: ExplanationSemantic Web: Explanation
Semantic Web: Explanation
 
Office 2010 cloud computing farhad_javidi
Office 2010 cloud computing farhad_javidiOffice 2010 cloud computing farhad_javidi
Office 2010 cloud computing farhad_javidi
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 

Information Extraction from the Web - In today's web

  • 1. In today's web Information Extraction from the Web Benjamin Habegger University of Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205 Seminary on Information Extraction from the Web ENSIAS, Rabat, Morocco - June 19, 2013
  • 3. Where is the web today ? Web of humans ● Interlinked documents ● Social Web ● Web 2.0 ● Crowd-sourcing Web of machines ● REST / API ● Service Interaction ● Open Data ● Semantic Web
  • 4. Somehow we're creating 2 webs Web of DataWeb of humans HTML Javascript CSS RDF REST SPARQL
  • 5. There are some interactions
  • 6. Open data still has some way to go Data thrown on the web in its original format ● Not many standardized formats ● Not many standardized semantics ● Can be – An Excel, CSV file – A REST service
  • 7. Still the Linked Open Data and Semantic Web are emerging ● Vocabularies – Foaf – Dublin Core – … ● Datasets – DBPedia – ...
  • 8. But still, can't we dream a little ? Having (a little) smarter machines... Shared web Learning capabilities
  • 9. Making our web robots smarter could even help improve our web... What does the following query give you today ? “lyon informatique emploi”
  • 10. Do you see any jobs there ?
  • 11. Nope, listing of pages which contain lists of jobs, ...
  • 12. There's still a long way to go... but information extraction from the web is a little step in making machines smarter
  • 13. And there are many people interested out there... Freelancer.com search for web scrapping
  • 14. So where does information extraction from the web fit in ? Open DataOpen Data Linked DataLinked Data Semantic WebSemantic Web Information ExtractionInformation Extraction Machine LearningMachine Learning Pattern MiningPattern Mining Data IntegrationData Integration Standardized VocabulariesStandardized Vocabularies Machine LearningMachine Learning Web ScrappingWeb Scrapping
  • 15. And what is it about ? ... Data for humans Data for machines
  • 16. How do we do that ? We'll see that after the break :) http://www.slideshare.net/BenjaminHabegger/2013-06ensiasrabatiealg