Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cutting Long Stories Short
Fact Extraction from Wikipedia
Marco Fossati
fossati@spaziodati.eu
Poznan, 25th June 2015
What?
A Google Summer of Code Project for DBpedia
What?
Teaching Machines
to Read
Natural Language
Why?
Text Contains a Huge Amount of Knowledge
Why?
DBpedia Focuses on Semi-structured Data
Discovery of New Relations
Automatic Knowledge Base Population
How?
Machine Learning
+
Lexical Semantics
How?
Poland victory World Cup 2014
“Poland won the World Cup in 2014”
Approach
1. Lexical Units
1.1.Extraction via POS Tagging
1.2.Statistical Ranking
2. Frame Database (FrameNet, Kicktionary)...
Approach
3. Frame + Frame Elements Classification
Unsupervised, Rule-based
Supervised
4. Crowdsourced Training Set Construc...
Crowdsourcing the Annotation
Label words with Frame Elements
Use Case
Soccer Domain
Widely Represented (223.000 articles)
Lots of Semi-structured Data
Italian Wikipedia
Wanna contribute?
https://github.com/dbpedia/
fact-extractor
That’s all Folks!
Marco Fossati
fossati@spaziodati.eu
Upcoming SlideShare
Loading in …5
×

Fact Extraction from Wikipedia

793 views

Published on

Talk given at the 4th International DBpedia Meeting in Poznan, Poland.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fact Extraction from Wikipedia

  1. 1. Cutting Long Stories Short Fact Extraction from Wikipedia Marco Fossati fossati@spaziodati.eu Poznan, 25th June 2015
  2. 2. What? A Google Summer of Code Project for DBpedia
  3. 3. What? Teaching Machines to Read Natural Language
  4. 4. Why? Text Contains a Huge Amount of Knowledge
  5. 5. Why? DBpedia Focuses on Semi-structured Data Discovery of New Relations Automatic Knowledge Base Population
  6. 6. How? Machine Learning + Lexical Semantics
  7. 7. How? Poland victory World Cup 2014 “Poland won the World Cup in 2014”
  8. 8. Approach 1. Lexical Units 1.1.Extraction via POS Tagging 1.2.Statistical Ranking 2. Frame Database (FrameNet, Kicktionary) The Data-driven Way
  9. 9. Approach 3. Frame + Frame Elements Classification Unsupervised, Rule-based Supervised 4. Crowdsourced Training Set Construction 5. RDF Serialization The Data-driven Way
  10. 10. Crowdsourcing the Annotation Label words with Frame Elements
  11. 11. Use Case Soccer Domain Widely Represented (223.000 articles) Lots of Semi-structured Data Italian Wikipedia
  12. 12. Wanna contribute? https://github.com/dbpedia/ fact-extractor
  13. 13. That’s all Folks! Marco Fossati fossati@spaziodati.eu

×