SlideShare a Scribd company logo
1 of 12
Download to read offline
Crowdsourcing Event Extraction 
Aljaž Košmerlj, JenyaBelyaeva, Gregor Leban, 
Blaž Fortuna, Marko Grobelnik 
JozefStefan Institute
Goal 
Identifyandextractfeatures(info-box) aboutevents(e.g. earthquake, product launch…)reportedin thenews. 
Automaticallyextracting structured information about events from news articles is challenging. 
Even when limited to news articles there is little structure in the text 
Human annotators can alleviate shortcomings of automatic approaches 
Problem:expert annotators are expensive 
Solution:use crowdsourcing to lower costs
Eventtypeexample 
„San Bernardino, California was struck by a moderate earthquake on Thursday night, with shaking felt from Los Angeles to Orange County. 
A preliminary reading by the U.S. Geological Survey showed a 4.5-magnitude quake struck at 7:49pm.…“ 
Event type: earthquake 
Roles: 
•magnitude–What was the magnitude of the earthquake? 
•location –Where did the earthquake occur? 
•time–At what time did the earthquake occur? 
•…
Constraints and considerations 
Price of 1 $ –10 $ per article is acceptable 
The annotation process needs to be guided (semi-automatic) in order to be efficient, reliable and cheep. 
We can assume some highly skilled workers (e.g. editors) 
Schema of the extracted data has to be open end extensible
Eventextractionsubtasks 
1.Identify articles that can be meaningfully structured 
2.Identify a set of event types 
3.For each event type identify a set of roles (a template) 
4.For each new article identify its event type and fill the roles with the entities from the article
Annotation interface 
We annotate stories, not individual articles. A story is a cluster of articles about the same event. 
Sources of clusters: Event Registry, Google clusters… 
The articles are sent through the Enrycher* service (POS tagging, named entity extraction…) 
Entities proposed for annotation currently identified using only POS tags (sequences of numerals and nouns) 
Online annotation interface 
Front end: JavaScript 
Back end: Python 
* http://enrycher.ijs.si/
Interface 
http://aidemo.ijs.si/eventAnnotation (pick any username, leave password empty)
Recommenderofeventtypes 
QMiner[1]SVM classifier 
Trainingdata: 
100 stories~ 20 per eventtype 
5 eventtypes: bombing, productlaunch, protest, roadaccident, earthquake 
Features: 
event: concepts, title, summary 
articles: concepts, title 
Leave-one-out testing: 
퐶퐴=0.67 
With50 non-eventstories: 
퐶퐴=0.54 
[1] https://github.com/qminer/qminer
Evaluation
Evaluation-results 
11 annotators 
10 stories 
Overallstats: 
nr. entititesannotated: 13.4±6.9 
% entitiesannotated: 12.1%±3.1% 
nr. rolesfilled: 6.2±0.9 
Pairwiseannotatoragreement: 
nr. agreedeventtypes: 5.9±2.0 
jaccardindexper story: 0.25±0.09 
Recommendersuccess: 
1st recommendation: 6.6±1.9 
in firsttworecommendations: 7.2±2.0
Future work 
Improverecommender 
usepredicatesin features 
Testingin a „professional“ environment 
improvementin speed? 
whatis a „correct“ annotation? 
Buildinga taxonomyofeventtypes 
activelearning
Thank youfor your attention!

More Related Content

Similar to Crowdsourcing event extraction

NewsKDD 2014: Crowdsourcing event extraction (poster)
NewsKDD 2014: Crowdsourcing event extraction (poster)NewsKDD 2014: Crowdsourcing event extraction (poster)
NewsKDD 2014: Crowdsourcing event extraction (poster)Aljaž Košmerlj
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...Ansgar Scherp
 
Aaai 2011 event processing tutorial
Aaai 2011 event processing tutorialAaai 2011 event processing tutorial
Aaai 2011 event processing tutorialOpher Etzion
 
Ai history to-m-learning
Ai history to-m-learningAi history to-m-learning
Ai history to-m-learningKyung Eun Park
 
Processing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentProcessing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentHaystax Technology
 

Similar to Crowdsourcing event extraction (6)

NewsKDD 2014: Crowdsourcing event extraction (poster)
NewsKDD 2014: Crowdsourcing event extraction (poster)NewsKDD 2014: Crowdsourcing event extraction (poster)
NewsKDD 2014: Crowdsourcing event extraction (poster)
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
Aaai 2011 event processing tutorial
Aaai 2011 event processing tutorialAaai 2011 event processing tutorial
Aaai 2011 event processing tutorial
 
Ai history to-m-learning
Ai history to-m-learningAi history to-m-learning
Ai history to-m-learning
 
Processing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentProcessing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk Assessment
 

Recently uploaded

Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsSumathi Arumugam
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxROVELYNEDELUNA3
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
MARSILEA notes in detail for II year Botany.ppt
MARSILEA  notes in detail for II year Botany.pptMARSILEA  notes in detail for II year Botany.ppt
MARSILEA notes in detail for II year Botany.pptaigil2
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsSafaFallah
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024suelcarter1
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusPradnya Wadekar
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestAkashDTejwani
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfNetHelix
 
Lehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptLehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptSachin Teotia
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxUalikhanKalkhojayev1
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 

Recently uploaded (20)

Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery Systems
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
MARSILEA notes in detail for II year Botany.ppt
MARSILEA  notes in detail for II year Botany.pptMARSILEA  notes in detail for II year Botany.ppt
MARSILEA notes in detail for II year Botany.ppt
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibiotics
 
Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabus
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening Test
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdf
 
Lehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.pptLehninger_Chapter 17_Fatty acid Oxid.ppt
Lehninger_Chapter 17_Fatty acid Oxid.ppt
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptx
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 

Crowdsourcing event extraction

  • 1. Crowdsourcing Event Extraction Aljaž Košmerlj, JenyaBelyaeva, Gregor Leban, Blaž Fortuna, Marko Grobelnik JozefStefan Institute
  • 2. Goal Identifyandextractfeatures(info-box) aboutevents(e.g. earthquake, product launch…)reportedin thenews. Automaticallyextracting structured information about events from news articles is challenging. Even when limited to news articles there is little structure in the text Human annotators can alleviate shortcomings of automatic approaches Problem:expert annotators are expensive Solution:use crowdsourcing to lower costs
  • 3. Eventtypeexample „San Bernardino, California was struck by a moderate earthquake on Thursday night, with shaking felt from Los Angeles to Orange County. A preliminary reading by the U.S. Geological Survey showed a 4.5-magnitude quake struck at 7:49pm.…“ Event type: earthquake Roles: •magnitude–What was the magnitude of the earthquake? •location –Where did the earthquake occur? •time–At what time did the earthquake occur? •…
  • 4. Constraints and considerations Price of 1 $ –10 $ per article is acceptable The annotation process needs to be guided (semi-automatic) in order to be efficient, reliable and cheep. We can assume some highly skilled workers (e.g. editors) Schema of the extracted data has to be open end extensible
  • 5. Eventextractionsubtasks 1.Identify articles that can be meaningfully structured 2.Identify a set of event types 3.For each event type identify a set of roles (a template) 4.For each new article identify its event type and fill the roles with the entities from the article
  • 6. Annotation interface We annotate stories, not individual articles. A story is a cluster of articles about the same event. Sources of clusters: Event Registry, Google clusters… The articles are sent through the Enrycher* service (POS tagging, named entity extraction…) Entities proposed for annotation currently identified using only POS tags (sequences of numerals and nouns) Online annotation interface Front end: JavaScript Back end: Python * http://enrycher.ijs.si/
  • 7. Interface http://aidemo.ijs.si/eventAnnotation (pick any username, leave password empty)
  • 8. Recommenderofeventtypes QMiner[1]SVM classifier Trainingdata: 100 stories~ 20 per eventtype 5 eventtypes: bombing, productlaunch, protest, roadaccident, earthquake Features: event: concepts, title, summary articles: concepts, title Leave-one-out testing: 퐶퐴=0.67 With50 non-eventstories: 퐶퐴=0.54 [1] https://github.com/qminer/qminer
  • 10. Evaluation-results 11 annotators 10 stories Overallstats: nr. entititesannotated: 13.4±6.9 % entitiesannotated: 12.1%±3.1% nr. rolesfilled: 6.2±0.9 Pairwiseannotatoragreement: nr. agreedeventtypes: 5.9±2.0 jaccardindexper story: 0.25±0.09 Recommendersuccess: 1st recommendation: 6.6±1.9 in firsttworecommendations: 7.2±2.0
  • 11. Future work Improverecommender usepredicatesin features Testingin a „professional“ environment improvementin speed? whatis a „correct“ annotation? Buildinga taxonomyofeventtypes activelearning
  • 12. Thank youfor your attention!