SlideShare a Scribd company logo
1 of 28
Computational Support in eHumanities
Proof of concept produced during
CLARIN’s Creative Camp
Talk of Europe
Wim Peters
Adam Funk
University of Sheffield, UK
w.peters@sheffield.ac.uk
a.funk@sheffield.ac.uk
CLARIN’s Creative Camp
Talk of Europe
• Our main aim in this event:
• Term identification and structuring
in ToE and UK Parliament data
• Linking ToE and UK Parliament terminology
• Automatic enrichment of ToE data set
• http://linkedpolitics.ops.few.vu.nl/home
Data set 1
• Talk of Europe data set
• Plenary debates of the European Parliament
as Linked Open Data
• http://linkedpolitics.ops.few.vu.nl/
Data set 2
• UK Parliamentary Archives
UK Parliamentary Archives
http://www.parliament.uk/business/publications/parliamentary-archives/
ParlParse
• Speeches scraped from UK Parliamentary web
site
• Converted in to structured XML
representations
http://parser.theyworkforyou.com/
Workflow
Output
• For terms in each data set:
– Terms
– Term hierarchies
– Term clusters
– Sententence-based sentiment context
• Between data sets:
– Term relatedness between terms
• To identify and extract relevant information from the source
material, we use the GATE architecture for the production of
semantic metadata in the form of text annotations.
• GATE is a framework for language engineering applications, which
supports efficient and robust text processing including functionality
for both manual and automatic annotation.
• It is highly scalable and has been applied in many large text
processing projects;
• It is an open source desktop application written in Java that
provides a user interface for professional linguists and text
engineers to bring together a wide variety of natural language
processing tools and apply them to a set of documents.
General Architecture for Text
Engineering
• General Architecture for Text Engineering (GATE)
• open source framework which
supports plug-in NLP components
to process a corpus of text.
http://gate.ac.uk/
Free system download and training courses
LEX 2014, Ravenna, Italy
General Architecture for Text
Engineering
Advantages
• Reproducibility
• Reusability
• Flexibility
• Customisability to scholarly requirements
regarding research questions and analysis
methodology
• http://www.gate.ac.uk
Text Annotations
Term Extraction
• TermRaider
• http://www.dcs.shef.ac.uk/~wim/termraider.html
• automatically provides domain-specific noun phrase
term candidates from a text corpus together with a
statistically derived termhood score.
• Possible terms are filtered by means of a multi-word-
unit grammar that defines the possible sequences of
part of speech tags constituting noun phrases.
• It computes various termhood scores such as Kyoto
Domain Relevance and frequency/inverted document
frequency (TF/IDF). The scores indicate the salience of
each term candidate for each document in the corpus.
KYOTO domain relevance score
• df* (1+nh)
– df: number of documents in the current corpora
containing the term
– nf: number of hyponymic term candidates
• W. Bosma and P. Vossen. Bootstrapping language-neutral term extraction.
In 7th Language Resources and Evaluation Conference (LREC), Valletta,
Malta (2010)
Tf-Idf
(WikiPedia)
Term Relatedness 1: Hyponyms
(rdf: skos:narrowerTransitive)
• Hierarchical relations between terms based on head phrase matching
• fight
– fight against all form of intolerance
• fight
– fight against serious crime and terrorism
• fight
– fight against all form of intolerance and discrimination
• fight
– fight against illegal drug and the organised crime
• fight
– fight against corruption and organised crime
• control
– efficient control
• efficient control of EU fund
Term relatedness 2: Clusters
• Compute Pointwise Mutual Information
– Pair-wise association score for terms that co-occur
within a context window (in our case sentences)
Cluster creation
• Simple clique algorithm
• https://en.wikipedia.org/wiki/Cluster_analysis
• Each cluster member (a term candidate with Kyoto
Domain Relevance score of > 70/100 is connected to all
other cluster members by means of a PMI score >
70/100
– Result: “statistical thesaurus”
– strongly associated groups of words
– Use enhance data exploration by expanding
searches with related terms (query expansion)
Clusters including “human rights”
ToE data
(manually highlighted elements indicative of contrast with UK
perspective)
• endvotecommissionnetworkprogrammefun
dingproposalreporttextlevelservicefreedom
fundconcernpresidentaccessbasisinternete
nforcementexampleinstrumentplasticmoney
EU policy
• recommendationpositionlevelchangecommu
nityrightpartapproachdiscussiondossierrega
rdopinionpolicyforcenegotiationaccountpub
licopportunityfight
Clusters including “human rights”
UK data
(manually highlighted elements indicative of contrast with EU
perspective)
• foreignpressanswerelection
• realiseMPspoliticianconsequenceclaimin
terestlessonpensionemployment
• incentiveaccountabilitymovementtreatme
ntwordyoung peopleassessment
Term Relatedness 3: Links between ToE
and UK terms
(rdf: skos:related)
• For now the link is limited to orthographic
overlap of terms’ canonical forms
– Lemmatised
– decapitalised
Sentiment Context for Terminology
• Sentences have a sentiment value of positive,
negative or neutral
• This allows the exploration of the emotional
load of the context in which terminology is
used
Added RDF
Why RDF output?
• Standard knowledge representation
• Queryable in SPARQL
• Slots additional knowledge into Talk of Europe
data model
Coverage of results
• Proof of concept
• EuroParliament
– 2 months (6546 speeches)
– 7900 term candidates
• UK Parliament
– 1 month (January 2014, 7571 UK speeches)
– 28000 term candidates
• Around 750000 triples
• 2900 relations between EU and UK terminology
Usability of data and methodology
• Assists further exploration of
parliamentarians’ styles, priorities and
perspectives through term usage and context
– E.g. compare cluster members of terms in order to
detect contrastive perspectives between ToE and
UK terminological use
– (see “human rights” example)
• Flexible methodology, re-usable on other data

More Related Content

What's hot

Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyCESSDA Training
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...IMPACT Centre of Competence
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...FAIRDOM
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...The European Library
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...Lora Aroyo
 
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
1-5 stars: Metadata on the Openness Level of Open Data Sets in EuropeSlim Turki, Dr.
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
 
Using Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementUsing Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementGary Wilhelm
 
Stayer cis 500 assignment 2 4 g wireless networks
Stayer cis 500 assignment 2 4 g wireless networksStayer cis 500 assignment 2 4 g wireless networks
Stayer cis 500 assignment 2 4 g wireless networksshyaminfo30
 
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'ScienceWorks
 
Semantic data integration proof of concept
Semantic data integration proof of conceptSemantic data integration proof of concept
Semantic data integration proof of conceptNicolas Bertrand
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 

What's hot (20)

Brislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evsBrislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evs
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary Linguistics
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
 
QALD-7 Question Answering over Linked Data Challenge
QALD-7 Question Answering over Linked Data ChallengeQALD-7 Question Answering over Linked Data Challenge
QALD-7 Question Answering over Linked Data Challenge
 
Qald 7 at ESWC2017
Qald 7 at ESWC2017Qald 7 at ESWC2017
Qald 7 at ESWC2017
 
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
Using Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementUsing Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data Management
 
Stayer cis 500 assignment 2 4 g wireless networks
Stayer cis 500 assignment 2 4 g wireless networksStayer cis 500 assignment 2 4 g wireless networks
Stayer cis 500 assignment 2 4 g wireless networks
 
Collaborate to Share
Collaborate to ShareCollaborate to Share
Collaborate to Share
 
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'
Erwin Folmer - Congres 'Data gedreven Beleidsontwikkeling'
 
Semantic data integration proof of concept
Semantic data integration proof of conceptSemantic data integration proof of concept
Semantic data integration proof of concept
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 

Viewers also liked

Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...
Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...
Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...ACBSP Global Accreditation
 
Settlement akibat konsolidasi(Andika johdi)
Settlement akibat konsolidasi(Andika johdi)Settlement akibat konsolidasi(Andika johdi)
Settlement akibat konsolidasi(Andika johdi)Andika Johdi II
 
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...ACBSP Global Accreditation
 
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...ACBSP Global Accreditation
 
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...ACBSP Global Accreditation
 
Enjoy english. 5 6 кл. биболетова м.з, добрынина н.в, трубанева н.н-2007 -20...
Enjoy english. 5 6 кл. биболетова м.з,  добрынина н.в, трубанева н.н-2007 -20...Enjoy english. 5 6 кл. биболетова м.з,  добрынина н.в, трубанева н.н-2007 -20...
Enjoy english. 5 6 кл. биболетова м.з, добрынина н.в, трубанева н.н-2007 -20...robinbad123100
 
Personal Brand Management Model for Student Career Achievement & Retention
Personal Brand Management Model for Student Career Achievement & RetentionPersonal Brand Management Model for Student Career Achievement & Retention
Personal Brand Management Model for Student Career Achievement & RetentionACBSP Global Accreditation
 
Integration and Assessment of Ethics in the Curriculum
Integration and Assessment of Ethics in the CurriculumIntegration and Assessment of Ethics in the Curriculum
Integration and Assessment of Ethics in the CurriculumACBSP Global Accreditation
 
7 ways to improve your public speaking
7 ways to improve your public speaking7 ways to improve your public speaking
7 ways to improve your public speakingCatharina Hart
 

Viewers also liked (17)

Images (1)
Images (1)Images (1)
Images (1)
 
Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...
Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...
Brad Kleindl - Degree to Enrollment Ratios and Persistence Rates, Meeting Rep...
 
Settlement akibat konsolidasi(Andika johdi)
Settlement akibat konsolidasi(Andika johdi)Settlement akibat konsolidasi(Andika johdi)
Settlement akibat konsolidasi(Andika johdi)
 
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...
Demonstrating the Application of Design Thinking Methodology in MBA Fieldwork...
 
Occupational Interest Schedule (OIS)
Occupational Interest Schedule (OIS)Occupational Interest Schedule (OIS)
Occupational Interest Schedule (OIS)
 
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...
Elana Carson and Dr. Joe F. Walenciak - Hip Hop in the Hallways: Creating a C...
 
daves C.V. 2015
daves C.V. 2015daves C.V. 2015
daves C.V. 2015
 
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...
Kelly Whealan George, Aaron Glassman, and Dixie Button - We’re Accredited! No...
 
Michelle Clayton Resume
Michelle Clayton ResumeMichelle Clayton Resume
Michelle Clayton Resume
 
Cbe in flex paced it programs-scc
Cbe in flex paced it programs-sccCbe in flex paced it programs-scc
Cbe in flex paced it programs-scc
 
Menlo Transfer-MEL
Menlo Transfer-MELMenlo Transfer-MEL
Menlo Transfer-MEL
 
Enjoy english. 5 6 кл. биболетова м.з, добрынина н.в, трубанева н.н-2007 -20...
Enjoy english. 5 6 кл. биболетова м.з,  добрынина н.в, трубанева н.н-2007 -20...Enjoy english. 5 6 кл. биболетова м.з,  добрынина н.в, трубанева н.н-2007 -20...
Enjoy english. 5 6 кл. биболетова м.з, добрынина н.в, трубанева н.н-2007 -20...
 
CV Sep 2015
CV Sep 2015CV Sep 2015
CV Sep 2015
 
Personal Brand Management Model for Student Career Achievement & Retention
Personal Brand Management Model for Student Career Achievement & RetentionPersonal Brand Management Model for Student Career Achievement & Retention
Personal Brand Management Model for Student Career Achievement & Retention
 
WebSphere Commerce Server Administrator
WebSphere Commerce Server AdministratorWebSphere Commerce Server Administrator
WebSphere Commerce Server Administrator
 
Integration and Assessment of Ethics in the Curriculum
Integration and Assessment of Ethics in the CurriculumIntegration and Assessment of Ethics in the Curriculum
Integration and Assessment of Ethics in the Curriculum
 
7 ways to improve your public speaking
7 ways to improve your public speaking7 ways to improve your public speaking
7 ways to improve your public speaking
 

Similar to Information Extraction from EuroParliament and UK Parliament data

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...The European Library
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluatedGESIS
 
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...Dominik Kowald
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for LibrariesThomas King
 
Creation of custom KOS-based recommendation systems
Creation of custom KOS-based recommendation systemsCreation of custom KOS-based recommendation systems
Creation of custom KOS-based recommendation systemsGESIS
 
Semantic interoperability courses training module 2 - core vocabularies v0.11
Semantic interoperability courses   training module 2 - core vocabularies v0.11Semantic interoperability courses   training module 2 - core vocabularies v0.11
Semantic interoperability courses training module 2 - core vocabularies v0.11Semic.eu
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
A Study of Licence Terms for Electronic Resource Management: Survey Results
A Study of Licence Terms for Electronic Resource Management: Survey ResultsA Study of Licence Terms for Electronic Resource Management: Survey Results
A Study of Licence Terms for Electronic Resource Management: Survey ResultsElectronic Resources & Libraries
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
Matthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundMatthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundTracy Kent
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentationEuropeana_Sounds
 

Similar to Information Extraction from EuroParliament and UK Parliament data (20)

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
TIDSR
TIDSRTIDSR
TIDSR
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...
WWW2014: Long Time No See: The Probability of Reusing Tags as a Function of F...
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for Libraries
 
Creation of custom KOS-based recommendation systems
Creation of custom KOS-based recommendation systemsCreation of custom KOS-based recommendation systems
Creation of custom KOS-based recommendation systems
 
Semantic interoperability courses training module 2 - core vocabularies v0.11
Semantic interoperability courses   training module 2 - core vocabularies v0.11Semantic interoperability courses   training module 2 - core vocabularies v0.11
Semantic interoperability courses training module 2 - core vocabularies v0.11
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
IR
IRIR
IR
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
Text Mining
Text MiningText Mining
Text Mining
 
A Study of Licence Terms for Electronic Resource Management: Survey Results
A Study of Licence Terms for Electronic Resource Management: Survey ResultsA Study of Licence Terms for Electronic Resource Management: Survey Results
A Study of Licence Terms for Electronic Resource Management: Survey Results
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Matthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundMatthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings Fund
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentation
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Information Extraction from EuroParliament and UK Parliament data

  • 1. Computational Support in eHumanities Proof of concept produced during CLARIN’s Creative Camp Talk of Europe Wim Peters Adam Funk University of Sheffield, UK w.peters@sheffield.ac.uk a.funk@sheffield.ac.uk
  • 2. CLARIN’s Creative Camp Talk of Europe • Our main aim in this event: • Term identification and structuring in ToE and UK Parliament data • Linking ToE and UK Parliament terminology • Automatic enrichment of ToE data set • http://linkedpolitics.ops.few.vu.nl/home
  • 3. Data set 1 • Talk of Europe data set • Plenary debates of the European Parliament as Linked Open Data • http://linkedpolitics.ops.few.vu.nl/
  • 4. Data set 2 • UK Parliamentary Archives
  • 6. ParlParse • Speeches scraped from UK Parliamentary web site • Converted in to structured XML representations
  • 8.
  • 10. Output • For terms in each data set: – Terms – Term hierarchies – Term clusters – Sententence-based sentiment context • Between data sets: – Term relatedness between terms
  • 11. • To identify and extract relevant information from the source material, we use the GATE architecture for the production of semantic metadata in the form of text annotations. • GATE is a framework for language engineering applications, which supports efficient and robust text processing including functionality for both manual and automatic annotation. • It is highly scalable and has been applied in many large text processing projects; • It is an open source desktop application written in Java that provides a user interface for professional linguists and text engineers to bring together a wide variety of natural language processing tools and apply them to a set of documents. General Architecture for Text Engineering
  • 12. • General Architecture for Text Engineering (GATE) • open source framework which supports plug-in NLP components to process a corpus of text. http://gate.ac.uk/ Free system download and training courses LEX 2014, Ravenna, Italy General Architecture for Text Engineering
  • 13. Advantages • Reproducibility • Reusability • Flexibility • Customisability to scholarly requirements regarding research questions and analysis methodology • http://www.gate.ac.uk
  • 15. Term Extraction • TermRaider • http://www.dcs.shef.ac.uk/~wim/termraider.html • automatically provides domain-specific noun phrase term candidates from a text corpus together with a statistically derived termhood score. • Possible terms are filtered by means of a multi-word- unit grammar that defines the possible sequences of part of speech tags constituting noun phrases. • It computes various termhood scores such as Kyoto Domain Relevance and frequency/inverted document frequency (TF/IDF). The scores indicate the salience of each term candidate for each document in the corpus.
  • 16. KYOTO domain relevance score • df* (1+nh) – df: number of documents in the current corpora containing the term – nf: number of hyponymic term candidates • W. Bosma and P. Vossen. Bootstrapping language-neutral term extraction. In 7th Language Resources and Evaluation Conference (LREC), Valletta, Malta (2010)
  • 18. Term Relatedness 1: Hyponyms (rdf: skos:narrowerTransitive) • Hierarchical relations between terms based on head phrase matching • fight – fight against all form of intolerance • fight – fight against serious crime and terrorism • fight – fight against all form of intolerance and discrimination • fight – fight against illegal drug and the organised crime • fight – fight against corruption and organised crime • control – efficient control • efficient control of EU fund
  • 19. Term relatedness 2: Clusters • Compute Pointwise Mutual Information – Pair-wise association score for terms that co-occur within a context window (in our case sentences)
  • 20. Cluster creation • Simple clique algorithm • https://en.wikipedia.org/wiki/Cluster_analysis • Each cluster member (a term candidate with Kyoto Domain Relevance score of > 70/100 is connected to all other cluster members by means of a PMI score > 70/100 – Result: “statistical thesaurus” – strongly associated groups of words – Use enhance data exploration by expanding searches with related terms (query expansion)
  • 21. Clusters including “human rights” ToE data (manually highlighted elements indicative of contrast with UK perspective) • endvotecommissionnetworkprogrammefun dingproposalreporttextlevelservicefreedom fundconcernpresidentaccessbasisinternete nforcementexampleinstrumentplasticmoney EU policy • recommendationpositionlevelchangecommu nityrightpartapproachdiscussiondossierrega rdopinionpolicyforcenegotiationaccountpub licopportunityfight
  • 22. Clusters including “human rights” UK data (manually highlighted elements indicative of contrast with EU perspective) • foreignpressanswerelection • realiseMPspoliticianconsequenceclaimin terestlessonpensionemployment • incentiveaccountabilitymovementtreatme ntwordyoung peopleassessment
  • 23. Term Relatedness 3: Links between ToE and UK terms (rdf: skos:related) • For now the link is limited to orthographic overlap of terms’ canonical forms – Lemmatised – decapitalised
  • 24. Sentiment Context for Terminology • Sentences have a sentiment value of positive, negative or neutral • This allows the exploration of the emotional load of the context in which terminology is used
  • 26. Why RDF output? • Standard knowledge representation • Queryable in SPARQL • Slots additional knowledge into Talk of Europe data model
  • 27. Coverage of results • Proof of concept • EuroParliament – 2 months (6546 speeches) – 7900 term candidates • UK Parliament – 1 month (January 2014, 7571 UK speeches) – 28000 term candidates • Around 750000 triples • 2900 relations between EU and UK terminology
  • 28. Usability of data and methodology • Assists further exploration of parliamentarians’ styles, priorities and perspectives through term usage and context – E.g. compare cluster members of terms in order to detect contrastive perspectives between ToE and UK terminological use – (see “human rights” example) • Flexible methodology, re-usable on other data