SlideShare a Scribd company logo
BiographyNet
Linking the world of History
Serge ter Braake, Antske Fokkens, Niels Ockeloen,
Susan Legêne, Guus Schreiber, Piek Vossen, et al.
The Network Institute, VU University Amsterdam
http://wm.cs.vu.nl http://www.biographynet.nl
October 2013
BiographyNet: Linking the world of history
General project info, February 2014
Overview of this presentation
• Introduction of the project
• What is E-history?
• Project goals
• Short overview of use cases
• Illustrative use case example
• Text mining using NLP
• Challenges
• Preliminary results
• Why provenance is important
• Requirements from the perspective of the Historian
• Requirements from the perspective of the Computer scientist
• The BiographyNet schema
• Extending the schema with Provenance
• Aggregated provenance information
• Detailed provenance information
• Demonstrator Interface
• First ideas and sketches
Overview
BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
E-humanities
Investigates what can be done in humanities with modern
techniques which we could not do before, or only with a
great deal of effort
What is E-history?
E-history
Sub domain of E-humanities which aims at improving existing methods
of historical research rather than introducing
a whole new way of doing historical research *
* Zaagsma, G.: Doing history in the digital age: history as a hybrid practice (2013)
http://gerbenzaagsma.org/blog/16-03-2013/doing-history-digital-age-history-hybrid-practice
BiographyNet: Linking the world of history
General project info, February 2014
BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
• Funded by the Netherlands eScience Center
• Partners are the Netherlands eScience Center, the
Huygens/ING Institute of the Royal Dutch Academy of
Sciences and VU University Amsterdam
• Starting Point: The Biographical Portal of the
Netherlands - http://www.biografischportaal.nl
• 125,000 short biographical descriptions with limited meta
data from a variety of Dutch biographical dictionaries
• 76,000 individuals
BiographyNet: Linking the world of history
General project info, February 2014
Short biographical descriptions
with limited meta data
0 20 40 60 80 100 120
Name
Category
Gender
Date of Death
Date of Birth
Place of Birth
Place of Death
Occupation
Religion
Father
Mother
Claim to Fame
Partner
Text
Name
Category
Gender
Date of Death
Date of Birth
Place of Birth
Place of Death
Occupation
Religion
Father
Mother
Claim to Fame
Partner
Text
Individuals with available information (%)
BiographyNet: Linking the world of history
General project info, February 2014
Main project goals
• Provide a richer historic knowledge base by creating a semantic layer on
top of the data from the Biographical Portal
• Convert the available data to RDF (first conversion available)
• Enrichments (NLP) and Aggregations
• Link to other sources
• Inspire Historians in setting up new research projects by providing them
with interesting leads
• Development of a demonstrator
• Quantitative analysis, visualisation and browsing techniques
• Re-usable deliverables
• Open-source release of the platform for analyzing texts about people
• Methodology for extraction of a relation network between
people, places and events
Project Goals
BiographyNet: Linking the world of history
General project info, February 2014
Currently 12 use cases developed involving quantitative
analysis, relation discovery, thematic research, etc.
• Simple:
• Group analysis of Governors-general
of the Dutch Indies
• More complex:
• When did Dutch elites get involved
with the ‘New World’?
• Highly complex:
• What can we say about nationalism in biographical
dictionaries from the nineteenth and twentieth century?
Use Case Overview
BiographyNet: Linking the world of history
General project info, February 2014
Governors-General of the Dutch Indies
• Highest Official in the Dutch Indies (1610-1949)
• 129 Biographies describing 71 individuals
• What can we say about these men as a group?
• What properties did they need to have to be appointed?
• Personal qualities
• Relations (already
more difficult)
Illustrative use case
BiographyNet: Linking the world of history
General project info, February 2014
Focus on the following information
• Family connections
• Parents
• Partner
• Children
• Dates
• Birth
• Appointment
• Death
• Motivation
• Education
• Religion
• Reasons for appointment
• Reasons for leaving the office
Governors General: Data Mining
BiographyNet: Linking the world of history
General project info, February 2014
Manual analysis
“More than one full week to manually mine this information
from the Biography Portal.” (Serge ter Braake)
The question
“Can a historian do this with (almost) the same results in
less than an hour when using the demonstrator?”
Governors General: Time and effort
BiographyNet: Linking the world of history
General project info, February 2014
Basic System for data enrichment using text:
• Identifying meta data in text
• Linguistically naïve supervised machine learning
• Linguistic processing
• Detection of (co-referenced) named-entities
(persons, places and dates) and events
• Concept identification
Text mining using Natural Language
Processing (NLP)
BiographyNet: Linking the world of history
General project info, February 2014
Challenges for NLP within BiographyNet:
• Deal with alternative spelling
• Texts vary from 19th century Dutch to contemporary Dutch
• Variations in the naming of people and places
• OCR-ed texts contain errors
• Used methods may introduce bias:
• Example: Location identification with GeoNames
Heuristic: On multiple possibilities, take the one in, or
closest to The Netherlands
• Problem: ‘America’ is a place in The Netherlands, but
what about trade with the new world?
NLP: Challenges
BiographyNet: Linking the world of history
General project info, February 2014
NLP: Preliminary results – Governors
0
10
20
30
40
50
60
70
80
90
100
metadata
text
Presence of information in text vs. meta data (% on 71 individuals)
BiographyNet: Linking the world of history
General project info, February 2014
Before development of the actual demonstrator can
commence, we first need to:
• Convert the data of the Biography Portal to RDF
• Prevent loss of information
• Devise a schema
• Structure the data
• Provide compatibility with other interesting sources
• Facilitate the recording of provenance information on the
manipulation of the data
Towards the demonstrator
BiographyNet: Linking the world of history
General project info, February 2014
Two main requirements for the demonstrator:
• A trace back to all original sources (texts and meta data) involved
in producing a certain result
• Which sources were used for the overall outcome and how often?
• What potentially relevant data was excluded from the end result?
• Which piece of data led to a specific result (e.g. the age of a specific
governor at his appointment)?
• Insight in the processes manipulating and selecting the data
• Indication of overall performance: Focus on recall or precision?
• Global description of the used heuristics should be provided
• Indication of responsibility: Who to contact when results are pulled
into question?
Requirements from the perspective
of the Historian
BiographyNet: Linking the world of history
General project info, February 2014
Reproducing results:
• Reproducing results in NLP is non-trivial
• Details in implementations or experimental setup can
influence results up to a point where they tell a different story
• Clear registration of all steps involved and storage of
intermediate system output can improve reproducibility
• Systematic testing can help to gain insight into the variation
of the outcome of our systems and hence lead to more
insight in their performance
Antske Fokkens, Marieke van Erp, Marten Postma, Ted Pedersen, Piek Vossen and Nuno
Freire (2013) Offspring from Reproduction Problems: What Replication Failure Teaches
Us. In: Proceedings of ACL 2013, Sofia, Bulgaria, August 2013.
Requirements from the perspective of the
Computer Scientist / Computational Linguist
BiographyNet: Linking the world of history
General project info, February 2014
Translation into requirements for the demonstrator:
• Facilitate Replication and Reproduction
• Recording of information on used tools such as Creator, version
number, etc.
• Recording of any kind of pre- / post-processing done on
input/output data.
• Recording of the intention behind the various steps in the NLP
pipeline, including made assumptions and possible biases.
• Intermediate results need to be preserved for debugging purposes
• The schema needs to be both generic and flexible
• NLP pipeline design can change
• Tools and their formats unclear towards the future
Requirements from the perspective of the
Computer Scientist / Computational Linguist
BiographyNet: Linking the world of history
General project info, February 2014
Foundations of the schema:
• Based on the structure of the original XML files
• Needs to facilitate the coupling of different biographies of the same
person, without compromising the original data
• Needs to facilitate the incorporation of several enrichments, following
from NLP, as well as aggregations
• Compatible with existing
schemas such as the
Europeana Data Model,
PROV, P-PLAN,
DC terms, etc.
The BiographyNet Schema
BiographyNet: Linking the world of history
General project info, February 2014
Purely syntactic conversion
• Preserve the original
structure of the data
• Prevent los of information
• Allow for reinterpretation of
the original data in the future
The conversion process
<XML> Very simplified BP XML Example
<BioDes>
<FileDes> Source Meta Data
<Author></Author>
</FileDes>
<PersonDes> Person Meta Data
<Name></Name>
</PersonDes>
<BioPart> Biographical Text
<Snippet></Snippet>
<BioPart>
</BioDes>
</XML>
BiographyNet: Linking the world of history
General project info, February 2014
Conversion steps:
• Retrieval of XML dump of the Biography Portal
• Initial conversion to ‘crude’ RDF
• Using ClioPatria and the XMLRDF
tool for ClioPatria
• RDF restructuring
• Correction of purely syntactic
inefficiencies in the data
• TODO: Linking to other sources
• Essential step in the
‘Linked Data’ philosophy
The conversion process
BiographyNet: Linking the world of history
General project info, February 2014
Provenance information is information on how Entities
come into existence
• What are entities?
• Documents, Articles, Pictures, etc.
• Basically anything that can be
‘produced’ by something or someone
• What kind of information?
• Who did what?
• Using which entities?
• In which processes?
• Why use the PROV-DM, i.e. PROV-O?
• PROV-DM now an official W3C recommendation
Adding Provenance Information
BiographyNet: Linking the world of history
General project info, February 2014
Based on the requirements for the demonstrator,
provenance needs to be modeled:
• From several perspectives:
• Information involved  Sources, but also: NER input data, etc.
• Processes involved  All steps in enrichment, aggregation, etc
• People involved  Who was responsible for pipeline, tool, etc.
• At multiple levels:
• An aggregated level,  Targeted at the Historian
i.e. per enrichment
• A detailed level, i.e. all  Targeted at the Computer Scientist and
individual processes  computational linguist
Provenance in BiographyNet
BiographyNet: Linking the world of history
General project info, February 2014
Needed to ensure credibility of the demonstrator, to
evaluate its performance and to improve the academic
status of the tool
• One needs to be able to validate results
• Replication: Retrieving the same results later using the
demonstrator
• Reproducibility: Manually by the historian
• The aggregated level – Targeted at the historian
• Which original sources where involved?
• Who to contact in case results are pulled into question?
• The detailed level – Targeted at the computer scientist
• Detailed information on each individual step
• Allows for debugging the internal processing pipeline
Recap: Why is provenance info
important for BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
BiographyNet: Schema illustration
http://www.biographynet.nl/schema
BiographyNet: Linking the world of history
General project info, February 2014
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
BiographyNet
Enrichment example
Thorbecke
Biographical
Description
File
Meta Data
NNBW
Person
Meta Data
“Thorbecke”
Biography
Parts
Birth
1798
Event
Biographical
Description
Enrichment
NLP
Pipeline
Person
Meta Data
Event
Birth
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
Zwolle
1798-01-14
prov:plan
BiographyNet: Linking the world of history
General project info, February 2014
Provenance and Plans (P-PLAN):* Represent the plans that
guided the execution of scientific processes
• ‘Plans’ describe the original idea behind an activity
• Each ‘Plan’ can consist of one or more ‘Steps’
• Each ‘Step’ corresponds to an ‘Activity’
• ‘Variables’ describe the input/output of an activity
• Structure, format, quantity, etc.
• Each ‘Variable’ corresponds with an input/output ‘Entity’ of an
‘Activity’
• ‘Plans’ have their own provenance info
• E.g. who was responsible for the creation of a plan?
*Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan
More than just Provenance:
BiographyNet: Linking the world of history
General project info, February 2014
P-PLAN is used to not only model what actually
happened, but also what was supposed to happen
• Forces the recording of what an activity and its
input/output should look like
• Provides abstract description of original idea behind activity
• As such, can provide info on heuristics and assumptions
• Allows for comparing the actual activity and its
input/output with the original plan and its variables
• Do they differ from each other and to what extend?
• Makes finding errors much easier, as more information is
available about what the input/output should look like
Why model plans besides provenance?
BiographyNet: Linking the world of history
General project info, February 2014
BiographyNet: Schema illustration
BiographyNet: Linking the world of history
General project info, February 2014
Activity
Plan
EntityEntity
Variable Variable
Agent
Agent
Association
Activit
Plan
Person
NLP
Tool
• The interface should be easy to use
• The demonstrator should inspire historians to
undertake new research and give
direction, rather than being the ‘closing factor’
in their research
• The interface should allow to ‘fine tune’
results returned upon an initial action
Interface: Focus
BiographyNet: Linking the world of history
General project info, February 2014
• Query composition
• Faceted browsing
• A combination
Interface: Options
BiographyNet: Linking the world of history
General project info, February 2014
• Drop down boxes
to select ‘Verbs’,
data elements
and relations
Interface: Query composition
BiographyNet: Linking the world of history
General project info, February 2014
• No explicit querying, but
convergence of the data through
browsing and selecting
• Provides better feedback to the user
• Allows for more direct and easier
adjustment of the selected data
Interface: Faceted browsing
BiographyNet: Linking the world of history
General project info, February 2014
Interface: Faceted browsing
• Query composition combined with faceted
browsing
• Create new facets by defining a query
– The result of the query is available as a subset of
the data by selecting the defined facet
– As such, combinable with other facets
• Method to integrate ‘open’ querying of the
data into a general interface and visualization
Interface: A combination
BiographyNet: Linking the world of history
General project info, February 2014
Interface: A combination
Question
Analysis
Selection
Process
Results
Data
Facets
BiographyNet: Linking the world of history
General project info, February 2014
Time and place
are primary elements
Interface: Demonstrator
Results
?
BiographyNet: Linking the world of history
General project info, February 2014
BiographyNet: Linking the world of history
General project info, February 2014
Main components of the demonstrator
• Initial schema available
• Schema models enrichments and aggregations alongside original
sources
• Allows for storing various levels of provenance information
• Model will be adapted while progressing with building the
demonstrator
• Initial conversion to RDF available
• Structure according to devised schema
• Next step is linking to external sources
• Initial NLP system setup available
• Preliminary results comparable with manual use case
• Interface
• First ideas and sketches
Current Status
BiographyNet: Linking the world of history
General project info, February 2014
Thank you for your attention
www.biographynet.nl
Feel free to ask questions
BiographyNet: Linking the world of history
General project info, February 2014

More Related Content

What's hot

The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
Robert H. McDonald
 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
TimelessFuture
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
Paolo Missier
 
co:op-READ-Convention Marburg - Basilis Gatos
co:op-READ-Convention Marburg - Basilis Gatosco:op-READ-Convention Marburg - Basilis Gatos
co:op-READ-Convention Marburg - Basilis Gatos
ICARUS - International Centre for Archival Research
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
Nattiya Kanhabua
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and Europeana
Antoine Isaac
 
It’s Not Just a Document: Using Government Data in Teaching and Research
It’s Not Just a Document: Using Government Data in Teaching and ResearchIt’s Not Just a Document: Using Government Data in Teaching and Research
It’s Not Just a Document: Using Government Data in Teaching and Research
Charleston Conference
 
Ir1
Ir1Ir1
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
Robert Sanderson
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
Fabien Gandon
 
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
Rafal Kasprowski
 

What's hot (11)

The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 
co:op-READ-Convention Marburg - Basilis Gatos
co:op-READ-Convention Marburg - Basilis Gatosco:op-READ-Convention Marburg - Basilis Gatos
co:op-READ-Convention Marburg - Basilis Gatos
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and Europeana
 
It’s Not Just a Document: Using Government Data in Teaching and Research
It’s Not Just a Document: Using Government Data in Teaching and ResearchIt’s Not Just a Document: Using Government Data in Teaching and Research
It’s Not Just a Document: Using Government Data in Teaching and Research
 
Ir1
Ir1Ir1
Ir1
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
 

Viewers also liked

Foto's maken en bewerken met je smartphone of tablet
Foto's maken en bewerken met je smartphone of tabletFoto's maken en bewerken met je smartphone of tablet
Foto's maken en bewerken met je smartphone of tablet
Ilse Depré
 
World History Review
World History ReviewWorld History Review
World History Review
bernsteinam
 
World history
World historyWorld history
World history
humzajamilpasha
 
Wikimedia workshop natuurfotografie
Wikimedia workshop natuurfotografie Wikimedia workshop natuurfotografie
Wikimedia workshop natuurfotografie
Sebastiaan ter Burg
 
A Brief History of the World
A Brief History of the WorldA Brief History of the World
A Brief History of the World
salinast
 
World History - Fun Facts
World History - Fun FactsWorld History - Fun Facts
World History - Fun Facts
Aaron Carn
 
The History Of Classical Music (1600 2000)
The History Of Classical Music (1600 2000)The History Of Classical Music (1600 2000)
The History Of Classical Music (1600 2000)
Alfredo Vazquez del Mercado
 
Famous People In World History
Famous People In World HistoryFamous People In World History
Famous People In World History
kylemagee
 
Historical timeline
Historical timelineHistorical timeline
Historical timeline
timelines156
 

Viewers also liked (9)

Foto's maken en bewerken met je smartphone of tablet
Foto's maken en bewerken met je smartphone of tabletFoto's maken en bewerken met je smartphone of tablet
Foto's maken en bewerken met je smartphone of tablet
 
World History Review
World History ReviewWorld History Review
World History Review
 
World history
World historyWorld history
World history
 
Wikimedia workshop natuurfotografie
Wikimedia workshop natuurfotografie Wikimedia workshop natuurfotografie
Wikimedia workshop natuurfotografie
 
A Brief History of the World
A Brief History of the WorldA Brief History of the World
A Brief History of the World
 
World History - Fun Facts
World History - Fun FactsWorld History - Fun Facts
World History - Fun Facts
 
The History Of Classical Music (1600 2000)
The History Of Classical Music (1600 2000)The History Of Classical Music (1600 2000)
The History Of Classical Music (1600 2000)
 
Famous People In World History
Famous People In World HistoryFamous People In World History
Famous People In World History
 
Historical timeline
Historical timelineHistorical timeline
Historical timeline
 

Similar to BiographyNet: Linking the world of History

2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
Dongpo Deng
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
Diane Rasmussen Pennington
 
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'OroDigital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Michael Mitchell
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
TimelessFuture
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
Vlad Posea
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
Bernhard Haslhofer
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base
Leila Zemmouchi-Ghomari
 
OER World Map Prototypes
OER World Map PrototypesOER World Map Prototypes
OER World Map Prototypes
ISKME
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
TimelessFuture
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
15. political discourseinthenewskb
15. political discourseinthenewskb15. political discourseinthenewskb
15. political discourseinthenewskb
ingeangevaare
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
shengjing 孙胜晶
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
Marko Grobelnik
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
Hazel Hall
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
Noreen Whysel
 
Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017
Europeana
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
National Information Standards Organization (NISO)
 
Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Agora User Committee Meeting 2013
Agora User Committee Meeting 2013
Lora Aroyo
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
Getaneh Alemu
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas
 

Similar to BiographyNet: Linking the world of History (20)

2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'OroDigital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base
 
OER World Map Prototypes
OER World Map PrototypesOER World Map Prototypes
OER World Map Prototypes
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
15. political discourseinthenewskb
15. political discourseinthenewskb15. political discourseinthenewskb
15. political discourseinthenewskb
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
 
Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Agora User Committee Meeting 2013
Agora User Committee Meeting 2013
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 

Recently uploaded

SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 

Recently uploaded (20)

SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 

BiographyNet: Linking the world of History

  • 1. BiographyNet Linking the world of History Serge ter Braake, Antske Fokkens, Niels Ockeloen, Susan Legêne, Guus Schreiber, Piek Vossen, et al. The Network Institute, VU University Amsterdam http://wm.cs.vu.nl http://www.biographynet.nl October 2013
  • 2. BiographyNet: Linking the world of history General project info, February 2014 Overview of this presentation • Introduction of the project • What is E-history? • Project goals • Short overview of use cases • Illustrative use case example • Text mining using NLP • Challenges • Preliminary results • Why provenance is important • Requirements from the perspective of the Historian • Requirements from the perspective of the Computer scientist • The BiographyNet schema • Extending the schema with Provenance • Aggregated provenance information • Detailed provenance information • Demonstrator Interface • First ideas and sketches Overview
  • 3. BiographyNet: Extracting relations between people, places and historic events • Multidisciplinary E-History Project What is BiographyNet? BiographyNet: Linking the world of history General project info, February 2014
  • 4. E-humanities Investigates what can be done in humanities with modern techniques which we could not do before, or only with a great deal of effort What is E-history? E-history Sub domain of E-humanities which aims at improving existing methods of historical research rather than introducing a whole new way of doing historical research * * Zaagsma, G.: Doing history in the digital age: history as a hybrid practice (2013) http://gerbenzaagsma.org/blog/16-03-2013/doing-history-digital-age-history-hybrid-practice BiographyNet: Linking the world of history General project info, February 2014
  • 5. BiographyNet: Extracting relations between people, places and historic events • Multidisciplinary E-History Project What is BiographyNet? BiographyNet: Linking the world of history General project info, February 2014
  • 6. BiographyNet: Extracting relations between people, places and historic events • Multidisciplinary E-History Project What is BiographyNet? • Funded by the Netherlands eScience Center • Partners are the Netherlands eScience Center, the Huygens/ING Institute of the Royal Dutch Academy of Sciences and VU University Amsterdam • Starting Point: The Biographical Portal of the Netherlands - http://www.biografischportaal.nl • 125,000 short biographical descriptions with limited meta data from a variety of Dutch biographical dictionaries • 76,000 individuals BiographyNet: Linking the world of history General project info, February 2014
  • 7. Short biographical descriptions with limited meta data 0 20 40 60 80 100 120 Name Category Gender Date of Death Date of Birth Place of Birth Place of Death Occupation Religion Father Mother Claim to Fame Partner Text Name Category Gender Date of Death Date of Birth Place of Birth Place of Death Occupation Religion Father Mother Claim to Fame Partner Text Individuals with available information (%) BiographyNet: Linking the world of history General project info, February 2014
  • 8. Main project goals • Provide a richer historic knowledge base by creating a semantic layer on top of the data from the Biographical Portal • Convert the available data to RDF (first conversion available) • Enrichments (NLP) and Aggregations • Link to other sources • Inspire Historians in setting up new research projects by providing them with interesting leads • Development of a demonstrator • Quantitative analysis, visualisation and browsing techniques • Re-usable deliverables • Open-source release of the platform for analyzing texts about people • Methodology for extraction of a relation network between people, places and events Project Goals BiographyNet: Linking the world of history General project info, February 2014
  • 9. Currently 12 use cases developed involving quantitative analysis, relation discovery, thematic research, etc. • Simple: • Group analysis of Governors-general of the Dutch Indies • More complex: • When did Dutch elites get involved with the ‘New World’? • Highly complex: • What can we say about nationalism in biographical dictionaries from the nineteenth and twentieth century? Use Case Overview BiographyNet: Linking the world of history General project info, February 2014
  • 10. Governors-General of the Dutch Indies • Highest Official in the Dutch Indies (1610-1949) • 129 Biographies describing 71 individuals • What can we say about these men as a group? • What properties did they need to have to be appointed? • Personal qualities • Relations (already more difficult) Illustrative use case BiographyNet: Linking the world of history General project info, February 2014
  • 11. Focus on the following information • Family connections • Parents • Partner • Children • Dates • Birth • Appointment • Death • Motivation • Education • Religion • Reasons for appointment • Reasons for leaving the office Governors General: Data Mining BiographyNet: Linking the world of history General project info, February 2014
  • 12. Manual analysis “More than one full week to manually mine this information from the Biography Portal.” (Serge ter Braake) The question “Can a historian do this with (almost) the same results in less than an hour when using the demonstrator?” Governors General: Time and effort BiographyNet: Linking the world of history General project info, February 2014
  • 13. Basic System for data enrichment using text: • Identifying meta data in text • Linguistically naïve supervised machine learning • Linguistic processing • Detection of (co-referenced) named-entities (persons, places and dates) and events • Concept identification Text mining using Natural Language Processing (NLP) BiographyNet: Linking the world of history General project info, February 2014
  • 14. Challenges for NLP within BiographyNet: • Deal with alternative spelling • Texts vary from 19th century Dutch to contemporary Dutch • Variations in the naming of people and places • OCR-ed texts contain errors • Used methods may introduce bias: • Example: Location identification with GeoNames Heuristic: On multiple possibilities, take the one in, or closest to The Netherlands • Problem: ‘America’ is a place in The Netherlands, but what about trade with the new world? NLP: Challenges BiographyNet: Linking the world of history General project info, February 2014
  • 15. NLP: Preliminary results – Governors 0 10 20 30 40 50 60 70 80 90 100 metadata text Presence of information in text vs. meta data (% on 71 individuals) BiographyNet: Linking the world of history General project info, February 2014
  • 16. Before development of the actual demonstrator can commence, we first need to: • Convert the data of the Biography Portal to RDF • Prevent loss of information • Devise a schema • Structure the data • Provide compatibility with other interesting sources • Facilitate the recording of provenance information on the manipulation of the data Towards the demonstrator BiographyNet: Linking the world of history General project info, February 2014
  • 17. Two main requirements for the demonstrator: • A trace back to all original sources (texts and meta data) involved in producing a certain result • Which sources were used for the overall outcome and how often? • What potentially relevant data was excluded from the end result? • Which piece of data led to a specific result (e.g. the age of a specific governor at his appointment)? • Insight in the processes manipulating and selecting the data • Indication of overall performance: Focus on recall or precision? • Global description of the used heuristics should be provided • Indication of responsibility: Who to contact when results are pulled into question? Requirements from the perspective of the Historian BiographyNet: Linking the world of history General project info, February 2014
  • 18. Reproducing results: • Reproducing results in NLP is non-trivial • Details in implementations or experimental setup can influence results up to a point where they tell a different story • Clear registration of all steps involved and storage of intermediate system output can improve reproducibility • Systematic testing can help to gain insight into the variation of the outcome of our systems and hence lead to more insight in their performance Antske Fokkens, Marieke van Erp, Marten Postma, Ted Pedersen, Piek Vossen and Nuno Freire (2013) Offspring from Reproduction Problems: What Replication Failure Teaches Us. In: Proceedings of ACL 2013, Sofia, Bulgaria, August 2013. Requirements from the perspective of the Computer Scientist / Computational Linguist BiographyNet: Linking the world of history General project info, February 2014
  • 19. Translation into requirements for the demonstrator: • Facilitate Replication and Reproduction • Recording of information on used tools such as Creator, version number, etc. • Recording of any kind of pre- / post-processing done on input/output data. • Recording of the intention behind the various steps in the NLP pipeline, including made assumptions and possible biases. • Intermediate results need to be preserved for debugging purposes • The schema needs to be both generic and flexible • NLP pipeline design can change • Tools and their formats unclear towards the future Requirements from the perspective of the Computer Scientist / Computational Linguist BiographyNet: Linking the world of history General project info, February 2014
  • 20. Foundations of the schema: • Based on the structure of the original XML files • Needs to facilitate the coupling of different biographies of the same person, without compromising the original data • Needs to facilitate the incorporation of several enrichments, following from NLP, as well as aggregations • Compatible with existing schemas such as the Europeana Data Model, PROV, P-PLAN, DC terms, etc. The BiographyNet Schema BiographyNet: Linking the world of history General project info, February 2014
  • 21. Purely syntactic conversion • Preserve the original structure of the data • Prevent los of information • Allow for reinterpretation of the original data in the future The conversion process <XML> Very simplified BP XML Example <BioDes> <FileDes> Source Meta Data <Author></Author> </FileDes> <PersonDes> Person Meta Data <Name></Name> </PersonDes> <BioPart> Biographical Text <Snippet></Snippet> <BioPart> </BioDes> </XML> BiographyNet: Linking the world of history General project info, February 2014
  • 22. Conversion steps: • Retrieval of XML dump of the Biography Portal • Initial conversion to ‘crude’ RDF • Using ClioPatria and the XMLRDF tool for ClioPatria • RDF restructuring • Correction of purely syntactic inefficiencies in the data • TODO: Linking to other sources • Essential step in the ‘Linked Data’ philosophy The conversion process BiographyNet: Linking the world of history General project info, February 2014
  • 23. Provenance information is information on how Entities come into existence • What are entities? • Documents, Articles, Pictures, etc. • Basically anything that can be ‘produced’ by something or someone • What kind of information? • Who did what? • Using which entities? • In which processes? • Why use the PROV-DM, i.e. PROV-O? • PROV-DM now an official W3C recommendation Adding Provenance Information BiographyNet: Linking the world of history General project info, February 2014
  • 24. Based on the requirements for the demonstrator, provenance needs to be modeled: • From several perspectives: • Information involved  Sources, but also: NER input data, etc. • Processes involved  All steps in enrichment, aggregation, etc • People involved  Who was responsible for pipeline, tool, etc. • At multiple levels: • An aggregated level,  Targeted at the Historian i.e. per enrichment • A detailed level, i.e. all  Targeted at the Computer Scientist and individual processes  computational linguist Provenance in BiographyNet BiographyNet: Linking the world of history General project info, February 2014
  • 25. Needed to ensure credibility of the demonstrator, to evaluate its performance and to improve the academic status of the tool • One needs to be able to validate results • Replication: Retrieving the same results later using the demonstrator • Reproducibility: Manually by the historian • The aggregated level – Targeted at the historian • Which original sources where involved? • Who to contact in case results are pulled into question? • The detailed level – Targeted at the computer scientist • Detailed information on each individual step • Allows for debugging the internal processing pipeline Recap: Why is provenance info important for BiographyNet? BiographyNet: Linking the world of history General project info, February 2014
  • 26. BiographyNet: Schema illustration http://www.biographynet.nl/schema BiographyNet: Linking the world of history General project info, February 2014
  • 27. Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duits Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duits BiographyNet Enrichment example Thorbecke Biographical Description File Meta Data NNBW Person Meta Data “Thorbecke” Biography Parts Birth 1798 Event Biographical Description Enrichment NLP Pipeline Person Meta Data Event Birth Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duits Zwolle 1798-01-14 prov:plan BiographyNet: Linking the world of history General project info, February 2014
  • 28. Provenance and Plans (P-PLAN):* Represent the plans that guided the execution of scientific processes • ‘Plans’ describe the original idea behind an activity • Each ‘Plan’ can consist of one or more ‘Steps’ • Each ‘Step’ corresponds to an ‘Activity’ • ‘Variables’ describe the input/output of an activity • Structure, format, quantity, etc. • Each ‘Variable’ corresponds with an input/output ‘Entity’ of an ‘Activity’ • ‘Plans’ have their own provenance info • E.g. who was responsible for the creation of a plan? *Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan More than just Provenance: BiographyNet: Linking the world of history General project info, February 2014
  • 29. P-PLAN is used to not only model what actually happened, but also what was supposed to happen • Forces the recording of what an activity and its input/output should look like • Provides abstract description of original idea behind activity • As such, can provide info on heuristics and assumptions • Allows for comparing the actual activity and its input/output with the original plan and its variables • Do they differ from each other and to what extend? • Makes finding errors much easier, as more information is available about what the input/output should look like Why model plans besides provenance? BiographyNet: Linking the world of history General project info, February 2014
  • 30. BiographyNet: Schema illustration BiographyNet: Linking the world of history General project info, February 2014
  • 32. • The interface should be easy to use • The demonstrator should inspire historians to undertake new research and give direction, rather than being the ‘closing factor’ in their research • The interface should allow to ‘fine tune’ results returned upon an initial action Interface: Focus BiographyNet: Linking the world of history General project info, February 2014
  • 33. • Query composition • Faceted browsing • A combination Interface: Options BiographyNet: Linking the world of history General project info, February 2014
  • 34. • Drop down boxes to select ‘Verbs’, data elements and relations Interface: Query composition BiographyNet: Linking the world of history General project info, February 2014
  • 35. • No explicit querying, but convergence of the data through browsing and selecting • Provides better feedback to the user • Allows for more direct and easier adjustment of the selected data Interface: Faceted browsing BiographyNet: Linking the world of history General project info, February 2014
  • 37. • Query composition combined with faceted browsing • Create new facets by defining a query – The result of the query is available as a subset of the data by selecting the defined facet – As such, combinable with other facets • Method to integrate ‘open’ querying of the data into a general interface and visualization Interface: A combination BiographyNet: Linking the world of history General project info, February 2014
  • 38. Interface: A combination Question Analysis Selection Process Results Data Facets BiographyNet: Linking the world of history General project info, February 2014
  • 39. Time and place are primary elements Interface: Demonstrator Results ? BiographyNet: Linking the world of history General project info, February 2014
  • 40. BiographyNet: Linking the world of history General project info, February 2014
  • 41. Main components of the demonstrator • Initial schema available • Schema models enrichments and aggregations alongside original sources • Allows for storing various levels of provenance information • Model will be adapted while progressing with building the demonstrator • Initial conversion to RDF available • Structure according to devised schema • Next step is linking to external sources • Initial NLP system setup available • Preliminary results comparable with manual use case • Interface • First ideas and sketches Current Status BiographyNet: Linking the world of history General project info, February 2014
  • 42. Thank you for your attention www.biographynet.nl Feel free to ask questions BiographyNet: Linking the world of history General project info, February 2014