The Research and Education Space
a pathway to bring our cultural heritage
(including the BBC archive) to life
Dr Chiara Del Vescovo
Data Architect at BBC
Vision
Web-like
Web-based
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education
Linked Open Data
A pathway
users
BL
BM
BFI
Tate
V&A
…
BBC
A pathway
users
BL
BM
BFI
Tate
V&A
…
BBC
?
usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A
…
BBC
usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A
…
BBC
aggregating
platform
RES (BBC, Jisc, BUFVC)
Core Platform: “Acropolis”
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
usersdevelopers
Acropolis
(index!)
BL
BM
BFI
Tate
V&A
…
BBC
RES (BBC, Jisc, BUFVC)
Core Platform: “Acropolis”
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopers
Acropolis
(index!)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC
AcropolisCore Platform: “Acropolis”
1
The crawler fetches data
sources. Once retrieved
store and passed to the
2
The results of the aggre
are stored in the aggreg
browse information and
entities.
3
The public face of the c
browsing interface (whi
to aid application develo
4
Applications may use th
aggregated entities, and
data.
5
Each component emplo
For example, we can ma
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
storebeta.acropolis.org.uk
Acropolis
Acropolis
Acropolis
Acropolis
Core Platform: “Acropolis”
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopersAcropolis
What I do
(with my colleague Alex)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC
What I do
(with my colleague Alex)
BL
BM
BFI
Tate
V&A
…
BBC
What I do
(with my colleague Alex)
1.devise a publishing scheme to
determine URIs
2.translate original metadata into RDF
3.links discovery and reconciliation with
“hubs” (e.g., LoC, Geonames,
DBPedia)
4.make the existing schema explicit as
a local ontology
5.matching the ontology onto well-
established ontologies (e.g., DCMI,
FOAF, SKOS, CIDOC-CRM)
6.advice on how to express machine-
readable licenses, for both resources
and metadata
7.technical support to publish LOD
BL
BM
BFI
Tate
V&A
…
BBC
DBPedialite
DBPedialite
DBPedialite
British Museum
British Museum
British Museum
DBPedia
DBPedia
• Europeana
• “general” Data Model (EDM)
• collection holders responsible to fit their
resources and metadata in EDM
Europeana
• Europeana
• “general” Data Model (EDM)
• collection holders responsible to fit their
resources and metadata in EDM
Europeana
British Library
Extreme cases
Challenges
Stakeholders go quiet!
1. Which metadata?
• Currently, resources metadata mostly oriented
towards “physical proximity”

i.e., indexes reflect similarity of author’s surname, broad
subject, format, media, etc.
• Heterogeneous platforms and data models

incompatibility, transformations needed
• Even when RDF is used, there’s a proliferation of
terms, vocabularies, formats adopted

little (if any) validation
2. Linking
• Systems that do not use RDF do not allow
collection holders to express their knowledge as
they wish

underspecified knowledge
• Even when RDF is used, information often provided
as literals rather than links to URIs

ad hoc solutions unavailable in a machine-readable format
3. Usability
• Reliability
• Lack of tools

developers have little contact with collection holders
• Licensing issues

resources licensing (not always explicit)

metadata licensing

users need to be aware of what that mean

(note that in educations things are slightly easier - blanket
licensing etc.)
Interested?
• get in touch!
• chiara.delvescovo@bbc.co.uk
• alex.tucker@bbc.co.uk
• new advertised position as

Junior Data Architect

careershub.bbc.co.uk

Documents, services, and data on the web

  • 1.
    The Research andEducation Space a pathway to bring our cultural heritage (including the BBC archive) to life Dr Chiara Del Vescovo Data Architect at BBC
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    RES (BBC, Jisc,BUFVC) Core Platform: “Acropolis” Project RES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store usersdevelopers Acropolis (index!) BL BM BFI Tate V&A … BBC
  • 12.
    RES (BBC, Jisc,BUFVC) Core Platform: “Acropolis” Project RES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store informed by usersdevelopers Acropolis (index!) planned pilots BL BM BFI Tate V&A … BBC
  • 13.
    AcropolisCore Platform: “Acropolis” 1 Thecrawler fetches data sources. Once retrieved store and passed to the 2 The results of the aggre are stored in the aggreg browse information and entities. 3 The public face of the c browsing interface (whi to aid application develo 4 Applications may use th aggregated entities, and data. 5 Each component emplo For example, we can ma as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity storebeta.acropolis.org.uk
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Core Platform: “Acropolis” ProjectRES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store informed by usersdevelopersAcropolis What I do (with my colleague Alex) planned pilots BL BM BFI Tate V&A … BBC
  • 19.
    What I do (withmy colleague Alex) BL BM BFI Tate V&A … BBC
  • 20.
    What I do (withmy colleague Alex) 1.devise a publishing scheme to determine URIs 2.translate original metadata into RDF 3.links discovery and reconciliation with “hubs” (e.g., LoC, Geonames, DBPedia) 4.make the existing schema explicit as a local ontology 5.matching the ontology onto well- established ontologies (e.g., DCMI, FOAF, SKOS, CIDOC-CRM) 6.advice on how to express machine- readable licenses, for both resources and metadata 7.technical support to publish LOD BL BM BFI Tate V&A … BBC
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    • Europeana • “general”Data Model (EDM) • collection holders responsible to fit their resources and metadata in EDM Europeana
  • 30.
    • Europeana • “general”Data Model (EDM) • collection holders responsible to fit their resources and metadata in EDM Europeana
  • 31.
  • 32.
  • 33.
  • 34.
    1. Which metadata? •Currently, resources metadata mostly oriented towards “physical proximity”
 i.e., indexes reflect similarity of author’s surname, broad subject, format, media, etc. • Heterogeneous platforms and data models
 incompatibility, transformations needed • Even when RDF is used, there’s a proliferation of terms, vocabularies, formats adopted
 little (if any) validation
  • 35.
    2. Linking • Systemsthat do not use RDF do not allow collection holders to express their knowledge as they wish
 underspecified knowledge • Even when RDF is used, information often provided as literals rather than links to URIs
 ad hoc solutions unavailable in a machine-readable format
  • 36.
    3. Usability • Reliability •Lack of tools
 developers have little contact with collection holders • Licensing issues
 resources licensing (not always explicit)
 metadata licensing
 users need to be aware of what that mean
 (note that in educations things are slightly easier - blanket licensing etc.)
  • 37.
    Interested? • get intouch! • chiara.delvescovo@bbc.co.uk • alex.tucker@bbc.co.uk • new advertised position as
 Junior Data Architect
 careershub.bbc.co.uk