Documents, services, and data on the web

The Research and Education Space
a pathway to bring our cultural heritage
(including the BBC archive) to life
Dr Chiara Del Vescovo
Data Architect at BBC

Vision
Web-like
Web-based
Interlinking
heterogenous
resources

Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations

Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education

Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education
Linked Open Data

A pathway
users
BL
BM
BFI
Tate
V&A
…
BBC

A pathway
users
BL
BM
BFI
Tate
V&A
…
BBC
?

usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A
…
BBC

usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A
…
BBC
aggregating
platform

RES (BBC, Jisc, BUFVC)
Core Platform: “Acropolis”
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
usersdevelopers
Acropolis
(index!)
BL
BM
BFI
Tate
V&A
…
BBC

RES (BBC, Jisc, BUFVC)
1
2
entities.
3
4
data.
5
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopers
Acropolis
(index!)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC

AcropolisCore Platform: “Acropolis”
1
The crawler fetches data
sources. Once retrieved
store and passed to the
2
The results of the aggre
are stored in the aggreg
browse information and
entities.
3
The public face of the c
browsing interface (whi
to aid application develo
4
Applications may use th
aggregated entities, and
data.
5
Each component emplo
For example, we can ma
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
storebeta.acropolis.org.uk

1
2
entities.
3
4
data.
5
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopersAcropolis
What I do
(with my colleague Alex)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC

What I do
BL
BM
BFI
Tate
V&A
…
BBC

What I do
1.devise a publishing scheme to
determine URIs
2.translate original metadata into RDF
3.links discovery and reconciliation with
“hubs” (e.g., LoC, Geonames,
DBPedia)
4.make the existing schema explicit as
a local ontology
5.matching the ontology onto well-
established ontologies (e.g., DCMI,
FOAF, SKOS, CIDOC-CRM)
6.advice on how to express machine-
readable licenses, for both resources
and metadata
7.technical support to publish LOD
BL
BM
BFI
Tate
V&A
…
BBC

• Europeana
• “general” Data Model (EDM)
• collection holders responsible to ﬁt their
resources and metadata in EDM
Europeana

Challenges
Stakeholders go quiet!

1. Which metadata?
• Currently, resources metadata mostly oriented
towards “physical proximity” 
i.e., indexes reﬂect similarity of author’s surname, broad
subject, format, media, etc.
• Heterogeneous platforms and data models 
incompatibility, transformations needed
• Even when RDF is used, there’s a proliferation of
terms, vocabularies, formats adopted 
little (if any) validation

2. Linking
• Systems that do not use RDF do not allow
collection holders to express their knowledge as
they wish 
underspeciﬁed knowledge
• Even when RDF is used, information often provided
as literals rather than links to URIs 
ad hoc solutions unavailable in a machine-readable format

3. Usability
• Reliability
• Lack of tools 
developers have little contact with collection holders
• Licensing issues 
resources licensing (not always explicit) 
metadata licensing 
users need to be aware of what that mean 
(note that in educations things are slightly easier - blanket
licensing etc.)

Interested?
• get in touch!
• chiara.delvescovo@bbc.co.uk
• alex.tucker@bbc.co.uk
• new advertised position as 
Junior Data Architect 
careershub.bbc.co.uk

Documents, services, and data on the web

More Related Content

What's hot

Similar to Documents, services, and data on the web

Documents, services, and data on the web