Gildas ILLIEN, Bibliothèque nationale de France
Director, Bibliographic and Digital Information Department
SEMANTIC WEB AND ARCHIVES,
LIBRARIES AND MUSEUMS
Fundacion Ramon Areces
Madrid, April 10, 2014
Semantic Web at the
Bibliothèque nationale de
France: another French
revolution?
2
Our goal: We want to free
our metadata from the
Library and get it INSIDE
the Web
We are ready to make many
sacrifices to achieve this…
Outline
 Why we chose to do it
 What we’re doing exactly
 What we’ve learnt and what we’ll do
WHY WE CHOSE TO DO IT
BUDGET CUTS:
To do more with less requires to
do things differently – and not
anymore on your own.
Demonstrate more value
for what you cost: a must.
LIBRARIANS :
We need to reinvent ourselves.
We want to have fun.
Data curators indeed?
USERS:
Library catalogues are not so popular
(where they ever?).
But where did our users go?
- on the Web.
Motivations
OPPORTUNITIES:
The Open Data policy movement
The Linked data technologies
= opportunities to remind policy
makers that libraries exist!
Be more visible
 Have search engines (and users!) find our resources
even if they don’t know the Library exists
Be more consistent
 Improve discovery and unity of our resources scattered
all over various silos
Be less expensive yet more generous
 Be part of the new data economy
 Link get and get linked to other trustful datasets
 Focus on our added value
Be more useful
Revisit our national bibliographic mission
Encourage and demonstrate better reuse of our data
What we want from
our metadata
WHAT WE’RE DOING EXACTLY
Data.bnf.fr :
The project
 Agile development method, with Logilab
 A small but smart team
 Open source code: CubicWeb
 Milestones:
 2009: conception and kick off
 2011 : proof of concept
 2012 : 10% of our catalogues
 2013 : 20% of our catalogues
 2014: 40% of our catalogues
 2015: 80% of our catalogues ?
Web pages for people
Matching - Clustering
Archives and
manuscripts
Raw data for machines
External links : Dbpedia,
Wikipedia, Geonames, ISNI
VIAF, DNB, Library of Congress…
Other resources :
bibliographies
web archives
Virtual exhibitions…
Data.bnf.fr:
Baseline
It’s OPEN
Technically
Legally
January 2014: BnF
officially released all its
metadata under a CC-by
licence
It’s LINKED
 URIs based on ARK identifiers
http://data.bnf.fr/ark:/12148/cb119374933
 Resource Description Framework
(143 773 998 triples)
 Download the page ; content
negociation
 Dumps on the page
http://data.bnf.fr/semanticweb
 Now being tested inside the BnF :
SPARQLenpoint
DATA MODEL,
ALIGNMENTS
DEMO
Théophile Gautier : Voyage en Espagne
https://www.google.com/search?q=gautier+voyage+en+espa
gne&ie=utf-8&oe=utf-
8&aq=t&rls=org.mozilla:fr:official&client=firefox-a
http://www.bing.com/search?q=gautier+voyage+en+espagne
&pc=MOZI&form=MOZSBR
http://fr.search.yahoo.com/search?p=gautier+voyage+en+es
pagne&ei=UTF-8&fr=moz35
Other examples
« Jacquemart Gielée » (poète XIIIe s.)
« Thomas Cajetan » (théologien XVe s.)
« Jacques de Révigny » (juriste XIIIe s.)
People, Places,
Dates…
Link to the place Madrid :
http://data.bnf.fr/lieu/ma
drid__espagne_/
Link to the date 1562 :
http://data.bnf.fr/date/15
62/
then to the century
Performance:
Cleopatra,
Theatro Real, 1921
http://data.bnf.fr/41375196/cleopatra_spectacle_1921/
Dowload a page:
 Télécharger chaque page
 Dowload the full dump:
http://data.bnf.fr/semanticweb
Easy to reuse?
WHAT WE’VE LEARNT
0
20.000
40.000
60.000
80.000
100.000
120.000
140.000
Number of unique visitors
Facts and figures:
so far, it works!
73 % are consulting the
catalogues and Gallica (2013)
70 % come from search
engines, 10 % from links (2013)
http://www.ifverso.com/fr/content/robur-le-conquerant-14
http://www.rechercheisidore.fr/
http://www.fevis.com
They use our data:
Examples
http://data.abuledu.org
Wikimedia France
Wikidata
data.bnf.fr
Catalogues de la BnF
(http://catalogue.bnf.fr/ark:/12148/cb12130221r)
Catalogue de
bibliothèque publique
(http://catalogue.bnf.fr/ark:/12148/cb12130221r)
OpenCat
Compléments
bibliographiques
(indexation de la fiction,
catégories, genres)
Couvertures
Documents numérisés
(texte, manuscrits, sons, images)
Portraits
Données locales
(cote, disponibilité, indexation)
Auteurs, sujets (« autorités »)
Informations bibliographiques (groupées par œuvres)
Compléments
biographiques
Conférences en ligne
Autres contenus BnF
(expositions virtuelles, conférences)
Autres contenus web
WHAT WE’LL DO
Scaling up
 Coming soon: from 280 000 to 1.3
million authors
 Issues with:
 Architecture and performance
 Design and ergonomy
 Service
 Staff
Aligning with
more people
 Reaching out the right communities
 In and out libraries
 Publishers organizations
 Music organizations
 Performing Arts
 Geodata
 …
http://data.bnf.fr/atelier/
Keep exploring:
L’Atelier/The Lab
http://data.bnf.fr/atelier/11957478/voltaire_candide/
27http://data.bnf.fr/atelier/11928669/voltaire/
Using data.bnf.fr
As a tool for change
 Retro-action to
the source
catalogues and
data (FRBR)?
 A metadata QA
tool for staff?
 A collection
management
tool?
Merci !
gildas.illien[at]bnf.fr
agnes.simon[at]bnf.fr
data@bnf.fr
Plan B
 Page about the author Averroès
 http://data.bnf.fr/12013155/averroes/
 Image and biography from Wikipedia
 Links to Gallica and to BnF archives and manuscripts : http://data.bnf.fr/en/documents-by-rdt/12013155/70/page1
 Links to the page about Cordoba
 Links to the year 1126 and the 12th century

Semantic web at the Bibliothèque nationale de France: another French revolution. Gildas Illien

  • 1.
    Gildas ILLIEN, Bibliothèquenationale de France Director, Bibliographic and Digital Information Department SEMANTIC WEB AND ARCHIVES, LIBRARIES AND MUSEUMS Fundacion Ramon Areces Madrid, April 10, 2014 Semantic Web at the Bibliothèque nationale de France: another French revolution?
  • 2.
    2 Our goal: Wewant to free our metadata from the Library and get it INSIDE the Web
  • 3.
    We are readyto make many sacrifices to achieve this…
  • 4.
    Outline  Why wechose to do it  What we’re doing exactly  What we’ve learnt and what we’ll do
  • 5.
    WHY WE CHOSETO DO IT
  • 6.
    BUDGET CUTS: To domore with less requires to do things differently – and not anymore on your own. Demonstrate more value for what you cost: a must. LIBRARIANS : We need to reinvent ourselves. We want to have fun. Data curators indeed? USERS: Library catalogues are not so popular (where they ever?). But where did our users go? - on the Web. Motivations OPPORTUNITIES: The Open Data policy movement The Linked data technologies = opportunities to remind policy makers that libraries exist!
  • 7.
    Be more visible Have search engines (and users!) find our resources even if they don’t know the Library exists Be more consistent  Improve discovery and unity of our resources scattered all over various silos Be less expensive yet more generous  Be part of the new data economy  Link get and get linked to other trustful datasets  Focus on our added value Be more useful Revisit our national bibliographic mission Encourage and demonstrate better reuse of our data What we want from our metadata
  • 8.
  • 9.
    Data.bnf.fr : The project Agile development method, with Logilab  A small but smart team  Open source code: CubicWeb  Milestones:  2009: conception and kick off  2011 : proof of concept  2012 : 10% of our catalogues  2013 : 20% of our catalogues  2014: 40% of our catalogues  2015: 80% of our catalogues ?
  • 10.
    Web pages forpeople Matching - Clustering Archives and manuscripts Raw data for machines External links : Dbpedia, Wikipedia, Geonames, ISNI VIAF, DNB, Library of Congress… Other resources : bibliographies web archives Virtual exhibitions… Data.bnf.fr: Baseline
  • 11.
    It’s OPEN Technically Legally January 2014:BnF officially released all its metadata under a CC-by licence
  • 12.
    It’s LINKED  URIsbased on ARK identifiers http://data.bnf.fr/ark:/12148/cb119374933  Resource Description Framework (143 773 998 triples)  Download the page ; content negociation  Dumps on the page http://data.bnf.fr/semanticweb  Now being tested inside the BnF : SPARQLenpoint
  • 13.
  • 14.
    DEMO Théophile Gautier :Voyage en Espagne https://www.google.com/search?q=gautier+voyage+en+espa gne&ie=utf-8&oe=utf- 8&aq=t&rls=org.mozilla:fr:official&client=firefox-a http://www.bing.com/search?q=gautier+voyage+en+espagne &pc=MOZI&form=MOZSBR http://fr.search.yahoo.com/search?p=gautier+voyage+en+es pagne&ei=UTF-8&fr=moz35 Other examples « Jacquemart Gielée » (poète XIIIe s.) « Thomas Cajetan » (théologien XVe s.) « Jacques de Révigny » (juriste XIIIe s.)
  • 15.
    People, Places, Dates… Link tothe place Madrid : http://data.bnf.fr/lieu/ma drid__espagne_/ Link to the date 1562 : http://data.bnf.fr/date/15 62/ then to the century
  • 16.
  • 17.
    Dowload a page: Télécharger chaque page  Dowload the full dump: http://data.bnf.fr/semanticweb Easy to reuse?
  • 18.
  • 19.
    0 20.000 40.000 60.000 80.000 100.000 120.000 140.000 Number of uniquevisitors Facts and figures: so far, it works! 73 % are consulting the catalogues and Gallica (2013) 70 % come from search engines, 10 % from links (2013)
  • 20.
  • 21.
    data.bnf.fr Catalogues de laBnF (http://catalogue.bnf.fr/ark:/12148/cb12130221r) Catalogue de bibliothèque publique (http://catalogue.bnf.fr/ark:/12148/cb12130221r) OpenCat Compléments bibliographiques (indexation de la fiction, catégories, genres) Couvertures Documents numérisés (texte, manuscrits, sons, images) Portraits Données locales (cote, disponibilité, indexation) Auteurs, sujets (« autorités ») Informations bibliographiques (groupées par œuvres) Compléments biographiques Conférences en ligne Autres contenus BnF (expositions virtuelles, conférences) Autres contenus web
  • 22.
  • 23.
    Scaling up  Comingsoon: from 280 000 to 1.3 million authors  Issues with:  Architecture and performance  Design and ergonomy  Service  Staff
  • 24.
    Aligning with more people Reaching out the right communities  In and out libraries  Publishers organizations  Music organizations  Performing Arts  Geodata  …
  • 25.
  • 26.
  • 27.
  • 28.
    Using data.bnf.fr As atool for change  Retro-action to the source catalogues and data (FRBR)?  A metadata QA tool for staff?  A collection management tool?
  • 29.
  • 30.
    Plan B  Pageabout the author Averroès  http://data.bnf.fr/12013155/averroes/  Image and biography from Wikipedia  Links to Gallica and to BnF archives and manuscripts : http://data.bnf.fr/en/documents-by-rdt/12013155/70/page1  Links to the page about Cordoba  Links to the year 1126 and the 12th century