SlideShare a Scribd company logo
1 of 30
Image Retrieval:
State of the Art at the BnF
Jean-Philippe Moreux
DSR/DCP
CONTENT
TECHNICS
Fouille
USE CASES
SERVICES
• Gallica
• Mandragore
• Banque
d’image
…
Which content?
• Diverses document genres, artistic/print technics
• All time periods and fields of knowledge
• Heterogeneous metadata
Which content?
Use cases?
General audience Researchers Information Retrieval
In-house Use cases?
Iconographic service (DIP) Digital mediation
In-house Use cases?
Catalogs curation, digitization workflow
btv1b90055240
btv1b9005519p
btv1b9003068t
btv1b8413056n
btv1b84130520
Catalog
curation
In-house Use cases?
Digitization
workflow
No metadata available!
Which technics?
1. Where are the images?
(segmentation)
2. What is their content? (visual recognition)
3. How to look into this content?
(information retrieval)
1. Use the segmentation contained in the OCR (when
available: monographs, newspapers)
2. Segment the image (e.g. with supervised learning)
1. Segment
Image classification
(1/n)
2. Index
Multiclasses image
classification (m/n)
Object detection Instance
detection
Semantic
segmentation
Digital signature Classification (0/1)
Today, deep learning makes the feature extraction phase unnecessary
(the artificial neural network takes care of it)
Which tools for indexing?
Yesterday, visual feature models and learning algorithms....
Generic pretrained models Transfert learning
Software as a Service Deep learning
frameworks
Ad hoc models:
(« dog » ou « griffo » ?)
Applications
?
Which tools for indexing?
Deep learning approaches, with…
and tools
• In the digital signatures space: [12,5,87,54…]
• In the (textual) labels of classes: "aircraft", "pilot", "biplane"...
• In the visual characteristics: blue, large size, contrast....
• With several indexing (multimodal search): "fish" + "red“ + “drawing”
3. Search
Similarity search Semantic search
Select Segment Index QA Use
Search
Disseminate
APIs,
datasets
Access
The workflow
http://patrimeph.ensea.fr/retin/
ETIS/BnF, projet Patrimeph (2018)
3,000 images (Gallica twits)
1. Visual similarity search
2. Human in the loop for accessing
the results relevance
• Picard, D., Gosselin, P.-H., Gaspard, M.-C., “Challenged in Content-Based Image Indexing of
Cultural Heritage Collections”. IEEE Signal Processing Magazine, Institute of Electrical and
Electronics Engineers, 2015, 32 (4), pp. 95-102
• Évaluation de descripteurs visuels pour l’annotation automatique d’images patrimoniales,
David PICARD, Philippe-Henri GOSSELIN, ETIS UMR 8051
Collaboration BnF/DRE and ETIS lab (2015)
3,000 annotated images
(400 concepts, 6 categories :
visual, physical, historical…)
1. Global Descriptors (e.g. color and textures
histograms)
2. Aggregation of local descriptors
3. Deep learning (CNN 8-19 layers)
• Best results : deep learning
• Too much concepts, too much diversity, too few images
• Indexing by visual and structural similarities
• Search by sample image
• Search API (e.g. by color)
GallicaSimilitudes (BnF, 2017)
http://gallicastudio.bnf.fr/gallicaimages/
770k images
(coll. Spécialisées)
• Global descriptors computed on the dataset
• FCTH (Fuzzy Color and Texture Histogram) footprint
calculation and Tanimoto similarity measurement
• Good results on color and overall structure criteria
• No consideration of shapes
• Complementary to "semantic" approaches
• Iconographic retrieval in encyclopedic collections
• Multimodal search (metadata, text, image)
• Creation of iconographic datasets for visual studies
• IIIF driven
GallicaPix (BnF, 2017)
http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq
220k images from 500k pages
(print, images, maps…).
Topic: WW1
• Data mining of the collections (metadata, OCR)
• Visual recognition (IBM, Google, open source models)
• Classification of 12 genres (picture, engraving, drawing,
map, text…) with a CNN trained on 12k BnF illustrations
GallicaPix and Numapresse research project
http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq
• Face and genre recognition on Excelsior and Paris-Match front pages (1910-1920)
• Human correction and annotation (named people/anonymous)
• Statistical analysis of representation
Genres in newspapers illustrations: GallicaPix for the Digital Humanities
GallicaPix and DH 2019
http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq
• DH Hackathon, May 2019, Helsinki : “Newspapers and Capitalism”, focus on ads
• Object detection (Yolo v3) on 60k illustrated ads
• Means of transport analysis (WW1)
Illustrated ads: GallicaPix for the Digital Humanities
(à compléter)
• Segmentation for heritage documents: DhSegment (EPFL)
Ecole polytechnique fédérale de Lausanne (2017-2019)
• Creation of learning datasets
(binary mask + source image)
• Convolutional neural networks
(CNN, Resnet 50)
https://drive.google.com/file/d/1QYVRYRW4_chgtatz9cTmxgJg3--LcQYV/view
(à compléter)
• Two INRIA teams: ZENITH (Montpellier), LinkedMedia (Rennes)
• Datasets: manuscripts (Mandragore) and WW1
•
INRIA, BnF research project (2018-2020)
• SNOOP (INRIA, INA), Pl@ntnet:
collaborative recognition of plants
• Search in large visual indexes
https://interstices.info/plantnet-un-reseau-et-des-outils-pour-une-recherche-participative
Objectives :
• Human in the loop :
• Relevance evaluation
• Object detection on small datasets
• Visual signatures for heritage content
search in large indexes
This is not a lion
24
Initial
Query
User
selection New query
leveraging the
user inputs
Selection…
INRIA/Zenith
• Tests on 1M images (WW1)
• Iterative retrain of the model
• New classes are created by the user
Good/Wrong
Human in the loop: iterative queries
25
Iterative query to build a « German helmet » detector
26
INRIA/LinkedMedia
• Tests on Mandragore database (manuscripts): zoologie
• Small datasets, visual heterogeneity (1000 BC to XXth c.)
Heritage content, small datasets
27
INRIA/LinkedMedia
• Activation maps
Tests on unsegmented documents
28
INRIA/LinkedMedia
• Faster RCNN
• Manual annotations
Object Detection
Object detection
Results
30
Image Retrieval
app
Indexation workflow and
Gallica Pix DB
Others BnF apps
APIs
Datasets
Gallica Digital
repositories
What’s next?

More Related Content

What's hot

Challenges for the Language Technology Industry
Challenges for the Language Technology IndustryChallenges for the Language Technology Industry
Challenges for the Language Technology IndustryAntoine Isaac
 
Populating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsPopulating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsArchAIDE Project
 
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...CARARE
 
eScriptorium: An Open Source Platform for Historical Document Analysis
eScriptorium: An Open Source Platform for Historical Document AnalysiseScriptorium: An Open Source Platform for Historical Document Analysis
eScriptorium: An Open Source Platform for Historical Document AnalysisEquipex Biblissima
 
Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13Antoine Isaac
 
Lessons learnt from EDIT - linking taxonomy and conservation
Lessons learnt from EDIT - linking taxonomy and conservationLessons learnt from EDIT - linking taxonomy and conservation
Lessons learnt from EDIT - linking taxonomy and conservationvbrant
 
CartoHeritage 2011: Annotations, Tags and Linked Data
CartoHeritage 2011: Annotations, Tags and Linked DataCartoHeritage 2011: Annotations, Tags and Linked Data
CartoHeritage 2011: Annotations, Tags and Linked Dataaboutgeo
 
Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Antoine Isaac
 
4 d reconstruction of the past v1.4
4 d reconstruction of the past v1.44 d reconstruction of the past v1.4
4 d reconstruction of the past v1.4Kostas Makantasis
 
E&L-presentatie - Linked Data Benchmark Council
E&L-presentatie - Linked Data Benchmark CouncilE&L-presentatie - Linked Data Benchmark Council
E&L-presentatie - Linked Data Benchmark CouncilErfGeo
 
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Project
 
Europeana Archaeology
Europeana ArchaeologyEuropeana Archaeology
Europeana ArchaeologyCARARE
 
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...3D ICONS Project
 
Europeana DSI - LT-Accelerate 14
Europeana DSI -  LT-Accelerate 14Europeana DSI -  LT-Accelerate 14
Europeana DSI - LT-Accelerate 14Antoine Isaac
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009pkdoorn
 
Metadata, the CARARE aggregation service and 3D ICONS
Metadata, the CARARE aggregation service and 3D ICONSMetadata, the CARARE aggregation service and 3D ICONS
Metadata, the CARARE aggregation service and 3D ICONS3D ICONS Project
 
Multilingual challenges in Europeana
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in EuropeanaAntoine Isaac
 
DHOxSS Working with digital images 23-07-2015
DHOxSS Working with digital images 23-07-2015DHOxSS Working with digital images 23-07-2015
DHOxSS Working with digital images 23-07-2015sego
 

What's hot (20)

Challenges for the Language Technology Industry
Challenges for the Language Technology IndustryChallenges for the Language Technology Industry
Challenges for the Language Technology Industry
 
Populating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsPopulating the Reference Database Photographing Collections
Populating the Reference Database Photographing Collections
 
Introduction to 3D ICONS
Introduction to 3D ICONSIntroduction to 3D ICONS
Introduction to 3D ICONS
 
MediaDNA


MediaDNA

MediaDNA


MediaDNA


 
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...
3D in the CARARE Project. Providing Europeana with 3D Content for the Archaeo...
 
eScriptorium: An Open Source Platform for Historical Document Analysis
eScriptorium: An Open Source Platform for Historical Document AnalysiseScriptorium: An Open Source Platform for Historical Document Analysis
eScriptorium: An Open Source Platform for Historical Document Analysis
 
Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13
 
Lessons learnt from EDIT - linking taxonomy and conservation
Lessons learnt from EDIT - linking taxonomy and conservationLessons learnt from EDIT - linking taxonomy and conservation
Lessons learnt from EDIT - linking taxonomy and conservation
 
CartoHeritage 2011: Annotations, Tags and Linked Data
CartoHeritage 2011: Annotations, Tags and Linked DataCartoHeritage 2011: Annotations, Tags and Linked Data
CartoHeritage 2011: Annotations, Tags and Linked Data
 
Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013Europeana vision - Web as Literature 2013
Europeana vision - Web as Literature 2013
 
4 d reconstruction of the past v1.4
4 d reconstruction of the past v1.44 d reconstruction of the past v1.4
4 d reconstruction of the past v1.4
 
E&L-presentatie - Linked Data Benchmark Council
E&L-presentatie - Linked Data Benchmark CouncilE&L-presentatie - Linked Data Benchmark Council
E&L-presentatie - Linked Data Benchmark Council
 
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
 
Europeana Archaeology
Europeana ArchaeologyEuropeana Archaeology
Europeana Archaeology
 
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...
3D processing and metadata ingestion at POLIMI, Gabriele Guidi, Sara Gonizzi ...
 
Europeana DSI - LT-Accelerate 14
Europeana DSI -  LT-Accelerate 14Europeana DSI -  LT-Accelerate 14
Europeana DSI - LT-Accelerate 14
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
Metadata, the CARARE aggregation service and 3D ICONS
Metadata, the CARARE aggregation service and 3D ICONSMetadata, the CARARE aggregation service and 3D ICONS
Metadata, the CARARE aggregation service and 3D ICONS
 
Multilingual challenges in Europeana
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in Europeana
 
DHOxSS Working with digital images 23-07-2015
DHOxSS Working with digital images 23-07-2015DHOxSS Working with digital images 23-07-2015
DHOxSS Working with digital images 23-07-2015
 

Similar to Image Retrieval at the BnF

Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...
Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...
Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...Europeana
 
Hybrid Image Retrieval in Digital libraries
Hybrid Image Retrieval in Digital librariesHybrid Image Retrieval in Digital libraries
Hybrid Image Retrieval in Digital librariesJean-Philippe Moreux
 
Archaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupArchaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupManolis Vavalis
 
Archaeology & cultural heritage application working group part 2
Archaeology & cultural heritage application working group part 2Archaeology & cultural heritage application working group part 2
Archaeology & cultural heritage application working group part 2Manolis Vavalis
 
wimmics and DBpedia FR
wimmics and DBpedia FRwimmics and DBpedia FR
wimmics and DBpedia FRJulienCojan
 
3D-ICONS: European project providing 3D models and related digital content to...
3D-ICONS: European project providing 3D models and related digital content to...3D-ICONS: European project providing 3D models and related digital content to...
3D-ICONS: European project providing 3D models and related digital content to...3D ICONS Project
 
Building the Biblissima Observatory
Building the Biblissima ObservatoryBuilding the Biblissima Observatory
Building the Biblissima ObservatoryEquipex Biblissima
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Cultural Objects in the Age of Digital Access
Cultural Objects in the Age of Digital AccessCultural Objects in the Age of Digital Access
Cultural Objects in the Age of Digital AccessFrancesco Spagnolo
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesVictor de Boer
 
Human Behaviour Centred Design
Human Behaviour Centred Design Human Behaviour Centred Design
Human Behaviour Centred Design Henry Muccini
 
Building data infrastructures for science
Building data infrastructures for scienceBuilding data infrastructures for science
Building data infrastructures for scienceVince Smith
 
Biblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability PurposesBiblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability PurposesEquipex Biblissima
 
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage 14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage Gravitate Project
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
2014 06-20 fac visual art and design bandung institute of technology ml
2014 06-20 fac visual art and design bandung institute of technology  ml2014 06-20 fac visual art and design bandung institute of technology  ml
2014 06-20 fac visual art and design bandung institute of technology mlMonika Lechner
 
A Cultural Heritage Repository as Source for Learning Materials
A Cultural Heritage Repository as Source for Learning MaterialsA Cultural Heritage Repository as Source for Learning Materials
A Cultural Heritage Repository as Source for Learning MaterialsManjulaPatel
 

Similar to Image Retrieval at the BnF (20)

Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...
Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...
Hybrid Image Retrieval in Digital Libraries by Jean-Philippe Moreux & Guillau...
 
Hybrid Image Retrieval in Digital libraries
Hybrid Image Retrieval in Digital librariesHybrid Image Retrieval in Digital libraries
Hybrid Image Retrieval in Digital libraries
 
Archaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupArchaeology and cultural heritage application working group
Archaeology and cultural heritage application working group
 
Archaeology & cultural heritage application working group part 2
Archaeology & cultural heritage application working group part 2Archaeology & cultural heritage application working group part 2
Archaeology & cultural heritage application working group part 2
 
wimmics and DBpedia FR
wimmics and DBpedia FRwimmics and DBpedia FR
wimmics and DBpedia FR
 
3D-ICONS: European project providing 3D models and related digital content to...
3D-ICONS: European project providing 3D models and related digital content to...3D-ICONS: European project providing 3D models and related digital content to...
3D-ICONS: European project providing 3D models and related digital content to...
 
Building the Biblissima Observatory
Building the Biblissima ObservatoryBuilding the Biblissima Observatory
Building the Biblissima Observatory
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Cultural Objects in the Age of Digital Access
Cultural Objects in the Age of Digital AccessCultural Objects in the Age of Digital Access
Cultural Objects in the Age of Digital Access
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media Archives
 
Human Behaviour Centred Design
Human Behaviour Centred Design Human Behaviour Centred Design
Human Behaviour Centred Design
 
Building data infrastructures for science
Building data infrastructures for scienceBuilding data infrastructures for science
Building data infrastructures for science
 
Biblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability PurposesBiblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability Purposes
 
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage 14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage
14th EUROGRAPHICS Workshop on Graphics and Cultural Heritage
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
2014 06-20 fac visual art and design bandung institute of technology ml
2014 06-20 fac visual art and design bandung institute of technology  ml2014 06-20 fac visual art and design bandung institute of technology  ml
2014 06-20 fac visual art and design bandung institute of technology ml
 
Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
A Cultural Heritage Repository as Source for Learning Materials
A Cultural Heritage Repository as Source for Learning MaterialsA Cultural Heritage Repository as Source for Learning Materials
A Cultural Heritage Repository as Source for Learning Materials
 
Digital humanities
Digital humanitiesDigital humanities
Digital humanities
 

More from Jean-Philippe Moreux

IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...
IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...
IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...Jean-Philippe Moreux
 
Fouille d’images dans les collections patrimoniales : GallicaPix
Fouille d’images dans les collections patrimoniales : GallicaPixFouille d’images dans les collections patrimoniales : GallicaPix
Fouille d’images dans les collections patrimoniales : GallicaPixJean-Philippe Moreux
 
Transcription collaborative à la BnF-2021
Transcription collaborative à la BnF-2021Transcription collaborative à la BnF-2021
Transcription collaborative à la BnF-2021Jean-Philippe Moreux
 

More from Jean-Philippe Moreux (6)

IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...
IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...
IIIF for Interoperability and Dissemination of Research Results: The NewsEye ...
 
GallicaPix
GallicaPix GallicaPix
GallicaPix
 
Atelier API Gallica
Atelier API GallicaAtelier API Gallica
Atelier API Gallica
 
IIIF & Digital Humanities
IIIF & Digital Humanities     IIIF & Digital Humanities
IIIF & Digital Humanities
 
Fouille d’images dans les collections patrimoniales : GallicaPix
Fouille d’images dans les collections patrimoniales : GallicaPixFouille d’images dans les collections patrimoniales : GallicaPix
Fouille d’images dans les collections patrimoniales : GallicaPix
 
Transcription collaborative à la BnF-2021
Transcription collaborative à la BnF-2021Transcription collaborative à la BnF-2021
Transcription collaborative à la BnF-2021
 

Recently uploaded

Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 

Recently uploaded (20)

Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 

Image Retrieval at the BnF

  • 1. Image Retrieval: State of the Art at the BnF Jean-Philippe Moreux DSR/DCP
  • 3. Which content? • Diverses document genres, artistic/print technics • All time periods and fields of knowledge • Heterogeneous metadata
  • 5. Use cases? General audience Researchers Information Retrieval
  • 6. In-house Use cases? Iconographic service (DIP) Digital mediation
  • 7. In-house Use cases? Catalogs curation, digitization workflow btv1b90055240 btv1b9005519p btv1b9003068t btv1b8413056n btv1b84130520 Catalog curation
  • 9. Which technics? 1. Where are the images? (segmentation) 2. What is their content? (visual recognition) 3. How to look into this content? (information retrieval)
  • 10. 1. Use the segmentation contained in the OCR (when available: monographs, newspapers) 2. Segment the image (e.g. with supervised learning) 1. Segment
  • 11. Image classification (1/n) 2. Index Multiclasses image classification (m/n) Object detection Instance detection Semantic segmentation Digital signature Classification (0/1)
  • 12. Today, deep learning makes the feature extraction phase unnecessary (the artificial neural network takes care of it) Which tools for indexing? Yesterday, visual feature models and learning algorithms....
  • 13. Generic pretrained models Transfert learning Software as a Service Deep learning frameworks Ad hoc models: (« dog » ou « griffo » ?) Applications ? Which tools for indexing? Deep learning approaches, with… and tools
  • 14. • In the digital signatures space: [12,5,87,54…] • In the (textual) labels of classes: "aircraft", "pilot", "biplane"... • In the visual characteristics: blue, large size, contrast.... • With several indexing (multimodal search): "fish" + "red“ + “drawing” 3. Search Similarity search Semantic search
  • 15. Select Segment Index QA Use Search Disseminate APIs, datasets Access The workflow
  • 16. http://patrimeph.ensea.fr/retin/ ETIS/BnF, projet Patrimeph (2018) 3,000 images (Gallica twits) 1. Visual similarity search 2. Human in the loop for accessing the results relevance
  • 17. • Picard, D., Gosselin, P.-H., Gaspard, M.-C., “Challenged in Content-Based Image Indexing of Cultural Heritage Collections”. IEEE Signal Processing Magazine, Institute of Electrical and Electronics Engineers, 2015, 32 (4), pp. 95-102 • Évaluation de descripteurs visuels pour l’annotation automatique d’images patrimoniales, David PICARD, Philippe-Henri GOSSELIN, ETIS UMR 8051 Collaboration BnF/DRE and ETIS lab (2015) 3,000 annotated images (400 concepts, 6 categories : visual, physical, historical…) 1. Global Descriptors (e.g. color and textures histograms) 2. Aggregation of local descriptors 3. Deep learning (CNN 8-19 layers) • Best results : deep learning • Too much concepts, too much diversity, too few images
  • 18. • Indexing by visual and structural similarities • Search by sample image • Search API (e.g. by color) GallicaSimilitudes (BnF, 2017) http://gallicastudio.bnf.fr/gallicaimages/ 770k images (coll. Spécialisées) • Global descriptors computed on the dataset • FCTH (Fuzzy Color and Texture Histogram) footprint calculation and Tanimoto similarity measurement • Good results on color and overall structure criteria • No consideration of shapes • Complementary to "semantic" approaches
  • 19. • Iconographic retrieval in encyclopedic collections • Multimodal search (metadata, text, image) • Creation of iconographic datasets for visual studies • IIIF driven GallicaPix (BnF, 2017) http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq 220k images from 500k pages (print, images, maps…). Topic: WW1 • Data mining of the collections (metadata, OCR) • Visual recognition (IBM, Google, open source models) • Classification of 12 genres (picture, engraving, drawing, map, text…) with a CNN trained on 12k BnF illustrations
  • 20. GallicaPix and Numapresse research project http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq • Face and genre recognition on Excelsior and Paris-Match front pages (1910-1920) • Human correction and annotation (named people/anonymous) • Statistical analysis of representation Genres in newspapers illustrations: GallicaPix for the Digital Humanities
  • 21. GallicaPix and DH 2019 http://demo14-18.bnf.fr:8984/rest?run=findIllustrations-form.xq • DH Hackathon, May 2019, Helsinki : “Newspapers and Capitalism”, focus on ads • Object detection (Yolo v3) on 60k illustrated ads • Means of transport analysis (WW1) Illustrated ads: GallicaPix for the Digital Humanities
  • 22. (à compléter) • Segmentation for heritage documents: DhSegment (EPFL) Ecole polytechnique fédérale de Lausanne (2017-2019) • Creation of learning datasets (binary mask + source image) • Convolutional neural networks (CNN, Resnet 50) https://drive.google.com/file/d/1QYVRYRW4_chgtatz9cTmxgJg3--LcQYV/view
  • 23. (à compléter) • Two INRIA teams: ZENITH (Montpellier), LinkedMedia (Rennes) • Datasets: manuscripts (Mandragore) and WW1 • INRIA, BnF research project (2018-2020) • SNOOP (INRIA, INA), Pl@ntnet: collaborative recognition of plants • Search in large visual indexes https://interstices.info/plantnet-un-reseau-et-des-outils-pour-une-recherche-participative Objectives : • Human in the loop : • Relevance evaluation • Object detection on small datasets • Visual signatures for heritage content search in large indexes This is not a lion
  • 24. 24 Initial Query User selection New query leveraging the user inputs Selection… INRIA/Zenith • Tests on 1M images (WW1) • Iterative retrain of the model • New classes are created by the user Good/Wrong Human in the loop: iterative queries
  • 25. 25 Iterative query to build a « German helmet » detector
  • 26. 26 INRIA/LinkedMedia • Tests on Mandragore database (manuscripts): zoologie • Small datasets, visual heterogeneity (1000 BC to XXth c.) Heritage content, small datasets
  • 28. 28 INRIA/LinkedMedia • Faster RCNN • Manual annotations Object Detection
  • 30. 30 Image Retrieval app Indexation workflow and Gallica Pix DB Others BnF apps APIs Datasets Gallica Digital repositories What’s next?

Editor's Notes

  1. Première étape du cycle de la donnée : les mettre à disposition Quelles données concernées : données et métadonnées produites par l’établissement Open data / ouverture des données publiques : métadonnées 2014 Conditions de réutilisations Gallica Sous quelles formes / comment les mettre à disposition ? Mise à disposition technique Interrogation synchrone : API : Interface de programmation applicative pour permettre à deux machines de dialoguer entre elles par un ou des protocoles normalisés Services web : API utilise des protocoles web Exemple : applications sur smartphone qui utilisent des données publiques et des données privées pour créer du service Interrogation asynchrone Données / data Jeux de données dont la constitution constitue une valeur ajoutée
  2. Première étape du cycle de la donnée : les mettre à disposition Quelles données concernées : données et métadonnées produites par l’établissement Open data / ouverture des données publiques : métadonnées 2014 Conditions de réutilisations Gallica Sous quelles formes / comment les mettre à disposition ? Mise à disposition technique Interrogation synchrone : API : Interface de programmation applicative pour permettre à deux machines de dialoguer entre elles par un ou des protocoles normalisés Services web : API utilise des protocoles web Exemple : applications sur smartphone qui utilisent des données publiques et des données privées pour créer du service Interrogation asynchrone Données / data Jeux de données dont la constitution constitue une valeur ajoutée
  3. Tradition ancienne d’échanges de données en bibliothèque. La situation en 2016 : plusieurs API ouvertes par la BnF l’historique Z39.50, bon exemple d’API qui n’est pas un service web le protocole SRU sur le catalogue général version web du Z3950 des API très utilisées comme les entrepôts OAI un service créé spécifiquement pour la diffusion des données : data.bnf.fr et son sparql endpoint des API à usage interne, d’abord créées parfois accessibles de l’extérieur : le SRU Gallica IIIF Mais dispersion de l’accès, de la documentation (lorsqu’elle existait) reflet de l’histoire, dispersion des usages Rappeler la diversité des publics : publics professionnels, développeurs, Prise de conscience : le hackathon 2016 > documenter officiellement, assumer l’ouverture de services web IIIF, SRU Gallica réalisation d’un wiki sur la plate-forme Github A l’occasion du hackathon 2017 Regrouper la documentation existante Aussi les jeux de données, notamment produits dans le cadre des projets de recherche : lien avec le projet CORPUS Corpus d’images Dumps de MD Listes d’URL du dépôt légal de l’internet Statistiques
  4. And finally, raw unsegmented images result in generic classes like "frame", "document", "written document », « newspapers », which are definitively not usable…
  5. And finally, raw unsegmented images result in generic classes like "frame", "document", "written document », « newspapers », which are definitively not usable…
  6. And finally, raw unsegmented images result in generic classes like "frame", "document", "written document », « newspapers », which are definitively not usable…
  7. And finally, raw unsegmented images result in generic classes like "frame", "document", "written document », « newspapers », which are definitively not usable…
  8. And finally, raw unsegmented images result in generic classes like "frame", "document", "written document », « newspapers », which are definitively not usable…
  9. Production / exploitation internalisée vs. externalisée