Accidentally
Becoming
A
Digital Librarian
John Resig
Providing Access to Knowledge.
Educating and Empowering Others.
1. Solve your own problems
2. Adapt and solve other’s
problems
Talking about 4 problems
Chazen Museum: 1980.2386
Problem 1:

Searching by Image
imgSeek
• Compares entire image.
• Finds similar images, not
exact.
• Does not find parts of
an image.
• Color sensitive.
• Open Source
https://services.tineye.com/MatchEngine
http://pastec.io/
Problem 2: No one agrees on
names/titles
Japanese Names
• Utagawa Hiroshige
• Ando Hiroshige
• Andō Hiroshige
• Hiroshige
• 歌川広重
• 広重
Similar Images
Different photo, same work of art.
Similar Images
Different photo, slightly different cropping.
Alternate Images
Partial Image vs. Much Larger Image
Alternate Images
Color vs. Black-and-White
Conservation
Conservation
Repairs and possibly removal of later additions.
Conservation/Destruction
Analysis even spots dramatic conservation work.
Copies
Copies
Copies
Copies
Fondazione Zeri
PHAROS: An International Consortium of Photo Archives
• Bibliotheca Hertziana, Rome (1,065,000)

• Bildarchiv Foto Marburg, Germany (2,000,000)

• Courtauld Institute of Art, London (4,173,500)

• Fondazione Federico Zeri, Bologna (290,000)

• Frick Art Reference Library, New York (1,346,000)

• Getty Research Institute, Los Angeles (2,086,000)

• Villa I Tatti, Florence (239,000)

• Institut National d’Histoire de l’Art, Paris (750,000)

• Kunsthistorisches Institut, Florence (650,000)

• National Gallery of Art, Washington (7,600,000)

• Paul Mellon Centre, London (185,000)

• Rijksbureau, The Hague (7,000,000

• Warburg Institute, London (3,500,000)

• Yale Center for British Art, New Haven (132,000)
PHAROS Images: Coming June 2016
Problem 3: Bad Images
Idyll: Offline Image Cropping
• Crop and annotate images offline and on a
mobile device.

• Saves the selections back to a server.
ComputerVision
• Unsupervised (requires no labeling):
• Comparing an entire image
• Categorizing an image
• Supervised (requires labeling):
• Finding parts of an image
• Finding and categorizing parts of an image
Unsupervised Training
• Requires little-to-no prepping of data
• Can just give the tool a set of images and
have it produce results
• Extremely easy to get started, results aren’t
always as interesting.
• Unsupervised: MatchEngine, PasteC
Supervised Training
• Need lots of training data
• Needs to be pre-selected/categorized
• Think:Thousands of images.
• If your collection is smaller than this, perhaps
it may not benefit.
• Or you may need crowd sourcing.
• Results can be more interesting:
• “Find all the people in this image”
General Computer
Vision
• Ideal for some supervised training problems
• CCV
• http://libccv.org/
• https://github.com/liuliu/ccv
• OpenCV
• http://opencv.org/
Object Detection
Problem 4: Education
Does your person have a bald spot at the top of their head?
Does your person have a *hidden* bald spot at the top of their head?
Does your person have a *hidden* bald spot at the top of their head?
Does your person have a cloth covering the front of their head?
Does your person have a comb in their hair?
• http://ejohn.org/research/

• http://ukiyo-e.org/
• https://github.com/jeresig

Accidentally Becoming a Digital Librarian