When automated analysis goes wrong

•Download as PPTX, PDF•

0 likes•492 views

A presentation for the Museum Computer Network conference 2017. Four examples where automated analysis of images and text worked for us, and four where it went wrong, often in an amusing way.

Government & Nonprofit

When automated
analysis goes wrong
Tristan Roddis
Cogapp
MCN conference 2017

Tristan Roddis. Cogapp
In this session
• Successes
• FAILURES

Tristan Roddis. Cogapp
Failure 1: Interesting
images

Adrian Hindle, Tristan Roddis. Cogapp
"Nourishment for the Ailing" and "Nourishment for the Healthy"
http://www.qdl.qa/en/archive/qnlhc/9549.20

Adrian Hindle, Tristan Roddis. Cogapp
"Nourishment for the Ailing" and "Nourishment for the Healthy"
http://www.qdl.qa/en/archive/qnlhc/9549.19

Tristan Roddis. Cogapp
Success 1: Interesting
images

Adrian Hindle, Tristan Roddis. Cogapp
• Automatically tag, organize, and search visual
content with machine learning.
• Create concepts

Adrian Hindle, Tristan Roddis. Cogapp
• Existing concepts:
• Illustration
• Drawing

Adrian Hindle, Tristan Roddis. Cogapp
• Negative concept:
• Text
• Manuscripts

Adrian Hindle, Tristan Roddis. Cogapp
• Combination of two custom concepts
• arabic_manuscript
• arabic_ manuscript_with_image

Adrian Hindle, Tristan Roddis. Cogapp
45 images
45 images

Adrian Hindle, Tristan Roddis. Cogapp
26 images
26 images

Adrian Hindle, Tristan Roddis. Cogapp
• Python script to create sets and train
• Test and train

Tristan Roddis. Cogapp
Other uses
• Blank sheets vs. writing
• Handwritten vs. typewritten
• [your collection-specific question here]

Tristan Roddis. Cogapp
Failure 2: term
extraction

Tristan Roddis. Cogapp
Success 2: add a
human

Tristan Roddis. Cogapp
Failure 3: Finding similar
images

Tristan Roddis. Cogapp
Scikit-learn
• Machine learning is not easy
• Especially for this problem

Tristan Roddis. Cogapp
Success 3: Finding similar
images

Tristan Roddis. Cogapp
Term extraction
• Clarifai
• Google Vision API
• Microsoft Computer Vision

Tristan Roddis. Cogapp
Google Vision API

Tristan Roddis. Cogapp
MS Computer vision

Tristan Roddis. Cogapp
Images
• Sourced from Nationalmuseum Sweden
• Using Europeana API for discovery
• 2000 images
• http://labs.cogapp.com/iiif-ml/

Tristan Roddis. Cogapp
Success 4:
Optical music recognition

http://labs.cogapp.com/nls-omr/wavs/91387296.wav

Tristan Roddis. Cogapp
Failure 4: Automated
captions

Emanuel Swedenborgsitting in front of
a laptop

Person sitting in a chair talking on a cell
phone

A group of sheep standing on top of a
horse

A group of sheep standing on top of a
book

Tristan Roddis. Cogapp
Conclusions
• Embrace failure
• Find workarounds
• Add humans to the mix

Thank you.
Questions?
http://labs.cogapp.com/iiif-ml
http://labs.cogapp.com/nls-omr
Tristan Roddis
tristanr@cogapp.com

Recently uploaded

Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…nishakur201

(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe9953056974 Low Rate Call Girls In Saket, Delhi NCR

VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...Suhani Kapoor

The Federal Budget and Health Care PolicyCongressional Budget Office

(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escortsaditipandeya

Delhi Russian Call Girls In Connaught Place ➡️9999965857 India's Finest Model...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...Suhani Kapoor

Climate change and occupational safety and health.Christina Parmionova

(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

VIP Call Girls Pune Vani 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile

Item # 4 - 231 Encino Ave (Significance Only).pdfahcitycouncil

(TARA) Call Girls Sanghavi ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Artificial Intelligence in Philippine Local Governance: Challenges and Opport...CedZabala

Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823

WIPO magazine issue -1 - 2024 World Intellectual Property organization.Christina Parmionova

VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Roomishabajaj13

(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Zechariah Boodey Farmstead Collaborative presentation - Humble Beginningsinfo695895

Recently uploaded (20)

Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…

(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe

VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...

The Federal Budget and Health Care Policy

(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service

VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escorts

Delhi Russian Call Girls In Connaught Place ➡️9999965857 India's Finest Model...

VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...

Climate change and occupational safety and health.

(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service

VIP Call Girls Pune Vani 8617697112 Independent Escort Service Pune

Item # 4 - 231 Encino Ave (Significance Only).pdf

(TARA) Call Girls Sanghavi ( 7001035870 ) HI-Fi Pune Escorts Service

Artificial Intelligence in Philippine Local Governance: Challenges and Opport...

Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...

WIPO magazine issue -1 - 2024 World Intellectual Property organization.

VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room

(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service

Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings

When automated analysis goes wrong

1. When automated analysis goes wrong Tristan Roddis Cogapp MCN conference 2017

2. Tristan Roddis. Cogapp In this session • Successes • FAILURES

3. Tristan Roddis. Cogapp Failure 1: Interesting images

4. Tristan Roddis. Cogapp The problem

5. Adrian Hindle, Tristan Roddis. Cogapp "Nourishment for the Ailing" and "Nourishment for the Healthy" http://www.qdl.qa/en/archive/qnlhc/9549.20

6. Adrian Hindle, Tristan Roddis. Cogapp "Nourishment for the Ailing" and "Nourishment for the Healthy" http://www.qdl.qa/en/archive/qnlhc/9549.19

7. Adrian Hindle, Tristan Roddis. Cogapp

8. Tristan Roddis

9. Tristan Roddis

10. Tristan Roddis. Cogapp Success 1: Interesting images

11. Adrian Hindle, Tristan Roddis. Cogapp • Automatically tag, organize, and search visual content with machine learning. • Create concepts

12. Adrian Hindle, Tristan Roddis. Cogapp • Existing concepts: • Illustration • Drawing

13. Adrian Hindle, Tristan Roddis. Cogapp • Negative concept: • Text • Manuscripts

14. Adrian Hindle, Tristan Roddis. Cogapp • Combination of two custom concepts • arabic_manuscript • arabic_ manuscript_with_image

15. Adrian Hindle, Tristan Roddis. Cogapp 45 images 45 images

16. Adrian Hindle, Tristan Roddis. Cogapp 26 images 26 images

17. Adrian Hindle, Tristan Roddis. Cogapp • Python script to create sets and train • Test and train

18. Adrian Hindle Tristan Roddis

19. Adrian Hindle, Tristan Roddis. Cogapp

20. Adrian Hindle Tristan Roddis

21. Adrian Hindle Tristan Roddis

22. Tristan Roddis

23. Tristan Roddis. Cogapp Other uses • Blank sheets vs. writing • Handwritten vs. typewritten • [your collection-specific question here]

24. Tristan Roddis. Cogapp Failure 2: term extraction

25. Tristan Roddis. Cogapp Failure 2: term extraction

26. Tristan Roddis. Cogapp Failure 2: term extraction

27. Tristan Roddis. Cogapp Failure 2: term extraction

28. Tristan Roddis. Cogapp Success 2: add a human

29. Tristan Roddis. Cogapp Failure 3: Finding similar images

30. Tristan Roddis. Cogapp Scikit-image

31. Tristan Roddis. Cogapp Scikit-image

32. Tristan Roddis. Cogapp Scikit-image

33. Tristan Roddis. Cogapp Scikit-image

34. Tristan Roddis. Cogapp Scikit-learn • Machine learning is not easy • Especially for this problem

35. Tristan Roddis. Cogapp Success 3: Finding similar images

36. Tristan Roddis. Cogapp Term extraction • Clarifai • Google Vision API • Microsoft Computer Vision

37. Tristan Roddis. Cogapp Clarifai

38. Tristan Roddis. Cogapp Google Vision API

39. Tristan Roddis. Cogapp MS Computer vision

40. Tristan Roddis. Cogapp Images • Sourced from Nationalmuseum Sweden • Using Europeana API for discovery • 2000 images • http://labs.cogapp.com/iiif-ml/

41.

42.

43. Tristan Roddis. Cogapp Success 4: Optical music recognition

44.

45. http://labs.cogapp.com/nls-omr/wavs/91387296.wav

46. http://labs.cogapp.com/nls-omr/

47. Tristan Roddis. Cogapp Failure 4: Automated captions

48. A vase of flowers on a display

49. Emanuel Swedenborg

50.

51. Emanuel Swedenborgsitting in front of a laptop

52. Person using a phone

53. Person sitting in a chair talking on a cell phone

54. A man and a woman taking a selfie

55. A cat with its mouth open

56. A pizza sitting on top of a window

57. A man riding a bear in the water

58. A teddy bear

59. A group of sheep standing on top of a horse

60. A group of sheep standing on top of a book

61. Tristan Roddis. Cogapp Conclusions • Embrace failure • Find workarounds • Add humans to the mix

62. Thank you. Questions? http://labs.cogapp.com/iiif-ml http://labs.cogapp.com/nls-omr Tristan Roddis tristanr@cogapp.com

Editor's Notes

Digital agency based in UK. Work internationally. Would love to work for you.
Present some experiments and research we’ve been doing. Look at three problems that can be solved using automated image analysis: interesting items, color extraction, finding similar images In all cases: images only. Deliberately not using metadata: manifests only for providing lists of images to analyse. Present our findings: some positive, full disclosure: some negative
Adrian We manage the Qatar Digital Library website, that currently has nearly a million scanned pages. Set ourselves the challenge of automating the process of finding “visually interesting” document pages Not sure what it is but we know what it’s not (not text, not blank, not bindings...
Over 600 pages Several dozen manuscripts available, so tens of thousands of images. And constantly adding more, so not easily achievable by humans. Impractical
We have some info but not for all, this is on the logical, hard to extract info or missing
Examples
Tristan First approach we tried was colour analysis More colours used in illustrations than in plain black script Imagaa extracts foreground/background colours, “color variance”
Between variance 11 and 17. Hugely mixed. Gave up: tried a different tack.
Adrian We manage the Qatar Digital Library website, that currently has nearly a million scanned pages. Set ourselves the challenge of automating the process of finding “visually interesting” document pages Not sure what it is but we know what it’s not (not text, not blank, not bindings...
Adrian
Lots of concepts: food, colour, focus ... Picking up stains
Still not good results
Call this what you want Tried with one - not very good Build up two training sets: one with positive results, one with negative results
IIIF Collection picked 10 random archives and then 10 images per archive Red -> interest Trained it a couple times (good results) 1 error here updated the set
Example for one archive
And it works
And again
Created a manifest - Mirador
Last point: you can apply this technique of negative/positive training sets to _any_ visual problem. We demonstrated one version of this particular to our collection, but I’m sure you can think of similar questions about your own.
Tristan Colour extraction is the low-hanging fruit of automated image analysis Very easy to do via scripts or APIs
“a machine quite literally just said the Met’s collection is full of shit”
Tristan Colour extraction is the low-hanging fruit of automated image analysis Very easy to do via scripts or APIs
Paris, Texas vs Paris, France
Adrian Last problem we looked Tried two approaches
Collection of algorithms for image processingSkeletonize: each pixel removed if doesn’t break connectivity, then segment colourCensure: feature detector (scale invariant center-surround detector)Daisy: local image descriptor based on gradient orientation histogramsThese are good to find the same image (with noise...)
Mean squared error: square of the diff between px in A and B, sum and divide by nb of pixelsStructural similarity: measure similarity (viewed as a quality measure) Results as expected
Very different
These 2 are more similar
Then started working with ML Harder than it looks (ok if you are an ML scientist) Not quick Tried something else
Adrian Last problem we looked Tried two approaches
Tristan 10 days before conference. Tried different approach.
Tristan: only indexed if confidence value is > 0.75 Elasticsearch with Searchkit interface
Focus on left-hand tags
Different tags
Different tags
Different tags
Adrian Last problem we looked Tried two approaches
MS captioning. Works well
Each AI has a “personality” of sorts

When automated analysis goes wrong

Recommended

Recommended

More Related Content

More from Cogapp

More from Cogapp (20)

Recently uploaded

Recently uploaded (20)

When automated analysis goes wrong

Editor's Notes