SlideShare a Scribd company logo
1 of 12
1
2022/10/24
@shawnmjones 1
2022/10/24
Managed by Triad National Security, LLC, for the U.S. Department of Energy’s NNSA.
Abstract Images Have Different Levels of
Retrievability Per Reverse Image Search Engine
Shawn M. Jones & Diane Oyen
Information Sciences (CCS-3)
2022/10/24
LA-UR-22-30888
2
2022/10/24
@shawnmjones
There are few computer vision research papers focused
on querying and retrieving abstract, technical drawings
• Technical documents typically contain
abstract images
• Many reasons exist to search for
abstract images online:
• protect intellectual property
• build datasets
• find evidence for legal cases
• establish scholarly evidence
• justify funding through image
reuse
https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg
https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg
https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
3
2022/10/24
@shawnmjones
Baidu Bing Google Yandex
Now major search engines support reverse image search
Screenshot source:
https://image.baidu.com
Screenshot source:
https://images.google.com
Screenshot source:
https://www.bing.com/
Screenshot source:
https://yandex.com/images
4
2022/10/24
@shawnmjones
With each service,
a user can upload
an image and
receive different
types of results
pages-with
results
similar-to
results
the uploaded
query image
Uploaded image source: https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg
Screenshot from: https://www.bing.com
5
2022/10/24
@shawnmjones
Research Question
When using the reverse image search
capability of general web search engines,
are natural images more easily discovered
than abstract images?
6
2022/10/24
@shawnmjones
To collect query images, we submitted terms to
Wikimedia Commons’ API
“diagram”
“schematic”
abstract images
“photo”
“photograph”
natural images
100 images
100 images
100 images
99 images
Previous studies have shown that Wikipedia content has high retrievability.
Image sources:
• https://commons.wikimedia.org/wiki/File:Galileo_Diagram.jpg
• https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg
• https://commons.wikimedia.org/wiki/File:Bicycle_diagram-es.svg
• https://commons.wikimedia.org/wiki/File:Systems_Engineering_V_diagram.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Hvdc_bipolar_schematic.svg
• https://commons.wikimedia.org/wiki/File:Beve_gear_schematic.png
• https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
• https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
• https://commons.wikimedia.org/wiki/File:Frank_W._Micklethwaite_photo_of_downtown_Toronto,_1890_-2.jpg
• https://commons.wikimedia.org/wiki/File:James_Abram_Garfield,_photo_portrait_seated.jpg
• https://commons.wikimedia.org/wiki/File:Wtc-photo.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg
• https://commons.wikimedia.org/wiki/File:Photographing_sunrise_1745.jpg
• https://commons.wikimedia.org/wiki/File:FEMA_-_5399_-_Photograph_by_Andrea_Booher_taken_on_09-28-2001_in_New_York.jpg
• https://commons.wikimedia.org/wiki/File:Photographing_a_model.jpg
7
2022/10/24
@shawnmjones
We then submitted
the same image to
each reverse image
search engine
then again with:
and so on...
Image source: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
Image source: https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
Screenshot source:
https://images.google.com
Screenshot source:
https://www.bing.com/
Screenshot source:
https://image.baidu.com
Screenshot source:
https://yandex.com/images
8
2022/10/24
@shawnmjones
Using ImageHash’s pHash and GoFigure’s VisHash we
evaluated how often the same image existed in the
results
pHash was designed
to compare
photographs via
Discrete Cosine
Transforms (DCT).
VisHash was designed
to compare diagrams
and technical
drawings by finding
shapes in the image.
Uploaded images:
https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
Screenshots source:
https://yandex.com/images
9
2022/10/24
@shawnmjones
Precision differs based on pages-with or similar-to
results, with Yandex performing best
blue = abstract images
green = natural images
Precision@k:
What percentage of images in the results are the same as the query image if we stop at k results?
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
10
2022/10/24
@shawnmjones
After reviewing 10 pages-with results, Google has a max of 54% retrievability
difference between images from the categories of photograph and diagram
blue = abstract images
green = natural images
Retrievability:
Given a query image, was it retrieved within the cutoff c?
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
11
2022/10/24
@shawnmjones
For similar-to results, Yandex consistently provides a
high MRR (0.8) for natural images
MRR:
How many results, on
average, across all
queries, must a visitor
review before finding a
the same one again?
Google does well with pages-with results
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
12
2022/10/24
@shawnmjones
Key Takeaways
• We submitted abstract and natural images
from Wikimedia Commons to four major
reverse image search engines.
• When they do return results, Bing and Baidu
do not perform well.
• Google does not perform well for similar-to
results, likely indicating that their definition
of similar-to differs from other search
engines.
• Yandex performs best in all cases.
• Yandex and Google consistently perform
better for natural images in pages-with
results.
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).

More Related Content

Similar to Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureBohyun Kim
 
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURESENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURESIAEME Publication
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491IJRAT
 
Sample CS Senior Capstone Projects
Sample CS Senior Capstone ProjectsSample CS Senior Capstone Projects
Sample CS Senior Capstone ProjectsFred Annexstein
 
Silk Data - Recommendations
Silk Data - RecommendationsSilk Data - Recommendations
Silk Data - RecommendationsNikolay Karelin
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Scienceresearchinventy
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howedomoritz
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningNikolay Karelin
 
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEYAPPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEYcscpconf
 
Applications of spatial features in cbir a survey
Applications of spatial features in cbir  a surveyApplications of spatial features in cbir  a survey
Applications of spatial features in cbir a surveycsandit
 
Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Emily Kolvitz
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
Multivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesMultivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesIJARIIT
 
Image retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveyImage retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveysipij
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)multimediaeval
 
Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Bastian Grimm
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and NowSi Krishan
 

Similar to Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine (20)

Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and Future
 
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURESENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491
 
Sample CS Senior Capstone Projects
Sample CS Senior Capstone ProjectsSample CS Senior Capstone Projects
Sample CS Senior Capstone Projects
 
Silk Data - Recommendations
Silk Data - RecommendationsSilk Data - Recommendations
Silk Data - Recommendations
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Science
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howe
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEYAPPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
 
Applications of spatial features in cbir a survey
Applications of spatial features in cbir  a surveyApplications of spatial features in cbir  a survey
Applications of spatial features in cbir a survey
 
Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
final ppt.pptx
final ppt.pptxfinal ppt.pptx
final ppt.pptx
 
final ppt.pptx
final ppt.pptxfinal ppt.pptx
final ppt.pptx
 
Multivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesMultivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databases
 
Image retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveyImage retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a survey
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
 
Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 

More from Shawn Jones

DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...Shawn Jones
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Shawn Jones
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsShawn Jones
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitShawn Jones
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-ItShawn Jones
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesShawn Jones
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestShawn Jones
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentShawn Jones
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wikiShawn Jones
 

More from Shawn Jones (19)

DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Reference Rot
Reference RotReference Rot
Reference Rot
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

  • 1. 1 2022/10/24 @shawnmjones 1 2022/10/24 Managed by Triad National Security, LLC, for the U.S. Department of Energy’s NNSA. Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine Shawn M. Jones & Diane Oyen Information Sciences (CCS-3) 2022/10/24 LA-UR-22-30888
  • 2. 2 2022/10/24 @shawnmjones There are few computer vision research papers focused on querying and retrieving abstract, technical drawings • Technical documents typically contain abstract images • Many reasons exist to search for abstract images online: • protect intellectual property • build datasets • find evidence for legal cases • establish scholarly evidence • justify funding through image reuse https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
  • 3. 3 2022/10/24 @shawnmjones Baidu Bing Google Yandex Now major search engines support reverse image search Screenshot source: https://image.baidu.com Screenshot source: https://images.google.com Screenshot source: https://www.bing.com/ Screenshot source: https://yandex.com/images
  • 4. 4 2022/10/24 @shawnmjones With each service, a user can upload an image and receive different types of results pages-with results similar-to results the uploaded query image Uploaded image source: https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg Screenshot from: https://www.bing.com
  • 5. 5 2022/10/24 @shawnmjones Research Question When using the reverse image search capability of general web search engines, are natural images more easily discovered than abstract images?
  • 6. 6 2022/10/24 @shawnmjones To collect query images, we submitted terms to Wikimedia Commons’ API “diagram” “schematic” abstract images “photo” “photograph” natural images 100 images 100 images 100 images 99 images Previous studies have shown that Wikipedia content has high retrievability. Image sources: • https://commons.wikimedia.org/wiki/File:Galileo_Diagram.jpg • https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg • https://commons.wikimedia.org/wiki/File:Bicycle_diagram-es.svg • https://commons.wikimedia.org/wiki/File:Systems_Engineering_V_diagram.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Hvdc_bipolar_schematic.svg • https://commons.wikimedia.org/wiki/File:Beve_gear_schematic.png • https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png • https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg • https://commons.wikimedia.org/wiki/File:Frank_W._Micklethwaite_photo_of_downtown_Toronto,_1890_-2.jpg • https://commons.wikimedia.org/wiki/File:James_Abram_Garfield,_photo_portrait_seated.jpg • https://commons.wikimedia.org/wiki/File:Wtc-photo.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg • https://commons.wikimedia.org/wiki/File:Photographing_sunrise_1745.jpg • https://commons.wikimedia.org/wiki/File:FEMA_-_5399_-_Photograph_by_Andrea_Booher_taken_on_09-28-2001_in_New_York.jpg • https://commons.wikimedia.org/wiki/File:Photographing_a_model.jpg
  • 7. 7 2022/10/24 @shawnmjones We then submitted the same image to each reverse image search engine then again with: and so on... Image source: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg Image source: https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png Screenshot source: https://images.google.com Screenshot source: https://www.bing.com/ Screenshot source: https://image.baidu.com Screenshot source: https://yandex.com/images
  • 8. 8 2022/10/24 @shawnmjones Using ImageHash’s pHash and GoFigure’s VisHash we evaluated how often the same image existed in the results pHash was designed to compare photographs via Discrete Cosine Transforms (DCT). VisHash was designed to compare diagrams and technical drawings by finding shapes in the image. Uploaded images: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png Screenshots source: https://yandex.com/images
  • 9. 9 2022/10/24 @shawnmjones Precision differs based on pages-with or similar-to results, with Yandex performing best blue = abstract images green = natural images Precision@k: What percentage of images in the results are the same as the query image if we stop at k results? S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  • 10. 10 2022/10/24 @shawnmjones After reviewing 10 pages-with results, Google has a max of 54% retrievability difference between images from the categories of photograph and diagram blue = abstract images green = natural images Retrievability: Given a query image, was it retrieved within the cutoff c? S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  • 11. 11 2022/10/24 @shawnmjones For similar-to results, Yandex consistently provides a high MRR (0.8) for natural images MRR: How many results, on average, across all queries, must a visitor review before finding a the same one again? Google does well with pages-with results S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  • 12. 12 2022/10/24 @shawnmjones Key Takeaways • We submitted abstract and natural images from Wikimedia Commons to four major reverse image search engines. • When they do return results, Bing and Baidu do not perform well. • Google does not perform well for similar-to results, likely indicating that their definition of similar-to differs from other search engines. • Yandex performs best in all cases. • Yandex and Google consistently perform better for natural images in pages-with results. S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).