Image Tagging at the Associated Press

•Download as PPTX, PDF•

1 like•637 views

AP's project to apply additional metadata to our images, using custom image recognition technology. Presented to the IPTC on October 16th 2018

Technology

Stuart Myles
Director of Information Management at the Associated Press

Improve Search and Auto-Publishing
• Improve search experience
• In AP portals and in customer CMSes
• Keywords to match more queries and so surface more content
• Filters to narrow results to more relevant content
• Simplify auto publishing
• AP customers have fewer – or even no – editors to manually review content
before publishing
• Filters let customers fine-tune saved searches
• Eliminate customer need to manually identify graphic content
@smyles

AP Images Metadata
• AP handles 3,000 – 4,000 images a day
• Digitize 700 – 800 photos a month
• AP Images has about 34 million photos
• AP already applies metadata to images
• Manually by photographers and editors
• Mapped from third party feeds
• Automatically based on photo text – such as caption – via AP’s tagging service
• We manually keyword some archive images
@smyles

Early Days: First Half of 2017
• Early in 2017, we evaluated leading vendors
• None offered custom tagging
• Disappointing results
• Too many keywords that do not apply to image
• Inaccurate keywords scored with high confidence
• Some were strong for stock images, not so good for news
• Others were too generic
@smyles

High confidence:
Sunglasses
Woman
Man
Low confidence:
Finger
Hand
And notice: watch

High confidence:
Bat
Batter
Baseball
Softball
And notice: watch

Technology Evolves: Second Half of 2017
• We evaluated open source software
• Future option, but the software isn’t mature yet
• Later in 2017, vendors upgraded their offerings
• Most added custom tagging
• Working with business and sales, we designed a new image taxonomy
• Sports actions, NSFW filters, emotions, and image attributes
• Complements the existing news taxonomy we apply to text content
@smyles

A Hybrid Approach: Out of the Box + Custom
• Use out-of-the-box tagging for most concepts
• Train custom tagger for any concepts not covered (or covered well) by OOB tagger
• Find example images for each concept e.g. “tackle” - anywhere from 500 to 5,000 examples per concept
• Test the tagger to make sure it is accurate
• Feed the tagger more examples where it underperforms
• Proof image tagging in Production
• High confidence tags accepted as-is
• Ignore low confidence tags
• Medium confidence tags reviewed by Editorial
@smyles

Train and Test
Management
• We assembled
training sets that we
shared with the
partner
• And we held back test
images
• Testing for accuracy
• Precision
• Recall

Things We Learnt
• Assembling test and train sets is arduous
• But also where most of the value lies
• Some concepts are difficult to distinguish
• Dawn / Dusk, Happy / Jubilant
• Perceived concepts are different than text subjects
• May require some reorganization of our taxonomy and how we represent it
@smyles

What's hot

Page Performance: A No-Holds Barred, Holistic LookJeff Whitfield

PetNet SEOJo Halliday

Using Content Curation to Drive Marketing and RevenueVivastream

SeoICSCSocialMedia

Keywords Are Dead And Other UnOptimization Truthssemrush_webinars

The Basics of Self-PublishingFLBlogCon

Women led business beehive & TorringtonGet up to Speed

Getting the Most out of online Marketing Torrington & HonitonGet up to Speed

Write For WebMaryrose Lyons

What's hot (9)

Page Performance: A No-Holds Barred, Holistic Look

PetNet SEO

Using Content Curation to Drive Marketing and Revenue

Seo

Keywords Are Dead And Other UnOptimization Truths

The Basics of Self-Publishing

Women led business beehive & Torrington

Getting the Most out of online Marketing Torrington & Honiton

Write For Web

Similar to Image Tagging at the Associated Press

Inbound Marketing Conference 2016 SummaryJimmy Smith

The Path to PersonalizationBarbara Holmes

The pathtopersonalization isite_5212014Barbara Holmes

Georgetown University Guest lecture on SEO and online marketingWO Strategies

Content marketing for businessDr Claire Trévien

The Big Picture: the Role of Video, Photography, and Content in Enhancing the...Nuxeo

Paid Marketing: Running The Right Experiments To Grow FastRadius

Maximizing The Impact of Your Content - Serena Ehrlich, Business WireDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Using SEO as a PR Metric - Measurement Base Camp - Greg Jarboe - Feb 7 2023.pptxGreg Jarboe

SEO Tips, Tactics & Strategies for Outdoor Writers, Authors and BloggersPaul Krupin

Content Creation - Bitesize TrainingTim Elliott

Marketing Automation: Key insights from a Best in Show lead generation campai...MarketingSherpa

Growth marketingOnur Polat

Creating a Winning Content StrategyEktron

Inside eCommerce - Micksgarage Masterclass - April 15th 2014.John Walsh

Putting your digital marketing strategy togetherAnne-Maree Kerr

How to win StartupWeekendMarsh Sutherland

Attract the RIGHT Audience With Visual MarketingPost Planner

Social Media Content StrategyDan Berlin

Social Media Content Strateg- Dan Berlin - UPA International 2012 Mad*Pow

Similar to Image Tagging at the Associated Press (20)

Inbound Marketing Conference 2016 Summary

The Path to Personalization

The pathtopersonalization isite_5212014

Georgetown University Guest lecture on SEO and online marketing

Content marketing for business

The Big Picture: the Role of Video, Photography, and Content in Enhancing the...

Paid Marketing: Running The Right Experiments To Grow Fast

Maximizing The Impact of Your Content - Serena Ehrlich, Business Wire

Using SEO as a PR Metric - Measurement Base Camp - Greg Jarboe - Feb 7 2023.pptx

SEO Tips, Tactics & Strategies for Outdoor Writers, Authors and Bloggers

Content Creation - Bitesize Training

Marketing Automation: Key insights from a Best in Show lead generation campai...

Growth marketing

Creating a Winning Content Strategy

Inside eCommerce - Micksgarage Masterclass - April 15th 2014.

Putting your digital marketing strategy together

How to win StartupWeekend

Attract the RIGHT Audience With Visual Marketing

Social Media Content Strategy

Social Media Content Strateg- Dan Berlin - UPA International 2012

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

WordPress Websites for Engineers: Elevate Your Brandgvaughan

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

"ML in Production",Oleksandr BaganFwdays

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

DMCC Future of Trade Web3 - Special Edition

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Scanning the Internet for External Cloud Exposures via SSL Certs

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Understanding the Laravel MVC Architecture

Powerpoint exploring the locations used in television show Time Clash

Unleash Your Potential - Namagunga Girls Coding Club

Unraveling Multimodality with Large Language Models.pdf

"Debugging python applications inside k8s environment", Andrii Soldatenko

WordPress Websites for Engineers: Elevate Your Brand

SIP trunking in Janus @ Kamailio World 2024

SQL Database Design For Developers at php[tek] 2024

My Hashitalk Indonesia April 2024 Presentation

"ML in Production",Oleksandr Bagan

Commit 2024 - Secret Management made easy

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Connect Wave/ connectwave Pitch Deck Presentation

Gen AI in Business - Global Trends Report 2024.pdf

Image Tagging at the Associated Press

1. Stuart Myles Director of Information Management at the Associated Press

2. What is Image Recognition? • Technology to recognize people, places, things or emotions in an image • Available as APIs, as well as open source software • Image recognition involves building a model to identify a set of topics • Topics can be anything you want - baseball, happy faces, drug use, war… • Requires lots of example images, so the software works out what patterns to look for • Consumers of image recognition services are often stock agencies • Off the shelf models are therefore available for concepts like • Graphic/NSFW • Celebrities • Emotion • General keywords – wedding, food • Many commercial companies also offer to create custom models – for a higher fee @smyles

3. Improve Search and Auto-Publishing • Improve search experience • In AP portals and in customer CMSes • Keywords to match more queries and so surface more content • Filters to narrow results to more relevant content • Simplify auto publishing • AP customers have fewer – or even no – editors to manually review content before publishing • Filters let customers fine-tune saved searches • Eliminate customer need to manually identify graphic content @smyles

4. AP Images Metadata • AP handles 3,000 – 4,000 images a day • Digitize 700 – 800 photos a month • AP Images has about 34 million photos • AP already applies metadata to images • Manually by photographers and editors • Mapped from third party feeds • Automatically based on photo text – such as caption – via AP’s tagging service • We manually keyword some archive images @smyles

5. Early Days: First Half of 2017 • Early in 2017, we evaluated leading vendors • None offered custom tagging • Disappointing results • Too many keywords that do not apply to image • Inaccurate keywords scored with high confidence • Some were strong for stock images, not so good for news • Others were too generic @smyles

6. High confidence: Sunglasses Woman Man Low confidence: Finger Hand And notice: watch

7. High confidence: Bat Batter Baseball Softball And notice: watch

8. Technology Evolves: Second Half of 2017 • We evaluated open source software • Future option, but the software isn’t mature yet • Later in 2017, vendors upgraded their offerings • Most added custom tagging • Working with business and sales, we designed a new image taxonomy • Sports actions, NSFW filters, emotions, and image attributes • Complements the existing news taxonomy we apply to text content @smyles

9. A Hybrid Approach: Out of the Box + Custom • Use out-of-the-box tagging for most concepts • Train custom tagger for any concepts not covered (or covered well) by OOB tagger • Find example images for each concept e.g. “tackle” - anywhere from 500 to 5,000 examples per concept • Test the tagger to make sure it is accurate • Feed the tagger more examples where it underperforms • Proof image tagging in Production • High confidence tags accepted as-is • Ignore low confidence tags • Medium confidence tags reviewed by Editorial @smyles

10. Train and Test Management • We assembled training sets that we shared with the partner • And we held back test images • Testing for accuracy • Precision • Recall

11.

12. Things We Learnt • Assembling test and train sets is arduous • But also where most of the value lies • Some concepts are difficult to distinguish • Dawn / Dusk, Happy / Jubilant • Perceived concepts are different than text subjects • May require some reorganization of our taxonomy and how we represent it @smyles

13. Thank you! Questions? @smyles

Image Tagging at the Associated Press

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Image Tagging at the Associated Press

Similar to Image Tagging at the Associated Press (20)

More from Stuart Myles

More from Stuart Myles (20)

Recently uploaded

Recently uploaded (20)

Image Tagging at the Associated Press