Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Image Tagging at the Associated Press

502 views

Published on

AP's project to apply additional metadata to our images, using custom image recognition technology.

Presented to the IPTC on October 16th 2018

Published in: Technology
  • I like this service ⇒ www.HelpWriting.net ⇐ from Academic Writers. I don't have enough time write it by myself.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I do no use writing service very often, only when I really have problems. But this one, I like best of all. The team of writers operates very quickly. It's called ⇒ www.WritePaper.info ⇐ Hope this helps!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You can try to use this service ⇒ www.HelpWriting.net ⇐ I have used it several times in college and was absolutely satisfied with the result.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Wall Street business magnate Stephen Chu, winner of the Strathmore’s Who’s Who Registry honoring the most successful business tycoons in the world, says the "Demolisher" Betting System lived up exactly to its billing! ♥♥♥ http://t.cn/A6zP2GDT
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Image Tagging at the Associated Press

  1. 1. Stuart Myles Director of Information Management at the Associated Press
  2. 2. What is Image Recognition? • Technology to recognize people, places, things or emotions in an image • Available as APIs, as well as open source software • Image recognition involves building a model to identify a set of topics • Topics can be anything you want - baseball, happy faces, drug use, war… • Requires lots of example images, so the software works out what patterns to look for • Consumers of image recognition services are often stock agencies • Off the shelf models are therefore available for concepts like • Graphic/NSFW • Celebrities • Emotion • General keywords – wedding, food • Many commercial companies also offer to create custom models – for a higher fee @smyles
  3. 3. Improve Search and Auto-Publishing • Improve search experience • In AP portals and in customer CMSes • Keywords to match more queries and so surface more content • Filters to narrow results to more relevant content • Simplify auto publishing • AP customers have fewer – or even no – editors to manually review content before publishing • Filters let customers fine-tune saved searches • Eliminate customer need to manually identify graphic content @smyles
  4. 4. AP Images Metadata • AP handles 3,000 – 4,000 images a day • Digitize 700 – 800 photos a month • AP Images has about 34 million photos • AP already applies metadata to images • Manually by photographers and editors • Mapped from third party feeds • Automatically based on photo text – such as caption – via AP’s tagging service • We manually keyword some archive images @smyles
  5. 5. Early Days: First Half of 2017 • Early in 2017, we evaluated leading vendors • None offered custom tagging • Disappointing results • Too many keywords that do not apply to image • Inaccurate keywords scored with high confidence • Some were strong for stock images, not so good for news • Others were too generic @smyles
  6. 6. High confidence: Sunglasses Woman Man Low confidence: Finger Hand And notice: watch
  7. 7. High confidence: Bat Batter Baseball Softball And notice: watch
  8. 8. Technology Evolves: Second Half of 2017 • We evaluated open source software • Future option, but the software isn’t mature yet • Later in 2017, vendors upgraded their offerings • Most added custom tagging • Working with business and sales, we designed a new image taxonomy • Sports actions, NSFW filters, emotions, and image attributes • Complements the existing news taxonomy we apply to text content @smyles
  9. 9. A Hybrid Approach: Out of the Box + Custom • Use out-of-the-box tagging for most concepts • Train custom tagger for any concepts not covered (or covered well) by OOB tagger • Find example images for each concept e.g. “tackle” - anywhere from 500 to 5,000 examples per concept • Test the tagger to make sure it is accurate • Feed the tagger more examples where it underperforms • Proof image tagging in Production • High confidence tags accepted as-is • Ignore low confidence tags • Medium confidence tags reviewed by Editorial @smyles
  10. 10. Train and Test Management • We assembled training sets that we shared with the partner • And we held back test images • Testing for accuracy • Precision • Recall
  11. 11. Things We Learnt • Assembling test and train sets is arduous • But also where most of the value lies • Some concepts are difficult to distinguish • Dawn / Dusk, Happy / Jubilant • Perceived concepts are different than text subjects • May require some reorganization of our taxonomy and how we represent it @smyles
  12. 12. Thank you! Questions? @smyles

×