The document discusses how AI and deep learning can help photographers by improving cameras with features like object detection and enabling automated image curation on websites. It provides examples of cameras that use AI, such as Google Clips, and how websites like Yelp use AI to select the best user-uploaded photos. The document also demonstrates AI tools from services like AWS, Cloudinary, and IBM Watson that can be used to build photo applications.
1. Become
A Photo Pro
With AI and Deep Learning
Dan Zeitman - Developer Advocate - Cloudinary
Marek Sadowski - Developer Advocate - IBM
2. Outline
● What is AI / Deep Learning powered Photography?
● INPUT: Taking better and more professional photos
○ AI in New Camera Devices
● OUTPUT: AI powered Websites
○ Automated Curation of images, using Auto-tagging,
Deep Learning & Visual Analysis
○ Advances in algorithmic and AI-based Image
manipulation, filtering, optimization and speedy
delivery
● Demos
○ AI Playground
○ Selfie Camera
○ Upload Widget
3. Key Features For Visual Recognition
● Object, Scene, and Activity detection
● Facial Recognition
● Facial analysis
● Person tracking
● Content Moderation - Unsafe content detection
● Celebrity recognition
● OCR - Text in images
11. AI / Deep Learning Cameras
Google Clips - Consumer Product that captures family members
and their pets.
AWS DeepLens - Learning tool aimed at AI Developers.
Lighthouse - Security camera (on Steroids)
Furbo (Furbo) - Pet Camera that dispenses treats
Spectacles (SnapChat V2) - Popular eyeglass camera will have AI
labels and AR capabilities in the next version
Arsenal Camera Assistant - Black box device to control DLSR
cameras
12. Use Case: Google Clips
Google Clips features
Moment IQ, a machine
learning algorithm that’s
smart enough to
recognize great
expressions, lighting and
framing. And it’s always
learning.
Source:Google Clips
13. Use Case: DeepLens
The world’s first deep
learning enabled video
camera for developers
AWS DeepLens helps put
deep learning in the
hands of developers,
literally, with a fully
programmable video
camera, tutorials, code,
and pre-trained models
designed to expand deep
learning skills.
Source: AWS DeepLens
14. Use Case: Lighthouse
Lighthouse is an
interactive assistant using
advanced camera
technology and machine
learning for your home.
You tell it the security, pet
and family related
activities you care about,
and it tells you when
those things happen.
Source:Lighthouse
15. Use Case: Arsenal Camera Assistant
Arsenal’s ultralight
hardware uses state
of the art AI to take
better photos in any
condition.
Source: Arsenal Camera Assistant
16. Use Case: Arsenal Camera Assistant
Arsenal quickly examines the scene. It uses image recognition to
identify environment and subject-specific needs (e.g. fast shutter
for birds or camera vibration)
Arsenal then finds great settings by comparing the current scene
with thousands of professional photos using a convolutional deep
neural network.
Lastly, Arsenal optimizes settings based on 18 different factors,
like hyperfocal distance, sensor dynamic range and lens
transmission.
Source: Arsenal Camera Assistant
18. Use Case: Yelp
Yelp users upload around 100,000 photos a day to a collection of tens of millions, and
that rate continues to grow.
Yelp turned to various computer vision techniques, trying to discover intrinsic features of
a given image that could be associated with a quality score
Source: Yelp
19. Use Case: Yelp
At Yelp, each business’s page showcases a few of its best photos, which we call cover
photos.
First, this system was highly subject to selection bias. Cover photos are viewed and clicked
significantly more often than average. As a result, once a photo ends up on the business
page, it is highly likely to remain there, even if more attractive and useful photos are
uploaded at a later date.
Additionally, relying solely on likes to determine prominent photos can end up promoting
“clickbait” photos- that is, those that may have low relevance and quality but are upvoted
due to their provocative nature.
Source:, Yelp
20. Cloudinary Search Demo (TJ Bot)
● Overview of Cloudinary DAM / Admin console
● Search Bot Demo
Become a Photo Pro with AI and Deep Learning
Dan Zeitman - Developer Advocate - Cloudinary
Marek Sadowski - Developer Advocate - IBM
What is AI / Deep Learning powered Photography?
INPUT: Taking better and more professional photos
AI in New Camera Devices
OUTPUT: AI powered Websites
Automated Curation of images, using Auto-tagging, Deep Learning & Visual Analysis
Advances in algorithmic and AI-based Image manipulation, filtering, optimization and speedy delivery
https://techcrunch.com/2017/12/01/crunch-report-tinder-is-using-ai-to-get-you-hooked-up/
https://techcrunch.com/2017/08/30/veo/
https://engineeringblog.yelp.com/2016/11/finding-beautiful-yelp-photos-using-deep-learning.html
Facial recognition
Fast and accurate search capability allows you to identify a person in a photo or video using your private repository of face images.
Object, scene, and activity detection
With AI, you can identify thousands of objects (e.g. bike, telephone, building) and scenes (e.g. parking lot, beach, city).
When analyzing video, you can also identify specific activities happening in the frame, such as "delivering a package" or "playing soccer".
Facial recognition
Fast and accurate search capability allows you to identify a person in a photo or video using your private repository of face images.
Facial analysis
You can analyze the attributes of faces in images and videos to determine things like happiness, age range, eyes open, glasses, facial hair, etc. In video, you can also measure how these things change over time, such as constructing a timeline of the emotions of an actor.
Person tracking
Track people through a video even when their faces are not visible, or as they go in and out of the scene. You can also identify their movements in the frame to tell things like whether someone was entering or exiting a building.
Unsafe content detection
Content Moderation helps you identify potentially unsafe or inappropriate content across both image and video assets and provides you with detailed labels that allow you to accurately control what you want to allow based on your needs.
Celebrity recognition
You can quickly identify well known people in your video and image libraries to catalog footage and photos for marketing, advertising, and media industry use cases.
Text in images
Specifically built to work with real world images, AI can detect and recognize text from images, such as street names, captions, product names, and license plates.
https://www.youtube.com/watch?v=JXh1yyvXpwo
Google Clips - Consumer Product that captures family members and their pets.
AWS DeepLens - Learning tool aimed at AI Developers.
Lighthouse - Security camera (on Steroids)
Furbo (Furbo) - Pet Camera that dispenses treats
Spectacles (SnapChat V2) - Popular eyeglass camera will have AI labels and AR capabilities in the next version
Arsenal Camera Assistant - Black box device to control DLSR cameras
https://aws.amazon.com/deeplens/
Lighthouse - https://www.light.house
Arsenal
https://witharsenal.com
https://techcrunch.com/2017/11/14/furbo-unveils-treat-tossing-dog-camera-with-smart-alerts-like-when-your-dog-is-pacing/
Furbo is calling this the “first AI-powered dog camera,” which uses machine learning and computer vision to detect when your dog is chewing, pacing back and forth, or playing with another pup. Furbo will also automatically take a photo of your pup when it’s looking at the camera and let you know when a human (like a dog-walker or puppy thief) comes into view.
http://www.mobyaffiliates.com/blog/snap-to-launch-second-version-of-spectacles-with-ai-capabilities/
Google Clips features Moment IQ, a machine learning algorithm that’s smart enough to recognize great expressions, lighting and framing. And it’s always learning.
Google Clips is smart enough to recognize great expressions, lighting and framing. So the camera captures beautiful, spontaneous images. And it gets smarter over time.
Clips learns to recognize familiar faces over time. You can help it learn faster by pressing the shutter button to shoot a portrait of a friend or family member.
Amazon is calling DeepLens the world’s first deep learning enabled video camera for developers.
AWS DeepLens helps put deep learning in the hands of developers, literally, with a fully programmable video camera, tutorials, code, and pre-trained models designed to expand deep learning skills.
Lighthouse is an interactive assistant using advanced camera technology and machine learning for your home. You tell it the security, pet and family related activities you care about, and it tells you when those things happen.
Product Developer’s Comments:
Ai / Deep learning:
A small amount is done on the device (mainly to filter out objects too small to be classified), but the much of the heavy lifting is done in our neural networks on the cloud.
Closed Source: (?)
Lighthouse built their own natural language processing, computer vision algorithms and custom 3D sensing hardware.
Biggest Challenges for a product manufacturer?
Everything! It's hard to build hardware! Getting the quality right, at the right place, at the right scale, at the right time. Lots goes into it.
Have you considered 3rd party integration with a service like Cloudinary?
We're launching a product with the most complicated and sophisticated computer vision that's ever been created on a consumer product. That's hard enough! At some point we'll focus on integrations, but that's not a priority for us quite yet.
Arsenal’s smart assistant AI suggests settings based on your subject and environment. It uses an advanced neural network to pick the optimal settings for any scene (using similar algorithms to those in self driving cars). Like any good assistant, it then lets you control the final shot.
Arsenal’s ultralight hardware uses state of the art AI to take better photos in any condition.
https://witharsenal.com/features
Arsenal quickly examines the scene. It uses image recognition to identify environment and subject-specific needs (e.g. fast shutter for birds or camera vibration)
Arsenal then finds great settings by comparing the current scene with thousands of professional photos using a convolutional deep neural network.
Lastly, Arsenal optimizes settings based on 18 different factors, like hyperfocal distance, sensor dynamic range and lens transmission.
https://witharsenal.com/features
Tinder:
If you’ve ever quickly swiped through Tinder, you know that sometimes your fingers can get away from you – and, all of a sudden, you’ve Super Liked someone without meaning to. Oops! Tinder today is addressing that problem with a new feature now testing in select markets that will make Super Liking a more intentional experience. Called “Super Likeable,” the feature will pop up at random times in the app to offer you a free Super Like which can be used on one of four people presented on the Super Likeable card.
Tinder says the experience itself is powered by artificial intelligence that helps select the people it thinks will be “of special interest to you.” According to TechCrunch, Tinder tell us, broadly, it’s using a history of your interactions on the service to figure out who sparks your interest.
Veo:
https://techcrunch.com/2017/12/01/crunch-report-tinder-is-using-ai-to-get-you-hooked-up/
https://techcrunch.com/2017/08/30/veo/
https://engineeringblog.yelp.com/2016/11/finding-beautiful-yelp-photos-using-deep-learning.html
Yelp users upload around 100,000 photos a day to a collection of tens of millions, and that rate continues to grow.
Yelp turned to various computer vision techniques, trying to discover intrinsic features of a given image that could be associated with a quality score
Depth of Field:
For example, one important feature for photographers is depth of field, which measures how much of the image is in focus. Using a “shallow” depth of field can be an excellent way to distinguish the subject of an image from its background, and photos uploaded to Yelp are no exception. In many cases, the most beautiful images of a given restaurant were very sharply focused on a specific entrée.
Contrast:
Contrast measures the difference in brightness and color between an object in an image and other nearby objects. There are several formulas for contrast, but most involve comparing the luminance, or light intensity of neighboring regions of an image.
Alignment:
Finally, the location of objects in an image with respect to one another can be a significant aesthetic consideration. Studies have shown, for example, that people have an innate predisposition towards symmetry in art. In addition, some photographers also promote what is called the “rule of thirds,” a method of aligning important elements of an image along certain axes to create a sense of motion or energy.
https://engineeringblog.yelp.com/2016/11/finding-beautiful-yelp-photos-using-deep-learning.html
https://engineeringblog.yelp.com/2015/10/how-we-use-deep-learning-to-classify-business-photos-at-yelp.html
At Yelp, each business’s page showcases a few of its best cover photos.
The key issue is that this system is highly subject to selection bias.
Cover photos are viewed and clicked significantly more often than average.
As a result, once a photo ends up on the business page, it is highly likely to remain there, even if more attractive and useful photos are uploaded at a later date.
Additionally, relying solely on “likes” to determine prominent photos can end up promoting “clickbait” photos- that is, those that may have low relevance and quality but are upvoted due to their provocative nature.
The engineering team at Yelp believes that the quality of cover photos for restaurants has significantly improved.
https://engineeringblog.yelp.com/2016/11/finding-beautiful-yelp-photos-using-deep-learning.html
https://engineeringblog.yelp.com/2015/10/how-we-use-deep-learning-to-classify-business-photos-at-yelp.html