Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amazon Rekognition & Amazon Polly

132 views

Published on

Amazon Rekognition & Amazon Polly

  • Be the first to comment

Amazon Rekognition & Amazon Polly

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing Amazon Rekognition & Amazon Polly Sara Mitchell AWS Solutions Architect sarmitc@amazon.com
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Rekognition Extract rich metadata from visual content Object and Scene Detection Facial Analysis Face Comparison Facial Recognition Celebrity Recognition Image Moderation
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://console.aws.amazon.com/rekognition/home
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Built in 3 weeks • Indexed against 99,000 people • Index created in one day • Saved ~9,000 hours a year in manual curation costs • Live video with frame sampling Automating Footage Tagging with Amazon Rekognition Previously, only about half of all footage was indexed due to the immense time requirements required by manual processes
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Automating Footage Tagging with Amazon Rekognition Solution Architecture EncodersStills Extraction & FeedsResults Cache Bucket R3 Amazon Rekognition users Stills Frames SQS Trigger 1 2 3 4
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. aws rekognition recognize-celebrities –image “S3Object={Bucket=mybucket,Name=cam.jpg}” aws rekognition search-faces-by-image –image “S3Object={Bucket=mybucket,Name=cam.jpg}” --collection-id “persons-of-interest" aws rekognition create-collection --collection-id “persons-of-interest” aws rekognition index-faces --image “S3Object={Bucket=mybucket,Name=subject.jpg}” --collection-id “persons-of-interest” Rekognition APIs – Advanced Usage { "FaceMatches": [ {"Face": {"BoundingB "Height": 0.2683333456516266, "Left": 0.5099999904632568, "Top": 0.1783333271741867, "Width": 0.17888888716697693}, " CompareFaces DetectFaces DetectLabels DetectModerationLabels GetCelebrityInfo RecognizeCelebrities 2 { "FaceMatches": [ {"Face": {"BoundingB "Height": 0.2683333456516266, "Left": 0.5099999904632568, "Top": 0.1783333271741867, "Width": 0.17888888716697693}, " CreateCollection DeleteCollection DeleteFaces IndexFaces ListCollections SearchFaces SearchFacesByImage ListFaces 3 1
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rekognition APIs – Advanced Usage Decision trees and processing pipelines Why? • Many use cases require more than a single operation to arrive at actionable data How? • S3 event notifications, Lambda, Step Functions • DynamoDB for persistent pipeline storage • Augmenting results with 3rd Party AI/ML • OpenCV, MXNet, etc. on EC2 Spot, ECS, AI/ML AMI Sample Use Cases • Person of interest near a celebrity • Multi-pass motion detection enhancement • Subjects leaving a location without possessions IndexFaces DetectLabels “person”
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Rekognition from HTML Client
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Ingestion • Ingest from Camera • Take a snapshot • Query Amazon Rekognition – detect labels • Query Amazon Rekognition Collection – is this person known? Amazon Rekognition Image Collection Extract meta-data for the image Validate image matches with stored image Amazon Cognito HTML Unauthenticated identity <scripttype="text/javascript"src="https://s3.amazonaws.com/.../webcam.min.js"></script>
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rekognition APIs – example Detect labels using ML model Rekognition stores meta-data about image
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rekognition APIs – example Match against image collection using ML models
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rekognition APIs – example How close a match?
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Rekognition Video Features Detects objects, activities, scenes. Label name, timestamp, confidence Detects and analyzes faces. Timestamp, face bounding box, age range, emotion, gender, pose … Search video for matches in a face collection. Timestamp, person & face bounding boxes, matched face IDs, similarity % … Detects and tracks unique people, including occlusions and shot changes, Persona and Face bounding box, person ID Recognizes celebrities throughout video. Celebrity name, bounding box, confidence % Detects nudity and explicit nudity. Timestamp, confidence, label name.
  15. 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  16. 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  17. 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Rekognition Video Streams • CCTV for building entry • Ensure workers on a building site are registered and known. • Foreign object detection on a runway • Trespassing on railway track • Customer personalization • Looking for celebrities in your street • Working out who’s at the front door
  18. 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Rekognition Video Who’s at the front door? Producer Application
  19. 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CreateKinesisVideoStream aws kinesisvideo create-stream --stream-name "video- stream-name" --data-retention-in-hours "24”
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Requirements for your application code • Producer Application • Any video data generating device • Streaming media data in real time • Streaming media data after buffering it for a few seconds • Streaming after-the-fact media uploads • Kinesis Video Streams Producer Libraries and SDK • Java • Android • C++ • Consumer Application • Lambda function - python • Triggered by the data arriving on the Kinesis Data Stream • Remember data flow is Camera > Kinesis Video Stream > Rekognition Data Processor > Kinesis Data Stream
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introduction to Amazon Polly
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Polly Natural sounding speech A subjective measure of how close TTS output is to human speech. Accurate text processing Ability of the system to interpret common text formats such as abbreviations, numerical sequences, homographs etc. Today in Las Vegas, NV it's 90°F. "We live for the music", live from the Madison Square Garden. Highly intelligibile A measure of how comprehensible speech is. ”Peter Piper picked a peck of pickled peppers.”
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Amazon Polly? • A service that converts text into lifelike speech • Offers 50 lifelike voices across 24 languages • Low latency responses enable developers to build real-time systems • Developers can store, replay and distribute generated speech
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Features and Functionality
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Polly features: SSML Speech Synthesis Markup Language is a W3C recommendation, an XML-based markup language for speech synthesis applications <speak> My name is Kuklinskei. It is spelled <prosody rate='x-slow'> <say-as interpret-as="characters">Kuklinskei</say-as> </prosody> </speak>
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • SSML is a W3C recommendation, an XML-based markup language for speech synthesis applications • All SSML documents must start with an opening <speak> tag and end with a closing </speak> tag. All other tags are inserted between <speak></speak>. Speech Synthesis Markup Language (SSML)
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The <lang> tag English in Italian The pronunciation of English is like that of a non-bilingual Italian speaker. <speak> Mi piace Bruce Springsteen. </speak> <speak> Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang> </speak>
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fun with SSML 'Can you make your voices sound like an auctioneer?' <speak><prosody rate='+60%'>I’m at 500 and I want 550<prosody volume='x-loud'>550</prosody></prosody> <prosody rate='+60%'>bid on 550 I’m at 500 would you go 550 550 for the gentleman in the corner</prosody> <prosody rate="+90%">A big black bug bit a big black bear a big black bug bit a big black bear</prosody> Do we get 600? <prosody rate='+90%'>A big black bug bit a big black bear</prosody><prosody rate='+60%'>We got 600 for the whole herd</prosody><prosody rate='default' volume='x- loud'>Sold <prosody rate='+60%'>for 600.</prosody></prosody></speak>
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Royal National Institute of Blind People creates and distributes accessible information in the form of synthesized content Amazon Polly delivers incredibly lifelike voices which captivate and engage our readers. John Worsfold Solutions Implementation Manager, RNIB ” “ • RNIB delivers largest library of audiobooks in the UK for nearly 2 million people with sight loss • Naturalness of generated speech is critical to captivate and engage readers • No restrictions on speech redistributions enables RNIB to create and distribute accessible information in a form of synthesized content RNIB provides the largest library in the UK for people with sight loss
  30. 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You! aws.amazon.com/blogs/machine-learning/
  31. 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing Amazon Rekognition & Amazon Polly Sara Mitchell AWS Solutions Architect sarmitc@amazon.com

×