Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AnnoTone (CHI 2015)

2,082 views

Published on

Ryohei Suzuki, Daisuke Sakamoto and Takeo Igarashi
"AnnoTone: Record-time Audio Watermarking for Context-aware Video Editing"
Talked at CHI 2015 Seoul

Published in: Science
  • Be the first to comment

  • Be the first to like this

AnnoTone (CHI 2015)

  1. 1. AnnoTone: Record-time Audio Watermarking for Context-aware Video Editing RYOHEI SUZUKI DAISUKE SAKAMOTO TAKEO IGARASHI THE UNIVERSITY OF TOKYO CHI 2015 @ Seoul Session: What do I hear? Communicating with Sound 1
  2. 2. Video recording and sharing have become casual hobbies for everyone. 2
  3. 3. Camera Computer Software Broadcasting 3
  4. 4. Video Editing is Still Difficult 4 Why? 1. Cost of learning video authoring tools is high 2. Context-aware editing requires much labor for careful review and trial-and-error • Adding visual effects • Clipping scenes • Adding captions and overlays • Using additional information (e.g., GPS)
  5. 5. Our Objective Annotating videos with contextual information during recording to facilitate video editing 5 1. Automate & speed-up video editing activity 2. Enhance expressions using additional data
  6. 6. In this talk, we propose 1. A video-annotation technique requiring no special equipment 2. A video-editing workflow that exploits contextual information for efficient editing. 6
  7. 7. Core Ideas • Encoding contextual information as inaudible sound signals • Embedding encoded annotations directly into the audio track of video during recording • Extracting the embedded information while editing process on demand 7
  8. 8. Annotation Embedding with Smartphone 8
  9. 9. 1. Hardware Setup • Attach smartphone to video camera • Launch annotation-embedding application Attaching Launching application 9
  10. 10. 2. Video Recording • Gathering annotation from user input or sensors • Converting them into inaudible audio signals User Input Sensors Scene Annotation Signals 10
  11. 11. Editing Workflow with Embedded Annotations 11
  12. 12. Workflow Overview 12 • Extract embedded annotation from audio track • Remove annotation signals after editing
  13. 13. Editing Pipeline Generally, video-editing involves a line of pipelined processes. Adding Captions & Effects Color Correction Clipping… … 13
  14. 14. Editing Pipeline Annotated audio track can pass through the existing pipeline as ordinary one. Adding Captions & Effects Color Correction Clipping… … 14
  15. 15. Annotation Extraction Adding Captions & Effects Color Correction Clipping… … 15 Annotation data is extracted on demand using our Watermark Extraction API Watermark Extractor Annotation Data
  16. 16. Annotation Removal Adding Captions & Effects Audio Mastering Clipping… … 16 After the process, annotation signals can be removed by applying an audio filter. Audio Filter
  17. 17. Applications 17
  18. 18. Record-time Editing Recording: information of Success/Failure Editing: Automatic extraction of successful parts Recording Success Failure Success Good! Bad! Good! Success Success Automatic extraction & combining (time) 18
  19. 19. Video-editing with GPS 19 Recording: GPS positions Editing: location-aware editing Clipping movie by sketching on a map Automatic map overlay
  20. 20. Automatic Overlaying 20 Recording: chess note of a game Editing: automatic overlaying of board graphics Notation UI Synthesized video 20
  21. 21. Integrating with AfterEffects AnnoTone plugin provides annotation data for AE which can be used for generating effects Exploiting annotations with existing practice 21 Controlling AE animation with sensor data
  22. 22. Integrating with AfterEffects 1. Analyzing footage to extract annotations 2. Generating a text layer containing JSON- formatted annotation data at timeframe 3. Associating video effects/parameters with annotations using expressions mechanism 22 Footage Effect control (Javascript) JSON text layer [{x: 138.0019, y: 38.13840}, {x: 139.0133, y: 38.43405}]…
  23. 23. Annotation by Audio Watermarking 23
  24. 24. Human’s Hearing Characteristics Human cannot perceive high-frequency sounds. Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.” Scandinavian audiology 27.3 (1998): 189-192. 24
  25. 25. Data-hiding as High-frequency Audio Signals 25 Frequency(Hz) 20 20k 22k 18k High-frequency Range Recordable Range Audible Range We can hide information in the audio track as high-frequency signals (audio watermarks). Microphone Human
  26. 26. Spectrogram of audio track High-frequency region (almost inaudible) 26 Data-hiding as High-frequency Audio Signals Hidden information
  27. 27. Benefit of Audio Watermarking 27 • Compatible with almost all video cameras • Consistent synchronization between annotations and video sequence • Removable by applying low-pass filter
  28. 28. Watermarking Protocol 28 • Dual-Tone Multi-Frequency (DTMF) – Representing 4-bits information by combination of two single tones from 7 frequencies • Packet representation – Variable-length payload – 400 bps gross data rate Spectrogram of a watermark packet
  29. 29. Related Work 29
  30. 30. ContextCam [Patel & Abowd, 2004] Incompatible with existing video cameras. Using special camera to record contexts of home videos Storing annotations in frames by image watermarking 30
  31. 31. Cryptone (Ultra Sound Control) [Hirabayashi & Shimizu, 2012] AnnoTone uses similar audio data-hiding method for video editing support. 01001 11010 Interaction between loudspeaker and smartphones using high-frequency tones to convey information 31
  32. 32. Performance Evaluation 33
  33. 33. 0 20 40 60 80 100 667 571 500 444 400 364 Correctdetectionrate(%) Gross bitrate (bps) silent public rock electronic Data-rate vs. Reliability ~100% correct detection rate was achieved with 400 bps annotation data rate. 34
  34. 34. Travel Distance Watermark signal can travel up to 20cm through air from a smartphone speaker 35 0 20 40 60 80 100 0 5 10 15 20 25 30 Correctdetectionrate(%) Distance between speaker and microphone (cm) silent public rock electronic
  35. 35. Durability against Conversion 36 Watermarks are preserved after conversion into Ogg Vorbis, AC-3 and AAC with enough bitrate. 0 20 40 60 80 100 128 192 256 320 Correctdetectionrate(%) Bit rate (kbps) MP3 Ogg Vorbis AC-3 AAC
  36. 36. Transparency for Human Ear 37 Measured noticeability of watermarks for human • Click a button after notice of noise (6 participants) 0 20 40 60 80 100 silent public rock electronic NoticedWatermarkRate(%) Before Erasure After Erasure
  37. 37. Limitations 38 • One-off development of annotation-embedding applications • Audio quality loss in watermark removal • Limited data-rate of annotation
  38. 38. Future Work 39
  39. 39. Embedding from Public Speaker 40 • Synchronization & integration of large number of videos to create multi-view videos, etc. • Entertainment use at amusement parks, etc. “Sleeping Beauty Castle at Disneyland” by Lyght Licensed under CC BY-SA 3.0 “Picture of Stadium” by Jazza5 Licensed under CC BY-SA 3.0
  40. 40. Conclusion 41
  41. 41. We proposed 42 a video annotation technique using audio watermarking, and a video-editing workflow exploiting annotations. Benefit AnnoTone can facilitate and enhance non-professional video editing process without special equipment.
  42. 42. 43
  43. 43. Compared with Smartphone Recording Some smartphone camera apps can record annotation as metadata format (e.g., Adobe XMP) – Of course, using such apps is clever for smartphone recording occasions What’s AnnoTone’s superiority? • Dedicated video cameras are still superior to smartphone camera – In resolution, definition, lens quality, etc. • No need of dealing with external metadata – Because annotations are directly embedded as sound 44

×