SlideShare a Scribd company logo
AnnoTone:
Record-time Audio Watermarking
for Context-aware Video Editing
RYOHEI SUZUKI
DAISUKE SAKAMOTO
TAKEO IGARASHI
THE UNIVERSITY OF TOKYO
CHI 2015 @ Seoul
Session: What do I hear? Communicating with Sound
1
Video recording and sharing have become
casual hobbies for everyone.
2
Camera Computer
Software Broadcasting
3
Video Editing is Still Difficult
4
Why?
1. Cost of learning video authoring tools is high
2. Context-aware editing requires much labor
for careful review and trial-and-error
• Adding visual effects
• Clipping scenes
• Adding captions and overlays
• Using additional information (e.g., GPS)
Our Objective
Annotating videos with contextual information
during recording to facilitate video editing
5
1. Automate & speed-up video editing activity
2. Enhance expressions using additional data
In this talk, we propose
1. A video-annotation technique requiring
no special equipment
2. A video-editing workflow that exploits
contextual information for efficient editing.
6
Core Ideas
• Encoding contextual information as
inaudible sound signals
• Embedding encoded annotations directly
into the audio track of video during recording
• Extracting the embedded information
while editing process on demand
7
Annotation Embedding
with Smartphone
8
1. Hardware Setup
• Attach smartphone to video camera
• Launch annotation-embedding application
Attaching Launching application
9
2. Video Recording
• Gathering annotation from user input or sensors
• Converting them into inaudible audio signals
User Input Sensors
Scene
Annotation Signals
10
Editing Workflow with
Embedded Annotations
11
Workflow Overview
12
• Extract embedded annotation from audio track
• Remove annotation signals after editing
Editing Pipeline
Generally, video-editing involves
a line of pipelined processes.
Adding
Captions
& Effects
Color
Correction
Clipping… …
13
Editing Pipeline
Annotated audio track can pass through
the existing pipeline as ordinary one.
Adding
Captions
& Effects
Color
Correction
Clipping… …
14
Annotation Extraction
Adding
Captions
& Effects
Color
Correction
Clipping… …
15
Annotation data is extracted on demand
using our Watermark Extraction API
Watermark Extractor
Annotation Data
Annotation Removal
Adding
Captions
& Effects
Audio
Mastering
Clipping… …
16
After the process, annotation signals
can be removed by applying an audio filter.
Audio Filter
Applications
17
Record-time Editing
Recording: information of Success/Failure
Editing: Automatic extraction of successful parts
Recording
Success Failure Success
Good! Bad! Good!
Success Success
Automatic extraction & combining
(time)
18
Video-editing with GPS
19
Recording: GPS positions
Editing: location-aware editing
Clipping movie by
sketching on a map
Automatic map overlay
Automatic Overlaying
20
Recording: chess note of a game
Editing: automatic overlaying of board graphics
Notation UI Synthesized video
20
Integrating with AfterEffects
AnnoTone plugin provides annotation data for AE
which can be used for generating effects
Exploiting annotations with existing practice
21
Controlling AE animation
with sensor data
Integrating with AfterEffects
1. Analyzing footage to extract annotations
2. Generating a text layer containing JSON-
formatted annotation data at timeframe
3. Associating video effects/parameters with
annotations using expressions mechanism
22
Footage
Effect control
(Javascript)
JSON text layer
[{x: 138.0019,
y: 38.13840},
{x: 139.0133,
y: 38.43405}]…
Annotation by
Audio Watermarking
23
Human’s Hearing Characteristics
Human cannot perceive high-frequency sounds.
Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.”
Scandinavian audiology 27.3 (1998): 189-192.
24
Data-hiding as High-frequency
Audio Signals
25
Frequency(Hz)
20
20k
22k
18k
High-frequency
Range
Recordable
Range
Audible
Range
We can hide information in the audio track
as high-frequency signals (audio watermarks).
Microphone Human
Spectrogram of audio track
High-frequency region
(almost inaudible)
26
Data-hiding as High-frequency
Audio Signals
Hidden
information
Benefit of Audio Watermarking
27
• Compatible with almost all video cameras
• Consistent synchronization between
annotations and video sequence
• Removable by applying low-pass filter
Watermarking Protocol
28
• Dual-Tone Multi-Frequency (DTMF)
– Representing 4-bits information by combination of
two single tones from 7 frequencies
• Packet representation
– Variable-length payload
– 400 bps gross data rate
Spectrogram of a watermark packet
Related Work
29
ContextCam
[Patel & Abowd, 2004]
Incompatible with existing video cameras.
Using special camera to record contexts of home videos
Storing annotations in frames by image watermarking
30
Cryptone (Ultra Sound Control)
[Hirabayashi & Shimizu, 2012]
AnnoTone uses similar audio data-hiding method
for video editing support.
01001
11010
Interaction between loudspeaker and smartphones
using high-frequency tones to convey information
31
Performance Evaluation
33
0
20
40
60
80
100
667 571 500 444 400 364
Correctdetectionrate(%)
Gross bitrate (bps)
silent
public
rock
electronic
Data-rate vs. Reliability
~100% correct detection rate was achieved
with 400 bps annotation data rate.
34
Travel Distance
Watermark signal can travel up to 20cm
through air from a smartphone speaker 35
0
20
40
60
80
100
0 5 10 15 20 25 30
Correctdetectionrate(%)
Distance between
speaker and microphone (cm)
silent
public
rock
electronic
Durability against Conversion
36
Watermarks are preserved after conversion into
Ogg Vorbis, AC-3 and AAC with enough bitrate.
0
20
40
60
80
100
128 192 256 320
Correctdetectionrate(%)
Bit rate (kbps)
MP3
Ogg Vorbis
AC-3
AAC
Transparency for Human Ear
37
Measured noticeability of watermarks for human
• Click a button after notice of noise (6 participants)
0
20
40
60
80
100
silent public rock electronic
NoticedWatermarkRate(%)
Before Erasure
After Erasure
Limitations
38
• One-off development of
annotation-embedding applications
• Audio quality loss in watermark removal
• Limited data-rate of annotation
Future Work
39
Embedding from Public Speaker
40
• Synchronization & integration of large number
of videos to create multi-view videos, etc.
• Entertainment use at amusement parks, etc.
“Sleeping Beauty Castle at Disneyland” by Lyght
Licensed under CC BY-SA 3.0
“Picture of Stadium” by Jazza5
Licensed under CC BY-SA 3.0
Conclusion
41
We proposed
42
a video annotation technique using audio watermarking,
and a video-editing workflow exploiting annotations.
Benefit
AnnoTone can facilitate and enhance non-professional
video editing process without special equipment.
43
Compared with
Smartphone Recording
Some smartphone camera apps can record
annotation as metadata format (e.g., Adobe XMP)
– Of course, using such apps is clever for smartphone
recording occasions
What’s AnnoTone’s superiority?
• Dedicated video cameras are still superior to
smartphone camera
– In resolution, definition, lens quality, etc.
• No need of dealing with external metadata
– Because annotations are directly embedded as sound
44

More Related Content

Similar to AnnoTone (CHI 2015)

Video editing
Video editingVideo editing
Video editing
Charles Flynt
 
Ch07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.pptCh07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.ppt
djempol
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
Alpen-Adria-Universität
 
Advantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.pptAdvantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.ppt
PawachMetharattanara
 
Video-Editing Techniques.pptx
Video-Editing Techniques.pptxVideo-Editing Techniques.pptx
Video-Editing Techniques.pptx
1A260YashRavindraRau
 
Sony HXR-NX3 Camcorder
Sony HXR-NX3 CamcorderSony HXR-NX3 Camcorder
Sony HXR-NX3 Camcorder
AV ProfShop
 
Extract the Audio from Video by using python
Extract the Audio from Video by using pythonExtract the Audio from Video by using python
Extract the Audio from Video by using python
IRJET Journal
 
Live Streaming Tools & Equipment
Live Streaming Tools & EquipmentLive Streaming Tools & Equipment
Live Streaming Tools & Equipment
Thiyagu K
 
Video Conferencing : Fundamentals and Application
Video Conferencing : Fundamentals and ApplicationVideo Conferencing : Fundamentals and Application
Video Conferencing : Fundamentals and Application
Videoguy
 
Mm video
Mm videoMm video
Mm video
maaz hamed
 
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
Eswar Publications
 
Using Your Mobile Device to Create Amazing Content ATDTK19
Using Your Mobile Device to Create Amazing Content ATDTK19Using Your Mobile Device to Create Amazing Content ATDTK19
Using Your Mobile Device to Create Amazing Content ATDTK19
Nick Floro
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
Yara Ali
 
Video File & Recording Media
Video File & Recording MediaVideo File & Recording Media
Video File & Recording Media
Atiwat Rungsirikulwit
 
Multi media unit-3.doc
Multi media unit-3.docMulti media unit-3.doc
Multi media unit-3.doc
Anjaan Gajendra
 
Audio equalizer
Audio equalizerAudio equalizer
Audio equalizer
text20image
 
Video
VideoVideo
JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009
Hal J. Reisiger
 
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATIONDIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
ramya marichamy
 
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
AmrMAshry1
 

Similar to AnnoTone (CHI 2015) (20)

Video editing
Video editingVideo editing
Video editing
 
Ch07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.pptCh07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.ppt
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
Advantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.pptAdvantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.ppt
 
Video-Editing Techniques.pptx
Video-Editing Techniques.pptxVideo-Editing Techniques.pptx
Video-Editing Techniques.pptx
 
Sony HXR-NX3 Camcorder
Sony HXR-NX3 CamcorderSony HXR-NX3 Camcorder
Sony HXR-NX3 Camcorder
 
Extract the Audio from Video by using python
Extract the Audio from Video by using pythonExtract the Audio from Video by using python
Extract the Audio from Video by using python
 
Live Streaming Tools & Equipment
Live Streaming Tools & EquipmentLive Streaming Tools & Equipment
Live Streaming Tools & Equipment
 
Video Conferencing : Fundamentals and Application
Video Conferencing : Fundamentals and ApplicationVideo Conferencing : Fundamentals and Application
Video Conferencing : Fundamentals and Application
 
Mm video
Mm videoMm video
Mm video
 
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...
 
Using Your Mobile Device to Create Amazing Content ATDTK19
Using Your Mobile Device to Create Amazing Content ATDTK19Using Your Mobile Device to Create Amazing Content ATDTK19
Using Your Mobile Device to Create Amazing Content ATDTK19
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
 
Video File & Recording Media
Video File & Recording MediaVideo File & Recording Media
Video File & Recording Media
 
Multi media unit-3.doc
Multi media unit-3.docMulti media unit-3.doc
Multi media unit-3.doc
 
Audio equalizer
Audio equalizerAudio equalizer
Audio equalizer
 
Video
VideoVideo
Video
 
JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009
 
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATIONDIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
DIGITAL VIDEO DATA SIZING AND OBJECT BASED ANIMATION
 
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
21-17-0034-00-0000-presentation-of-network-requirement-according-to-compressi...
 

More from Ryohei Suzuki

Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
Ryohei Suzuki
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problems
Ryohei Suzuki
 
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
Ryohei Suzuki
 
Basic Concepts of Entanglement Measures
Basic Concepts of Entanglement MeasuresBasic Concepts of Entanglement Measures
Basic Concepts of Entanglement Measures
Ryohei Suzuki
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
Ryohei Suzuki
 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"
Ryohei Suzuki
 
等号と不等号の物理学
等号と不等号の物理学等号と不等号の物理学
等号と不等号の物理学
Ryohei Suzuki
 
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
Ryohei Suzuki
 
コンピュータは知恵熱を出すか?
コンピュータは知恵熱を出すか?コンピュータは知恵熱を出すか?
コンピュータは知恵熱を出すか?
Ryohei Suzuki
 
身体の中の小宇宙:免疫研究の最前線
身体の中の小宇宙:免疫研究の最前線身体の中の小宇宙:免疫研究の最前線
身体の中の小宇宙:免疫研究の最前線
Ryohei Suzuki
 
Single-cell pseudo-temporal ordering 近年の技術動向
Single-cell pseudo-temporal ordering 近年の技術動向Single-cell pseudo-temporal ordering 近年の技術動向
Single-cell pseudo-temporal ordering 近年の技術動向
Ryohei Suzuki
 
Collaborative 3D Modeling by the Crowd
Collaborative 3D Modeling by the CrowdCollaborative 3D Modeling by the Crowd
Collaborative 3D Modeling by the Crowd
Ryohei Suzuki
 
アナログとはなんだろう。―古くて新しい、もう一つの計算―
アナログとはなんだろう。―古くて新しい、もう一つの計算―アナログとはなんだろう。―古くて新しい、もう一つの計算―
アナログとはなんだろう。―古くて新しい、もう一つの計算―
Ryohei Suzuki
 
立体音響とインタラクション
立体音響とインタラクション立体音響とインタラクション
立体音響とインタラクション
Ryohei Suzuki
 
SIGGRAPH 2014 Preview -"Shape Collection" Session
SIGGRAPH 2014 Preview -"Shape Collection" SessionSIGGRAPH 2014 Preview -"Shape Collection" Session
SIGGRAPH 2014 Preview -"Shape Collection" Session
Ryohei Suzuki
 
Overview of User Interfaces
Overview of User InterfacesOverview of User Interfaces
Overview of User InterfacesRyohei Suzuki
 
Brief Introduction to Recent Spatial Interfaces
Brief Introduction to Recent Spatial InterfacesBrief Introduction to Recent Spatial Interfaces
Brief Introduction to Recent Spatial Interfaces
Ryohei Suzuki
 
Generalization of the Concept of Pattern Language
Generalization of the Concept of Pattern LanguageGeneralization of the Concept of Pattern Language
Generalization of the Concept of Pattern LanguageRyohei Suzuki
 
iii_SGMI #5 "OOP & Design Patterns"
iii_SGMI #5 "OOP & Design Patterns"iii_SGMI #5 "OOP & Design Patterns"
iii_SGMI #5 "OOP & Design Patterns"Ryohei Suzuki
 

More from Ryohei Suzuki (20)

Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problems
 
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...
 
Basic Concepts of Entanglement Measures
Basic Concepts of Entanglement MeasuresBasic Concepts of Entanglement Measures
Basic Concepts of Entanglement Measures
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"
 
等号と不等号の物理学
等号と不等号の物理学等号と不等号の物理学
等号と不等号の物理学
 
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
Wolf et al. "Graph abstraction reconciles clustering with trajectory inferen...
 
コンピュータは知恵熱を出すか?
コンピュータは知恵熱を出すか?コンピュータは知恵熱を出すか?
コンピュータは知恵熱を出すか?
 
身体の中の小宇宙:免疫研究の最前線
身体の中の小宇宙:免疫研究の最前線身体の中の小宇宙:免疫研究の最前線
身体の中の小宇宙:免疫研究の最前線
 
Single-cell pseudo-temporal ordering 近年の技術動向
Single-cell pseudo-temporal ordering 近年の技術動向Single-cell pseudo-temporal ordering 近年の技術動向
Single-cell pseudo-temporal ordering 近年の技術動向
 
Collaborative 3D Modeling by the Crowd
Collaborative 3D Modeling by the CrowdCollaborative 3D Modeling by the Crowd
Collaborative 3D Modeling by the Crowd
 
アナログとはなんだろう。―古くて新しい、もう一つの計算―
アナログとはなんだろう。―古くて新しい、もう一つの計算―アナログとはなんだろう。―古くて新しい、もう一つの計算―
アナログとはなんだろう。―古くて新しい、もう一つの計算―
 
立体音響とインタラクション
立体音響とインタラクション立体音響とインタラクション
立体音響とインタラクション
 
SIGGRAPH 2014 Preview -"Shape Collection" Session
SIGGRAPH 2014 Preview -"Shape Collection" SessionSIGGRAPH 2014 Preview -"Shape Collection" Session
SIGGRAPH 2014 Preview -"Shape Collection" Session
 
Overview of User Interfaces
Overview of User InterfacesOverview of User Interfaces
Overview of User Interfaces
 
Brief Introduction to Recent Spatial Interfaces
Brief Introduction to Recent Spatial InterfacesBrief Introduction to Recent Spatial Interfaces
Brief Introduction to Recent Spatial Interfaces
 
Generalization of the Concept of Pattern Language
Generalization of the Concept of Pattern LanguageGeneralization of the Concept of Pattern Language
Generalization of the Concept of Pattern Language
 
iii_SGMI #5 "OOP & Design Patterns"
iii_SGMI #5 "OOP & Design Patterns"iii_SGMI #5 "OOP & Design Patterns"
iii_SGMI #5 "OOP & Design Patterns"
 

Recently uploaded

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 

Recently uploaded (20)

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 

AnnoTone (CHI 2015)

  • 1. AnnoTone: Record-time Audio Watermarking for Context-aware Video Editing RYOHEI SUZUKI DAISUKE SAKAMOTO TAKEO IGARASHI THE UNIVERSITY OF TOKYO CHI 2015 @ Seoul Session: What do I hear? Communicating with Sound 1
  • 2. Video recording and sharing have become casual hobbies for everyone. 2
  • 4. Video Editing is Still Difficult 4 Why? 1. Cost of learning video authoring tools is high 2. Context-aware editing requires much labor for careful review and trial-and-error • Adding visual effects • Clipping scenes • Adding captions and overlays • Using additional information (e.g., GPS)
  • 5. Our Objective Annotating videos with contextual information during recording to facilitate video editing 5 1. Automate & speed-up video editing activity 2. Enhance expressions using additional data
  • 6. In this talk, we propose 1. A video-annotation technique requiring no special equipment 2. A video-editing workflow that exploits contextual information for efficient editing. 6
  • 7. Core Ideas • Encoding contextual information as inaudible sound signals • Embedding encoded annotations directly into the audio track of video during recording • Extracting the embedded information while editing process on demand 7
  • 9. 1. Hardware Setup • Attach smartphone to video camera • Launch annotation-embedding application Attaching Launching application 9
  • 10. 2. Video Recording • Gathering annotation from user input or sensors • Converting them into inaudible audio signals User Input Sensors Scene Annotation Signals 10
  • 12. Workflow Overview 12 • Extract embedded annotation from audio track • Remove annotation signals after editing
  • 13. Editing Pipeline Generally, video-editing involves a line of pipelined processes. Adding Captions & Effects Color Correction Clipping… … 13
  • 14. Editing Pipeline Annotated audio track can pass through the existing pipeline as ordinary one. Adding Captions & Effects Color Correction Clipping… … 14
  • 15. Annotation Extraction Adding Captions & Effects Color Correction Clipping… … 15 Annotation data is extracted on demand using our Watermark Extraction API Watermark Extractor Annotation Data
  • 16. Annotation Removal Adding Captions & Effects Audio Mastering Clipping… … 16 After the process, annotation signals can be removed by applying an audio filter. Audio Filter
  • 18. Record-time Editing Recording: information of Success/Failure Editing: Automatic extraction of successful parts Recording Success Failure Success Good! Bad! Good! Success Success Automatic extraction & combining (time) 18
  • 19. Video-editing with GPS 19 Recording: GPS positions Editing: location-aware editing Clipping movie by sketching on a map Automatic map overlay
  • 20. Automatic Overlaying 20 Recording: chess note of a game Editing: automatic overlaying of board graphics Notation UI Synthesized video 20
  • 21. Integrating with AfterEffects AnnoTone plugin provides annotation data for AE which can be used for generating effects Exploiting annotations with existing practice 21 Controlling AE animation with sensor data
  • 22. Integrating with AfterEffects 1. Analyzing footage to extract annotations 2. Generating a text layer containing JSON- formatted annotation data at timeframe 3. Associating video effects/parameters with annotations using expressions mechanism 22 Footage Effect control (Javascript) JSON text layer [{x: 138.0019, y: 38.13840}, {x: 139.0133, y: 38.43405}]…
  • 24. Human’s Hearing Characteristics Human cannot perceive high-frequency sounds. Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.” Scandinavian audiology 27.3 (1998): 189-192. 24
  • 25. Data-hiding as High-frequency Audio Signals 25 Frequency(Hz) 20 20k 22k 18k High-frequency Range Recordable Range Audible Range We can hide information in the audio track as high-frequency signals (audio watermarks). Microphone Human
  • 26. Spectrogram of audio track High-frequency region (almost inaudible) 26 Data-hiding as High-frequency Audio Signals Hidden information
  • 27. Benefit of Audio Watermarking 27 • Compatible with almost all video cameras • Consistent synchronization between annotations and video sequence • Removable by applying low-pass filter
  • 28. Watermarking Protocol 28 • Dual-Tone Multi-Frequency (DTMF) – Representing 4-bits information by combination of two single tones from 7 frequencies • Packet representation – Variable-length payload – 400 bps gross data rate Spectrogram of a watermark packet
  • 30. ContextCam [Patel & Abowd, 2004] Incompatible with existing video cameras. Using special camera to record contexts of home videos Storing annotations in frames by image watermarking 30
  • 31. Cryptone (Ultra Sound Control) [Hirabayashi & Shimizu, 2012] AnnoTone uses similar audio data-hiding method for video editing support. 01001 11010 Interaction between loudspeaker and smartphones using high-frequency tones to convey information 31
  • 33. 0 20 40 60 80 100 667 571 500 444 400 364 Correctdetectionrate(%) Gross bitrate (bps) silent public rock electronic Data-rate vs. Reliability ~100% correct detection rate was achieved with 400 bps annotation data rate. 34
  • 34. Travel Distance Watermark signal can travel up to 20cm through air from a smartphone speaker 35 0 20 40 60 80 100 0 5 10 15 20 25 30 Correctdetectionrate(%) Distance between speaker and microphone (cm) silent public rock electronic
  • 35. Durability against Conversion 36 Watermarks are preserved after conversion into Ogg Vorbis, AC-3 and AAC with enough bitrate. 0 20 40 60 80 100 128 192 256 320 Correctdetectionrate(%) Bit rate (kbps) MP3 Ogg Vorbis AC-3 AAC
  • 36. Transparency for Human Ear 37 Measured noticeability of watermarks for human • Click a button after notice of noise (6 participants) 0 20 40 60 80 100 silent public rock electronic NoticedWatermarkRate(%) Before Erasure After Erasure
  • 37. Limitations 38 • One-off development of annotation-embedding applications • Audio quality loss in watermark removal • Limited data-rate of annotation
  • 39. Embedding from Public Speaker 40 • Synchronization & integration of large number of videos to create multi-view videos, etc. • Entertainment use at amusement parks, etc. “Sleeping Beauty Castle at Disneyland” by Lyght Licensed under CC BY-SA 3.0 “Picture of Stadium” by Jazza5 Licensed under CC BY-SA 3.0
  • 41. We proposed 42 a video annotation technique using audio watermarking, and a video-editing workflow exploiting annotations. Benefit AnnoTone can facilitate and enhance non-professional video editing process without special equipment.
  • 42. 43
  • 43. Compared with Smartphone Recording Some smartphone camera apps can record annotation as metadata format (e.g., Adobe XMP) – Of course, using such apps is clever for smartphone recording occasions What’s AnnoTone’s superiority? • Dedicated video cameras are still superior to smartphone camera – In resolution, definition, lens quality, etc. • No need of dealing with external metadata – Because annotations are directly embedded as sound 44

Editor's Notes

  1. Hello everyone, I am Ryohei Suzuki from the University of Tokyo, Japan. Today, I’m going to talk about our new work, AnnoTone.
  2. Nowadays, recording videos, making video contents, and sharing them online have become one of the casual hobbies for every people, possibly including little children.
  3. We have a high-definition and cheap video cameras. Computer hardware and software needed for video editing are available in a reasonable price in the market. And also, we have YouTube, Vimeo, dailymotion and other video sharing services working as broadcasting platforms for everyone.
  4. So, it seems that we have everything needed to enjoy creating video contents and sharing them. But we know that video editing is still difficult in spite of them. What is the major challenge? Yes, it takes a lot of time to master the usage of video authoring tools, and improvement on the user interfaces is demanded. But in this project, we focused on another problem that, context-aware editing requires … In this talk, we define “context-aware editing” as any type of video editing process that intensively uses the contexts of recording such as
  5. The objective of our project is, to annotate videos with contextual information during recording to facilitate video editing for… 1st, automating & speed-up existing video editing activity 2nd, to enhance video expressions by using additional data.
  6. In this talk, we propose …
  7. The core ideas of our work can be summarized as follows. Firstly, we encode … Secondly, we embed encoded annotations … Then, finally we extract the embedded …
  8. First, let me show you how the user can use our system to annotate videos
  9. Firstly, the user should attach a smartphone directly to a video camera, like these pictures. Then, launch an annotation-embedding application on the phone. (見せる)
  10. During a video recording, the smartphone gathers annotation information from either user input or sensors installed on it. The phone converts the annotation into a sequence of inaudible audio signals and transmits it from a loudspeaker, Then, the video camera records the scene with superimposed audio watermarks.
  11. Then, let me tell you the editing workflow with embedded annotations.
  12. The overview of the workflow is as follows. First, the user imports recorded footages into their PC, and load them into a video authoring software Then, the authoring software uses the watermark extractor to obtain the embedded annotation data, then uses the data to facilitate the editing process. Finally, the user gets edited video and the audio track without annotation signals, and by combining them, a video content is completed. Then, let me go into the editing process in detail.
  13. Generally, video-editing activity involves a line of pipelined simple editing processes, Starting from clipping, adding captions, animation, color correction, and so on.
  14. And, our annotated audio track can pass through the existing pipeline as ordinary one, Because it is merely an audio data.
  15. When one of the process in the pipeline needs annotation data for editing, It can use the provided watermark extraction API to extract annotation data from the annotated track.
  16. And, when the annotations become needless after the process, their signals can be removed by applying an audio filter, Then the user can get a clean audio track to proceed to the following processes, for example, audio mastering.
  17. Then, let me show you some applications of AnnoTone.
  18. The first example introduces a concept “Record-time Editing”. In this application, the camera operator annotates a long footage with “success / failure” information of the actor’s performance, instead of repeatedly stop and start recording when the actor makes mistake. After recording, the software automatically extracts only the successful parts of the footage, and by combining the parts, creates a complete video. It would be useful for recording lecture videos, which may involves a lot of mistakes of the actor.
  19. Using GPS receiver of a smartphone, we can exploit positional information for video editing. While recording, the smartphone periodically embeds the location data of the camera, then user can get a sequence of locations, or a path, of a footage while editing. Such sequence can be used to create an overlaid map like the left image. In the right movie, a footage is mapped on a geographical map as a path, and the user can clip a portion of the footage by, sketching a corresponding path on the map. This map-based clipping might be very useful for dealing with a long video with movement, such as touring video.
  20. And, annotations can be used to create various kinds of overlaid contents. In this application, the camera operator annotates a video of chess game with the chess note, using the notation interface shown in the left. Then, the system automatically generates the overlaid graphics of the chess board as the right movie.
  21. We also prepared a mechanism to integrate AnnoTone with Adobe AfterEffects software. Our plugin can provide annotation data to be used by the editing workflow of AfterEffects, to generates various effects and animations. It enables user to exploit annotation with the established editing practice.
  22. The way of integration is as follows. Firstly, the annotone software analyzes a footage to extract annotations The system generates a text layer containing JSON-formatted annotation data at each timeframe. Then, the user can associate video effects, parameters, and animations with annotations using the expression function of AfterEffects, it is a light-weight end-user scripting mechanism.
  23. Then, let me talk about the technical detail of audio watermarking of AnnoTone.
  24. It is known that the human’s hearing sensitivity drops with the increase of sound frequency, And most human can not hear high-frequency tones above, 17 or 18 kHz.
  25. On the other hand, ordinary video camera with microphone can record high-frequency sounds up to 22 kHz. Therefore, we can hide information in the high-frequency range of the audio track as modulated signals.
  26. This is a example spectrogram of an audio track of a video. This is a high-frequency region which is almost inaudible to human, And, we hide information as this.
  27. But…, what is the benefit of using audio watermarking for data-hiding? First of all, it is compatible with almost all video cameras, because it only requires microphone for embedding. Second, the synchronization between annotations and the timestamp of video can be preserved throughout the editing process, due to the direct embedding on the video sequence. Additionally, the watermark signals can be easily removed by applying low-pass filter.
  28. Our watermarking protocol uses Dual-Tone Multi-Frequency (or DTMF) to modulate digital data into audio signals. It represents 4-bit information per unit signal by a combination of two single tones from 7 frequencies. Our packet representation have variable-length payload, and 400 bps gross data rate.
  29. Then let me introduce some related work of AnnoTone
  30. ContextCam is a special camera which can record contextual information such as location and person presence of home videos. It stores annotation in frames of video by image watermarking technique. Indeed it realizes context-aware video recording for a specific purpose. However, because it is simply a special camera, this technique is not compatible with existing equipment,
  31. Cryptone, or Ultra Sound Control protocol is an interaction technique between loudspeaker and smartphones of audience at music venues. It uses a high-frequency modulation to convey simple information. AnnoTone’s audio watermarking technique is very similar to that of Cryptone, and can be seen as an extension of that. But, the purpose of it is very different.
  32. Let me show you the results of performance evaluations briefly.
  33. Firstly, we measured the maximum data-rate of watermarking which can achieve enough reliability. The result showed that almost 100% correct detection rate can be achieved with 400 bps annotation data rate, in four acoustic environments, they are, silent room, public street, with playing rock music and electronic music.
  34. Next, we measured how long can watermark signals travel through air, from a smartphone speaker. The result showed that they can travel for up to about 20 cm, and it implies that users can use flexible hardware setup to some extent, when it is difficult to directly attach a smartphone to a camera because of the shapes of them.
  35. Thirdly, we tested the durability of watermark signals against audio format conversion, a common process in video editing. According to the results, watermarks can be preserved after being converted into Ogg Vorbis, AC-3 and AAC formats, if enough bitrate is given. On the other hand, if we use MP3 as the destination format, we couldn’t preserve watermarks even if the bitrate setting was very high.
  36. Finally, we tested the transparency or imperceptibility of audio watermark signals for human ear. We hired 6 participants, and gave them a task of clicking a button when they notice a noise while listening to an annotated audio track. The result showed that watermark signals are not completely transparent, and especially young participant was able to notice them, But they became almost completely transparent after applying low-pass filter.
  37. And, we admit AnnoTone has some limitations. First of all, it requires one-off developments of annotation-embedding applications for smartphone for embedding different types of annotations. Secondly, the cause of audio quality loss in the process of watermark removal is inevitable because it simply uses low-pass filtering. And, the data-rate of annotation is significantly limited, therefore if we want to annotate a video with a large amount of data, we should consider another way. For example, we can use AnnoTone’s annotations as anchors for separately recorded body annotation data.
  38. As future work,
  39. We are thinking about transmitting watermark signals from publicly installed speakers to annotate many videos simultaneously. It could be used to synchronize or integrate a large number of videos recorded at a same place, such as stadium, to create new types of video contents, like multi-view videos. Also, similar technique could be used for entertainment use at amusement parks.
  40. Then, let me conclude the presentation.
  41. We proposed … The benefit of the technique is that … Thank you for listening.