2. Motivation
● 2011: Build a NRCS for 35 journalists
○ Peak 45 people on special events (elections), 8 people on weekends
○ Based on Barcelona, offices in Madrid, but contributions from everywhere
○ Cheap, reliable, easy to support, etc
3. Motivation
● 2017: we have available:
○ Cloud CMS
○ Cloud Video edition APIs
○ Cloud storage
○ Reliable, powerful, and more affordable internet links
○ Video editor player? Broadcast UX?
5. Introduction - design goals
● Accuracy is our main concern
● Full control & feedback about displayed video / audio frame
● Use common browser technologies: Javascript / HTML5
● Assumptions (just trimming player):
○ BW is NOT 1st concern (few users, good connections)
○ Maximise image quality is NOT a concern
○ Full mobile device compatibility not a concern
● Market research:
○ Vimond IO (IBC 09/2016)
○ Grabyo (tested 03/2016)
○ Volicon (tested 03/2016)
○ Accurate player
7. VOD Backend process
● Extracts media information (fps, length, audio fs, etc)
● Extracts time code (SMPTE timecode) information
● Detects scene changes, and add it as cue point information
● Extract AV initial delay, to compensate it later
● Ready to extract other timed metadata (Cue points)
● For video
○ Decodes video and encodes each frame using JPEG (quality as a parameter)
● For audio
○ Decodes audio and encodes each portion as PCM (video frame aligned) with sample accuracy
● Generate a JSON manifest with all the information
9. Backend: step by step
● Transcode video to single frame files: ffmpeg
10. Backend: step by step
● Decode audio to PCM (wav): ffmpeg
● Create one PCM audio “frame” per video file: Our own lib
● Compensate A/V delay at the beginning and end of stream: Our own lib
Current constraint:
Audio fs multiple of fps
ex: 44.1Khz / 30fps =
1470 asamples/vframe
13. Frontend process: Pure javascript (NO MSE)
● Fetch the manifest
● Fetch all video frames files in the manifest (&)
○ Download & store then in a Image() matrixV[0..NFrames]
● Fetch all the audio files in the manifest (&)
○ Store them as a byte object matrixA[0..NFrames] (Uses XMLHttpRequest with arraybuffer)
● Wait for user events (pos, play, rev, +/-1 frame, etc):
○ Position X
■ Show video frame X and metadata
■ Create audio context (if not created), write all frame samples into buffer and play it.
○ Playback
■ For Video: Show video frame X+1 every 1/fps (SetInterval function)
■ Create audio context (if not created), write the samples for Ys into a buffer, and play it.
14. Now yes, the demo (VOD)!
● Check it out!
○ https://jordicenzano.github.io/frame-accurate-scrubbing/
17. Pros and Cons (as a trimming tool)
● Accuracy
● Responsivity
● Cloud: accessed from everywhere, easy support & upgrade
● Run (almost) everywhere: HTML5 + Javascript
● Broadcast friendly (uses TC, easy to integrate Broadcast workflows)
● Requires more BW than regular playback (x3...x5)
● Probable audio clicks every anchor point (20s). Not designed for long
playback.
● Current limitations:
○ Audio fs must much player device (most common = 44.1Khz)
○ Audio fs multiple of fps
18. Future work
● Accept any audio sample frequency
● Implement live ingest approach
● Add intelligence to download algorithm:
○ Download all the audio and for the video just a the range that surround the cursor
○ Using ABR, download the lowest quality. And improve quality arround the cursor
● Test JPG2000 instead of JPG (½ BW savings?)
● Compensate video speed in long playbacks (avoid long term A/V drift)
● Try to use WebWorkers to download (and/or process) audio
● Implement multiple speeds (super easy)
● Implement multiple qualities (ABR approach)