Generating a time shrunk lecture video by event

1
Generating a time shrunk
lecture video by event
detection
Presented by: Mona Ragheb
Yara Ali
Supervised by: Dr. Aliaa Youssif

2
Agenda
 Introduction
 Generating lecture video using virtual camera work
 Event detection steps
 Evaluation
 Results
 Conclusion
 References

3
Introduction
 E-learning has become a popular method used in higher
education.
 However, video recording
by a cameraman and video
editing takes a long time
and costs a great deal.
 To solve this problem, a
system has been developed
to generate dynamic lecture
video using virtual camerawork from the high resolution
images recorded by a HDV (high-definition video) camcorder

4
How the system works?
 The system generates a lecture video:
1. Using virtual camerawork based on shooting
techniques of broadcast cameramen
2. By cropping from the high resolution image to track
the region of interest (ROI) such as the instructor.
3. Generate time shrunk video using event detection
 Camera motion analysis is used to detect scene
changes.

5
Shooting techniques
 People invariably make the
same sets of mistakes when
they first start shooting video:
1. Trees or telephone poles sticking out of the back
of someone's head
2. Interview subjects who are just darkened blurs
because there was bright light in the background
3. Boring shots of buildings with no action

6
Event Detection
 Two kinds of events were detected:
1. A speech period
2. Chalkboard writing period

7
Kinds of event detection
1. Speech period:
It’s detected by voice activity detection with LPC
cepstrum and classified into speech or non-speech
using Mahalanobis distance

8
LPC
 Linear predictive coding (LPC) is a tool used mostly in
audio signal processing and speech processing for
representing the spectral envelope of a digital signal of
speech in compressed form, using the information of a
linear predictive model.
 one of the most useful methods for encoding good quality
speech at a low bit rate and provides extremely accurate
estimates of speech parameters.
 A spectral envelope is a curve in the frequency-amplitude
plane, derived from a Fourier magnitude spectrum. It
describes one point in time (one window, to be precise).

9
How LPC works?
 LPC analyzes the speech signal by estimating the
formants, removing their effects from the speech signal,
and estimating the intensity and frequency of the
remaining buzz.
 The process of removing the formants is called inverse
filtering, and the remaining signal after the subtraction of
the filtered modeled signal is called the residue.
 use the buzz parameters and the residue to create a
source signal, use the formants to create a filter (which
represents the tube), and run the source through the
filter, resulting in speech.

11
Kinds of event detection (Cont.)
2. Chalkboard writing period
 It’s detected by using a graph cuts technique to
segment a precise region of interests such as an
instructor.
 By deleting content-free period, i.e, period without
the events of speech and writing, and fast-forwarding
writing periods, our method can
generate a time shrunk lecture video
automatically.

12
Generating lecture video using virtual
camera work
 A HDV camcorder is located at the back of the
classroom to videotape images with high resolution
(1,400 × 810 pixels), which contain the whole area
of the chalkboard, so that students can read the
handwritten characters on the chalkboard.
 Problem: it is impossible to display the high
resolution image on the small screen of a general
notebook PC.

13
camera work (Cont.)

14
Solution?
1. The system detects a moving object by
temporal differencing
2. The timing for virtual camerawork is
detected using bilateral filtering and zero
crossing.

Bilateral Filter
 The bilateral filter was introduced by
Tomasi et al. as a non-iterative means of
smoothing images while retaining edge
detail.
 It involves a weighted convolution in which
the weight for each pixel depends not only
on its distance from the center pixel, but
also its relative intensity.
15

16
Bilateral Filter
• (a) and (b) show the potential of bilateral filtering for the
removal of texture. The picture "simplification" illustrated
by figure 2 (b) can be useful for data reduction without loss
of overall shape features in applications such as image
transmission, picture editing and manipulation, image
description for retrieval.

17
camerawork (Cont.)
 If the ROI has a large movement, this period of the video is classified into
panning, and if the ROI has no motion but voice activity, this period is
classified into zooming.
 Panning is used to show motion and speed. It is a technique that requires
practice since it has to be done in one smooth, continuous motion.

18
Event detection steps
1. Voice activity detection
2. Chalkboard writing detection
1. Object detection and segmentation
2. Generation of current chalkboard image
3. Chalkboard writing detection
3. Generating a time shrunk video

19
1- Voice activity detection

1- Voice activity detection (Cont.)
 Whenever you do a finite Fourier transform, you're
implicitly applying it to an infinitely repeating signal.
So, for instance, if the start and end of your finite
sample don't match then that will look just like a
discontinuity in the signal, and show up as lots of
high-frequency nonsense in the Fourier transform,
which you don't really want.
20

1- Voice activity detection (Cont.)
 If your sample happens to be a beautiful sinusoid
but an integer number of periods don't happen to fit
exactly into the finite sample, your FT will show
appreciable energy in all sorts of places nowhere
near the real frequency. You don't want any of that.
 Windowing the data makes sure that the ends
match up while keeping everything reasonably
smooth; this greatly reduces the sort of "spectral
leakage" described in the previous paragraph
21

2-Chalkboard writing detection
1-Object detection and segmentation
 Extracting a precise object region is needed for
detecting periods of writing characters on
chalkboard.
 Temporal differencing ( object detection ) is robust
to lighting change.
 Temporal differencing can not extract all foreground
pixels of moving objects so another technique is
used to support ( Graph cuts technique )

23
2-Chalkboard writing detection
1-Object detection and segmentation

2- Chalkboard writing detection
2-1-Object detection and segmentation (Cont.)
24

Image Segmentation
 Image segmentation is an important problem
in computer vision and medical image
analysis.
 The objective of image segmentation is to
provide a visually meaningful partition of the
image domain. Although it is usually an easy
task for human to separate background and
different objects for a given image.
25

26
2-2- Generation of current chalkboard image

27
2-3- Chalkboard writing detection

28
2-3-Chalkboard writing detection
(Cont.)

29
3- Generating a time shrunk video

30
3- Generating a time shrunk video

31
Evaluation
 We videotaped 3 lectures (each video is 90 min
long) by HDV camcorder.
 In this evaluation, we use “recall” and “precision” for
determining effectiveness of detection result.

Precision Vs Recall
• Precision (also called positive predictive value)
• Recall (also known as sensitivity) both calculate no of
correct returned in in total frames
 In this figure the relevant items are to the left of the straight
line while the retrieved items are within the oval. The red
regions represent errors. On the
left these are the relevant items
not retrieved (false negatives),
while on the right they are the
retrieved items that are not
relevant (false positives).

33
Results
 There are 200 false positive frames in a 90
min video because of the students’ voices.

36
Conclusion
 This paper presents a novel approach for generating a time
shrunk lecture video using event detection.
 Our method detects speech periods by voice activity detection
and chalk board writing periods by a combination of object
detection and segmentation techniques.
 By deleting the content-free periods and fast-forwarding the
chalkboard writing periods, our method can generate a time
shrunk lecture video automatically.
 The resulting generated video is about 20%∼30% shorter than
the original video in time. This is almost the same as the results
of manual editing by a human operator.

37
References
1. http://www.vision.cs.chubu.ac.jp/04/pdf/e-learning08.pdf
2. http://research.cs.tamu.edu/prism/lectures/sp/l9.pdf
3. http://www.princeton.edu/~achaney/tmve/wiki100k/docs/4. http://www.ee.columbia.edu/~dpwe/e4896/lectures
/E4896-L06.pdf

Generating a time shrunk lecture video by event

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Generating a time shrunk lecture video by event

Similar to Generating a time shrunk lecture video by event (20)

Recently uploaded

Recently uploaded (20)

Generating a time shrunk lecture video by event

Editor's Notes