Keywords

260 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
260
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Keywords

  1. 1. Browsing Digital Video Francis Li Anoop Gupta, Elizabeth Sanocki, Group for User Interface Research, CS Dept. Li-wei He, Yong Rui University of California, Berkeley Microsoft Research Berkeley, CA 94720-1776 Redmond, WA 98052 fli@cs.berkeley.edu {anoop, a-elisan, lhe, yongrui}@microsoft.com based streaming media, instant random access into the LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT content is possible. This allows indices into the content COLUMN ON THE FIRST PAGE FOR THE such as the chapter lists of digital versatile disc (DVD) COPYRIGHT NOTICE. videos [7]. In addition, as computing costs continue to drop, processing techniques can be utilized to automatically generate such indices or shorten the viewing length of a video without losing content. Such features can potentially allow a viewer to save significant amounts ABSTRACT of time watching a video as well as more effectively filter Video in digital format coupled with the content during playback. digital/programmable playback devices presents opportunities for significantly enhancing the user’s In exploring the ability to browse digital video, we viewing experience. For example, time compression can considered the following questions: shorten the viewing length of a video and shot boundary • What are potential high-value features that we frames can provide a visual index into the content. Such can provide for browsing digital video? features have primarily been evaluated in isolation with a narrow set of video content types. We investigated as • Will users derive significant benefits from their well as implemented the design of a software video use and availability? How will the benefits vary browsing application that combines many such features. with the task and type of content being watched? In addition, we evaluated its use in watching six different • How does the usage affect the enjoyment and/or video content types and present the resulting data for other factors of the viewing experience? analysis and discussion. The participants in the evaluation found the browser to be useful and effective for watching • What should the interface be for an application or the different types of video in a limited amount of time. device that provides these advanced features? Also, the results show that both the experience of using This paper attempts to answer the questions raised above. the browser and value of each feature varies depending on We designed and implemented a prototype software video the content type. browsing application that provides a wide array of features enabled by digital video technologies. In addition to Keywords traditional VCR controls, the prototype provides rich Digital video; Video browsing; Video indexing; Time indexes for navigation (e.g., table of contents and video compression; Pause removal; Next-generation video shot boundaries), speeded-up playback features (e.g., time playback interfaces. compression and pause removal), the ability to make INTRODUCTION personal annotations to the video, and other advanced One of the primary mediums for content creation and browsing controls. Many of these features have been distribution is video. However, the way we watch video studied previously, but primarily in isolation within a has not changed significantly since the invention of the narrow set of video content types. We evaluated the analog video-cassette recorder (VCR) in the 1970-80s. combined use of these features using our prototype across The VCR makes it possible to watch a video with the six different video content types and tasks: classroom additional ability to 1) pause the video and 2) fast-forward lectures, conference presentations, sports, television or rewind the video for skipping or re-watching particular dramas, news, and travelogues. The results of this study segments. Seeking to a random location is possible, but are presented in this paper for analysis and discussion has a large delay associated with it due to the use of tape within the context of the above questions. storage. In the next section, we discuss related work in browsing Today, Internet video streaming and set-top devices like digital media. Then, in the following section, we describe ReplayTV [18] and TiVo [21] are technologies that are the design of our prototype software video browsing defining a platform for more interactive video playback. application. This is followed by a description of the Unlike traditional VCRs, ReplayTV and TiVo devices general experimental method used in the evaluation. We store video in digital form (MPEG-2) on large hard disks. then detail the six video content types and present the With digital video stored on hard disks and/or as Internet- results of the study. Finally, we present our conclusions.
  2. 2. RELATED WORK compression or shot boundary frames. The user interface Previous research in browsing digital media has often design is also quite different as input must be performed focused on either audio or video, but not both. The using a remote control device. Finally, no public data is SpeechSkimmer provided an interface for selecting time available on how the provided controls are actually being compressed and pause-removed audio playback as well as used. facilitating jumping back and forward between pre- PROTOTYPE FEATURES AND FUNCTIONALTIY defined segments of the recording [3,4]. The Audio Our study used two video browsers: “Basic” and Notebook [20] uses pen strokes to index audio as it is “Enhanced”. The enhanced browser was developed using recorded and allows time compressed playback. a modified version of the Microsoft Windows Media For video, the Hierarchical Video Magnifier [14] was Player. The basic browser leveraged the same software, designed to provide users with a context of the contents of but displayed only a subset of the functionality. a video by displaying video frames nearby the current Basic browser controls: The basic controls provide the position. Arman, et al [2] improved the frame selection features typically found on current software video methods by detecting shot-boundaries, useful in editing playback applications. They include Play, Pause, Fast- systems [12]. The Classroom-2000 project at Georgia forward, Seek, Skip-to-beginning of video, and Skip-to- Tech [5] investigated richly indexed videos of lectures, end of video. No audio was played during fast-forward as including indexing based on strokes drawn on a black- is common with current media players, and seek was board. None of these systems explore the wide range of accomplished by dragging the seek thumb on the timeline browsing techniques and/or user scenarios explored here. in the interface. Due to limitations of the Windows Media Christel, et al, describe an evaluation of techniques for Player, a traditional rewind feature could not be provided. shortening the viewing time of a video based on both Enhanced browser controls: Figure 1 shows the user audio and video analysis [6]. Such techniques, used in interface for the enhanced browser. The following systems like CueVideo [16], condense the content into a additional controls were provided: shortened video summary that is intended to be watched in its entirety. The user does not control what is deleted to • Speed-up controls: Time compression (TC), create the shortened summary and cannot browse the Pause removal (PR) resulting video, the focus of this study. • Textual indices: Table of contents (TOC), Notes The Informedia [8] project at CMU has performed substantial research in indexing and searching video in the • Visual indices: Shot boundary (SB) frames, context of information retrieval and digital library Timeline markers systems. Companies like Virage [21] and MediaSite [11] • Jump controls: Jump- back, Jump- next are providing these services for finding video on the The speed-up controls allow the user to shorten the Internet. Others have used domain knowledge to improve viewing time of a video. Time Compression (TC) uses such services for specific video content types like news signal processing techniques to increase the playback [10]. Such work focuses on query-based searching of speed while preserving the pitch of the audio. Pause collections of video content rather than on browsing an removal (PR) detects the pauses in continuous speech and individual video that is the focus of this study. removes both the audio and video segments associated The computer software industry has quickly embraced the with them. Internet as a platform for digital video. However, the The textual indices provide the user with a means to main focus of industry development has been the creation browse the contents of a video in the same manner that and distribution of content, not viewing or browsing. As a they might browse a text document. The user can seek to result, the leading software playback applications such as the location in the video that is associated with a the Real Networks RealPlayer [17], Apple QuickTime particular entry. The table of contents (TOC) is used to Player [1], and Microsoft Windows Media Player [13] provide a pre-generated list of entries that cannot be offer relatively few controls for browsing. In addition to modified. The notes feature allows the user to create their the controls found on a VCR, these applications add a own entries as well as add longer text comments to each seek bar allowing random access via a “thumb” and a entry. When the user creates a note, the video is paused table of contents index. and the title and comment entered by the user is anchored The consumer electronics industry has begun to to the current position of the video. We expected that incorporate more advanced browsing features in the next users might use the notes feature to bookmark significant generation of hardware video playback devices. DVD parts of the video for later reference as well as to record Video players support random access into the content their thoughts regarding the content of the video at that using a table of contents index. ReplayTV and TiVo set- location. top boxes offer an index to the shows recorded. In The visual indices provided are the shot boundary frames addition, they provide the ability to jump forward by 30 or and the timeline markers. The numbered shot boundary 60 seconds, primarily intended for skipping commercials, frames allow the user to visually identify and then seek to and back 8 – 10 seconds for “instant replays”. However, a particular shot by clicking on it. As the video plays, the none of these devices provide features like time frame corresponding to the currently playing shot is
  3. 3. B a s ic C o n t r o ls : P la y , p a u s e , f a s t - f o r w a r d , t im e lin e s e e k b a r E la s p s e d tim e in d ic a to r w ith th u m b , s k ip - to - b e g in n in g , s k ip - to - e n d . N o r e w in d fe a tu r e T a b le o f c o n te n ts (T O C ): O p e n s w a s a v a ila b le . s e p a r a t e d ia lo g w it h t e x t u a l lis t in g o f s ig n if ic a n t p o in t s in t h e v id e o b a s e d J u m p b a c k /n e x t c o n tr o ls : S e e k o n c o n t e n t o f v id e o . C o n t a in s " s e e k " v id e o b a c k w a r d o r f o r w a r d b y fe a tu r e a llo w in g u s e r to s e e k to fix e d in c r e m e n ts o r to th e p o in t s in v id e o . I n d e x e n t r ie s a ls o p r e v io u s / n e x t e n t r y in a n in d e x . in d ic a te d o n T im e lin e s e e k b a r . J u m p in t e r v a ls a r e s e le c t e d f r o m d r o p - d o w n lis t ( s h o w n b e lo w ) P e rs o n a l n o te s b u tto n : O p e n s a c tiv a te d b y c lic k in g th e d o w n - s e p a r a te d ia lo g w ith u s e r - g e n e r a te d p o in tin g a r r o w s . L is t v a r ie s w ith p e r s o n a l n o te s in d e x . C o n ta in s a v a ila b ility o f N o te s , S h o t " s e e k " fe a tu r e a llo w in g u s e r to s e e k B o n d a r ie s , a n d T O C . to th e p o in ts in v id e o . N o te s in d e x e n t r ie s a ls o in d ic a t e d o n T I m e lin e s e e k b a r. M a r k e r s : In d ic a te p la c e m e n t o f e n tr ie s fo r T O C , p e r s o n a l n o te s a n d s h o t b o u n d a r y in d ic e s . P a u s e r e m o v a l: T o g g le s T im e lin e z o o m : Z o o m in a n d z o o m b e tw e e n th e s e le c tio n o f th e o u t. p a u s e - r e m o v e d v id e o a n d th e o r ig in a l v id e o . S h o t b o u n d a ry fra m e s : In d e x o f v id e o . S h o t is a n u n b r o k e n T im e c o m p r e s s io n : A llo w s t h e s e q u e n c e o f fra m e s re c o rd e d fro m a a d ju s tm e n t o f p la y b a c k s p e e d s in g le c a m e r a . S h o t b o u n d a r ie s a r e fr o m 5 0 % to 2 5 0 % in 1 0 % g e n e r a te d fr o m a d e te c tio n a lo g r ith m in c r e m e n t s . 1 0 0 % is n o r m a l th a t id e n tifie s th e tr a n s itio n s speed. b e tw e e n s h o ts a n d r e c o r d s th e ir lo c a tio n s in to a n in d e x . C u r r e n t s h o t D u r a t io n : D is p la y s th e le n g th o f is h ig h lg h te d a s v id e o p la y s w h e n th e v id e o ta k in g in to a c c o u n t th e s y n c b o x is c h e c k e d . C a n s e e k to c o m b in e d s e ttin g o f s e le c te d p a r t o f v id e o b y c lic k in g o n P a u s e - r e m o v a l a n d T im e s h o t. c o m p r e s s io n c o n tr o ls . Figure 1. Enhanced Browser User Interface highlighted. The timeline markers show the location of instrumented to record the usage of each feature during the TOC and notes entries in the video with color coded the study. bars. They can be used to judge the locations of entries STUDY DESIGN relative to the current position of the video (shown by the The user study was designed to evaluate feature usage and thumb). the experience with the enhanced browser via observation, The jump-back and jump-next controls seek the video subjective surveys, and comparison with the basic backward or forward, respectively, by a fixed interval or browser. In addition, we chose to conduct the study by entries in an index. Users can jump by 5 seconds, 10 across a broad range of content types, ultimately choosing seconds, TOC entry, note, or shot boundary. It was six such “browsing scenarios”. The scenarios included hypothesized, for example, that a user watching the video watching classroom lectures, conference presentations, might use the jump back 5 and 10 seconds controls to sports, television dramas, news, and travelogues. We repeat significant events just passed whereas the jump detail the scenarios with the presentation of results in the next TOC entry control might be used to preview the first next section. few minutes of each consecutive entry in the TOC. Also, Participants were recruited from a pool of non-Microsoft it is very difficult to do these operations using the seek employees that expressed interest in participating in a thumb. For example, a one-hour video (3600 seconds) usability study at Microsoft. In addition, the participants spread across roughly 400 pixels (width of our browser) were selected and assigned to a scenario based upon means that moving the thumb one pixel seeks 9 seconds. matching interests with the scenario content. Five Our goal for the prototype was to expose the functionality participants per scenario completed the study for a total of of the browser with a user interface adequate for 30 participants. Each participant received a Microsoft evaluation. Although not discussed in this paper, the software product for their involvement in the study. study was also used to evaluate the usability of the The participants were assigned tasks related to their interface. Both the basic and enhanced browsers were browsing scenario. Each participant first completed their
  4. 4. lecture. The task was to watch the lecture video and Seek FF SB TC PR Jmp TOC Note summarize the main points in preparation for the quiz. Bas Enh Bas Enh Enhanced Classroom 4.8 5.6 4.4 4.1 5.0 5.4 5.1 4.8 6.8 3.5 The time constraint ensured that the participants would not be able to watch the entire video. However, the Conference 5.6 4.1 3.6 3.3 4.9 6.9 6.5 5.1 N/A 3.8 participants were selected based upon previous Sports 5.2 4.7 5.6 5.9 6.1 5.7 4.3 5.6 5.3 4.5 programming experience in a language other than C. Shows 5.0 3.6 4.4 4.3 5.1 6.0 4.3 2.8 N/A 2.5 Since many programming concepts are similar across News 5.8 4.9 5.4 4.3 6.4 6.7 6.6 5.6 6.6 4.6 different languages, it was presumed that the participants Travel 5.2 5.7 5.4 4.2 6.3 6.6 6.0 6.3 N/A 6.4 could effectively skim the video based upon previous Average 5.3 4.8 4.8 4.4 5.6 6.2 5.5 5.0 6.2 4.1 knowledge. SB = Shot Boundaries, Jmp = Jump-back and Jump-next Using the basic browser, though, the participants had a Table 1. Qualitative Ratings of Browser Features difficult time skimming the video. The participants fast- (1 = not useful, 7 = very useful) forwarded through topics and skipped topics using the seek thumb. However, with no indication of the position task watching a video using the basic browser. Then, after of topic changes, the participants made random guesses to completing a practice task to learn the enhanced features, seek. Figure 2 shows they used the seek thumb an average they watched two more videos using the enhanced of 21 times in the half hour, or roughly once every 1.5 browser. To encourage the use of the browsing features, minutes. they were limited to ½ hour for watching each 45 min. – 1 The enhanced browser provided a TOC generated from hour video. slides used in the lecture. Participants in this scenario In addition to pre- and post-study surveys, the participants used the TOC to seek the video with greater frequency completed a survey after watching each video. The than any other scenario (avg. use 12.5 times vs. 2 times participants were asked to describe their browsing strategy overall, Table 2). They reported that they “used the TOC as well as rate their interest in the contents of the video, to jump to the main parts of the lecture rather than the quality of their experience, and the usefulness of the guessing”. They also made considerable use of TC and features available in each condition. PR. This increased the fraction of content they watched once or more from 35% to 48% (Table 4), corresponding SCENARIOS AND RESULTS to a combined speed-up factor of 1.37 (Table 3). The We present the results with the descriptions of the six TOC, TC, and PR were the top-three valued controls, with browsing scenarios, including quantitative data on what ratings of 6.8, 5.4, and 5.1 respectively (Table 1).1 features were used most and a qualitative analysis of the users’ browsing experience. The basic browser interface is not unlike that of the VCRs that many Stanford students use in campus libraries to First, we present tables that we will be using in discussing watch televised courses. The initial findings of this the results for scenarios. Table 1 presents the users’ scenario suggest that significant benefits could be gained qualitative ratings of various features provided in the basic by providing a TOC for courses, if not also TC and PR. and enhanced interfaces on a scale of 1…7, where 1 is “not useful at all” and 7 is “very useful”. Table 2 shows Conference Presentation the frequency of use of the features across the scenarios. Conference and seminar presentations are valuable for Table 3 shows the overall playback speed using time- compression and pause-removal across the scenarios. Seek FF SB Jmp Jmp TOC Note Note Table 4 shows what fraction of content the users watched Bck Nxt Sk Add Sk 0, 1, 2, or 3+ times. Finally, Table 5 shows how the users Bas Enh Bas Enh Enhanced utilized their time during the study, i.e., with the video Classroom 21.6 0.0 10.8 0.0 1.5 4.5 2.0 12.5 0.0 0.0 paused, playing at normal speed, fast-forward (FF), Conference 15.7 0.5 4.2 0.0 2.0 0.5 7.0 N/A 3.0 1.0 playing time-compressed (TC), pause-removed (PR), or both (TC, PR). Sports 20.0 7.0 12.8 4.5 26.5 0.0 4.0 1.5 2.0 0.5 Shows 14.8 3.0 9.8 1.0 4.5 0.0 11.0 N/A 0.0 0.0 Classroom Lecture News 34.0 0.5 10.2 0.0 9.5 2.0 10.5 3.5 1.0 0.0 Increasing resource demands on education have led to the adoption of video offering of courses by many institutions. Travel 51.8 3.0 11.0 0.0 55.0 14.5 4.5 N/A 9.5 5.0 Stanford University, for example, offers hundreds of Average 26.3 2.3 9.8 0.9 16.5 3.6 6.5 5.8 2.6 1.1 courses each year, live and on-demand, via television broadcast, videotape, and Internet delivery [19]. The SB = Shot Boundary Seek, TOC Sk = TOC Seek, Note Sk = Note Seek classroom lecture scenario simulates a student taking a Table 2. Avg. Feature Use per Participant per Video traditional live course with a video archive. The keeping up with contemporary work in various academic participants were asked to imagine they were taking a C and professional fields. Electronically accessible on- programming class. A quiz was going to be administered in ½ hour but they did not attend the previous one-hour 1 Although “Seek” is rated high in Table 1, notice that it is used zero times in the enhanced browser (Table 2). The high rating is due to the fact that the participants thought of TOC also as a seek mechanism.
  5. 5. Time Comp. And TC and PR made it possible to watch significantly more Time Comp. of the video, as in the classroom scenario. However, Pause-Removed (Gain) without a TOC, both the shot boundaries and the notes Classroom 126.6% 136.8% (10.2%) were utilized to effectively browse the video. Conference 122.0% 150.4% (28.3%) Sports Sports 116.9% 137.3% (20.4%) These days, twenty-four hour networks bombard us with Shows 132.3% 144.6% (12.3%) a wide array of sports programming. However, the time News 123.3% 142.2% (18.9%) available to watch sports has not increased. The sports Travel 139.9% 149.5% (9.6%) scenario gave participants the chance to browse sports Average 126.8% 143.4% (16.6%) events. Each participant reported that they watched sports or sports news shows on a regular basis. Table 3. Avg. Playback Speed in Enhanced The specific task was to find highlights in a baseball Conditions game to discuss with friends at the health club in ½ demand presentations can provide added flexibility of hour. A single baseball game was divided into three anytime, anywhere viewing. However, the ability to one-hour segments and presented in order to the browse a presentation can potentially be of great value participants. Since baseball can have long periods of little when time is limited. or no scoring activity, it was expected that there was The participants were asked to pretend they had ½ hour ample opportunity to skim the video. As an aid, a table of before attending a meeting with co-workers to discuss a contents was provided in the enhanced condition indexing conference they had attended. The participants did not the top and bottom of each inning in the video (~6 attend the same presentations as their co-workers, but entries). would still like to take part in the discussion. The task Using the basic browser, most of the participants started was to review a video of the missed presentation and out by using the fast forward button to skip the summarize the main points in preparation for the meeting. commercials and dead time between plays. The The videos for the study ranged between 40 to 50 minutes participants spent nearly 40% of their time watching the and were selected from the ACM 97 presentations of “The game in fast-forward (Table 5), higher than any other Next 50 Years of Computing”. Participants were recruited scenario. Play highlights can be identified visually, so the based upon background interests in the future of lack of audio was insignificant. Fast-forward, however, computing and education. Unlike the classroom lecture was not enough to skim the game in ½ hour. As a result, scenario, the contents of videos were not technical or the participants also made considerable use of the seek highly structured so a TOC was not provided for enhanced thumb (~15 times in 30 minutes). browser. With the enhanced browser, the participants most Using the basic browser, the participants used the seek frequently used the shot-boundary frames to seek the thumb and the fast forward to skim the video much like in video (~27 times in 30 minutes, Table 2) and rated it the classroom scenario. highest in surveys (6.1, Table 1). Using the five frames at bottom of browser, the participants could determine the Using the enhanced browser, the highest rated controls outcome of the current play. By scrolling the frames were TC and PR (6.9 and 6.5, Table 1). On average, a ahead, the participants could preview and seek to combined speed-up of 1.5 was used by the participants successive plays. In contrast, the TOC inning index was (Table 2) and, as compared to the basic browser, they only used once or twice, mainly to skip the ads at the end covered 86% of the content instead of 68% (Table 3). of an inning. TC, PR, and fast-forward were also very Shot boundary frames were used twice on average, usually popular in the enhanced browser. Unlike other scenarios, to skip lengthy introductions as the transition between the fast-forward remained quite attractive as it allowed greater host and the speaker could be seen in the frames. Basic Enhanced Although the average rating was neutral (3.8, Table 1), %W0 %W1 %W2 %W+ %W0 %W1 %W2 %W+ personal notes were used effectively by several Classroom 64.2 32.8 2.6 0.2 51.9 41.1 6.6 0.9 participants. Two of the five participants used notes to mark interesting locations in the video. One of them Conference 32.2 64.5 2.7 0.8 14.0 73.9 11.0 0.9 included the shot boundary frame number in the title of Sports 78.2 20.8 1.0 0.0 58.8 33.3 6.3 1.7 her notes, providing a visual indicator for the location of Shows 59.4 40.4 0.0 0.0 46.2 52.9 0.9 0.0 the note. Both participants used their notes to review the News 63.4 33.8 2.4 0.0 48.5 46.3 5.1 0.4 main points of the video for their summary. A third participant used the notes feature to bookmark the start Travel 66.8 25.3 7.3 0.5 42.9 30.8 11.5 15.1 and end of video segments he skipped to review them later Average 60.7 36.3 2.7 0.3 43.7 46.4 6.9 3.2 if time allowed. This behavior suggests the need for a bookmark feature that does not require typing a title for a Table 4. Percent of video not watched (%W0), watched note or a logging feature that automatically marks the once (%W1), twice (%W1), 3 or more times (%W+) portions of the video skipped.
  6. 6. “E.R.”, “Ally McBeal”, and “Babylon 5” (including Paused Playing FF TC PR TC, PR commercials). Classroom Basic 6.0 86.4 11.0 Few expectations were made regarding the browsing Enhanced 11.1 27.5 0.3 19.3 5.5 26.4 behavior of the participants in this scenario. It was an Conference Basic 10.2 90.0 2.2 absolute certainty that the features would be used to skip Enhanced 14.7 10.2 0.1 13.5 2.4 54.6 commercials. However, how each participant might Sports Basic 8.6 54.0 38.0 choose to browse the content of the shows could depend heavily upon personal preference. Enhanced 4.6 35.9 10.7 21.2 6.4 22.2 Using the basic browser, it was not possible for the Shows Basic 15.4 72.2 12.6 participants to watch the entire show in ½ hour even if Enhanced 4.6 18.8 0.9 29.6 0.2 36.0 they skipped commercials. The seek thumb was used 14 News Basic 7.4 79.4 17.0 times on average, or roughly one seek every 2 minutes Enhanced 9.6 10.8 0.0 23.6 12.4 44.5 (Table 2). The participants guessed randomly when seeking. Travel Basic 11.8 53.8 15.2 Enhanced 19.3 17.5 0.0 19.7 1.3 27.7 In the enhanced conditions, time compression was the highest rated feature of the browser (6, Table 1). It was Table 5. Percentage of Study Time Spent used to increase the amount of the show watched from an speed-up than time compression and key information was average of 40% in the basic condition to 54% over the in the video channel anyway. TC and PR combined enhanced conditions (Table 4). The second highest rated offered a speed-up of 1.37 allowing more of the game to feature was shot boundaries (5.1, Table 1). By scrolling be watched. the shot boundary frames, the participants could instantly and accurately skip commercials. The average use of 5 In this scenario, we saw the development of more shot boundary seeks (Table 2) corresponds roughly to the sophisticated strategies over time. For example, when number of commercials in a one-hour show. watching the second video using the enhanced browser, two participants chose to watch the home team at bat When asked to rate their satisfaction with their coverage while completely skipping the visitors. Another two of the show, the participants reported an increase from 3.4 participants used the notes feature to bookmark interesting using the basic browser to 5.4 after the second use of the plays for later reference. Both strategies exemplify the enhanced browser (scale of 1 – 7, 7 being best). However, user in control over the game, unlike watching a set of unlike the sports condition, the participants did not agree highlights from a news show. When asked if the that the availability of a video browser would affect the availability of the enhanced browser would affect how way they watched television (3.6 in basic, avg. 4.3 over they watched television, the participants’ responses enhanced). The participants all reported that they would increased from 4.2 using the basic browser (neutral), to 6 not regularly watch a show under such time constraints. after the second use of the enhanced browser (agree, scale One participant called time compression and pause of 1 – 7, 7 being strongly agree). Similarly, when asked removal “mentally fatiguing”. about the quality of their experience, ratings increased News from 4.8 to 6 (scale of 1 – 7, 7 being best). Like sports, news is available 24 hours a day, 7 days a The results show that having the ability to browse and week on television and the Internet, but the time for skim a baseball game can potentially be very appealing. watching them has not increased. The participants in this Features that support skimming visually, such as shot scenario were asked to pretend that they were forced by boundaries, TC, and PR, are far more useful than others in family members to spend less time watching the news. this scenario. The task was to watch a one-hour news program in the ½ hour before dinner and summarize the news for discussion Shows at the table. Participants were selected based upon a Every day, millions of viewers watch the countless prerequisite of watching at least ½ hour of news daily. number of sitcoms, soap operas, dramas, and other shows that fill the airwaves. The VCR has proven to be an The participants were presented three consecutive airings indispensable aid in allowing viewers to skim and browse of “The News Hour with Jim Lehrer” which consists of a their recordings, primarily through skipping general news summary followed by five in-depth reports. advertisements. How would they react to the features in Since the content is highly structured into discrete story our application? segments, we expected that the participants would want to choose the stories they were interested in watching. To Each participant regularly watched at least one weekly facilitate this behavior, a table of contents was generated television show. They were asked to pretend that they for the enhanced conditions to index the beginning of the wanted to watch the final episode of their favorite show news summary and each story. airing in ½ hour, but they still needed to watch the previous episode that they had recorded. The task was to Using the basic browser, the seek thumb was used heavily review the major events in the show before watching the (35 times, Table 2) to skip. The participants had to make final episode. Each participant watched a full episode of
  7. 7. many guesses to find the beginnings of stories in the The notes were invaluable for marking the start and end video. points of clips, receiving its highest rating across the Using the enhanced browser, participants were able to use scenarios (6.4 vs average 4.1, Table 1). An average of 9.5 TC and PR to watch more of the video (37% watched with notes was added by each participant versus 2.6 overall basic vs. 52% with enhanced, Table 4). In addition, the (Table 2). They positioned their notes by hitting the TOC made it possible for participants to “select which one jump-back button after noticing an interesting landmark. [story to watch] or in which order I watched them”. Like Jump-back was also used the most (14.5 times, Table 2) the classroom scenario, TC, PR, and the TOC were the and rated the highest here across the scenarios (6.25, highest rated features (6.7, 6.6, 6.6, respectively, Table 1). Table 1). Unlike the classroom scenario, though, shot boundary Ultimately, the participants rated TC the highest in this frames proved to be a useful preview feature for the scenario (6.6, Table 1). TC and PR made it possible to participants as they watched the video (rated 6.4, Table 1). watch nearly 25% more of the video, increasing from 33% Participants would scroll the frames to get an overview of with the basic browser to 57% with the enhanced (Table the contents of a story, using the jump-next button or 4). When asked to rate the quality of their itinerary clicking on a frame to skip ahead. summaries, the participants reported an increase from 4.4 using the basic browser to 5.8 by the second use of the Ultimately, all the participants felt that they could better enhanced browser (scale of 1 – 7, 7 being best). cover the news program using the enhanced browser, with an average satisfaction of coverage rating of 6.6 on a scale These early results indicate that casual users find the of 7, 7 being best, versus an average rating of 4.8 in the combination of different features very useful in a simple basic condition. When asked if a video browser would editing task. Yet some of these features have yet to be affect the way they watched television, the participants found even on professional editing packages. were even more enthusiastic than those in the sports CONCLUDING REMARKS scenario, rating an average of 6.9 on a scale of 7 (strongly The widespread adoption of Internet streaming video and agree). the development of devices like ReplayTV and TiVo Overall, the results indicate that news is a very rich video present an unprecedented opportunity to provide new content type and that browsers can take advantage of both features for browsing digital video. We investigated and textual and visual indices for searching as well as TC and implemented the design of a software video browsing PR for saving time. And, the overwhelmingly positive application that included features like time compression, subjective results show that many people could pause removal, and different forms of content indices. In immediately benefit from such features. addition, we evaluated this design using six different video content types and presented the resulting data for Travel analysis and discussion. Travel videos can be effective previews for destination getaways. The participants were asked to construct a five Using the enhanced browser, the participants viewed minute summary of a travel video that outlined a potential nearly 20% more of the video using TC and PR (average itinerary for a vacation. The summary would be used to video watched for basic browser condition = 39% vs. convince their families where they wanted to go on their average watched for enhanced = 57%, Table 4). They vacation. The travel video scenario was designed to extensively used the shot boundary frames (overall evaluate the use of the advanced features in a simple average 16.5 times, Table 2) in place of the seek thumb to editing task- identifying and assembling clips into a advance in the video. Based on the individual results of sequence. each scenario, we can informally classify our six video content types into different categories: informational Each participant reported an interest in travel as well as audio-centric, informational video-centric, narrative- having planned or taken a vacation in recent years. The entertainment. travel videos contained tourist points of interest and used narrator voice-overs to describe the contents of the scenes. Informational audio-centric videos like classroom lectures and conference presentations contain most of their content Using the basic browser, the seek thumb was used nearly in the audio channel and usually have little visual activity twice as often as in the next most used scenario (64 times in the content. As such, visual browsing features like shot vs. 34 for news, Table 2). The greater accuracy needed boundary frames provide minimal cues. For structured for defining clips required many adjustments using the content, a TOC provides a valuable index, although users seek thumb. can take advantage of notes and shot boundaries when it is In the enhanced condition, the participants relied on the unavailable. shot boundary frames to navigate the videos, using them With informational video-centric content like travel and to identify interesting looking destinations. They used the sports videos, the video contains significant information shot boundary frames to seek the video an average of 55 and the shot boundary frames become indispensable. times versus an average of 16.5 over all scenarios (Table When combined with notes and the jump-back button, it 2) and rated it the third most useful feature (6.3, Table 1). was possible to accurately bookmark locations in the video. News can fall equally into both the informational
  8. 8. audio-centric and informational video-centric categories, 8.Informedia, http://www.informedia.cs.cmu.edu/ and can take advantage of a combination of the different 9.Komlodi, A. and Marchionini, G. “Key frame preview indices for effective browsing. techniques for video browsing.” In Proceedings of the When watching narrative-entertainment like television third ACM Conference on Digital libraries, 1998, Pages dramas, the viewing experience was affected when the 118 – 125. participants were forced to use browsing features like TC 10.Low, C. Y., Tian, Q., and Zhang, H. “An automatic and PR. One participant succinctly stated the general news video parsing, indexing and browsing system.” In sentiment: “I saved time but I would seldom want to Proceedings ACM Multimedia 96, 1996, Page 425. watch a show in a fast version”. 11.MediaSite, http://www.mediasite.com/ However, when watching news and sports, the participants reported the opposite response. A sports participant 12.Meng, J. and Chang, S. “CVEPS - a compressed video remarked that “anything to remove excess time from editing and parsing system.” In Proceedings ACM viewing is positive”. A news participant went further to Multimedia 96, 1996, Page 43. say that “saving time isn’t the best part – being in control 13.Microsoft Windows Media, http://www.microsoft.com/ is”. The features provided the ability to “move to what windows/windowsmedia/ interested me most and then return to the other segments 14.Mills, M., Cohen, J., and Wong, Y. Y., A magnifier as time permitted”. tool for video data, in Proceedings of CHI '92, 1992, In the travel scenario, the participants could identify the ACM Press, 93-98. editing nature of the task. When asked about the 15.Omoigui, N., He, L., Gupta, A., Grudin, J., and technology, one participant responded: “It’s exciting. I Sanocki, E., Time-Compression: Systems Concerns, think editing home movies would be fun”. Another Usage, and Benefits, in Proceedings of CHI ’99 remarked, “I would buy this software in a minute if it (Pittsburgh, PA, 1999), ACM Press, 136-143. would allow me to edit video”. 16.Ponceleon, D., Strinivasan, S., Amir, A., Dragutin, P., The results also show that the availability of such and Diklic, D. “Key to effective video retrieval: advanced features could be immediately beneficial to effective cataloging and browsing.” In Proceedings of users. Immediate plans for future work include refining the 6th ACM international conference on Multimedia, the browser interface based upon observed usability 1998, Pages 99 – 107. problems. Long term plans include performing in-depth evaluations of the browser features with greater numbers 17.Real Networks RealPlayer, http://www.real.com/ of participants over a much longer period of time. Given 18.Replay Networks ReplayTV, http://www.replaytv.com/ the increasing production of video content in different 19.Stanford Online, http://stanford-online.stanford.edu/ contexts, we feel that there is no shortage of applications for new technology in this area. 20.Stifelman, L. “The Audio Notebook: Paper and Pen Interaction with Structured Speech” Ph.D. dissertation, REFERENCES MIT Media Laboratory, 1997. 1.Apple QuickTime Player, http://www.apple.com/quicktime/ 21.TiVo Inc., http://www.tivo.com/ 2.Arman, F., Depommier, R., Hsu, A., and Chiu, M.-Y. 22.Virage, http://www.virage.com “Content-based browsing of video sequences.” In Proceedings of the second ACM international conference on Multimedia '94 , 1994, Page 97 3.Arons, B. “SpeechSkimmer: A System for Interactively Skimming Recorded Speech.” ACM Transactions on Computer Human Interaction, 4, 1, 1997, 3-38. 4.Arons, B. “Techniques, Perception, and Applications of Time-Compressed Speech.” In Proceedings of 1992 Conference, American Voice I/O Society, Sep. 1992, pp. 169-177. 5.Brotherton, J. A., Bhalodia, J. R., and Abowd, G. D. “Automated Capture, Integration, and Visualization of Multiple Media Streams.” In the Proceedings of IEEE Multimedia '98, July, 1998. 6.Chistel, M. G., Smith, M., Taylor, C. R., and Winkler, D. B., “Evolving video skims into useful multimedia abstractions”. In Proceedings of CHI '98 (Los Angeles, CA, 1998), ACM Press, 171-178. 7.DVD Video Group, http://www.dvdvideogroup.com/

×