1. Annotations for Streaming Video on the
Web: System Design and Usage Studies
David Bargeron, Anoop Gupta, Elizabeth Sanocki, and
November 11, 1998
Redmond, WA. 98052 USA
2. Annotations for Streaming Video on the Web:
System Design and Usage Studies
David Bargeron, Anoop Gupta, Elizabeth Sanocki, and Jonathan Grudin
ABSTRACT functionality, and its user-interface. MRAS is a Web-based
Streaming video on the World Wide Web is being widely client/server framework that supports the association of any
deployed, and workplace training and distance education are kind of addressable media with any other kind of
key applications of the technology. The ability to annotate addressable media. The underlying MRAS framework can
video presentations on the Web can provide significant be used for virtually any annotation task, such as taking
added value to these applications. Written and spoken notes on other types of “target” documents such as ordinary
annotations can provide “in context” personal notes and web pages, archiving virtual reality sessions, and composing
enable asynchronous collaboration, turning passive internet dynamic user-specific user interfaces. The current client-
media content into interactive, living content. We discuss side user interface is specifically tailored for annotating
design considerations in constructing a collaborative video streaming video on the Web.
annotation system and introduce our prototype, called In this paper we focus on the use of MRAS in the
MRAS. We present preliminary data on use of MRAS for asynchronous educational environment presented by on-
personal note-taking and for sharing notes. The subjects in demand streaming video. In addition to describing the
our studies showed a strong preference for using MRAS to MRAS system design, we report results from preliminary
take notes (although it took longer to do so), as compared to studies on use of the system. We compare taking notes with
pen-and-paper. The subjects also indicated they would MRAS to taking notes with pen and paper. We look at how
make more comments and questions with MRAS than in a users built on each others’ comments and questions
“live” class, and that sharing added substantial value. We asynchronously using MRAS, similar to the way discussions
also discovered some surprising results with respect to evolve in “live” classroom situations. We examine issues in
usability aspects of a video-annotation system. positioning annotations for video and how users track
Keywords previously made annotations. Finally, we examine the
Video Annotation, Multimedia Annotation, Streaming differences and similarities in how users create text and
Video, Distance Learning, Asynchronous Collaboration audio annotations with MRAS.
1 INTRODUCTION The extensive literature on annotation systems primarily
There has been much discussion about using streaming addresses annotation of text [DaH95, PhW97, Mar98]. The
video on the World Wide Web for workplace training and few systems concerned with annotations of video have
distance learning. The ability to view content on-demand mainly focused on the construction of video databases
anytime and anywhere could expand education from a [WeP94, CLA+97] rather than on promoting collaboration.
primarily synchronous “live” activity to include more We discuss the relationship between our work and earlier
flexible, asynchronous interaction [Sta98]. However, key projects in Section 5.
parts of classroom experience are missing from on-demand The paper is organized as follows. Section 2 describes the
video environments, including questions and comments by architecture, functionality, and user interface of the MRAS
other students and the instructor’s responses. A key system. Section 3 presents results of our personal note-
challenge to making on-demand video a viable educational taking study, and Section 4 presents results of our shared-
tool is supporting this kind of interactivity asynchronously. notes study. We discuss related work in Section 5, and we
A Web-based system for annotating multimedia web present concluding remarks in Section 6.
content can satisfy many requirements of asynchronous 2 MRAS SYSTEM DESIGN AND INTERFACE
educational environments. It can allow students to record Building a system for annotating streaming video on the
notes, questions, and comments as they watch Web-based Web requires careful consideration of architecture,
lecture videos. It can allow them to share annotations with functionality, and interfaces. In this section we present the
each other and instructors, either via the system or via MRAS architecture, including how it interfaces to other
email. It can enable them to watch their own and others’ server components (database, video, web, email), the
annotations scroll by as the video plays. It can support using protocols that provide universal access across firewalls, and
annotations as a table-of-contents to allow jumping to how it provides rich control over bandwidth use for
relevant portions of the video. And finally, such a system annotation retrieval. We show how MRAS supports
can allow students to organize their annotations for personal asynchronous collaboration via annotation sharing, fine-
and public use. grained access control to annotation sets, threaded
We have designed a protoype system called MRAS discussions, and an interface to email. We discuss the
(Microsoft Research Annotation Service) which implements unique user-interface challenges and opportunities involved
this functionality. We present the MRAS architecture, its in annotating video, including precise temporal annotation
3. positioning, video seek by annotation position, and ActiveX controls that display an interface for streaming
annotation tracking while a video is playing. video on the Web.
2.1 MRAS System Overview SM TP M R A S A n n o ta tio n S e r v e r
When a user accesses a web page containing video, the web C lie n t
A n n o ta t io n
browser contacts the web server to get the HTML page, the M M A E m a il R e p ly C o n te n t
S to re
[M R A S u s e r S e rv e r
video-server to get the video content, and the MRAS in te r f a c e ]
Annotation Server to get any annotations associated with W in d o w s
M e s s a g in g
that video. Figure 1 shows the interaction of these S u b s y s te m
networked components. The MRAS Annotation Server ABE
A n n o ta t io n
M e ta D a ta
manages the Annotation Meta Data Store and the Default S to re
Annotation Content Store, and communicates with clients
via HTTP. Meta data about target content is keyed on the
target content’s URL. The MRAS Server communicates H ttp S v c s
HTTP M AW S H ttp S v c s N T N e tw o rk
S e c u r ity
with Email Servers via SMTP, and can send and receive
annotations in email. Since the display of annotations is IIS
composed with target media at runtime on the client, target
content location and user access rights are not restricted. Figure 2: MRAS SystemComponents.
2.3 MRAS User Interface
S tr e a m in g
Figure 5 shows the MRAS toolbar, the top-level user
V id e o
S e rv e r
interface supported by the ActiveX controls in the MMA
module. This toolbar is positioned in a web browser just
U D P /T C P /H ttp
below the web page display, or it can be embedded
D e f a u lt
A n n o ta tio n anywhere in a web page. After logon to a MRAS server,
W eb Page
C o n te n t S to re
users choose the annotation activities they want to perform
S e rv e r
A n n o ta tio n
from this toolbar.
S e rv e r
C lie n t
A n n o ta tio n
2.3.1 Adding New Annotations
M e ta D a ta
S to re Figure 3 shows the dialog box for adding new annotations.
SM TP When a user presses the “Add” button on the top level
E m a il S e r v e r annotations toolbar (Figure 5), this dialog box appears. As
shown, it currently supports adding text and audio
annotations. If there is a video in the current web page, then
Figure 1: MRAS System Overview. when the dialog comes up, the annotation’s target position
is set to the current position in the video’s timeline, and the
2.2 MRAS System Components
video pauses. Pausing is required when adding an audio
Figure 2 shows the MRAS system components and annotation due to hardware constraints on most PCs. Based
illustrates how they work together. The MRAS Annotation on results from a pilot test, we decided to automatically
Server receives commands from the client via HTTP pause the video in both the audio and text cases to make the
requests (allowing it to go through firewalls), which are user experience more consistent.
unpacked and forwarded to the Annotation Back End (ABE)
by the Multimedia Annotation Web Server (MAWS) and 2.3.2 Positioning Annotations
Httpsvcs modules. ABE is the main system module, Just as an annotation on a text-based document can
containing objects for accessing annotation data stores, correspond to a text span, so an MRAS annotation can
composing outgoing email from annotation data, processing correspond to a time range within a target video. Each
incoming email from the Email Reply Server, and making annotation maintains its own range positioning data.
method calls to the server from the client side. Annotation Annotation ranges may overlap. Controls at the top of the
meta data and default annotation content [i.e., annotation ‘Add’ dialog box are for refining or changing a range
content authored in the MRAS user interface] are stored in beginning and end so that it can be positioned precisely
Microsoft SQL 7.0 relational databases. Incoming email is “over” the appropriate segment of the video.
handled by the Email Reply Server. 2.3.3 Organizing Annotations
HTTP Communication between client and server consists of After the annotation has been positioned within the target,
commands encoded as URLs and data formatted as OLE the user must choose from a drop-down list the Annotation
Structured Storage documents. OLE Structured Storage is a Set to which the annotation will be added. Annotation Sets
flexible binary standard for storing arbitrarily complex are the fundamental organizational mechanism in MRAS.
composition documents, but is not widely supported on the Each annotation belongs to one and only one set. An
Web, so we plan to convert our wire data format to XML. Annotation Set roughly corresponds to any traditional
The last piece of the MRAS system is the client, where the collection of comments, questions, or notes. For example, it
HttpSvcs module handles communication with the server may correspond to an individual user’s personal notebook,
and ABE translates user actions into commands destined for or to the transcript of a meeting. As discussed later,
the server. The MRAS user interface is encapsulated in the annotation sets also form the basis for access control and
Multimedia Annotations module (MMA), which hosts sharing.
4. serve as the basic entities against which access rights are
established and controlled. Users are identified and
authenticated in MRAS by their network user accounts and
user group affiliations. A table in the Annotation Meta Data
Store (Figure 2) stores a permissions map that relates user
groups to Annotation Sets. In this way, MRAS affords a fine
grained mechanism for controlling access rights between
groups of users and sets of annotations.
Figure 3: MRAS “Add New Annotation” dialog box.
A user positions an annotation within the target video using the controls at
the top of the dialog box. The Annotation Set is chosen from a drop-down
box of sets to which the user has write access. Email addresses and a
summary line are entered in the spaces provided.
After an Annotation Set has been specified, a user can enter
SMTP email addresses to which the annotation will be sent
after being validated by the server. The server copies
contextual information and annotation data into the email
message, allowing the recipient(s) to quickly get to the
appropriate place in the target video. Replies to annotation
email messages are handled by the MRAS Email Reply
Server, which receives and processes email replies as if they
were annotation replies (described below).
2.3.4 Retrieving Annotations
Figure 4 shows the MRAS “Query Annotations” dialog box,
Figure 4: Query Annotations dialog box.
used to retrieve existing annotations from the MRAS server. Users specify the time range within the target video over which annotations
This is accessed via the “Query” button on the MRAS are to be retrieved. All sets to which a user has read access (and which
toolbar (Figure 5). The controls at the top of the dialog are contain annotations for the current target video) are listed, and the user
similar to those in the “Add New Annotation” dialog box; chooses which sets to query. Other search criteria support fine-grained
queries and Quality Of Service in the MRAS system.
however, here they describe the time range in the target
video for which annotations are to be retrieved. This, along Another important use of Annotation Sets is to support
with the “Max To Retrieve” and “Level Of Detail” boxes in collaboration through fine-grained, structured sharing of
the dialog box, narrow the target range of annotations to be annotations. In a college course, for instance, one set could
retrieved, providing a user with control of bandwidth usage be created for each student called “student X’s notebook,”
when downloading annotations. Users are also presented to which only the student has read/write access and the
with a list of sets as in the “Add” dialog (Figure 3) and can professor has read access. Another set entitled “class
choose any combination from which to query. “Annotation discussion” could grant read/write access to all class
Create Date” and “Summary Keyword Search” allow further members. And a set called “comments on homework” could
query refinement. be created, to which TAs have read/write access and
everyone else has read access.
2.3.5 Access Control and Sharing
The Annotation Set list presented in the “Query 2.3.6 Viewing and Using Annotations
Annotations” dialog differs slightly from the list in the The MRAS “View Annotations” window (Figure 6) displays
“Add” dialog. Here, it lists sets to which a user has read annotations retrieved from the server in a tree structure.
access and which contain annotations on the current target Several interesting MRAS features are introduced here,
video, whereas in the “Add” dialog it is simply a list of sets including annotation reply, seek, tracking, and preview.
to which the user has write access. In addition to being the Reply and seek functionality are accessed through a menu
fundamental organizational mechanism, Annotation Sets that appears when a user clicks over an annotation in the
Figure 5: MRAS toolbar. The top-level toolbar is displayed at the bottom of the web browser window. It allows the user to logon to an MRAS server,
retrieve existing annotations, add new annotations, and see annotations that have been retrieved from the server.
5. window. “Reply” allows users to elaborate on a previously 3.1 Experimental Procedure
created annotation by creating a child annotation. This is the 3.1.1 Participants
basis for threaded discussions in MRAS. “Seek” Six subjects participated in the “Personal Notes” usage
automatically changes the current position of the target study. They were intermediate to advanced Windows users
video to the beginning of the target range over which an who had expressed interest in participating in software
annotation was created. Using seek, users can easily see usability studies in the Microsoft Usability Labs. Subjects
which part of the target video corresponds to the annotation were given software products for their participation in the
they are viewing. This also allows annotations to be used to study.
create personal/shared table-of-contents.
The tracking feature polls the target video at regular Subjects were told to assume that they were students in a
intervals for its current position and updates the “View” course. As preparation for a class discussion planned for the
window. Annotations close to the current target video next class meeting, they needed to watch a video
position are highlighted with red icons and bold typeface, presentation of Elliot Soloway’s 1997 ACM talk “Educating
and the closest annotation gets an arrow icon. Tracking the Barney Generation’s Grandchildren”. Their goal was to
allows users to see where they are in a list of annotations as generate questions or comments from the video for the class
the target video plays. As annotations are tracked, the discussion. Subjects were given as much time as they
preview feature displays the content of the closest needed to complete the task.
annotation, without the user having to manually expand it. During the video, subjects made annotations in two ways:
This was a popular feature in our usage studies, especially using handwritten notes and using text-based MRAS notes.
in the “shared annotations” condition in which users added Half of the subjects (three) took handwritten notes during
comments and questions to a set containing annotations by the first half of the video and switched to MRAS for the
previous users. second half of the video (Paper-first condition). The other
three used MRAS for text-based note taking during the first
half, and pen and paper for the second half (MRAS-first
Subjects began the study session by completing a
background questionnaire. For the MRAS-first condition,
subjects watched the first half of the video after
familiarizing themselves with the MRAS user interface. For
the Paper-first condition, no instructions were given for how
to take hand-written notes. Halfway through the talk, the
video stopped and subjects switched to the other note taking
method. Those in the Paper-first condition familiarized
themselves with taking notes using MRAS before
continuing with the video.
Subjects were observed while watching the video for
behaviors such as pausing or rewinding the video. Task
time, and the number of annotations and their content were
also recorded. After the study, subjects completed a post-
study questionnaire and participated in a debriefing session.
Figure 6 MRAS “View Annotations” window. Table 1 summarizes the results of the Personal Notes Study.
The annotation entitled “Educating the Teachers” and its children are being
tracked. The annotation entitled “Teacher/Student relationships” has been MRAS Paper Total MRAS Paper Total
selected by the user. The preview pane shows the contents of the user- Cond. Subject Notes Notes Notes Time Time Time
selected annotation in this example, but it can also show the contents of the 1 15 16 31 46.58 28.05 74.63
tracked annotations. The mouse-click menu for seeking, replying,
2 19 13 32 40.00 28.00 68.00
expanding, and deleting is also displayed for the user-selected annotation. 3 39 21 60 38.50 28.20 66.70
Avg (SEM) 24.33 (7.42) 16.67 (2.33) 41.00 (9.50) 41.69 (2.48) 28.08 (0.06) 69.78 (2.46)
4 7 14 21 31.28 30.00 61.28
3 PERSONAL NOTES USAGE STUDY
5 13 14 27 34.00 17.07 51.07
We conducted two studies of annotation creation on 6
21.19 (4.41) 51.61 (5.44)
streaming video over the Web. The first concerned personal Total Avg(SEM) 16.83 (4.79) 16.17 (1.30) 33.00 (5.63) 36.06 (2.95) 24.64 (2.50) 60.69 (4.86)
note-taking using MRAS. Since there is little if any past
Table 1: Personal Notes Study Results.
experience with such systems or their use, our primary
motivation was to build intuition about how people would 3.2.1 Number of Notes and Time Spent Taking Notes
use the system---how many annotations will they create, Although the video was just 33 minutes, the subjects took
how will they position the annotations within the lecture lots of notes. Subjects in both conditions using MRAS
video, what will their reaction be in contrast to pen-and added an average of 16.8 notes, and subjects using pen and
paper (something they have been doing for tens of years). paper added an average of 16.2 notes. No significant
differences were observed in number of notes taken using
6. Interestingly, taking notes using MRAS took 32% longer their notes. While the subject pool was too small to make
than on paper (repeated measures ANOVA, p = 0.012). any broad generalizations, these results are extremely
This was due to the fact that with paper the users would encouraging.
often take notes while the video continued to play, thus 4 SHARED NOTES STUDY
overlapping listening and note-taking. For MRAS, the
system always pauses the video when adding, since audio- Having established that subjects preferred MRAS to paper
annotations would otherwise interfere. Thus one of the for personal notes on a video-taped lecture, the second study
design lessons was to remove this automatic pausing when sought to determine whether there was a benefit to sharing
conditions allow (e.g., for text annotations). To our surprise, notes, and to using a particular medium for making notes.
this difference failed to negatively affect the subjects’ When notes are taken with pen and paper, the possibilities
perception of MRAS---as discussed later, all six subjects felt for collaboration and sharing are limited at best, and people
that the benefits of taking notes with MRAS outweighed the generally are restricted to making text annotations,
costs. incidental markings, and drawings. With a Web-based
Another very surprising discovery was how MRAS-first system such as MRAS, users can share their notes with
subjects took paper notes differently from paper-first anyone else on the Web. Users can group and control access
subjects. All three MRAS-first subjects paused the video to annotations. The potential also exists for annotations
while taking paper notes, while none of the paper-first using virtually any kind of media supported by a PC. In our
subjects paused. This suggests that subjects modified their second study, we explore these possibilities.
note-taking styles as a result of using MRAS. 4.1 Experimental Procedure
3.2.2 Contextualization and Positioning Personal Notes The design for the “Shared Notes” study was similar to that
An important aspect of MRAS is that annotations are of the “Personal Notes” study. The same video was used and
automatically linked (contextualized) to the portion of video similar instructions were given to subjects. The exceptions
being watched. We were curious how subjects associated were that all subjects used MRAS to take notes for the
their paper notes with portions of lecture, and their reactions entire duration of the video, and they were told that they
to MRAS. were adding their comments and questions to a shared set of
Examining the paper notes of subjects, we found there was
little contextualization in the handwritten notes. It was 4.1.1 Subjects
difficult or impossible to match up subjects’ handwritten Eighteen subjects participated in the study. They were
notes with the video. The exception was one subject in the recruited from the same subject pool used in the “Personal
MRAS-first condition who had added timestamps to his Notes” study. Subjects participated in one of three
handwritten notes. On the other hand, it was easy to match conditions, a Text-Only condition where they were only
up subjects’ notes taken with MRAS with the video, using allowed to add text annotations, an Audio-Only condition,
the seek feature. Subjects’ comments reinforced this. One wherein only audio annotations were allowed, and a Text-
subject in the Paper-first case told us that MRAS “...allows and-Audio condition, where both text and audio annotations
me to jump to areas in the presentation pertaining to my were allowed.
comments.” We elaborate more on this topic in our shared- 4.1.2 Procedure
notes study. Again, subjects were told to assume that they were creating
3.2.3 User Experience discussion questions and comments for participation in a
When asked to compare their experience taking notes with class discussion. For each condition, subjects participated in
MRAS to that using paper, all six subjects expressed sequence and notes were saved to a common annotation set,
preference for MRAS to paper. Subjects preferred using so that each subsequent subject could see, review, and
MRAS because of the features it offered. Comments respond to the annotations created by previous subjects. The
emphasized organization, readability, and contextualization first subject in each condition started with the same set of
as reasons for preferring MRAS annotations. One subject “seed” annotations, which had been created by a subject in
stated that the notes she took with MRAS were “...much an earlier pilot study and adapted to the appropriate
more legible, and easier to access at a later time to see annotation medium for each condition. Subjects were
exactly what the notes were in reference to”. instructed to look at existing annotations before adding their
Subjects also felt that the notes they took with MRAS were own to avoid redundant annotations.
more useful. When asked which method resulted in notes 4.2 Results
that would be the most useful 6 months down the road (for
instance for studying for an exam) all six subjects chose 4.2.1 Number and Nature of Annotations
MRAS. They again cited issues such as better organization, One goal of the “Shared Notes” study was to evaluate the
increased readability, and the fact that MRAS automatically impact of different annotation media on how users annotate.
positions notes within the lecture video. In this regard a Several previous studies have compared the use of text and
subject said of her notes “the end product is more audio for annotation [CFK91] [NCC+94]; however, little if
organized...The outline feature...allows for a quick review, any work has been done to investigate the effect that
with the option to view detail when time permits”. Subjects annotation medium for Web-based video annotations. Also,
noted that the seek and tracking features of the MRAS user whereas these previous studies have focused on a qualitative
interface were particularly useful for relating their notes to analysis of annotation content, we focus on quantitative
the lecture video, and this added to the overall usefulness of issues.
7. There was a fairly high rate of participation in all three
conditions: A total of 50 annotations were added in Text-
Only, 41 in Audio-Only, and 76 in the Text and Audio
8. Text Only Condition Audio Only Condition Text+Audio Condition Time
Subject New Reply Total New Reply Total New Reply Audio Text Total Text Audio Text+Audio
1st 7 0 7 6 2 8 4 2 2 4 6 55.30 118.85 55.45
2nd 0 1 1 2 1 3 3 3 1 5 6 52.63 71.40 62.32
3rd 7 2 9 5 4 9 5 3 3 5 8 61.98 87.08 45.63
4th 9 6 15 1 0 1 6 6 7 5 12 52.00 44.33 77.05
5th 4 8 12 4 7 11 6 4 1 9 10 75.00 82.48 55.07
6th 5 1 6 6 3 9 14 20 12 22 34 87.07 58.70 95.53
Avg (SEM) 5.33 (1.28) 3.00 (1.32) 8.33 (1.99) 4.00 (0.86) 2.83 (1.01) 6.83 (1.60) 6.33 (1.61) 6.33 (2.79) 4.33 (1.78) 8.33 (2.82) 12.67 (4.37) 63.99 (5.79) 77.14 (10.51) 65.18 (7.42)
Table 2: Shared Notes Study Results.
Table 2 shows the results of the “Shared Notes” usage study, in which subjects were divided into three conditions (Text Only, Audio Only, and Text+Audio).
The number of annotations added by each subject was analyzed for type (new vs reply). In the Text+Audio condition, it was also analyzed for medium (text vs
audio). Subjects tended to use text as an annotation medium slightly more than audio for both new and reply annotations, subjects did not take significantly
longer to complete the task in any of the three conditions, and on average the number of replies went up per subject in sequence.
condition (). However, the number per subject was not reply type, repeated measures ANOVA, p = 0.009). User
significantly different for adding new annotations (one-way feedback from both the Text and Audio and Audio-Only
ANOVA, p=0.455), replying to existing ones (one-way conditions explains why: Subjects generally felt it took
ANOVA, p=0.355), or for the combination of the two (one- more effort to listen to audio than to read text. One subject
way ANOVA, p=0.367). Neither medium nor the choice of in the Text and Audio condition was frustrated with the
medium had a significant effect on the number of speed of audio annotations, saying that “I read much faster
annotations added. than I or others talk. I wanted to expedite a few of the slow
Furthermore, subjects in the Text-Only condition took an talkers.” Another subject pointed out that “it was easy to
average of 63.99 minutes (+/- 5.79 SEM) to complete read the text [in the preview pane] as the video was running.
watching the video and taking notes, whereas Audio-Only By the time I stopped to listen to the audio (annotations) I
subjects took 77.14 minutes (10.51 SEM), and Text and lost the flow of information from the video.”
Audio subjects took 65.18 minutes (7.42 SEM). Again, the Medium therefore does have a quantitative effect on the
differences were not significant (one-way ANOVA, p = creation of annotations in a Web-based system, albeit a
0.469), indicating that neither medium nor the choice of subtle one. We anticipated some of these difficulties in our
medium affected the time to take notes using MRAS. design phase (e.g., as pointed by [NCC+94]), and that was
Subjects in the Text and Audio condition tended to use text our reason for providing a text-based summary line for all
more than audio for creating annotations (both new and annotations, regardless of media type. Several subjects in
reply, see Table 3) (Student’s t-test, p=0.062), and most the Audio-Only and Text and Audio conditions suggested a
expressed a preference for text. When asked which medium speech-to-text feature that would allow audio annotations to
they found easier for adding annotations, 4 out of 6 chose be stored as both audio and text. This would make it easy to
text. One said typing text “...gives me time to think of what scan audio annotation content in the preview window. We
I want to put down. With the audio, I could have paused, agree that some such approach is appropriate, and plan to
gathered my thoughts, then spoke. But, it is easier to erase a try it with the next version of MRAS.
couple of letters... than to try to figure out where in the
4.2.2 Annotation Sharing
audio to erase, especially if I talked for a while.”
Web-based annotations can be shared easily. This opens the
Original Reply Text-Only Audio-Only Text+Audio All Cond. door for rich asynchronous collaboration models. We
Medium Medium Cond. Total Cond. Total Cond. Total Total explored several dimensions of annotation sharing,
Audio Audio 0 (0%) 24 (100%) 6 (16%) 30 (37%) including the impact on the number of annotations over
Audio Text 0 (0%) 0 (0%) 7 (18%) 7 (9%)
time and the distribution of new and “reply” annotations
Text Audio 0 (0%) 0 (0%) 3 (8%) 3 (4%)
added over time. Neither of these dimensions has been
Text Text 19 (100%) 0 (0%) 22 (58%) 41 (50%)
explored in depth in the literature.
Table 3: Effect of Medium on Reply Annotation Counts. Since subjects participated sequentially, we expected to see
The effect of annotation medium on the number of replies was substantial.
Subjects in the Audio-Only and Text-Only conditions were limited to a the number of new annotations drop off as the study
single medium, so all replies were in that medium. Subjects in the Text and progressed. We thought earlier subjects would make notes
Audio condition were more likely to reply to text annotations using text, and on all the interesting parts of the lecture video. Later
were more likely to reply to text annotations overall. subjects would have fewer opportunities for adding new and
Subjects also favored text for creating reply annotations in different annotations, and so would add fewer overall. In
particular. This is shown in Table 3. When we looked at fact, something very different happened. We looked at total
replies in the Text and Audio condition, we discovered that annotations added in sequence across all three conditions,
subjects were just as likely to use text to reply to audio and found that the number increased significantly (Pearson r
annotations as they were to use audio. Furthermore, subjects = 0.491, p = 0.039). This is illustrated in Figure 7.
were more likely to use text when replying to text Furthermore, this increase in the total number of annotations
annotations (main effect for reply type, repeated measures arose from a significant increase in replies alone (Pearson r
ANOVA, p = 0.033). = 0.525, p = 0.025), suggesting that more sharing occurred
as the study progressed. Obviously, this can not go on
Interestingly, subjects in the Text and Audio condition
forever, and some easy means for supporting moderation
were also much more likely to reply to text annotations in
would be nice to add to the system.
the first place (interaction effect for annotation type and
9. 4.2.3 Positioning Shared Notes helped or distracted them, some reported the additional
Positioning annotations in this study was particularly information confusing because it was not perfectly
important because the notes were shared. Subjects who synchronized with the activity in the lecture video. The
participated later in the study relied on the positioning of majority, however, found others’ annotations useful in
earlier subjects’ annotations to contextualize the annotation guiding their own thinking. This indicates that the positive
contents. Our hypothesis was that if annotations were not results of [Mar98] for annotation of text carry over to Web-
accurately positioned in the video, they would be confusing. based annotations. One Text and Audio subject said “it was
thought provoking to read the comments of others.” Another
remarked “it was more of an interactive experience than a
normal lecture. I found myself reading the other notes and
40 assimilating the lecture and comments together.” These
30 comments were echoed by subjects in the other conditions:
20 An Audio-Only subject said that it was interesting and
useful to see “...what other people find important enough to
comment on.” And a Text-Only participant pointed out that
1st 2nd 3rd 4th 5th 6th MRAS “...gives you the opportunity to see what others have
S ub je c t S e q ue nc e
thought, thus giving you the chance to prevent repetition or
[to] thoughtfully add on to other's comments.” Note- sharing
Figure 7: Annotations Added Over Time. in a Web-based annotation system could significantly
The number of annotations added over time increased (Pearson r = 0.491, p
= 0.039). Here, the numbers for all three conditions (Text-Only, Audio-
enhance collaboration in an asynchronous environment.
Only, and Text and Audio) have been combined. A linear least-squares trend Finally, subjects were asked whether they found using
line is superimposed in black.
MRAS to be an effective way of preparing for a class
We expected an “accurately” positioned annotation to be discussion. A strong majority across all conditions strongly
placed directly over the place in the video that agreed. The average response of subjects in the Text and
contextualized it. Audio condition was 6.0 out of 7 (with 7“strongly agree”
and 1 “strongly disagree”). The average in both the Audio-
As in the “Personal Notes” study, users mostly positioned
Only and Text-Only conditions was 5.0 out of 7. One Text-
their annotations 10 to 15 seconds after the relevant part of
Only subject was particularly enthusiastic: “I think that this
the video. (This was not a conscious action/decision on part
software could really get more students involved, to see
of the subjects, but just that they pressed the “add” button
different viewpoints before classroom discussions, and
after they had heard the thought on the video.) Contrary to
could increase students’ participation who would otherwise
our expectations, subjects were not confused or distracted
not get too involved.”
by the lack of accuracy. In fact, when they watched the
lecture video from beginning to end they found it natural to 5 RELATED WORK
see annotations display in the preview window a few Annotations for personal and collaborative use have been
seconds after the relevant part in the video. widely studied in several domains. Annotation systems have
been built and studied in educational contexts. CoNotes
4.2.4 User Experience [DaH95] and Animal Landlord [SmR97] support guided
Subjects generally liked using MRAS for taking shared pedagogical annotation experiences. Both require
notes on streaming video over the Web. As already noted, preparation of annotation targets (for instance, by an
their feedback reinforced many of our observations, instructor). MRAS can support a guided annotation model
especially with respect to their choice of medium when through annotation sets, but does not require it, and requires
creating annotations. Moreover, their feedback lends fresh no special preparation of target media. The Classroom 2000
insight into the process of creating and sharing Web-based project [AAF+96] is centered on capturing all aspects of a
annotations. live classroom experience, but in contrast to MRAS,
facilities for asynchronous interaction are lacking. Studies
Asked whether they made more comments and questions
of handwritten annotations in the educational sphere
using MRAS than they would have in a live lecture, 14 out
[Mar98] have shown that annotations made in books are
of 18 said yes. However, this was due in part to the fact that
valuable to subsequent users. Deployment of MRAS-like
they were watching a video that could be paused and
systems will allow similar value to be added to video
replayed. One participant in the Text-Only condition told
us he “...would not have had the opportunity to make as
many comments or ask as many questions” in a live lecture. The MRAS system architecture is related to several other
Another subject in the Audio-Only condition commented designs. OSF [SMB96] and NCSA [LaB97] have proposed
that she “...would not have had as much to say since... I’d scalable Web-based architectures for sharing annotations on
be taking up everyone’s time [and] ‘hogging the stage.’” web pages. These are similar in principal to MRAS, but
Regardless of the reason, this is good news for neither supports fine-grained access control, annotation
asynchronous collaboration in education contexts: Web- grouping, video annotations, or rich annotation positioning.
based support for making comments and asking questions Knowledge Weasel [LaS93] is Web-based. It offers a
asynchronously may increase class participation. common annotation record format, annotation grouping, and
fine-grained annotation retrieval, but does not support
The fact that Web-based annotations can be shared also had
access control and stores meta data in a distributed file
a big impact on users. Asked whether the other annotations
10. system, not in a relational database as does MRAS. The will watch the lecture-videos in advance on the web and
ComMentor architecture [RMW97] is similar to MRAS, but annotate them with questions and comments. Face-to-face
access control is weak and annotations of video are not time in class will be used for in-depth discussion, rather
supported. than the professor regurgitating textbook material.
Several projects have concentrated on composing 7 ACKNOWLEDGEMENTS
multimedia documents from a base document and its Thanks to the Microsoft Usability Labs for use of their lab
annotations. These include Open Architecture Multimedia facilities. Mary Czerwinski assisted in our study designs.
Documents [GaS93], DynaText [Smi93], Relativity Ziyad Hakura reviewed the paper and gave us valuable
Controller [Gou93], Multivalent Annotations [PhW97], suggestions for improvement. Scott Cottrille, Brian
DIANE [BHB+97], and MediaWeaver [Wei98]. These Christian, Nosa Omoigui, and Mike Morton contributed
systems either restrict target location and storage, or alter valuable suggestions for the architecture and
the original target in the process of annotation. MRAS does implementation of MRAS.
neither, yet it can still support document segmentation and
composition with the rich positioning information it stores. 8 REFERENCES
[AAF+96] Abowd, G., Atkeson, C.G., Feinstein, A., Hmelo, C., Kooper, R.,
The use of text and audio as annotation media has been Long, S., Sawhney, N., and Tani, M. Teaching and Learning as
compared in [CFK91] and [NCC+94]. These studies focused Multimedia Authoring: The Classroom 2000 Project, Proceedings of
on using annotations for feedback rather than collaboration. Multimedia ’96 (Boston, MA, USA, Nov 1996), ACM Press, 187-198.
[BHB+97] Bessler, S., Hager, M., Benz, H., Mecklenburg, R., and Fischer, S.
Also, the annotations were taken on text documents using DIANE: A Multimedia Annotation System, Proceedings of ECMAST ’97
pen and paper or a tape recorder. Our studies targeted an (Milan, Italy, May 1997).
instructional video and were conducted online. Finally, [CLA+97] Carrer, M., Ligresti, L., Ahanger, G., and Little, T.D.C. An
these studies reported qualitative analyses of subjects’ Annotation Engine for Supporting Video Database Population,
Multimedia Tools and Applications 5 (1997), Kluwer Academic
annotations. We concentrated on a quantitative comparison Publishers, 233-258.
and have not yet completed a qualitative analysis of our [CFK91] Chalfonte, B.L., Fish, R.S., and Kraut, R.E. Expressive Richness: A
subjects’ notes. Comparison of Speech and Text as Media for Revision, Proceedings of
CHI’91 (1991), ACM Press, 21-26.
Considerable work on video annotation has focused on [DaH95] Davis, and Huttonlocker. CoNote System Overview. (1995)
indexing video for video databases. Examples include Lee’s Available at
hybrid approach [LeK93], Marquee [WeP94], VIRON http://www.cs.cornell.edu/home/dph/annotation/annotations.html.
[KKK96], and VANE [CLA+97], and they run the gamut [GaS93] Gaines, B.R. and Shaw, M.L.G. Open Architecture Multimedia
Documents, Proceedings of Multimedia ’93 (Anaheim, CA, USA, Aug
from fully manual to fully automated systems. In contrast to 1993), ACM Press, 137-46.
MRAS, they are not designed as collaborative tools for [Gou93] Gould, E.J. Relativity Controller: Reflecting User Perspective in
learning and communication. Document Spaces, Adjunct Proceedings of INTERCHI '93 (1993), ACM
6 CONCLUDING REMARKS [KKK96] Kim, K.W., Kim, K.B., Kim, H.J. VIRON: An Annotation-Based
Ability to annotate video over the Web provides a powerful Video Information Retrieval System, Proceedings of COMPSAC '96
means for supporting “in-context” user notes and (Seoul, South Korea, Aug 1996), IEEE Press, 298-303.
[LaB97] Laliberte, D., and Braverman, A. A Protocol for Scalable Group
asynchronous collaboration. We have presented MRAS, a and Public Annotations. 1997 NCSA Technical Proposal, available at
prototype client-server system for annotating video. MRAS http:// union.ncsa.uiuc.edu/ ~liberte/ www/ scalable-annotations.html.
interfaces to other server components (database, video, web, [LaS93] Lawton, D.T., and Smith, I.E. The Knowledge Weasel Hypermedia
email), employes protocols that provide access across Annotation System, Proceedings of HyperText ’93 (Nov 1993), ACM
firewalls, and provides control over bandwidth use in Press, 106-117.
[LeK93] Lee, S.Y., and Kao, H.M. Video Indexing – An Approach based on
annotation retrieval. MRAS supports asynchronous Moving Object and Track, Proceedings of the SPIE, vol. 1908 (1993),
collaboration via annotation sharing, fine-grained access 25-36.
control to annotation sets, threaded replies, and an email [Mar98] Marshall, C.C. Toward an ecology of hypertext annotation,
interface. Unique interface design issues arose, including Proceedings of HyperText ’98 (Pittsburgh, PA, USA, June 1998), ACM
annotation tracking while video is playing, seek capability, [NCC+94] Neuwirth, C.M., Chandhok, R., Charney, D., Wojahn, P., and
and annotation positioning. Kim, L. Distributed Collaborative Writing: A Comparison of Spoken and
We report results from two studies of video annotation. The Written Modalities for Reviewing and Revising Documents. Proceedings
of CHI ’94 (Boston, MA, April 1994), ACM Press, 51-57.
results have been very encouraging. In a study of personal [PhW97] Phelps, T.A., and Wilensky, R. Multivalent Annotations,
note-taking, all subjects preferred MRAS over pen-and- Proceedings of the First European Conference on Research and Advanced
paper (despite spending more time with MRAS). In a study Technology for Digital Libraries (Pisa, Italy, Sept 1997).
of shared-note taking, 14 of 18 subjects said they made [RMW97] Roscheisen, M., Mogensen, C., Winograd, T. Shared Web
Annotations as a Platform for Third-Party Value-Added, Information
more comments and questions on the lecture than they Providers: Architecture, Protocols, and Usage Examples, Technical
would have in a live lecture. Precise positioning of Report CSDTR/DLTR (1997), Stanford University. Available at
annotations was less of an issue than we expected, and http://www-diglib.stanford.edu/rmr/TR/TR.html.
annotation tracking was heavily used by all subjects when [SMB96] Schickler, M.A., Mazer, M.S., and Brooks, C., Pan-Browser
annotations were shared. Support for Annotations and Other Meta Information on the World Wide
Web. Proceedings of the Fifth International World Wide Web Conference
Although these preliminary results are exciting, we are only (Paris, France, May 1996), available at
scratching the surface of possibilities that a system like http://www5conf.inria.fr/fich_html/papers/P15/Overview.html.
[SmR97] Smith, B.K., and Reiser, B.J. What Should a Wildebeest Say?
MRAS provides. This Fall, we will experimentally deploy Interactive Nature Films for High School Classrooms, Proceedings of
MRAS for a course at Stanford University [Sta98]. Students
11. ACM Multimedia ’97 (Seattle, WA, USA, Nov. 1997), ACM Press, [WeP94] Weber, K., and Poon, A. Marquee: A Tool for Real-Time Video
193-201. Logging, Proceedings of CHI ’94 (Boston, MA, USA, April 1994), ACM
[Smi93] Smith, M. DynaText: An Electronic Publishing System. Computers Press, 58-64.
and the Humanities 27, (1993), 415-420. [Wei98] Wei, S.X. MediaWeaver – A Distributed Media Authoring System
[Sta98] Stanford Online: Masters in Electrical Engineering. for Networked Scholarly Workspaces. Multimedia Tools and
http://scpd.stanford.edu/cee/telecom/onlinedegree.html Applications 6 (1998), Kluwer Academic Publishers, 97-111.