Autobriefer A system for authoring narrated briefings.pdf

O. Stock and M. Zancanaro (eds.) Multimodal Intelligent Information Presentation,
Dordrecht: Kluwer, 2005.
1
T.H.E. Editor(s) (ed.), Book title, 1—6.
© yyyy Kluwer Academic Publishers. Printed in the Netherlands.
ELISABETH ANDRE, KRISTIAN CONCEPCION, INDERJEET
MANI, AND LINDA VAN GUILDER
AUTOBRIEFER: A SYSTEM FOR AUTHORING
NARRATED BRIEFINGS
We describe a system called AutoBriefer which automatically generates multimedia briefings from high-
level outlines. The author of the briefing interacts with the system to specify some of the briefing content.
In order to be scalable across domains, AutoBriefer provides a graphical user interface (GUI) that allows
the user to create or edit domain knowledge for use in the selecting of some of the briefing content. The
system then uses declarative presentation planning strategies to synthesize a narrated multimedia briefing
in various presentation formats. The narration employs synthesized audio as well as, optionally, an agent
embodying the narrator.
1. INTRODUCTION
Document production is an important function in many organizations. In addition to
products such as technical papers, instruction manuals, system documentation,
courseware, etc., briefings are a very common type of multimedia document
product. Since so much time is spent by so many people in producing briefings,
often under serious time constraints, any method to reduce the amount of time spent
on briefing production could yield great gains in productivity.
Our work forms part of a larger DARPA-funded project aimed at improving
analysis and decision-making in crisis situations by providing tools that allow
analysts to collaborate to develop structured arguments in support of particular
conclusions, and to help predict likely future scenarios. These arguments, along with
background evidence, are packaged together as briefings to high-level decision-
makers. In leveraging automatic methods to generate briefings, our approach needs
to allow the analyst to take on as much of the briefing authoring as she wants to
(e.g., it may take time for her to adapt to or trust the machine, or she may want the
machine to present just part of the briefing). The analyst's organisation usually will
instantiate one of several templates dictating the high-level structure of a briefing;
for example, a briefing may always have to begin with an executive summary. The
methods also need to be relatively easily adapted to different domains, given that the
subject matter of crises are somewhat unpredictable.
These task requirements force us to focus on a flexible approach to briefing
generation. The system must allow the user to take control as much as she wants in
the specification of the content and format, as well as in editing the final product. It
must also be able to help the user as needed by exploiting any redundancy in the
briefing content, including stereotypical structure (dictated in part by the business
rules of the organization) as well as material from previous briefings, databases, etc.
In both these cases, the system should be able to populate an outline of the briefing.
In cases where the user wants to be freed from specifying the low-level details of the

2 ELISABETH ANDRE ET AL.
animation, narration, and multimedia synchronization of the briefing, the system
should be able to plan and synthesize the actual presentation, which the user can
then post-edit.
As an example, consider a user putting together a briefing about a particular
individual, say Victor Polay, a leader of Peru’s Tupac Amaru movement. A person
preparing this briefing by hand (using, say, Microsoft PowerPoint) would need to
plan the outline, and assemble the various textual and multimedia elements that
constitute the core content. This latter activity could involve searching, doing
background research, collecting existing material, drafting new text, etc. The person
must also design or select a template for the overall briefing, as well as the layout of
individual slides, and then produce each slide individually, realizing the content of
each slide by embedding or linking the media elements on it, specifying the
animation strategies, and then rehearsing and refining the briefing. Finally, the
person has to present the briefing in front of the intended audience(s).
In this paper, we report on a system called AutoBriefer where the author of the
briefing interacts with a system via a graphical user interface (GUI) to specify some
of the briefing content. As before, the user can select or create an outline. However,
the user doesn’t have to produce each slide, realizing the media elements on each.
Instead, the AutoBriefer GUI allows the user to select and associate the media
elements directly with the outline. In doing so, the user may specify the temporal
layout of the content. Moreover, instead of using pre-existing media elements, the
GUI also allows the user to direct personalized information agents to assemble
media elements from existing (possibly distributed) databases or text collections.
The user can then leave the rest of the briefing preparation to the system, by asking
AutoBriefer to generate a PowerPoint briefing. In doing so, the user allows the
system to plan how many slides are needed, and to plan and realize the media
elements on each slide, synchronizing them spatially and temporally. Further, the
user doesn’t need to present the briefing; the system itself can generate a spoken
narrative for the briefing, synchronizing the narrative elements it introduces with the
other media elements, producing an animated briefing as a result.
One challenge in automating this process involves automatically selecting and
assembling media elements; here AutoBriefer exploits new technologies for
automated summarization. In order to generate a narrative automatically, the system
needs knowledge about the domain objects; here, AutoBriefer provides scalability
across domains by allowing the user to specify or edit domain knowledge in the
form of meta-information attributes associated with media elements. This ability to
specify and edit domain knowledge in turn addresses an important bottleneck in
current multimedia presentation strategies, which are difficult to adapt across
domains. AutoBriefer leverages the ability to edit domain knowledge along with
declarative presentation planning strategies to synthesize a narrated multimedia
briefing in various presentation formats.
2. APPROACH
The design of AutoBriefer is different from most multimedia presentation planners
in the way the problem is decomposed. An easy-to-use GUI is used to specify high-

AUTOBRIEFER 3
level structural and temporal constraints for arranging domain objects in a
presentation. Information associated with these domain objects is used to generate a
narrative outline. This narrative outline is then fed as data along with a high-level
communicative goal to a presentation planning component (PPP). The planner
chooses a sequence of plan operators to realize the goals associated with the outline.
The plan operators constitute a declarative representation of presentation strategies
to decide on narrator animation, and spatial and temporal synchronization.
The system architecture is shown in Figure 1. The user interacts with the system
via a GUI using a simple point-and-click interface. She may load in a domain model
of existing domain objects, for possible editing, or she may create new domain
objects using information agents.
Figure 1. System Architecture
Briefing
GUI
Multimedia
Player
or
PowerPoint
Narrative
Planner
Layout
Generator
XML2Prop
Converter
SMIL
or
PowerPoint
XML
Narrative
Outline
Briefing
Generator
User
Interface
User
Domain
Knowledge
Agents
Propositional
Narrative
Outline
As shown in Figure 2, the user arranges the objects along two dimensions. The
first, structural dimension, allows objects to be nested under others to create a
hierarchical structure. This is shown in the left frame, which uses a directory
browser interaction metaphor. The user here has chosen to specify the goal of
retrieving an existing media object, namely a picture of “Polay”. The user has also
specified goals of creation of a summary of events and a biography; the execution of
these goals, of type ‘create’, involve the use of information agents. Finally, the user
has also included a particular audio file as a coda to end the briefing.
The second, temporal dimension allows objects to be ordered temporally. This is
shown in the right frame: objects from the left frame can be dropped into numbered
temporal slots in the right frame (the system lays them out, by default, based on the
outline).

Figure 2. Briefing GUI
Each object can be edited by filling in meta-information in the form of values for
slots (Figure 3). Here the user has typed in a caption, and has required that the
system generate the running text.
Figure 3. Editing an Object

AUTOBRIEFER 5
Once a subset of objects in the domain have been selected and arranged
appropriately, the user may choose a style sheet for the current briefing, and then ask
the system to create the briefing, in either PowerPoint (preferred, since it has a wide
user base) or SMIL (Synchronized Multimedia Integration Language), a W3C-
developed extension (SMIL 2002) of HTML that can be played by standard
multimedia players, such as Real (REAL 2002) and Grins (GRINS 2002).
The narrative outline is represented in XML as a tree whose structure represents
the rhetorical structure of the briefing. An example is shown in Figure 4, which was
generated automatically as a result of GUI interaction. (Here, for readability, we
show a schematic representation of the narrative outline rather than the raw XML.)
Each node has a label, which offers a brief textual description of the node. Each leaf
node has an associated goal, which, when realized, provides content for that node.
Content-level goals are of two kinds: retrieve goals, which retrieve pre-existing
media objects of a particular type (text, audio, image, audio, video) satisfying some
description, and create goals, which create new media objects of these types using
personalized information agents for search, summarization, etc. In addition to
content-level goals, the narrative outline contains narrative-level goals, which serve
to introduce descriptions of content at other nodes, including segues to content
nodes, and running text and captions describing media objects at content nodes.
Ordering relations reflecting temporal and spatial layout are defined on nodes in
the tree. Two coarse-grained relations, seq for precedence, and par for simultaneity,
are used to specify a temporal ordering on the nodes in the tree.
As shown in Figure 1, once the user of the GUI asks the system to create the
briefing, the GUI passes off the narrative outline to a converter which first applies
an XML parser to the outline to check for syntactic correctness. The validated
outline is then transformed into propositions that can be processed by the Narrative
Planner and the Layout Generator. The Narrative Planner takes the input narrative
outline and works out a coherent presentation by realizing the corresponding retrieve
and create goals. In particular, it fills out some content by invoking specific
components for the acquisition and creation of multimedia material. It also makes
decisions when needed about output media type. To improve the coherence of the
briefing, the narrative planner may introduce narrative-level goals which refer to
meta information on the content of a briefing. In the case of create goals, meta
information may be easily provided by the narrative planner. For retrieve goals, this
information is usually supplied by the user. In either case, narrative elements are
generated from the meta-information by using shallow text generation methods
(canned text). Finally, the Layout Generator generates the temporal and spatial
layout of the presentation.
Both the narrative planner and the layout generator are based on the PPP
multimedia presentation planning tool (Andre and Rist 1996). The PPP tool
combines two components: an HTN-planner that realizes the Narrative Planner and
a component for spatial and temporal reasoning that is used in the Layout Generator.
While in the original approach of André and Rist a presentation is planned on the
basis of a domain knowledge base, the starting point for the approach presented here
is the narrative outline as provided by the user, which is then refined and
synchronized.

Figure 4. Narrative Outline
Peru Action Brief
1 Preamble
2 Situation Assessment
2.1 Chronology of Events
2.1.2 Latest Document Summary
create(“summarize-generic compression .10/peru/p32.txt”)
2.2 Biographies
2.2.1 Profile of Victor Polay
2.2.1.1 A File Photo of Victor Polay
retrieve(“D:rawdata/polay.jpg”)
2.2.1.2 Profile of Victor Polay
create(“summarize –bio –length 350 –span multi –person
Victor Polay -out table/peru/”*”)
3 Coda
“This briefing has assessed aspects of the situation in Peru. Overall,
crisis appears to be worsening.”
<seq>
<par>1</par>
<par>2.1.2</par>
<par>2.2.1.1 2.2.1.2</par>
<par>3</par>
</seq>
3. INTELLIGENT MULTIMEDIA PRESENTATION PLANNING
The author of a briefing may choose to flesh out as much of the narrative as desired.
For instance, she may exactly indicate how a certain image should be labeled, but is
not required to do so. Further, she may specify certain narrative goals, though she is
not required to do so; she may also leave it up to the Narrative Planner component to
introduce all the narrative goals. In case domain objects should appear in a certain
order in the final presentation, she may also provide the desired temporal ordering
relations. Note, however, that the complete temporal layout need not be specified by
the user. In addition to the narrative outline, the user may input a number of
presentation constraints, such as the degree of detail, the style of presentation (e.g.
agent-based versus audio presentation) or the output format, e.g., SMIL or
PowerPoint. The task of the presentation planner is then to work out a presentation
based on the user’s input. This includes the realization of the create, retrieve and
narrative goals as well as the determination of the spatial and temporal layout. The
result of the presentation is a document in the required format.

AUTOBRIEFER 7
3.1 Basic Technology
To accomplish the tasks sketched above, we rely on our earlier work on
presentation planning. The essential idea of this approach is to formalize action
sequences for composing multimedia material and designing scripts for presenting
this material to the user as operators of a planning system. The header (i.e., effect) of
a planning operator refers to a complex communicative goal (e.g. to provide a
biography of a certain person) whereas the expressions of the body of the operator
indicate which acts have to be executed in order to achieve this goal (e.g. to show a
picture of that person).
The input goal to the presentation planner is a complex presentation goal, e.g. to
provide a briefing. To accomplish this goal, the planner looks for operators whose
headers subsume it. If such an operator is found, all expressions in the body of the
operator will be set up as new subgoals. During the decomposition process, a
presentation is built up by collecting and expanding the nodes of the user’s narrative
outline. To accomplish this task, we start from an empty output file that is
continuously augmented by SMIL-expressions or expressions in an XML-based
markup language that serves as an intermediate format for the PowerPoint slides to
be generated. The planning process terminates if all subgoals have been expanded to
elementary execution acts or have already been worked out by the user. Elementary
execution acts include retrieve or create goals as well as commands, such as
SAddString, that augment the output file by expressions in the required language.
3.2 Narrative Planning
As mentioned above, the system can construct a narrative to accompany the briefing.
Narrative goals are generated to cover captions, running text, and segues. The
captions and running text are realized from information provided in the narrative
outline. In the case of create goals, the running text and captions may be easily
generated by the system, as a by-product of creating a media object. In the case of
retrieve goals, the objects retrieved may not have any meta-information, in which
case a default caption and running-text is generated. For example, if a retrieve goal
associated with a picture lacked a user-supplied caption, the Narrative Planner
would use the narrative outline as the basis for constructing a caption. This
flexibility allows the system to fill in what the user left unspecified. Clearly, a
system's explanatory narrative will be enhanced by the availability of rich meta-
information.
The segues are realized by the system. For example, an object with a label slot
“A biography of Victor Polay” could result in a generated segue “Here is a
biography of Victor Polay”. The Narrative Planner, when providing content for
narrative nodes, uses a variety of different canned text patterns. For the above
example, the pattern would be “Here is ?label”, where ?label is a variable whose
value is the label of the current node.
Segue goals are generated automatically by the system and integrated into the
narrative outline. We always introduce a segue node at the beginning of the
presentation (called a preamble node), which provides a segue covering the root of

the narrative outline tree. A segue node is also produced at the end (called a coda).
(Both preamble and segue can of course be specified by the author if desired).
For introducing intervening segue nodes, we use the following algorithm based
on the distance between nodes and the height in the narrative outline tree. We
traverse the non-narrative leaves of the tree in their temporal order, evaluating each
pair of adjacent nodes A and B where A precedes B temporally. A segue is
introduced between nodes A and B if either (a) the maximum of the 2 distances from
A and B to their least common ancestor is greater than 3 nodes or (b) the sum of the
2 distances from A and B to the least common ancestor is greater than 4 nodes. This
is less intrusive than introducing segues at random or between every pair of
successive nodes, and appears to perform better than introducing a segue at each
depth of the tree.
In case an animated presenter is used, the narrative planner also decides on the
non-verbal communicative means to be employed. In particular, we make use of
mimics and body gestures to convey the discourse structure of the presentation and
the character’s emotions, see also (Poggi et al., this volume) or (Theune et al., this
volume). Since a deliberate use of communicative means requires detailed
knowledge about the content of the presentation, it is hard to determine appropriate
non-verbal gestures for the presentation of content associated with retrieve goals and
create goals (that heavily rely on pre-authored multimedia material). The use of non-
verbal gestures for the realization of narrative goals essentially depends on their
type. For instance, segues that relate to a change in topic are accompanied by a shift
in posture, see also (Cassell et al. 2001).
Strategies for expanding the narrative content are coded by means of plan
operators. In the following, we present a simple plan operator which may be used to
work out a narrative goal, namely to present a new PowerPoint slide. The body
includes a communicative gesture as well as a canned text pattern which is
instantiated with the value for ?label and uttered by the character. It should be
obvious how this operator could be used to realize the segues for several of the
nodes in the narrative outline in Figure 4.
(define-plan-operator
:header (A0 (PresentPersona ?id ?SlideNumber))
:constraints (Label ?id ?label)
:inferiors ( (A1 (sAddPPCode ("PLAY OneHand;;")))
(A2 (SAddPPCode ("SAY Here is a " ?label ";;"))))
)
The presentation operators also specify which media type is used to realize a
certain goal. By default, we attempt to realize narrative goals as speech media type,
using rules on text length and truncatability to less than 250 bytes to decide when to
use text-to-speech. Captions are always realized as text (i.e., they have a text
realization and a possible audio realization).

AUTOBRIEFER 9
3.3 Layout Generation
As described above, we are able to generate both SMIL and PowerPoint
presentations. The generation of Powerpoint presentations starts from an XML-
based markup language that represents both the contents of a Powerpoint slide as
well as the accompanying comments and gestures by the presenter. The XML-
representation is then converted into Powerpoint presentations making use of
Bridge2Alpha from IBM alphaWorks, a tool that allows Java programs to
communicate with ActiveX objects. While the material of a presentation appears on
the slides pages, suggestions regarding its presentation are included in the note
pages. Essentially, these suggestions are related to AutoBriefer’s narrative goals.
The spatial layout of the slides is given by the corresponding PowerPoint style sheet
and can be directly inferred from the hierarchical structure as indicated by the
human author.
When generating SMIL-presentations, we also allow for dynamic media, such as
video and audio, all of which have to be displayed in a spatially and temporally
coordinated manner and need to be synchronized with the agents’ communicative
gestures. For instance, a character may point to a certain object in a video while it is
being played. In order to relieve the user from the burden to completely work out a
spatial and temporal layout, the following extensions of the presentation planner
have become necessary (Andre and Rist 1996):
• The formulation of temporal and spatial constraints in the plan operators
We distinguish between metric and qualitative constraints. Metric temporal
and spatial constraints appear as metric (in-)equalities, e.g. (5 ≤ Bottom
VideoRegion – Top VideoRegion). Qualitative temporal constraints are
represented in an “Allen-style” fashion which allows for the specification of
thirteen temporal relationships between two named intervals, e.g.
(SpeakInterval (During) PointInterval). Qualitative spatial constraints are
represented by a set of topological relations, such as LeftOf, CenterHor or
TopAlign. Note that the plan operators include general temporal constraints
that should be satisfied for a certain document type while the temporal
constraints the user specifies with the GUI refer to a particular document
instance.
• The development of a mechanism for designing a spatial and temporal
layout
We collect all spatial and temporal constraints during the presentation
planning process, combine them with the temporal constraints specified by
the user for the document to be generated and use the incremental constraint
solving toolkit Cassowary (Borning et al. 1997) to determine a consistent
spatial and temporal layout which is then represented as a SMIL document.
In case of a conflict between the constraints specified by the GUI user and
the constraints specified by the author of plan operators, the user’s
constraints is given preference, but a warning is produced.
In the following, we present a plan operator which may be used to synchronize
text, spoken voice and video. The body of the plan operator contains a complex
action (PresentVideo) that requires further expansion and two elementary actions

(SAddSpeech and SAddSmilCode). SAddSmilCode will extend the SMIL output
file with an animation that visualizes the talking. SAddSpeech will cause the
Festival speech synthesizer (Taylor et al. 1998) to generate an audio file for ?text
which will be included into the SMIL output file as well.
The title should be visible while the comment is spoken and the video is run. In
particular, the verbal comment starts one time unit after the display of the title which
preceeds the video. After the video is played, the title will be visible for one more
time unit. The spatial constraints specify where the regions for the title A1 and the
video sequence A3 should be located with respect to the region corresponding to the
superior presentation segment A0.
(define-plan-operator
:header (A0 (PresentSegment ?segment))
:constraints (*and* (BELP (Title ?segment ?title))
(BELP (Comment ?segment ?comment))
(BELP (Video ?segment ?video)))
:inferiors (A1 (SAddSmilCode (?title)))
(A2 (SAddSpeech Festival ?comment))
(A3 (PresentVideo (?video)))
:temporal (1 <= Begin A2 – Begin A1 <= 1)
(A2 (Before) A3)
(1 <= End A1 – End A3 <= 1)
:spatial (CenterHor A1)
(AlignTop A1)
(CenterHor A3)
(1 <= Bottom A1 – Top A3 <= 1)
Note that the plan operators do not necessarily represent a complete specification
of the temporal and spatial layout of the resulting presentation, but only include
specific design constraints. For instance, the plan operator above does not specify
how long the title should be displayed, but only that it should start one time unit
earlier and end one time unit later than the video. The advantage of our approach is
that a consistent schedule is generated automatically on the basis of the constraints
specified by the author of the plan operators and the user of the GUI shown in
Figure 3.
4. EXAMPLE OUTPUT
In the following, we will present two different multimedia briefings that may be
generated (among many other variations) from one and the same narrative outline: a
PowerPoint slide show that makes use of an animated character and a SMIL-based
briefing that just employs synthesized audio.
We start from the narrative outline discussed in Section 2. It includes two create
goals, involving agents that carry out summarization functions: a multi-document
summarizer (Mani and Bloedorn 1999) and a biographical summarizer (Schiffman et
al. 2001).

AUTOBRIEFER 11
Figure 5 shows the XML-representation of the first slide of the PowerPoint
presentation, which was created automatically by the Narrative Planner.
Figure 5. An Introductory Slide Represented by the Narrative Planner in XML
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE Presentation SYSTEM "presentation.dtd">
<Presentation>
<Slide>
<Title>Peru Action Brief</Title>
<Content>
<S>Preamble</S>
<S>Situation Assessment
<S> <S>Chronology of Events </S>
<S>Biographies </S>
</S>
<S>Coda </S>
</Content>
<Persona>
SHOW kaufmann, Kaufmann4.31.acs, 1,69;;
PLAY Tie;;
PLAY Normal;;
PLAY OneHand;;
SAY In this briefing I will go over the situation assessment.;;
PLAY Normal;;
SAY This will cover an overview of the chronology of events and
a biography of Victor Polay.;;
PLAY SpeakGiveTurn;;
SLIDE 2=SAY Next slide please.
</Persona>
</Slide>
…
</Presentation>
The narrative goals include both natural language utterances as well as the
corresponding gestures to illustrate the contents on the corresponding slide. The
final PowerPoint presentation generated is shown in Figure 6. Here we show screen
dumps of the six PowerPoint segments produced which also include a virtual
speaker. To enable this style of presentation, we make use of Microsoft Narrator, a
tool that allows for the enhancement of PowerPoint slides by Microsoft agents that
narrate and automatically advance through each slide.

Figure 6. PowerPoint Presentation
. In this briefing I will go over the
situation assessment. This will
cover an overview of the
chronology of events and a
biography of Victor Polay.
Here is the latest document
summary.
Here is an overview of the
chronology of events.
Next, a biography of Victor
Polay.

AUTOBRIEFER 13
Victor Polay, also known as
Comandante Rolando, is the Tupac
Amaru founder, a Peruvian
guerrilla commander, a former
rebel leader, and the Tupac Amaru
rebels' top leader. He studied in
both France and Spain. His wife is
Rosa Polay and his mother is
Otilia Campos de Polay. His
associates include Alan Garcia.
This briefing has assessed
aspects of the situation in Peru.
Overall, the crisis appears to be
worsening.
The same narrative outline can be used to plan a presentation in SMIL. The
corresponding presentation generated in SMIL is shown in Figure 7.
Figure 7. SMIL Presentation
In this briefing I will go over
the situation assessment. This
will cover an overview of the
chronology of events and a
biography of Victor Polay.
Here is an overview of the
chronology of events.

Here is the latest document
summary.
Victor Polay, also known as
Comandante Rolando, is the Tupac
Amaru founder, a Peruvian guerrilla
commander, a former rebel leader,
and the Tupac Amaru rebels' top
leader. He studied in both France
and Spain. His wife is Rosa Polay
and his mother is Otilia Campos de
Polay. His associates include Alan
Garcia.
Next, a biography of Victor
Polay.
This briefing has assessed
aspects of the situation in Peru.
Overall, the crisis appears to be
worsening.
5. RELATED WORK
There is a fair amount of work on automatic authoring of multimedia presentations.
(Wahlster et al. 1993), (Dalal et al. 1996), (André and Rist 1996) aim at the fully
automated generation of documents. While these approaches make use of their own

AUTOBRIEFER 15
display format, more recent work has discovered SMIL as a useful basis for
multimedia presentations, e.g., see (Geurts et al. 2001) and (Bes et al. 2001). Our
work is distinguished in several ways from the approaches listed above. First of all,
our work is not restricted to a single presentation format. Second, AutoBriefer
allows for flexibly shifting the responsibility between the presentation system and
the human author. For instance, it is up to the user to determine to what extent to
flesh out the narrative outline. Third, our approach explicitly distinguishes between
content-level and narrative-level goals. Our captions, which are very short, rely on
canned text based on node labels in the initial narrative outline, or are based on
shallow meta-information generated by the summarization agents along with the
created media object. (Mittal et al. 1995) describe a variety of strategies for
generation of longer, more explanatory captions, some of which may be exploited in
our work by deepening the level of meta-information. The distinction between
content-level and narrative-level goals has been proven useful especially with
respect to the generation of presentations that make use of synthesized agents to
convey this meta-information. Finally, the approaches listed above concentrate on
the presentation planning task and don’t exploit methods for multimedia document
analysis.
(Piwek et al., this volume) investigate how the same content may be expressed
by different kinds of multimodal presentation ranging from plain texts and
illustrated documents to dialogues between two animated agents. However, these
functions are not yet included in a single system as in Autobriefer which is able to
produce a large variety of presentation variants from one and the same outline.
In our ability to leverage automatic summarization agents, our work should be
clearly distinguished from work which attempts to format a summary (from an XML
representation) into something akin to a Powerpoint briefing, e.g., (Nagao and
Hasida 1998). Our work, by contrast, is focused on using summarization in
generating briefings from an abstract outline.
6. CONCLUSION
Overall, the advantage of our approach lies in the fact that we are able to generate a
large variety of briefings starting from the same narrative outline generated as a
result of GUI interaction. In addition, the approach allows for a flexible adaptation
of the degree of automation. The generated PowerPoint document may be used in
two different ways. In the simplest case, a human speaker presents the slides herself
to an audience considering the information on the notes as suggestions on how to
deliver the material. Another possibility is the integration of one or several
synthesized agents as virtual presenters (André and Rist 2000).
As mentioned at the outset, one of the goals of Autobriefer is to save time in
briefing production. In the future, we hope to evaluate the time savings by means of
an experiment that would compare the amount of time naïve and expert users need
to produce a variety of briefings using Autobriefer instead of authoring directly in
PowerPoint or SMIL. We suppose that the savings will be greater for direct use of
PowerPoint compared to SMIL presentations since PowerPoint already provides a

convenient interface to the user. Nevertheless, a benefit for the authoring of
PowerPoint slides is expected especially in situations where users will have to
produce a larger variety of PowerPoint documents for different settings of
generation parameters. In addition, PowerPoint cannot of course be used to generate
a presentation based on a rough outline provided by the user.
Autobriefer has only touched the surface of what might be possible in terms of
automated presentation of briefings. In the future, more use of natural language
generation would allow for a greater degree of flexibility in generation of narrated
briefings. To provide for more persuasive briefings, it would also be useful to allow
the human speaker to specify in the GUI interaction the emotional and rhetorical
gestures a character should convey since this information is hard to determine
automatically.
ACKNOWLEDGMENTS
The work was funded by the BMBF under contract number 01 IW 001 and
DARPA’s GENOA research program, under contract number DAA-B07-99-C-
C201. We would like to thank Peter Rist for designing and animating the virtual
presenter. Part of this work has been conducted by the first author within the DFKI
project MIAU. Thanks to Hamish Cunningham from Sheffield University for
providing us with the GATE system which made it possible to install a complete
version of Autobriefer at DFKI without having to rely on the commercial
information extraction software used within GENOA.
REFERENCES
André, E., and T. Rist. (1996) Coping with temporal constraints in multimedia presentation planning. In
Proceedings of the AAAI ‘96. Menlo Park, Cambridge, London: AAAI Press/The MIT Press, Vol.
1:142–147.
André, E., Rist, T. , van Mulken, S., Klesen, M. and Baldes, S. (2000). The Automated Design of
Believable Dialogues for Animated Presentation Teams. In: Cassell, J., Sullivan, J., Prevost, S. and
Churchill, E. (eds.): Embodied Conversational Agents, 220-255, Cambridge, MA: MIT Press.
Andre, E. and Rist, T. (2001) Controlling the Behavior of Animated Presentation Agents in the Interface:
Scripting versus Instructing. AI Magazine, Vol. 22(4): 53-66, 2001.
Borning, A., Marriott, K., Stuckey, P. and Xiao, Y. (1997). Linear Arithmetic Constraints for User
Interface Applications, Proc. of the 1997 ACM Symposium on User Interface Software and
Technology, 87-96.
Cassell, J., Nakano, Y., Bickmore, T., Sidner, C. and Rich, C. (2001). Non-verbal Cues for Discourse
Structure, Proceedings of the 41st
Annual Meeting of the Association of Computational Linguistics,
106-115.
Dalal, M., Feiner, S., McKeown, K., Pan, S., Zhou, M., Hollerer, T., Shaw, J., Feng, Y., and Fromer, J.
(1996) Negotiation for Automated Generation of Temporal MultimediaPresentations. Proceedings of
ACM Multimedia '96, 55-64.
Bes, F., Jourdan, M., and Khantache, F. (2001) A Generic Architecture for Automated Construction of
Multimedia Presentations. The 8th International Conference on Multimedia Modeling, Amsterdam,
2001, 229-246.
Geurts, J. van Ossenbruggen, J., and Hardman, L. (2001) Application-Specific Constraints for
Multimedia Presentation Generation. The 8th International Conference on Multimedia Modeling,
Amsterdam, 247-266.

AUTOBRIEFER 17
Mani, I. and Bloedorn, E. (1999). Summarizing Similarities and Differences Among Related Documents.
Information Retrieval 1(1), 35-67.
Mani, I., Gates, B., and Bloedorn, E. (1999) Improving Summaries by Revising Them. Proceedings of the
37th
Annual Meeting of the Association for Computational Linguistics, College Park, MD, 558-565.
Mittal, V., Roth, S., Moore, J., Mattis, J., and Carenini, G. (1995) Generating Explanatory Captions for
Information Graphics. Proceedings of the International Joint Conference on Artificial Intelligence
(IJCAI'95), 1276-1283.
Nagao, K. and K. Hasida, K. (1998) Automatic Text Summarization Based on the Global Document
Annotation. Proceedings of COLING'98, Montreal, 917-921.
ORATRIX (2002). www.oratrix.com
Piwek, P.. Power, R., Scott, D., and van Deemter, K. (2003). Generating Multimedia Presentations: From
Plain Text to Screenplay, this volume.
Poggi, I., Pelachaud, C., de Rosis, F., De Carolis, B., and Carofiglio, V. (2003) Greta. A Believable
Intelligent Embodied Conversational Agent, this volume.
REAL (2002). www.real.com
Schiffman, B., Mani, I., and Concepcion, K. Producing Biographical Summaries: Combining Linguistic
Knowledge with Corpus Statistics. In Proceedings of the 39th Annual Meeting of the Association for
Computational Linguistics (ACL'2001), 450-457.
SMIL (2002). Synchronized Multimedia Integration Language. www.w3.org/AudioVideo/
Taylor, P., Black, A., and Caley, R. (1998) The architecture of the Festival Speech Synthesis System.
Proceedings of the Third ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, 147-151.
Theune, M., Heylen, D., and Nijholt, A. (2003) Generating Embodied Information Presentations, this
volume.
Wahlster, W., Andre, E., Finkler, W., Profitlich, H.-J., and Rist, T. (1993) Plan-Based Integration of
Natural Language and Graphics Generation. AI Journal, 63:387-427.

Autobriefer A system for authoring narrated briefings.pdf

Recommended

Recommended

More Related Content

Similar to Autobriefer A system for authoring narrated briefings.pdf

Similar to Autobriefer A system for authoring narrated briefings.pdf (20)

More from Shannon Green

More from Shannon Green (20)

Recently uploaded

Recently uploaded (20)

Autobriefer A system for authoring narrated briefings.pdf