2. 2 ELISABETH ANDRE ET AL.
animation, narration, and multimedia synchronization of the briefing, the system
should be able to plan and synthesize the actual presentation, which the user can
then post-edit.
As an example, consider a user putting together a briefing about a particular
individual, say Victor Polay, a leader of Peruâs Tupac Amaru movement. A person
preparing this briefing by hand (using, say, Microsoft PowerPoint) would need to
plan the outline, and assemble the various textual and multimedia elements that
constitute the core content. This latter activity could involve searching, doing
background research, collecting existing material, drafting new text, etc. The person
must also design or select a template for the overall briefing, as well as the layout of
individual slides, and then produce each slide individually, realizing the content of
each slide by embedding or linking the media elements on it, specifying the
animation strategies, and then rehearsing and refining the briefing. Finally, the
person has to present the briefing in front of the intended audience(s).
In this paper, we report on a system called AutoBriefer where the author of the
briefing interacts with a system via a graphical user interface (GUI) to specify some
of the briefing content. As before, the user can select or create an outline. However,
the user doesnât have to produce each slide, realizing the media elements on each.
Instead, the AutoBriefer GUI allows the user to select and associate the media
elements directly with the outline. In doing so, the user may specify the temporal
layout of the content. Moreover, instead of using pre-existing media elements, the
GUI also allows the user to direct personalized information agents to assemble
media elements from existing (possibly distributed) databases or text collections.
The user can then leave the rest of the briefing preparation to the system, by asking
AutoBriefer to generate a PowerPoint briefing. In doing so, the user allows the
system to plan how many slides are needed, and to plan and realize the media
elements on each slide, synchronizing them spatially and temporally. Further, the
user doesnât need to present the briefing; the system itself can generate a spoken
narrative for the briefing, synchronizing the narrative elements it introduces with the
other media elements, producing an animated briefing as a result.
One challenge in automating this process involves automatically selecting and
assembling media elements; here AutoBriefer exploits new technologies for
automated summarization. In order to generate a narrative automatically, the system
needs knowledge about the domain objects; here, AutoBriefer provides scalability
across domains by allowing the user to specify or edit domain knowledge in the
form of meta-information attributes associated with media elements. This ability to
specify and edit domain knowledge in turn addresses an important bottleneck in
current multimedia presentation strategies, which are difficult to adapt across
domains. AutoBriefer leverages the ability to edit domain knowledge along with
declarative presentation planning strategies to synthesize a narrated multimedia
briefing in various presentation formats.
2. APPROACH
The design of AutoBriefer is different from most multimedia presentation planners
in the way the problem is decomposed. An easy-to-use GUI is used to specify high-
3. AUTOBRIEFER 3
level structural and temporal constraints for arranging domain objects in a
presentation. Information associated with these domain objects is used to generate a
narrative outline. This narrative outline is then fed as data along with a high-level
communicative goal to a presentation planning component (PPP). The planner
chooses a sequence of plan operators to realize the goals associated with the outline.
The plan operators constitute a declarative representation of presentation strategies
to decide on narrator animation, and spatial and temporal synchronization.
The system architecture is shown in Figure 1. The user interacts with the system
via a GUI using a simple point-and-click interface. She may load in a domain model
of existing domain objects, for possible editing, or she may create new domain
objects using information agents.
Figure 1. System Architecture
Briefing
GUI
Multimedia
Player
or
PowerPoint
Narrative
Planner
Layout
Generator
XML2Prop
Converter
SMIL
or
PowerPoint
XML
Narrative
Outline
Briefing
Generator
User
Interface
User
Domain
Knowledge
Agents
Propositional
Narrative
Outline
As shown in Figure 2, the user arranges the objects along two dimensions. The
first, structural dimension, allows objects to be nested under others to create a
hierarchical structure. This is shown in the left frame, which uses a directory
browser interaction metaphor. The user here has chosen to specify the goal of
retrieving an existing media object, namely a picture of âPolayâ. The user has also
specified goals of creation of a summary of events and a biography; the execution of
these goals, of type âcreateâ, involve the use of information agents. Finally, the user
has also included a particular audio file as a coda to end the briefing.
The second, temporal dimension allows objects to be ordered temporally. This is
shown in the right frame: objects from the left frame can be dropped into numbered
temporal slots in the right frame (the system lays them out, by default, based on the
outline).
4. 4 ELISABETH ANDRE ET AL.
Figure 2. Briefing GUI
Each object can be edited by filling in meta-information in the form of values for
slots (Figure 3). Here the user has typed in a caption, and has required that the
system generate the running text.
Figure 3. Editing an Object
6. 6 ELISABETH ANDRE ET AL.
Figure 4. Narrative Outline
Peru Action Brief
1 Preamble
2 Situation Assessment
2.1 Chronology of Events
2.1.2 Latest Document Summary
create(âsummarize-generic compression .10/peru/p32.txtâ)
2.2 Biographies
2.2.1 Profile of Victor Polay
2.2.1.1 A File Photo of Victor Polay
retrieve(âD:rawdata/polay.jpgâ)
2.2.1.2 Profile of Victor Polay
create(âsummarize âbio âlength 350 âspan multi âperson
Victor Polay -out table/peru/â*â)
3 Coda
âThis briefing has assessed aspects of the situation in Peru. Overall,
crisis appears to be worsening.â
<seq>
<par>1</par>
<par>2.1.2</par>
<par>2.2.1.1 2.2.1.2</par>
<par>3</par>
</seq>
3. INTELLIGENT MULTIMEDIA PRESENTATION PLANNING
The author of a briefing may choose to flesh out as much of the narrative as desired.
For instance, she may exactly indicate how a certain image should be labeled, but is
not required to do so. Further, she may specify certain narrative goals, though she is
not required to do so; she may also leave it up to the Narrative Planner component to
introduce all the narrative goals. In case domain objects should appear in a certain
order in the final presentation, she may also provide the desired temporal ordering
relations. Note, however, that the complete temporal layout need not be specified by
the user. In addition to the narrative outline, the user may input a number of
presentation constraints, such as the degree of detail, the style of presentation (e.g.
agent-based versus audio presentation) or the output format, e.g., SMIL or
PowerPoint. The task of the presentation planner is then to work out a presentation
based on the userâs input. This includes the realization of the create, retrieve and
narrative goals as well as the determination of the spatial and temporal layout. The
result of the presentation is a document in the required format.
7. AUTOBRIEFER 7
3.1 Basic Technology
To accomplish the tasks sketched above, we rely on our earlier work on
presentation planning. The essential idea of this approach is to formalize action
sequences for composing multimedia material and designing scripts for presenting
this material to the user as operators of a planning system. The header (i.e., effect) of
a planning operator refers to a complex communicative goal (e.g. to provide a
biography of a certain person) whereas the expressions of the body of the operator
indicate which acts have to be executed in order to achieve this goal (e.g. to show a
picture of that person).
The input goal to the presentation planner is a complex presentation goal, e.g. to
provide a briefing. To accomplish this goal, the planner looks for operators whose
headers subsume it. If such an operator is found, all expressions in the body of the
operator will be set up as new subgoals. During the decomposition process, a
presentation is built up by collecting and expanding the nodes of the userâs narrative
outline. To accomplish this task, we start from an empty output file that is
continuously augmented by SMIL-expressions or expressions in an XML-based
markup language that serves as an intermediate format for the PowerPoint slides to
be generated. The planning process terminates if all subgoals have been expanded to
elementary execution acts or have already been worked out by the user. Elementary
execution acts include retrieve or create goals as well as commands, such as
SAddString, that augment the output file by expressions in the required language.
3.2 Narrative Planning
As mentioned above, the system can construct a narrative to accompany the briefing.
Narrative goals are generated to cover captions, running text, and segues. The
captions and running text are realized from information provided in the narrative
outline. In the case of create goals, the running text and captions may be easily
generated by the system, as a by-product of creating a media object. In the case of
retrieve goals, the objects retrieved may not have any meta-information, in which
case a default caption and running-text is generated. For example, if a retrieve goal
associated with a picture lacked a user-supplied caption, the Narrative Planner
would use the narrative outline as the basis for constructing a caption. This
flexibility allows the system to fill in what the user left unspecified. Clearly, a
system's explanatory narrative will be enhanced by the availability of rich meta-
information.
The segues are realized by the system. For example, an object with a label slot
âA biography of Victor Polayâ could result in a generated segue âHere is a
biography of Victor Polayâ. The Narrative Planner, when providing content for
narrative nodes, uses a variety of different canned text patterns. For the above
example, the pattern would be âHere is ?labelâ, where ?label is a variable whose
value is the label of the current node.
Segue goals are generated automatically by the system and integrated into the
narrative outline. We always introduce a segue node at the beginning of the
presentation (called a preamble node), which provides a segue covering the root of
8. 8 ELISABETH ANDRE ET AL.
the narrative outline tree. A segue node is also produced at the end (called a coda).
(Both preamble and segue can of course be specified by the author if desired).
For introducing intervening segue nodes, we use the following algorithm based
on the distance between nodes and the height in the narrative outline tree. We
traverse the non-narrative leaves of the tree in their temporal order, evaluating each
pair of adjacent nodes A and B where A precedes B temporally. A segue is
introduced between nodes A and B if either (a) the maximum of the 2 distances from
A and B to their least common ancestor is greater than 3 nodes or (b) the sum of the
2 distances from A and B to the least common ancestor is greater than 4 nodes. This
is less intrusive than introducing segues at random or between every pair of
successive nodes, and appears to perform better than introducing a segue at each
depth of the tree.
In case an animated presenter is used, the narrative planner also decides on the
non-verbal communicative means to be employed. In particular, we make use of
mimics and body gestures to convey the discourse structure of the presentation and
the characterâs emotions, see also (Poggi et al., this volume) or (Theune et al., this
volume). Since a deliberate use of communicative means requires detailed
knowledge about the content of the presentation, it is hard to determine appropriate
non-verbal gestures for the presentation of content associated with retrieve goals and
create goals (that heavily rely on pre-authored multimedia material). The use of non-
verbal gestures for the realization of narrative goals essentially depends on their
type. For instance, segues that relate to a change in topic are accompanied by a shift
in posture, see also (Cassell et al. 2001).
Strategies for expanding the narrative content are coded by means of plan
operators. In the following, we present a simple plan operator which may be used to
work out a narrative goal, namely to present a new PowerPoint slide. The body
includes a communicative gesture as well as a canned text pattern which is
instantiated with the value for ?label and uttered by the character. It should be
obvious how this operator could be used to realize the segues for several of the
nodes in the narrative outline in Figure 4.
(define-plan-operator
:header (A0 (PresentPersona ?id ?SlideNumber))
:constraints (Label ?id ?label)
:inferiors ( (A1 (sAddPPCode ("PLAY OneHand;;")))
(A2 (SAddPPCode ("SAY Here is a " ?label ";;"))))
)
The presentation operators also specify which media type is used to realize a
certain goal. By default, we attempt to realize narrative goals as speech media type,
using rules on text length and truncatability to less than 250 bytes to decide when to
use text-to-speech. Captions are always realized as text (i.e., they have a text
realization and a possible audio realization).
9. AUTOBRIEFER 9
3.3 Layout Generation
As described above, we are able to generate both SMIL and PowerPoint
presentations. The generation of Powerpoint presentations starts from an XML-
based markup language that represents both the contents of a Powerpoint slide as
well as the accompanying comments and gestures by the presenter. The XML-
representation is then converted into Powerpoint presentations making use of
Bridge2Alpha from IBM alphaWorks, a tool that allows Java programs to
communicate with ActiveX objects. While the material of a presentation appears on
the slides pages, suggestions regarding its presentation are included in the note
pages. Essentially, these suggestions are related to AutoBrieferâs narrative goals.
The spatial layout of the slides is given by the corresponding PowerPoint style sheet
and can be directly inferred from the hierarchical structure as indicated by the
human author.
When generating SMIL-presentations, we also allow for dynamic media, such as
video and audio, all of which have to be displayed in a spatially and temporally
coordinated manner and need to be synchronized with the agentsâ communicative
gestures. For instance, a character may point to a certain object in a video while it is
being played. In order to relieve the user from the burden to completely work out a
spatial and temporal layout, the following extensions of the presentation planner
have become necessary (Andre and Rist 1996):
âą The formulation of temporal and spatial constraints in the plan operators
We distinguish between metric and qualitative constraints. Metric temporal
and spatial constraints appear as metric (in-)equalities, e.g. (5 †Bottom
VideoRegion â Top VideoRegion). Qualitative temporal constraints are
represented in an âAllen-styleâ fashion which allows for the specification of
thirteen temporal relationships between two named intervals, e.g.
(SpeakInterval (During) PointInterval). Qualitative spatial constraints are
represented by a set of topological relations, such as LeftOf, CenterHor or
TopAlign. Note that the plan operators include general temporal constraints
that should be satisfied for a certain document type while the temporal
constraints the user specifies with the GUI refer to a particular document
instance.
âą The development of a mechanism for designing a spatial and temporal
layout
We collect all spatial and temporal constraints during the presentation
planning process, combine them with the temporal constraints specified by
the user for the document to be generated and use the incremental constraint
solving toolkit Cassowary (Borning et al. 1997) to determine a consistent
spatial and temporal layout which is then represented as a SMIL document.
In case of a conflict between the constraints specified by the GUI user and
the constraints specified by the author of plan operators, the userâs
constraints is given preference, but a warning is produced.
In the following, we present a plan operator which may be used to synchronize
text, spoken voice and video. The body of the plan operator contains a complex
action (PresentVideo) that requires further expansion and two elementary actions
10. 10 ELISABETH ANDRE ET AL.
(SAddSpeech and SAddSmilCode). SAddSmilCode will extend the SMIL output
file with an animation that visualizes the talking. SAddSpeech will cause the
Festival speech synthesizer (Taylor et al. 1998) to generate an audio file for ?text
which will be included into the SMIL output file as well.
The title should be visible while the comment is spoken and the video is run. In
particular, the verbal comment starts one time unit after the display of the title which
preceeds the video. After the video is played, the title will be visible for one more
time unit. The spatial constraints specify where the regions for the title A1 and the
video sequence A3 should be located with respect to the region corresponding to the
superior presentation segment A0.
(define-plan-operator
:header (A0 (PresentSegment ?segment))
:constraints (*and* (BELP (Title ?segment ?title))
(BELP (Comment ?segment ?comment))
(BELP (Video ?segment ?video)))
:inferiors (A1 (SAddSmilCode (?title)))
(A2 (SAddSpeech Festival ?comment))
(A3 (PresentVideo (?video)))
:temporal (1 <= Begin A2 â Begin A1 <= 1)
(A2 (Before) A3)
(1 <= End A1 â End A3 <= 1)
:spatial (CenterHor A1)
(AlignTop A1)
(CenterHor A3)
(1 <= Bottom A1 â Top A3 <= 1)
Note that the plan operators do not necessarily represent a complete specification
of the temporal and spatial layout of the resulting presentation, but only include
specific design constraints. For instance, the plan operator above does not specify
how long the title should be displayed, but only that it should start one time unit
earlier and end one time unit later than the video. The advantage of our approach is
that a consistent schedule is generated automatically on the basis of the constraints
specified by the author of the plan operators and the user of the GUI shown in
Figure 3.
4. EXAMPLE OUTPUT
In the following, we will present two different multimedia briefings that may be
generated (among many other variations) from one and the same narrative outline: a
PowerPoint slide show that makes use of an animated character and a SMIL-based
briefing that just employs synthesized audio.
We start from the narrative outline discussed in Section 2. It includes two create
goals, involving agents that carry out summarization functions: a multi-document
summarizer (Mani and Bloedorn 1999) and a biographical summarizer (Schiffman et
al. 2001).
11. AUTOBRIEFER 11
Figure 5 shows the XML-representation of the first slide of the PowerPoint
presentation, which was created automatically by the Narrative Planner.
Figure 5. An Introductory Slide Represented by the Narrative Planner in XML
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE Presentation SYSTEM "presentation.dtd">
<Presentation>
<Slide>
<Title>Peru Action Brief</Title>
<Content>
<S>Preamble</S>
<S>Situation Assessment
<S> <S>Chronology of Events </S>
<S>Biographies </S>
</S>
<S>Coda </S>
</Content>
<Persona>
SHOW kaufmann, Kaufmann4.31.acs, 1,69;;
PLAY Tie;;
PLAY Normal;;
PLAY OneHand;;
SAY In this briefing I will go over the situation assessment.;;
PLAY Normal;;
SAY This will cover an overview of the chronology of events and
a biography of Victor Polay.;;
PLAY SpeakGiveTurn;;
SLIDE 2=SAY Next slide please.
</Persona>
</Slide>
âŠ
</Presentation>
The narrative goals include both natural language utterances as well as the
corresponding gestures to illustrate the contents on the corresponding slide. The
final PowerPoint presentation generated is shown in Figure 6. Here we show screen
dumps of the six PowerPoint segments produced which also include a virtual
speaker. To enable this style of presentation, we make use of Microsoft Narrator, a
tool that allows for the enhancement of PowerPoint slides by Microsoft agents that
narrate and automatically advance through each slide.
12. 12 ELISABETH ANDRE ET AL.
Figure 6. PowerPoint Presentation
. In this briefing I will go over the
situation assessment. This will
cover an overview of the
chronology of events and a
biography of Victor Polay.
Here is the latest document
summary.
Here is an overview of the
chronology of events.
Next, a biography of Victor
Polay.
13. AUTOBRIEFER 13
Victor Polay, also known as
Comandante Rolando, is the Tupac
Amaru founder, a Peruvian
guerrilla commander, a former
rebel leader, and the Tupac Amaru
rebels' top leader. He studied in
both France and Spain. His wife is
Rosa Polay and his mother is
Otilia Campos de Polay. His
associates include Alan Garcia.
This briefing has assessed
aspects of the situation in Peru.
Overall, the crisis appears to be
worsening.
The same narrative outline can be used to plan a presentation in SMIL. The
corresponding presentation generated in SMIL is shown in Figure 7.
Figure 7. SMIL Presentation
In this briefing I will go over
the situation assessment. This
will cover an overview of the
chronology of events and a
biography of Victor Polay.
Here is an overview of the
chronology of events.
17. AUTOBRIEFER 17
Mani, I. and Bloedorn, E. (1999). Summarizing Similarities and Differences Among Related Documents.
Information Retrieval 1(1), 35-67.
Mani, I., Gates, B., and Bloedorn, E. (1999) Improving Summaries by Revising Them. Proceedings of the
37th
Annual Meeting of the Association for Computational Linguistics, College Park, MD, 558-565.
Mittal, V., Roth, S., Moore, J., Mattis, J., and Carenini, G. (1995) Generating Explanatory Captions for
Information Graphics. Proceedings of the International Joint Conference on Artificial Intelligence
(IJCAI'95), 1276-1283.
Nagao, K. and K. Hasida, K. (1998) Automatic Text Summarization Based on the Global Document
Annotation. Proceedings of COLING'98, Montreal, 917-921.
ORATRIX (2002). www.oratrix.com
Piwek, P.. Power, R., Scott, D., and van Deemter, K. (2003). Generating Multimedia Presentations: From
Plain Text to Screenplay, this volume.
Poggi, I., Pelachaud, C., de Rosis, F., De Carolis, B., and Carofiglio, V. (2003) Greta. A Believable
Intelligent Embodied Conversational Agent, this volume.
REAL (2002). www.real.com
Schiffman, B., Mani, I., and Concepcion, K. Producing Biographical Summaries: Combining Linguistic
Knowledge with Corpus Statistics. In Proceedings of the 39th Annual Meeting of the Association for
Computational Linguistics (ACL'2001), 450-457.
SMIL (2002). Synchronized Multimedia Integration Language. www.w3.org/AudioVideo/
Taylor, P., Black, A., and Caley, R. (1998) The architecture of the Festival Speech Synthesis System.
Proceedings of the Third ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, 147-151.
Theune, M., Heylen, D., and Nijholt, A. (2003) Generating Embodied Information Presentations, this
volume.
Wahlster, W., Andre, E., Finkler, W., Profitlich, H.-J., and Rist, T. (1993) Plan-Based Integration of
Natural Language and Graphics Generation. AI Journal, 63:387-427.