Automating book covers w/ XML

Steve Kotrch
Director of Publishing Technology
steve.kotrch@simonandschuster.com
Twitter: steveko

Simon & Schuster

1

Automating
Book Covers with
MarkLogic

2

What we’ll cover
• Simon & Schuster, a Trade publisher
• The problem with the cover workﬂow
• The solution, “Cover Automation.” We’re
going to explore how MarkLogic Server
together with InDesign’s XML capabilities
help us make this happen.
• Where it works, where it doesn’t. Where it
might work in the future.

3

Point 3 is of course the meat of this presentation.

Simon & Schuster
Trade Book Publisher

4
4

Fiction, non-ﬁction, audio books, Childrenʼs books, ebooks
Roughly 2000 titles/year
In business since 1924

Trade Publishing
• Finance: venture capital for authors
• Content creation
• Prepare content for publication
• Manufacturing
• Distribution
• Marketing
• Sales
5
5

What it means to be a trade book publisher. We pay authors to write books--venture capitalist.
We donʼt create our own content; thatʼs for the authors to do. We do prepare content for
publication: copyedit, and design, create marketing material--and a book cover these days is
marketing, like packaging for any product. We arrange for manufacturing. We distribute, market
and sell books.

Technology

• .NET development environment
• SQL Server on the back end
• MarkLogic for content management

6

In terms of technology, the infrastructure is mostly Microsoft, with a .NET development
environment. We use SQL Server for our databases, including business-critical systems. And
of course Mark Logic for content management.
So Why MarkLogic?

Digital Warehouse
• Scanning of 15,000 books
• Printable, searchable PDF; XML; OEBPS
• Rights database
• Scan contracts
• Enter rights information into a database
• Data Distribution
• XML, binaries

7

Six years ago or more: Future-proof content. >15,000 titles. POD PDF (search), OEBPS, XML.
80 years: >15,000 titles. Chosen for marketability & rights. => scan author contracts &
create rights database.
ML Server to help w/ distro of content & metadata--in XML, multiple destinations, unique
req’s
First implementation: store scanned OCR’d contracts

Digital Warehouse

TMM / Product
Database THE INTERNET
DATA
Distribution
Platforms
Chuckwalla
Retailers
DAM
Search
Engines

MarkLogic
Server

XML

8

Now our Syndication Server, as we call it, is part of a 3-pronged approach: structured data
(SQL Server); binary data (DAM); XML data (MarkLogic Server). Now book XML is used not just
for future-prooﬁng, but also for search and for extracting sample chapters.

Key to Online Marketing:
Search

9

Why search is important, and not just for books. 19% of all online retail is done through
Amazon.com.
Completeness and quality of information gets top hits. This is a 10-year-old book, still the
top hit for this topic because of completeness, quality of info

Current ML “Servers”

• Contracts (Executed: Scanned, OCR’d)
• Syndication
• Cover Automation
• Content Enrichment (soon)

10

I talked about three uses to which we’ve put Mark Logic at S&S, and later I’ll discuss a fourth
one which will be coming on line soon, along with other plans we have for the technology

Cover Automation

11
11

Let’s turn back to the main topic of this talk. Mentioned before: Not STM, not Educational. On
the XML FIRST value graph, down near zero. But here I found a compelling case for creating
an XML-based publishing system.

12
12

Here is the jacket for a pretty successful hardcover book, Sing You Home by Jodi Picoult. Letʼs look at
the problems involved in developing book covers like this one.
File is headed for manufacturing.
Book covers in todayʼs marketplace: Marketing—packaging, the representation to the world of that
product. (A couple of slides ago, images representing the products that S&S publishes. You probably
didnʼt even give it a thought that these were pictures of book covers. “Placeholder image” =>10%
bump in pre-publication sales.
Just the right look, and just the right words.
One impetus to developing the system: print = online. Another revolves around two issues . . . <click>

Two Issues
Multiple formats Multiple inputs

13

14
14

One problem facing our designers is that for a publisher the size of Simon & Schuster, the
books come in a lot of different sizes, various formats, and from various imprints.
Imprints are profit units each of which has a slightly different outlook on the world, and likely to
be run by someone with a healthy ego and strong opinions, resulting in each having a slightly
different layout. All these different layouts and sizes poses a problem for designers and for the
workflow--getting clear, definitive and accurate information to the designers about what theyʼre
supposed to create.

Two Issues
Multiple formats Multiple inputs

15

EMAILS

WORD FILES ON A FILE SERVER

PRINTED OR WRITTEN NOTES

16
16

Added to this was the fact that information about what is supposed to go on the cover was
coming from multiple and sometimes contradictory inputs—emails, word ﬁles that were
dropped in one or another folder on a ﬁle server, or printed or even written notes. Also, these
werenʼt coming to the designer in any particular order or even following a schedule. Editors
with clout left off entire sections of copy until the last minute.
Youʼre going to hear me use the term “copy” here, which comes from the advertising and
publishing industries. It means “text.”

The Solution

• Definitive, authoritative information about
cover format and size
• A stable, controlled set of cover templates
• A single source for cover copy
• Digital delivery of copy to Design

17

<click>The thing is, we have all the pertinent format information about book covers in our
production database. I figured we should use it.
<click>To help avoid confusion, I also got buy-in from the art directors for them to develop a
definitive, concise library of templates and, though it might not seem important, a vocabulary
of terms. <click>We also have a system for editors to write descriptive copy about books for
internal use, the tipsheet system. So theyʼre used to writing copy in a web-based system.
Writing cover copy in a related system isnʼt that much of a stretch.

Cover Workflow

Acquisition Editor or
Editor Marketing Copywriter Review: Designers Designers

Tipsheet Specifications Cover Copy Finished Copy White Layout Finished Layout

18

What we established with the aid of Cover Automation is a clear workflow for book covers,
from the creation of cover copy through to the design of the cover itself. This is a somewhat
simplified view. 1. Acquisition Editor writes his or her Tipsheet. 2. Once Editorial and Marketing
agree on a format for the book and some initial specifications (format, size, approximate
length) this information is shared with Production. 3a. The Editor or sometimes a Copywriter
writes the cover copy, usually based on what was written for the Tipsheet. 3b. Production
finalizes the specifications. 4. Cover copy review and revision process, digitally. 5. Cover
Automation system combines the finished copy, specʼs, metadata to create White Layout

The Cover and its Features
BACK FLAP BACK COVER SPINE FRONT COVER FRONT FLAP

AUTHOR
QUOTES QUOTE
PHOTO
AUTHOR
PRICES

READING LINE

DESCRIPTION
SUBTITLE

AUTHOR BIO TITLE

“ALSO” LINE

CREDITS READING LINE
ISBN & PRICES

19
19
We have a vocabulary for describing the elements of a cover, and itʼs got sections that we
have names for, but you canʼt say that covers are really structured. The elements appear in
different places, in a different order; not all covers have all elements. Still, we can use the
information we have to our advantage, especially because we have a tool like Mark Logic to
work with.

The Cover Editor

20
20

Here is the Cover Editor. Generated by our product database, Title Management. The Cover
Editor provides keys to solving the problem. <click> Important metadata: type (hardcover, so a
dust jacket) and size of the cover.
The Cover Editor is also the single source for cover copy: It allows Editorial to write and edit
the copy for the various sections of a cover. <click> It allows them to place items in the order
they want them to appear in (although here the authorʼs name isnʼt on the top of the front
cover, where it should be). Of course it also allows them to add elements, such as praise
quotes.

21
21

Letʼs scroll down and take a closer look at the bookʼs description, and follow that through. The
Cover Editor is a pretty full-featured program. It It allows for basic formatting--bold and
italics--and keeps track of all text changes and records when they happened and who worked
on it. I noticed that Sarah Branham added most of the elements to this cover, and that the
copyeditor Carole Schwindeller worked on it as well.
The right-hand column is for notes

22
22

The system also provides routing of the cover copy through various departments. When the user <click> “sends” the cover
copy to someone, the system creates an email that identiﬁes the sender. They can write a message to the recipient, and this
email contains a link to this Web page.
Once all the approvals are received, Editorial sends the cover copy to the designer. <click> She clicks the Build Cover button.
She then chooses <click> the imprint, format and size, and tells the system to build a cover for her. We debated creating the
cover automatically because the system has all that information, but the art directors felt that there enough departures from
the standards involved that they were more comfortable having the designers go through this step and choose manually.

The White Layout

23
23

The result that is downloaded to the designerʼs desk is what I call the White Layout. <click>
Here you can see the Description that we have been following.
Letʼs take a look at what is happening behind the scenes.

24
24

Behind the layout--and what the designer doesnʼt see--is an XML structure. Hereʼs the
Description again <click>, with its corresponding element in the structure. So how do we get to
this point?

Cover Editor/SQL Server: Cover Copy in XML Template in IDML

The “Replacement Engine”

“White Layout”

25
25

We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the
Cover Editor generates an XML version of the data to go on the cover.
Another key element is the template, which weʼll examine in a moment. The InDesign template corresponding to the format
and size of the book is exported to a variety of XML called IDML, which the MarkLogic server can read, and the server
combines the two to create the White Layout. Weʼve all heard about Babbageʼs Difference Engine. What we have here is a
Replacement Engine. Replacement is a key concept to understanding how this system works.
Letʼs take a look at that process step-by-step.

The Template (indd)

26
26

Here it is in Indesign format. Once the designer has the layout set, we identify the elements that are to be populated with
cover copy, using InDesignʼs capability to introduce structure into their layouts. Notice that this one contains 14 structural
elements. So in addition to the cover copy that Editorial writes, it allows us to populate items like prices and ISBN, and what
you might call “boilerplate,” like the line that tells buyers that the book is also available as an ebook or an audio. This is a great
timesaver, and eliminates the possibility of of a lot of errors, as it used to be up to the designers (notoriously bad typists) to
type in the ISBN, prices, and tag lines like “Also available as and ebook.”
I want to bring to your attention the element that is on the ﬂap. It is a placeholder frame <click>. There is only one element
here, yet, as weʼll see, there will be more than one element that is to be placed in the white layout here. We could never
predict how many elements there might be in a section, and weʼll discuss how we solved that particular problem a little later
on.

The Template (idml)

27
27

Once the template is set, it is saved in idml format. IDML is an XML format, consisting of XML ﬁles describing each aspect and
piece in the layout, then placed in a ZIP wrapper, similar to what Microsoft is doing with its Ofﬁce documents. What is
important about the fact that Adobe has done this, is that the IDML format is both complete and native. That is key for us,
because it allows us to edit or even create InDesign documents through XML.
Here is the template opened in an XML editor. Notice all the folders listed down the left side of the window.<click>

The Template (idml)

28
28

Here Iʼve opened the “Stories” folder, and chosen one of the “Stories.” Iʼve highlighted the tag
that identifies this Story as the “placeholder” frame for copy that is to be placed on the flap
<click>. A “placeholder” frame is the first in a column of one or more frames in that Section.

Cover-Info

29
29

The other key building block is the XML output by the SQL database. We refer to it as “cover-info.” Cover-info XML is not valid,
but it is well-formed and Mark Logic is perfectly fine with it. It is a collection of the elements that are to appear on the cover,
both those that were written by Editorial and data generated from the product database.
<click> We can see the first section, frontcover, and the first three and part of a fourth element, identified as TextFrames: the
TITLE, SUBTITLE, AUTHOR, and parts of the HEADLINE, like “#1 New York Times bestselling author” The words “New York
Times” are in a SPAN identified as “emphasis.”

30
30

If we scroll down through this file, we come to the Description, which we were following. Itʼs part of the Section called “flap.”
There is a quote from Stephen King followed by the Description we were looking at before. So, as I pointed out before, here
are two text frames that are to be positioned in the same section.
Hereʼs how we solve that problem: Notice the InDesign-specific information. There is an argument in the section tag called y-
offset. <click> This indicates that each TextFrame for that section is to start 54 points--or 3/4 inch--below the previous one.
Also notice that the ParagraphStyle argument. It corresponds with InDesign paragraph styles, and the CharacterStyle
corresponds with character styles set up in the template.
Where Mark Logic comes into the equation, as you might have guessed, is putting these two XML streams--the IDML of the
template and the XML of the cover-info--together to generate the White Layout.

31
31

Letʼs look behind the scenes at what happens when the Designer pushes the “Generate Layout” and then “Start” buttons. In
this administratorʼs view we can see the templates weʼve saved in IDML format. These correspond to the choices made by the
designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed
with that template.<click> Clicking on that displays the date and time that it was processed, along with a string that shows us
what is happening: <click> At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and
passes to it two arguments, the ISBN and the template.

32
32

Iʼm not going to go through the Xqueries line by line. Iʼm not an Xquery expert by any stretch,
but that would take us the better part of a month to do.

33
33

Sufﬁce to say that Controller does what its name implies. It runs the show. The heavy lifting is
done by another Xquery, lib-ss. Thatʼs where “run-layout” resides.

34
34

Lib-ss is 14 screens long, but the operative phrase appears in this snippet: “replace all the
replaceable content.” A few slides back we looked at the XML contained in one of the IDML
templates. It has placeholder-type tags, and values for them. We also looked at the cover-info
XML delivered by the SQL database. It has tags in it called placeholder-type, pageitem type,
and pageitem-number.

35
35

The cover-info is placed into a corresponding folder in one part of the MarkLogic system <click>. What lib-ss and its minions
do is open the cover-info XML and for each XML element <click> it looks into the content folder <click> for the relevant
JobTicket and finds the Story with the relevant placeholder-type <click>--the “flap” in our case. It then replaces the contents of
that story with the contents from cover-info <click>. It stores the result in the staging folder <click>.
If there is more than one pageitem for that placeholder-type--the Description in “flap” in our case--it generates another Story,
identical in type and size, and drops it in, 3/4” lower on the layout.
Once it is through processing the contents of cover-info, it then combines--in memory--the Stories that are in Staging with the
ones in the template, replacing whatever stories there have the same ID.

36
36

It ZIPs this collection of files into an IDML, which the user downloads. Here is the IDML,
opened in an XML editor, with the first Story of the flap displayed. If you look closely enough,
<click> youʼll see that the local formatting is preserved from the Cover Editor.

37
37

Here is the Description that we were following. <click>It is tagged as item-content, and a
description.

38
38

So the IDML that is delivered to the user--what I call the “white layout”--contains a wealth of
structural information. We donʼt turn on the structure view for the user, of course.

39
39

The designer then proceeds to add graphics, photos, color, and to set the type to create the
ﬁnished layout.

40
40

The result still contains the structure of the template even at this stage. Here the book description is highlighted. The system is
actually designed so that if this layout is saved as an IDML, it can be submitted to the system and the corresponding fields we
saw in the Cover Editor could be updated with any changes made to the text in the InDesign file. The idea was that this
finished cover would contain the final word on what editors, publishers and marketing types wanted to say about the book.
However, we uncovered two flaws in our logic.

CoverTextAsArt

41
41

For one thing, many covers, like this one, have artwork instead of text. For Children's, this meant that our approach of having
cover copy delivered from a database wasn't of much use. There wasn't much text on their covers anyway. But designers in
the Adult Division wanted to be able to turn some of the text into art, too. And they were apt to break up or combine frames of
text, resulting in the IDML Stories getting out of synch.
More importantly, waiting for the ﬁnal, approved text on the cover has become too much of a luxury. While at the time we
began this project, total, letter-perfect correspondence between the cover and online copy was mandatory. Online marketing
demands that this information be timely. That means it must be out before the cover is even close to being ready for the
printer. Weʼre getting to the point that what goes out o the Net does not have to batch the cover character for character.

Future Directions
“Catalog Automation”

42

We are thinking of other applications for this technology, though. Here is the page from our
HTML-based digital catalog for Sing You Home, but weʼre exploring the idea of using a similar
approach to building print catalogs, as well.

Future Directions
Content Enrichment

43

One of the reasons I pushed so hard for ML, besides XML manipulation, was search.
Our Digital Group, with the help of MarkLogic Professional Services and some third-party
tools, has built a content enrichment and search tool which will be going live shortly. Here
I’ve typed in the phrase “civil rights.” It has turned up a list of relevant titles and extracted
appropriate sections of text. The tool also allows the user to drill down into the title. At a
demonstration of the tool, someone typed in Quaddaﬁ and turned up a book from years ago,
that everyone had forgotten about, written by a personal friend of his. I think this will
become a valuable tool, and it shows why saving content in XML is important, even though
we don’t have an XML-based workﬂow.

Conclusion
• What it means to be a Trade publisher
• The problem: haphazard copy, unclear
format information
• How we use MarkLogic Server to create
InDesign layouts
• Limitations of this approach.
• Where it might work in the future.

44

So weʼve covered (read) <click> <click> <click> <click> <click>

Acknowledgements

I would just like to acknowledge the
contributions of Frank Rubino and Jason
Myatt of MarkLogic. Not only did they build
the MarkLogic part of Cover Automation,
but help me with this presentation.

45

Frank Rubino & Jason Myatt

Thank You—Questions?
Steve Kotrch
Director of Publishing Technology
steve.kotrch@simonandschuster.com
Twitter: steveko

Simon & Schuster

46
46

Automating book covers w/ XML

Recommended

Recommended

More Related Content

Similar to Automating book covers w/ XML

Similar to Automating book covers w/ XML (20)

Recently uploaded

Recently uploaded (20)

Automating book covers w/ XML

Editor's Notes