SlideShare a Scribd company logo
Steve Kotrch
 Director of Publishing Technology
steve.kotrch@simonandschuster.com
         Twitter: steveko




 Simon & Schuster




                                     1
Automating
Book Covers with
   MarkLogic


                   2
What we’ll cover
           •    Simon & Schuster, a Trade publisher
           •    The problem with the cover workflow
           •    The solution, “Cover Automation.” We’re
                going to explore how MarkLogic Server
                together with InDesign’s XML capabilities
                help us make this happen.
           •    Where it works, where it doesn’t. Where it
                might work in the future.


                                                             3

Point 3 is of course the meat of this presentation.
Simon & Schuster
                               Trade Book Publisher




                                              4
                                                             4

Fiction, non-fiction, audio books, Childrenʼs books, ebooks
Roughly 2000 titles/year
In business since 1924
Trade Publishing
        •   Finance: venture capital for authors
        •   Content creation
        •   Prepare content for publication
        •   Manufacturing
        •   Distribution
        •   Marketing
        •   Sales
                                              5
                                                                                             5

What it means to be a trade book publisher. We pay authors to write books--venture capitalist.
We donʼt create our own content; thatʼs for the authors to do. We do prepare content for
publication: copyedit, and design, create marketing material--and a book cover these days is
marketing, like packaging for any product. We arrange for manufacturing. We distribute, market
and sell books.
Technology

          •   .NET development environment
          •   SQL Server on the back end
          •   MarkLogic for content management




                                                                                             6

In terms of technology, the infrastructure is mostly Microsoft, with a .NET development
environment. We use SQL Server for our databases, including business-critical systems. And
of course Mark Logic for content management.
So Why MarkLogic?
Digital Warehouse
          •    Scanning of 15,000 books
              •   Printable, searchable PDF; XML; OEBPS
          •    Rights database
              •   Scan contracts
              •   Enter rights information into a database
          •    Data Distribution
              •   XML, binaries


                                                                                             7

Six years ago or more: Future-proof content. >15,000 titles. POD PDF (search), OEBPS, XML.
80 years: >15,000 titles. Chosen for marketability & rights. => scan author contracts &
create rights database.
ML Server to help w/ distro of content & metadata--in XML, multiple destinations, unique
req’s
First implementation: store scanned OCR’d contracts
Digital Warehouse

                            TMM / Product
                              Database                 THE INTERNET
                            DATA
                                                         Distribution
                                                          Platforms
              Chuckwalla
                                                                Retailers
             DAM
                                                            Search
                                                            Engines

                             MarkLogic
                              Server

                             XML



                                                                                              8

Now our Syndication Server, as we call it, is part of a 3-pronged approach: structured data
(SQL Server); binary data (DAM); XML data (MarkLogic Server). Now book XML is used not just
for future-proofing, but also for search and for extracting sample chapters.
Key to Online Marketing:
                 Search




                                                                                               9

Why search is important, and not just for books. 19% of all online retail is done through
Amazon.com.
Completeness and quality of information gets top hits. This is a 10-year-old book, still the
top hit for this topic because of completeness, quality of info
Current ML “Servers”

           •   Contracts (Executed: Scanned, OCR’d)
           •   Syndication
           •   Cover Automation
           •   Content Enrichment (soon)




                                                                                                  10

I talked about three uses to which we’ve put Mark Logic at S&S, and later I’ll discuss a fourth
one which will be coming on line soon, along with other plans we have for the technology
Cover Automation

                                              11
                                                                                             11

Let’s turn back to the main topic of this talk. Mentioned before: Not STM, not Educational. On
the XML FIRST value graph, down near zero. But here I found a compelling case for creating
an XML-based publishing system.
12
                                                                                                          12

Here is the jacket for a pretty successful hardcover book, Sing You Home by Jodi Picoult. Letʼs look at
the problems involved in developing book covers like this one.
File is headed for manufacturing.
Book covers in todayʼs marketplace: Marketing—packaging, the representation to the world of that
product. (A couple of slides ago, images representing the products that S&S publishes. You probably
didnʼt even give it a thought that these were pictures of book covers. “Placeholder image” =>10%
bump in pre-publication sales.
Just the right look, and just the right words.
One impetus to developing the system: print = online. Another revolves around two issues . . . <click>
Two Issues
Multiple formats   Multiple inputs




                                     13
14
                                                                                                  14

One problem facing our designers is that for a publisher the size of Simon & Schuster, the
books come in a lot of different sizes, various formats, and from various imprints.
Imprints are profit units each of which has a slightly different outlook on the world, and likely to
be run by someone with a healthy ego and strong opinions, resulting in each having a slightly
different layout. All these different layouts and sizes poses a problem for designers and for the
workflow--getting clear, definitive and accurate information to the designers about what theyʼre
supposed to create.
Two Issues
Multiple formats   Multiple inputs




                                     15
EMAILS



                                                     WORD FILES ON A FILE SERVER




                                                      PRINTED OR WRITTEN NOTES




                                               16
                                                                                                  16

Added to this was the fact that information about what is supposed to go on the cover was
coming from multiple and sometimes contradictory inputs—emails, word files that were
dropped in one or another folder on a file server, or printed or even written notes. Also, these
werenʼt coming to the designer in any particular order or even following a schedule. Editors
with clout left off entire sections of copy until the last minute.
Youʼre going to hear me use the term “copy” here, which comes from the advertising and
publishing industries. It means “text.”
The Solution

           •   Definitive, authoritative information about
               cover format and size
           •   A stable, controlled set of cover templates
           •   A single source for cover copy
           •   Digital delivery of copy to Design



                                                                                                 17

<click>The thing is, we have all the pertinent format information about book covers in our
production database. I figured we should use it.
<click>To help avoid confusion, I also got buy-in from the art directors for them to develop a
definitive, concise library of templates and, though it might not seem important, a vocabulary
of terms. <click>We also have a system for editors to write descriptive copy about books for
internal use, the tipsheet system. So theyʼre used to writing copy in a web-based system.
Writing cover copy in a related system isnʼt that much of a stretch.
Cover Workflow

        Acquisition                    Editor or
        Editor         Marketing       Copywriter    Review:                   Designers       Designers

            Tipsheet   Specifications    Cover Copy             Finished Copy    White Layout   Finished Layout




                                                                                                                 18

What we established with the aid of Cover Automation is a clear workflow for book covers,
from the creation of cover copy through to the design of the cover itself. This is a somewhat
simplified view. 1. Acquisition Editor writes his or her Tipsheet. 2. Once Editorial and Marketing
agree on a format for the book and some initial specifications (format, size, approximate
length) this information is shared with Production. 3a. The Editor or sometimes a Copywriter
writes the cover copy, usually based on what was written for the Tipsheet. 3b. Production
finalizes the specifications. 4. Cover copy review and revision process, digitally. 5. Cover
Automation system combines the finished copy, specʼs, metadata to create White Layout
The Cover and its Features
               BACK FLAP        BACK COVER          SPINE            FRONT COVER              FRONT FLAP

                                                AUTHOR
                       QUOTES                                                         QUOTE
                                                 PHOTO
                                                            AUTHOR
                                                                                                    PRICES



                                                                                     READING LINE




                                                                                                             DESCRIPTION
                                                                                   SUBTITLE


 AUTHOR BIO                                                  TITLE

 “ALSO” LINE




                     CREDITS                                                                  READING LINE
                                ISBN & PRICES




                                                                                                                           19
                                                     19
We have a vocabulary for describing the elements of a cover, and itʼs got sections that we
have names for, but you canʼt say that covers are really structured. The elements appear in
different places, in a different order; not all covers have all elements. Still, we can use the
information we have to our advantage, especially because we have a tool like Mark Logic to
work with.
The Cover Editor




                                                                                             20
                                                                                                  20

Here is the Cover Editor. Generated by our product database, Title Management. The Cover
Editor provides keys to solving the problem. <click> Important metadata: type (hardcover, so a
dust jacket) and size of the cover.
The Cover Editor is also the single source for cover copy: It allows Editorial to write and edit
the copy for the various sections of a cover. <click> It allows them to place items in the order
they want them to appear in (although here the authorʼs name isnʼt on the top of the front
cover, where it should be). Of course it also allows them to add elements, such as praise
quotes.
21
                                                                                               21

Letʼs scroll down and take a closer look at the bookʼs description, and follow that through. The
Cover Editor is a pretty full-featured program. It It allows for basic formatting--bold and
italics--and keeps track of all text changes and records when they happened and who worked
on it. I noticed that Sarah Branham added most of the elements to this cover, and that the
copyeditor Carole Schwindeller worked on it as well.
The right-hand column is for notes
22
                                                                                                                              22

The system also provides routing of the cover copy through various departments. When the user <click> “sends” the cover
copy to someone, the system creates an email that identifies the sender. They can write a message to the recipient, and this
email contains a link to this Web page.
Once all the approvals are received, Editorial sends the cover copy to the designer. <click> She clicks the Build Cover button.
She then chooses <click> the imprint, format and size, and tells the system to build a cover for her. We debated creating the
cover automatically because the system has all that information, but the art directors felt that there enough departures from
the standards involved that they were more comfortable having the designers go through this step and choose manually.
The White Layout




                                               23
                                                                                                23

The result that is downloaded to the designerʼs desk is what I call the White Layout. <click>
Here you can see the Description that we have been following.
Letʼs take a look at what is happening behind the scenes.
24
                                                                                              24

Behind the layout--and what the designer doesnʼt see--is an XML structure. Hereʼs the
Description again <click>, with its corresponding element in the structure. So how do we get to
this point?
Cover Editor/SQL Server: Cover Copy in XML                                             Template in IDML




                                                                          The “Replacement Engine”

                                                                 “White Layout”




                                                           25
                                                                                                                          25

We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the
Cover Editor generates an XML version of the data to go on the cover.
Another key element is the template, which weʼll examine in a moment. The InDesign template corresponding to the format
and size of the book is exported to a variety of XML called IDML, which the MarkLogic server can read, and the server
combines the two to create the White Layout. Weʼve all heard about Babbageʼs Difference Engine. What we have here is a
Replacement Engine. Replacement is a key concept to understanding how this system works.
Letʼs take a look at that process step-by-step.
The Template (indd)




                                                                 26
                                                                                                                                    26

Here it is in Indesign format. Once the designer has the layout set, we identify the elements that are to be populated with
cover copy, using InDesignʼs capability to introduce structure into their layouts. Notice that this one contains 14 structural
elements. So in addition to the cover copy that Editorial writes, it allows us to populate items like prices and ISBN, and what
you might call “boilerplate,” like the line that tells buyers that the book is also available as an ebook or an audio. This is a great
timesaver, and eliminates the possibility of of a lot of errors, as it used to be up to the designers (notoriously bad typists) to
type in the ISBN, prices, and tag lines like “Also available as and ebook.”
I want to bring to your attention the element that is on the flap. It is a placeholder frame <click>. There is only one element
here, yet, as weʼll see, there will be more than one element that is to be placed in the white layout here. We could never
predict how many elements there might be in a section, and weʼll discuss how we solved that particular problem a little later
on.
The Template (idml)




                                                             27
                                                                                                                            27

Once the template is set, it is saved in idml format. IDML is an XML format, consisting of XML files describing each aspect and
piece in the layout, then placed in a ZIP wrapper, similar to what Microsoft is doing with its Office documents. What is
important about the fact that Adobe has done this, is that the IDML format is both complete and native. That is key for us,
because it allows us to edit or even create InDesign documents through XML.
Here is the template opened in an XML editor. Notice all the folders listed down the left side of the window.<click>
The Template (idml)




                                               28
                                                                                                   28

Here Iʼve opened the “Stories” folder, and chosen one of the “Stories.” Iʼve highlighted the tag
that identifies this Story as the “placeholder” frame for copy that is to be placed on the flap
<click>. A “placeholder” frame is the first in a column of one or more frames in that Section.
Cover-Info




                                                               29
                                                                                                                                29

The other key building block is the XML output by the SQL database. We refer to it as “cover-info.” Cover-info XML is not valid,
but it is well-formed and Mark Logic is perfectly fine with it. It is a collection of the elements that are to appear on the cover,
both those that were written by Editorial and data generated from the product database.
<click> We can see the first section, frontcover, and the first three and part of a fourth element, identified as TextFrames: the
TITLE, SUBTITLE, AUTHOR, and parts of the HEADLINE, like “#1 New York Times bestselling author” The words “New York
Times” are in a SPAN identified as “emphasis.”
30
                                                                                                                                  30

If we scroll down through this file, we come to the Description, which we were following. Itʼs part of the Section called “flap.”
There is a quote from Stephen King followed by the Description we were looking at before. So, as I pointed out before, here
are two text frames that are to be positioned in the same section.
Hereʼs how we solve that problem: Notice the InDesign-specific information. There is an argument in the section tag called y-
offset. <click> This indicates that each TextFrame for that section is to start 54 points--or 3/4 inch--below the previous one.
Also notice that the ParagraphStyle argument. It corresponds with InDesign paragraph styles, and the CharacterStyle
corresponds with character styles set up in the template.
Where Mark Logic comes into the equation, as you might have guessed, is putting these two XML streams--the IDML of the
template and the XML of the cover-info--together to generate the White Layout.
31
                                                                                                                             31

Letʼs look behind the scenes at what happens when the Designer pushes the “Generate Layout” and then “Start” buttons. In
this administratorʼs view we can see the templates weʼve saved in IDML format. These correspond to the choices made by the
designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed
with that template.<click> Clicking on that displays the date and time that it was processed, along with a string that shows us
what is happening: <click> At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and
passes to it two arguments, the ISBN and the template.
32
                                                                                                  32

Iʼm not going to go through the Xqueries line by line. Iʼm not an Xquery expert by any stretch,
but that would take us the better part of a month to do.
33
                                                                                                   33

Suffice to say that Controller does what its name implies. It runs the show. The heavy lifting is
done by another Xquery, lib-ss. Thatʼs where “run-layout” resides.
34
                                                                                                 34

Lib-ss is 14 screens long, but the operative phrase appears in this snippet: “replace all the
replaceable content.” A few slides back we looked at the XML contained in one of the IDML
templates. It has placeholder-type tags, and values for them. We also looked at the cover-info
XML delivered by the SQL database. It has tags in it called placeholder-type, pageitem type,
and pageitem-number.
35
                                                                                                                               35

The cover-info is placed into a corresponding folder in one part of the MarkLogic system <click>. What lib-ss and its minions
do is open the cover-info XML and for each XML element <click> it looks into the content folder <click> for the relevant
JobTicket and finds the Story with the relevant placeholder-type <click>--the “flap” in our case. It then replaces the contents of
that story with the contents from cover-info <click>. It stores the result in the staging folder <click>.
If there is more than one pageitem for that placeholder-type--the Description in “flap” in our case--it generates another Story,
identical in type and size, and drops it in, 3/4” lower on the layout.
Once it is through processing the contents of cover-info, it then combines--in memory--the Stories that are in Staging with the
ones in the template, replacing whatever stories there have the same ID.
36
                                                                                                 36

It ZIPs this collection of files into an IDML, which the user downloads. Here is the IDML,
opened in an XML editor, with the first Story of the flap displayed. If you look closely enough,
<click> youʼll see that the local formatting is preserved from the Cover Editor.
37
                                                                                             37

Here is the Description that we were following. <click>It is tagged as item-content, and a
description.
38
                                                                                                  38

So the IDML that is delivered to the user--what I call the “white layout”--contains a wealth of
structural information. We donʼt turn on the structure view for the user, of course.
39
                                                                                               39

The designer then proceeds to add graphics, photos, color, and to set the type to create the
finished layout.
40
                                                                                                                                 40

The result still contains the structure of the template even at this stage. Here the book description is highlighted. The system is
actually designed so that if this layout is saved as an IDML, it can be submitted to the system and the corresponding fields we
saw in the Cover Editor could be updated with any changes made to the text in the InDesign file. The idea was that this
finished cover would contain the final word on what editors, publishers and marketing types wanted to say about the book.
However, we uncovered two flaws in our logic.
CoverTextAsArt




                                                               41
                                                                                                                                  41

For one thing, many covers, like this one, have artwork instead of text. For Children's, this meant that our approach of having
cover copy delivered from a database wasn't of much use. There wasn't much text on their covers anyway. But designers in
the Adult Division wanted to be able to turn some of the text into art, too. And they were apt to break up or combine frames of
text, resulting in the IDML Stories getting out of synch.
More importantly, waiting for the final, approved text on the cover has become too much of a luxury. While at the time we
began this project, total, letter-perfect correspondence between the cover and online copy was mandatory. Online marketing
demands that this information be timely. That means it must be out before the cover is even close to being ready for the
printer. Weʼre getting to the point that what goes out o the Net does not have to batch the cover character for character.
Future Directions
                               “Catalog Automation”




                                                                                                42

We are thinking of other applications for this technology, though. Here is the page from our
HTML-based digital catalog for Sing You Home, but weʼre exploring the idea of using a similar
approach to building print catalogs, as well.
Future Directions
                                  Content Enrichment




                                                                                                43

One of the reasons I pushed so hard for ML, besides XML manipulation, was search.
Our Digital Group, with the help of MarkLogic Professional Services and some third-party
tools, has built a content enrichment and search tool which will be going live shortly. Here
I’ve typed in the phrase “civil rights.” It has turned up a list of relevant titles and extracted
appropriate sections of text. The tool also allows the user to drill down into the title. At a
demonstration of the tool, someone typed in Quaddafi and turned up a book from years ago,
that everyone had forgotten about, written by a personal friend of his. I think this will
become a valuable tool, and it shows why saving content in XML is important, even though
we don’t have an XML-based workflow.
Conclusion
           •   What it means to be a Trade publisher
           •   The problem: haphazard copy, unclear
               format information
           •   How we use MarkLogic Server to create
               InDesign layouts
           •   Limitations of this approach.
           •   Where it might work in the future.

                                                                  44

So weʼve covered (read) <click> <click> <click> <click> <click>
Acknowledgements

           I would just like to acknowledge the
           contributions of Frank Rubino and Jason
           Myatt of MarkLogic. Not only did they build
           the MarkLogic part of Cover Automation,
           but help me with this presentation.




                                                         45

Frank Rubino & Jason Myatt
Thank You—Questions?
             Steve Kotrch
    Director of Publishing Technology
  steve.kotrch@simonandschuster.com
             Twitter: steveko




         Simon & Schuster

                  46
                                        46

More Related Content

Similar to Automating book covers w/ XML

Agile Publishing Model - version 2012
Agile Publishing Model - version 2012Agile Publishing Model - version 2012
Agile Publishing Model - version 2012
Dominique Raccah
 
Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2
Ian Moyse ☁
 
Microservices for-java-developers
Microservices for-java-developersMicroservices for-java-developers
Microservices for-java-developers
Sandeep Rangdal
 
Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)
Philippe Humeau
 
Benchmark of e-commerce solutions
Benchmark of e-commerce solutionsBenchmark of e-commerce solutions
Benchmark of e-commerce solutions
NBS System
 
Slidepresentation
SlidepresentationSlidepresentation
Slidepresentation
T_design
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
IBM India Smarter Computing
 
DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)
Michael Elder
 
Total access building and delivering a stand-out investor presentation
Total access   building and delivering a stand-out investor presentationTotal access   building and delivering a stand-out investor presentation
Total access building and delivering a stand-out investor presentation
Joyce Chuang
 
Total access: building and delivering a stand-out investor presentation
Total access: building and delivering a stand-out investor presentationTotal access: building and delivering a stand-out investor presentation
Total access: building and delivering a stand-out investor presentation
Pemo Theodore
 
Microservices for Java Developers
Microservices for Java DevelopersMicroservices for Java Developers
Microservices for Java Developers
Omar AbdullWahhab
 
Entrepreneur week startup marketing 101
Entrepreneur week startup marketing 101Entrepreneur week startup marketing 101
Entrepreneur week startup marketing 101
April Dunford
 
Content Management for Publishers
Content Management for PublishersContent Management for Publishers
Content Management for Publishers
Apex CoVantage
 
Oracle Social CRM Applications Strategy Overview And Roadmap
Oracle Social CRM Applications Strategy Overview And RoadmapOracle Social CRM Applications Strategy Overview And Roadmap
Oracle Social CRM Applications Strategy Overview And Roadmap
Andrew Wong
 
Saving money by adopting an XML based Meta Data Workflow
Saving money by adopting an XML based Meta Data WorkflowSaving money by adopting an XML based Meta Data Workflow
Saving money by adopting an XML based Meta Data Workflow
toc
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XML
somisguided
 
Day 5
Day 5Day 5
Day 5
Ahmed Riad
 
Need to reboot your content creation strategy? Start with "No"
Need to reboot your content creation strategy? Start with "No"Need to reboot your content creation strategy? Start with "No"
Need to reboot your content creation strategy? Start with "No"
Keith Boyd
 
Lean Cloud - Amazon Web Services
Lean Cloud - Amazon Web ServicesLean Cloud - Amazon Web Services
Lean Cloud - Amazon Web Services
Simone Brunozzi
 

Similar to Automating book covers w/ XML (20)

Agile Publishing Model - version 2012
Agile Publishing Model - version 2012Agile Publishing Model - version 2012
Agile Publishing Model - version 2012
 
Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2
 
Microservices for-java-developers
Microservices for-java-developersMicroservices for-java-developers
Microservices for-java-developers
 
Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)
 
Benchmark of e-commerce solutions
Benchmark of e-commerce solutionsBenchmark of e-commerce solutions
Benchmark of e-commerce solutions
 
Slidepresentation
SlidepresentationSlidepresentation
Slidepresentation
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)
 
Total access building and delivering a stand-out investor presentation
Total access   building and delivering a stand-out investor presentationTotal access   building and delivering a stand-out investor presentation
Total access building and delivering a stand-out investor presentation
 
Total access: building and delivering a stand-out investor presentation
Total access: building and delivering a stand-out investor presentationTotal access: building and delivering a stand-out investor presentation
Total access: building and delivering a stand-out investor presentation
 
Microservices for Java Developers
Microservices for Java DevelopersMicroservices for Java Developers
Microservices for Java Developers
 
Entrepreneur week startup marketing 101
Entrepreneur week startup marketing 101Entrepreneur week startup marketing 101
Entrepreneur week startup marketing 101
 
Content Management for Publishers
Content Management for PublishersContent Management for Publishers
Content Management for Publishers
 
Oracle Social CRM Applications Strategy Overview And Roadmap
Oracle Social CRM Applications Strategy Overview And RoadmapOracle Social CRM Applications Strategy Overview And Roadmap
Oracle Social CRM Applications Strategy Overview And Roadmap
 
Saving money by adopting an XML based Meta Data Workflow
Saving money by adopting an XML based Meta Data WorkflowSaving money by adopting an XML based Meta Data Workflow
Saving money by adopting an XML based Meta Data Workflow
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XML
 
Day 5
Day 5Day 5
Day 5
 
Need to reboot your content creation strategy? Start with "No"
Need to reboot your content creation strategy? Start with "No"Need to reboot your content creation strategy? Start with "No"
Need to reboot your content creation strategy? Start with "No"
 
Lean Cloud - Amazon Web Services
Lean Cloud - Amazon Web ServicesLean Cloud - Amazon Web Services
Lean Cloud - Amazon Web Services
 

Recently uploaded

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Automating book covers w/ XML

  • 1. Steve Kotrch Director of Publishing Technology steve.kotrch@simonandschuster.com Twitter: steveko Simon & Schuster 1
  • 3. What we’ll cover • Simon & Schuster, a Trade publisher • The problem with the cover workflow • The solution, “Cover Automation.” We’re going to explore how MarkLogic Server together with InDesign’s XML capabilities help us make this happen. • Where it works, where it doesn’t. Where it might work in the future. 3 Point 3 is of course the meat of this presentation.
  • 4. Simon & Schuster Trade Book Publisher 4 4 Fiction, non-fiction, audio books, Childrenʼs books, ebooks Roughly 2000 titles/year In business since 1924
  • 5. Trade Publishing • Finance: venture capital for authors • Content creation • Prepare content for publication • Manufacturing • Distribution • Marketing • Sales 5 5 What it means to be a trade book publisher. We pay authors to write books--venture capitalist. We donʼt create our own content; thatʼs for the authors to do. We do prepare content for publication: copyedit, and design, create marketing material--and a book cover these days is marketing, like packaging for any product. We arrange for manufacturing. We distribute, market and sell books.
  • 6. Technology • .NET development environment • SQL Server on the back end • MarkLogic for content management 6 In terms of technology, the infrastructure is mostly Microsoft, with a .NET development environment. We use SQL Server for our databases, including business-critical systems. And of course Mark Logic for content management. So Why MarkLogic?
  • 7. Digital Warehouse • Scanning of 15,000 books • Printable, searchable PDF; XML; OEBPS • Rights database • Scan contracts • Enter rights information into a database • Data Distribution • XML, binaries 7 Six years ago or more: Future-proof content. >15,000 titles. POD PDF (search), OEBPS, XML. 80 years: >15,000 titles. Chosen for marketability & rights. => scan author contracts & create rights database. ML Server to help w/ distro of content & metadata--in XML, multiple destinations, unique req’s First implementation: store scanned OCR’d contracts
  • 8. Digital Warehouse TMM / Product Database THE INTERNET DATA Distribution Platforms Chuckwalla Retailers DAM Search Engines MarkLogic Server XML 8 Now our Syndication Server, as we call it, is part of a 3-pronged approach: structured data (SQL Server); binary data (DAM); XML data (MarkLogic Server). Now book XML is used not just for future-proofing, but also for search and for extracting sample chapters.
  • 9. Key to Online Marketing: Search 9 Why search is important, and not just for books. 19% of all online retail is done through Amazon.com. Completeness and quality of information gets top hits. This is a 10-year-old book, still the top hit for this topic because of completeness, quality of info
  • 10. Current ML “Servers” • Contracts (Executed: Scanned, OCR’d) • Syndication • Cover Automation • Content Enrichment (soon) 10 I talked about three uses to which we’ve put Mark Logic at S&S, and later I’ll discuss a fourth one which will be coming on line soon, along with other plans we have for the technology
  • 11. Cover Automation 11 11 Let’s turn back to the main topic of this talk. Mentioned before: Not STM, not Educational. On the XML FIRST value graph, down near zero. But here I found a compelling case for creating an XML-based publishing system.
  • 12. 12 12 Here is the jacket for a pretty successful hardcover book, Sing You Home by Jodi Picoult. Letʼs look at the problems involved in developing book covers like this one. File is headed for manufacturing. Book covers in todayʼs marketplace: Marketing—packaging, the representation to the world of that product. (A couple of slides ago, images representing the products that S&S publishes. You probably didnʼt even give it a thought that these were pictures of book covers. “Placeholder image” =>10% bump in pre-publication sales. Just the right look, and just the right words. One impetus to developing the system: print = online. Another revolves around two issues . . . <click>
  • 13. Two Issues Multiple formats Multiple inputs 13
  • 14. 14 14 One problem facing our designers is that for a publisher the size of Simon & Schuster, the books come in a lot of different sizes, various formats, and from various imprints. Imprints are profit units each of which has a slightly different outlook on the world, and likely to be run by someone with a healthy ego and strong opinions, resulting in each having a slightly different layout. All these different layouts and sizes poses a problem for designers and for the workflow--getting clear, definitive and accurate information to the designers about what theyʼre supposed to create.
  • 15. Two Issues Multiple formats Multiple inputs 15
  • 16. EMAILS WORD FILES ON A FILE SERVER PRINTED OR WRITTEN NOTES 16 16 Added to this was the fact that information about what is supposed to go on the cover was coming from multiple and sometimes contradictory inputs—emails, word files that were dropped in one or another folder on a file server, or printed or even written notes. Also, these werenʼt coming to the designer in any particular order or even following a schedule. Editors with clout left off entire sections of copy until the last minute. Youʼre going to hear me use the term “copy” here, which comes from the advertising and publishing industries. It means “text.”
  • 17. The Solution • Definitive, authoritative information about cover format and size • A stable, controlled set of cover templates • A single source for cover copy • Digital delivery of copy to Design 17 <click>The thing is, we have all the pertinent format information about book covers in our production database. I figured we should use it. <click>To help avoid confusion, I also got buy-in from the art directors for them to develop a definitive, concise library of templates and, though it might not seem important, a vocabulary of terms. <click>We also have a system for editors to write descriptive copy about books for internal use, the tipsheet system. So theyʼre used to writing copy in a web-based system. Writing cover copy in a related system isnʼt that much of a stretch.
  • 18. Cover Workflow Acquisition Editor or Editor Marketing Copywriter Review: Designers Designers Tipsheet Specifications Cover Copy Finished Copy White Layout Finished Layout 18 What we established with the aid of Cover Automation is a clear workflow for book covers, from the creation of cover copy through to the design of the cover itself. This is a somewhat simplified view. 1. Acquisition Editor writes his or her Tipsheet. 2. Once Editorial and Marketing agree on a format for the book and some initial specifications (format, size, approximate length) this information is shared with Production. 3a. The Editor or sometimes a Copywriter writes the cover copy, usually based on what was written for the Tipsheet. 3b. Production finalizes the specifications. 4. Cover copy review and revision process, digitally. 5. Cover Automation system combines the finished copy, specʼs, metadata to create White Layout
  • 19. The Cover and its Features BACK FLAP BACK COVER SPINE FRONT COVER FRONT FLAP AUTHOR QUOTES QUOTE PHOTO AUTHOR PRICES READING LINE DESCRIPTION SUBTITLE AUTHOR BIO TITLE “ALSO” LINE CREDITS READING LINE ISBN & PRICES 19 19 We have a vocabulary for describing the elements of a cover, and itʼs got sections that we have names for, but you canʼt say that covers are really structured. The elements appear in different places, in a different order; not all covers have all elements. Still, we can use the information we have to our advantage, especially because we have a tool like Mark Logic to work with.
  • 20. The Cover Editor 20 20 Here is the Cover Editor. Generated by our product database, Title Management. The Cover Editor provides keys to solving the problem. <click> Important metadata: type (hardcover, so a dust jacket) and size of the cover. The Cover Editor is also the single source for cover copy: It allows Editorial to write and edit the copy for the various sections of a cover. <click> It allows them to place items in the order they want them to appear in (although here the authorʼs name isnʼt on the top of the front cover, where it should be). Of course it also allows them to add elements, such as praise quotes.
  • 21. 21 21 Letʼs scroll down and take a closer look at the bookʼs description, and follow that through. The Cover Editor is a pretty full-featured program. It It allows for basic formatting--bold and italics--and keeps track of all text changes and records when they happened and who worked on it. I noticed that Sarah Branham added most of the elements to this cover, and that the copyeditor Carole Schwindeller worked on it as well. The right-hand column is for notes
  • 22. 22 22 The system also provides routing of the cover copy through various departments. When the user <click> “sends” the cover copy to someone, the system creates an email that identifies the sender. They can write a message to the recipient, and this email contains a link to this Web page. Once all the approvals are received, Editorial sends the cover copy to the designer. <click> She clicks the Build Cover button. She then chooses <click> the imprint, format and size, and tells the system to build a cover for her. We debated creating the cover automatically because the system has all that information, but the art directors felt that there enough departures from the standards involved that they were more comfortable having the designers go through this step and choose manually.
  • 23. The White Layout 23 23 The result that is downloaded to the designerʼs desk is what I call the White Layout. <click> Here you can see the Description that we have been following. Letʼs take a look at what is happening behind the scenes.
  • 24. 24 24 Behind the layout--and what the designer doesnʼt see--is an XML structure. Hereʼs the Description again <click>, with its corresponding element in the structure. So how do we get to this point?
  • 25. Cover Editor/SQL Server: Cover Copy in XML Template in IDML The “Replacement Engine” “White Layout” 25 25 We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. Another key element is the template, which weʼll examine in a moment. The InDesign template corresponding to the format and size of the book is exported to a variety of XML called IDML, which the MarkLogic server can read, and the server combines the two to create the White Layout. Weʼve all heard about Babbageʼs Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works. Letʼs take a look at that process step-by-step.
  • 26. The Template (indd) 26 26 Here it is in Indesign format. Once the designer has the layout set, we identify the elements that are to be populated with cover copy, using InDesignʼs capability to introduce structure into their layouts. Notice that this one contains 14 structural elements. So in addition to the cover copy that Editorial writes, it allows us to populate items like prices and ISBN, and what you might call “boilerplate,” like the line that tells buyers that the book is also available as an ebook or an audio. This is a great timesaver, and eliminates the possibility of of a lot of errors, as it used to be up to the designers (notoriously bad typists) to type in the ISBN, prices, and tag lines like “Also available as and ebook.” I want to bring to your attention the element that is on the flap. It is a placeholder frame <click>. There is only one element here, yet, as weʼll see, there will be more than one element that is to be placed in the white layout here. We could never predict how many elements there might be in a section, and weʼll discuss how we solved that particular problem a little later on.
  • 27. The Template (idml) 27 27 Once the template is set, it is saved in idml format. IDML is an XML format, consisting of XML files describing each aspect and piece in the layout, then placed in a ZIP wrapper, similar to what Microsoft is doing with its Office documents. What is important about the fact that Adobe has done this, is that the IDML format is both complete and native. That is key for us, because it allows us to edit or even create InDesign documents through XML. Here is the template opened in an XML editor. Notice all the folders listed down the left side of the window.<click>
  • 28. The Template (idml) 28 28 Here Iʼve opened the “Stories” folder, and chosen one of the “Stories.” Iʼve highlighted the tag that identifies this Story as the “placeholder” frame for copy that is to be placed on the flap <click>. A “placeholder” frame is the first in a column of one or more frames in that Section.
  • 29. Cover-Info 29 29 The other key building block is the XML output by the SQL database. We refer to it as “cover-info.” Cover-info XML is not valid, but it is well-formed and Mark Logic is perfectly fine with it. It is a collection of the elements that are to appear on the cover, both those that were written by Editorial and data generated from the product database. <click> We can see the first section, frontcover, and the first three and part of a fourth element, identified as TextFrames: the TITLE, SUBTITLE, AUTHOR, and parts of the HEADLINE, like “#1 New York Times bestselling author” The words “New York Times” are in a SPAN identified as “emphasis.”
  • 30. 30 30 If we scroll down through this file, we come to the Description, which we were following. Itʼs part of the Section called “flap.” There is a quote from Stephen King followed by the Description we were looking at before. So, as I pointed out before, here are two text frames that are to be positioned in the same section. Hereʼs how we solve that problem: Notice the InDesign-specific information. There is an argument in the section tag called y- offset. <click> This indicates that each TextFrame for that section is to start 54 points--or 3/4 inch--below the previous one. Also notice that the ParagraphStyle argument. It corresponds with InDesign paragraph styles, and the CharacterStyle corresponds with character styles set up in the template. Where Mark Logic comes into the equation, as you might have guessed, is putting these two XML streams--the IDML of the template and the XML of the cover-info--together to generate the White Layout.
  • 31. 31 31 Letʼs look behind the scenes at what happens when the Designer pushes the “Generate Layout” and then “Start” buttons. In this administratorʼs view we can see the templates weʼve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.<click> Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: <click> At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the template.
  • 32. 32 32 Iʼm not going to go through the Xqueries line by line. Iʼm not an Xquery expert by any stretch, but that would take us the better part of a month to do.
  • 33. 33 33 Suffice to say that Controller does what its name implies. It runs the show. The heavy lifting is done by another Xquery, lib-ss. Thatʼs where “run-layout” resides.
  • 34. 34 34 Lib-ss is 14 screens long, but the operative phrase appears in this snippet: “replace all the replaceable content.” A few slides back we looked at the XML contained in one of the IDML templates. It has placeholder-type tags, and values for them. We also looked at the cover-info XML delivered by the SQL database. It has tags in it called placeholder-type, pageitem type, and pageitem-number.
  • 35. 35 35 The cover-info is placed into a corresponding folder in one part of the MarkLogic system <click>. What lib-ss and its minions do is open the cover-info XML and for each XML element <click> it looks into the content folder <click> for the relevant JobTicket and finds the Story with the relevant placeholder-type <click>--the “flap” in our case. It then replaces the contents of that story with the contents from cover-info <click>. It stores the result in the staging folder <click>. If there is more than one pageitem for that placeholder-type--the Description in “flap” in our case--it generates another Story, identical in type and size, and drops it in, 3/4” lower on the layout. Once it is through processing the contents of cover-info, it then combines--in memory--the Stories that are in Staging with the ones in the template, replacing whatever stories there have the same ID.
  • 36. 36 36 It ZIPs this collection of files into an IDML, which the user downloads. Here is the IDML, opened in an XML editor, with the first Story of the flap displayed. If you look closely enough, <click> youʼll see that the local formatting is preserved from the Cover Editor.
  • 37. 37 37 Here is the Description that we were following. <click>It is tagged as item-content, and a description.
  • 38. 38 38 So the IDML that is delivered to the user--what I call the “white layout”--contains a wealth of structural information. We donʼt turn on the structure view for the user, of course.
  • 39. 39 39 The designer then proceeds to add graphics, photos, color, and to set the type to create the finished layout.
  • 40. 40 40 The result still contains the structure of the template even at this stage. Here the book description is highlighted. The system is actually designed so that if this layout is saved as an IDML, it can be submitted to the system and the corresponding fields we saw in the Cover Editor could be updated with any changes made to the text in the InDesign file. The idea was that this finished cover would contain the final word on what editors, publishers and marketing types wanted to say about the book. However, we uncovered two flaws in our logic.
  • 41. CoverTextAsArt 41 41 For one thing, many covers, like this one, have artwork instead of text. For Children's, this meant that our approach of having cover copy delivered from a database wasn't of much use. There wasn't much text on their covers anyway. But designers in the Adult Division wanted to be able to turn some of the text into art, too. And they were apt to break up or combine frames of text, resulting in the IDML Stories getting out of synch. More importantly, waiting for the final, approved text on the cover has become too much of a luxury. While at the time we began this project, total, letter-perfect correspondence between the cover and online copy was mandatory. Online marketing demands that this information be timely. That means it must be out before the cover is even close to being ready for the printer. Weʼre getting to the point that what goes out o the Net does not have to batch the cover character for character.
  • 42. Future Directions “Catalog Automation” 42 We are thinking of other applications for this technology, though. Here is the page from our HTML-based digital catalog for Sing You Home, but weʼre exploring the idea of using a similar approach to building print catalogs, as well.
  • 43. Future Directions Content Enrichment 43 One of the reasons I pushed so hard for ML, besides XML manipulation, was search. Our Digital Group, with the help of MarkLogic Professional Services and some third-party tools, has built a content enrichment and search tool which will be going live shortly. Here I’ve typed in the phrase “civil rights.” It has turned up a list of relevant titles and extracted appropriate sections of text. The tool also allows the user to drill down into the title. At a demonstration of the tool, someone typed in Quaddafi and turned up a book from years ago, that everyone had forgotten about, written by a personal friend of his. I think this will become a valuable tool, and it shows why saving content in XML is important, even though we don’t have an XML-based workflow.
  • 44. Conclusion • What it means to be a Trade publisher • The problem: haphazard copy, unclear format information • How we use MarkLogic Server to create InDesign layouts • Limitations of this approach. • Where it might work in the future. 44 So weʼve covered (read) <click> <click> <click> <click> <click>
  • 45. Acknowledgements I would just like to acknowledge the contributions of Frank Rubino and Jason Myatt of MarkLogic. Not only did they build the MarkLogic part of Cover Automation, but help me with this presentation. 45 Frank Rubino & Jason Myatt
  • 46. Thank You—Questions? Steve Kotrch Director of Publishing Technology steve.kotrch@simonandschuster.com Twitter: steveko Simon & Schuster 46 46

Editor's Notes

  1. \n
  2. \n
  3. Point 3 is of course the meat of this presentation.\n
  4. Fiction, non-fiction, audio books, Children&amp;#x2019;s books, ebooks\nRoughly 2000 titles/year\n\n
  5. What it means to be a trade book publisher. We pay authors to write books--venture capitalist. We don&amp;#x2019;t create our own content; that&amp;#x2019;s for the authors to do. We do prepare content for publication: copyedit, and design, create marketing material--and a book cover these days is marketing, like packaging for any product. We arrange for manufacturing. We distribute, market and sell books.\n\n
  6. In terms of technology, the infrastructure is mostly Microsoft, with a .NET development environment. We use SQL Server for our databases, including business-critical systems. And of course Mark Logic for content management.\nSo Why MarkLogic?\n
  7. XML important: Six or more years ago, tasked to identify a way to future-proof our content. We were embarking on a project that involved digitizing thousands of books for which we had no files, the Digital Warehouse project. Not to be confused with a Data Warehouse. Three-pronged approach: structured data (SQL Server); binary data (DAM); XML data (MarkLogic Server). We knew we would have to transform the content into various forms, so MarkLogic. Now book XML is used not just for future-proofing, but also for search and for extracting sample chapters.\n
  8. Why search is important. 19% of all online retail is done through Amazon.com.\nEven having a &amp;#x201C;placeholder image&amp;#x201D; for a book cover gets us a 10% bump in pre-publication sales.\n
  9. \n
  10. \n
  11. Here is the jacket for a pretty successful hardcover book. Let&amp;#x2019;s look at the problems involved in developing book covers like this one. \nLet me take a moment to talk about book covers in today&amp;#x2019;s marketplace. They are extremely important to the marketing of our products. They act as packaging, but even more than that they are the representation to the world of that product. A couple of slides ago you saw a bunch of images representing the products that S&amp;S publishes. You probably didn&amp;#x2019;t even give it a thought that these were pictures of book covers. As a result, a lot of effort goes into getting just the right look, and just the right words on a cover. As a result they go through many rounds of approvals and revisions before they&amp;#x2019;re considered ready to be printed.\nOne impetus to developing the system I&amp;#x2019;m about to describe to you is the demand on the part of Sales and Marketing to have exactly the words we use on the cover of a book represent it online. Another revolves around two issues . . . &lt;click&gt;\n
  12. One problem facing our designers is that for a publisher the size of Simon &amp; Schuster, the books come in a lot of different sizes, various formats, and from various imprints. Imprints are profit units each of which has a slightly different outlook on the world, and likely to be run by someone with a healthy ego and strong opinions, resulting in each having a slightly different layout. All these different layouts and sizes poses a problem for designers and for the workflow--getting clear, definitive and accurate information to the designers about what they&amp;#x2019;re supposed to create.\nAdded to this was the fact that information about what is supposed to go on the cover was coming from multiple and sometimes contradictory inputs&amp;#x2014;emails, word files that were dropped in one or another folder on a file server, or printed or even written notes. Also, these weren&amp;#x2019;t coming to the designer in any particular order or even following a schedule. Editors with clout left off entire sections of copy until the last minute.\nYou&amp;#x2019;re going to hear me use the term &amp;#x201C;copy&amp;#x201D; here, which comes from the advertising and publishing industries. It means &amp;#x201C;text.&amp;#x201D; \n
  13. &lt;click&gt;The thing is, we have all the pertinent format information about book covers in our production database. That&amp;#x2019;s one of the SQL databases I alluded to earlier. I figured we should use it. \n&lt;click&gt;To help avoid confusion, I also got buy-in from the art directors for them to develop a definitive, concise library of templates and, though it might not seem important, a vocabulary of terms by which to call the various sizes and formats (jacket, cover, trade paper, French flap, step-back and so on). &lt;click&gt;We also have a system for editors to write descriptive copy about books for internal use, the tipsheet system. So they&amp;#x2019;re used to writing copy in a web-based system. Writing cover copy in a related system isn&amp;#x2019;t that much of a stretch.\n
  14. What we established with the aid of Cover Automation is a clear workflow for book covers, from the creation of cover copy through to the design of the cover itself. This is a somewhat simplified view. 1. Acquisition Editor writes his or her Tipsheet. 2. Once Editorial and Marketing agree on a format for the book and some initial specifications (format, size, approximate length) this information is shared with Production. 3a. The Editor or sometimes a Copywriter writes the cover copy, usually based on what was written for the Tipsheet. 3b. Production finalizes the specifications. 4. Cover copy review and revision process, digitally. 5. Cover Automation system combines the finished copy, spec&amp;#x2019;s, metadata to create White Layout\n
  15. We have a vocabulary for describing the elements of a cover, and it&amp;#x2019;s got sections that we have names for, but you can&amp;#x2019;t say that covers are really structured. The elements appear in different places, in a different order; not all covers have all elements. Still, we can use the information we have to our advantage, especially because we have a tool like Mark Logic to work with.\n
  16. Here is the Cover Editor. It&amp;#x2019;s a Web page generated out of our product database, Title Management. The Cover Editor provides keys to solving the problem. &lt;click&gt; Notice that it contains important metadata, especially the definitive type (hardcover, so a dust jacket) and size of the cover. Again, this is taken from the database that we use to actually order manufacturing, so it has to be dead accurate.\nThe Cover Editor is also the single source for cover copy: It allows Editorial to write and edit the copy for the various sections of a cover. &lt;click&gt; It allows them to place items in the order they want them to appear in (although here the author&amp;#x2019;s name isn&amp;#x2019;t on the top of the front cover, where it should be). Of course it also allows them to add elements, such as praise quotes.\n
  17. Let&amp;#x2019;s scroll down the Web page and take a closer look at the book&amp;#x2019;s description, and follow that through. The Cover Editor is a pretty full-featured program. It It allows for basic formatting--bold and italics--and keeps track of all text changes and records when they happened and who worked on it. I noticed that Sarah Branham added most of the elements to this cover, and that the copyeditor Carole Schwindeller worked on it as well.\nThe right-hand column is for notes--italicize this, make that prominent, are you sure you want to use this adjective three times in the same paragraph? For this title there didn&amp;#x2019;t seem to be any need for that.\n\n
  18. The system also provides routing of the cover copy through various departments. When the user &lt;click&gt; &amp;#x201C;sends&amp;#x201D; the cover copy to someone, the system creates an email that identifies the sender. They can write a message to the recipient, and this email contains a link to this Web page. \nOnce all the approvals are received, Editorial sends the cover copy to the designer. &lt;click&gt; She clicks the Build Cover button. \nShe then chooses &lt;click&gt; the imprint, format and size, and tells the system to build a cover for her. We debated creating the cover automatically because the system has all that information, but the art directors felt that there enough departures from the standards involved that they were more comfortable having the designers go through this step and choose manually.\n
  19. The result that is downloaded to the designer&amp;#x2019;s desk is what I call the White Layout. &lt;click&gt; Here you can see the Description that we have been following. \nLet&amp;#x2019;s take a look at what is happening behind the scenes.\n
  20. Behind the layout--and what the designer doesn&amp;#x2019;t see--is an XML structure. Here&amp;#x2019;s the Description again &lt;click&gt;, with its corresponding element in the structure. So how do we get to this point? \n
  21. We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. \nAnother key element is the template, which we&amp;#x2019;ll examine in a moment. The template corresponding to the format and size of the book is exported to a version of XML which the MarkLogic server can read, and the server combines the two to create the White Layout. We&amp;#x2019;ve all heard about Babbage&amp;#x2019;s Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works.\nLet&amp;#x2019;s take a look at that process step-by-step.\n
  22. We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. \nAnother key element is the template, which we&amp;#x2019;ll examine in a moment. The template corresponding to the format and size of the book is exported to a version of XML which the MarkLogic server can read, and the server combines the two to create the White Layout. We&amp;#x2019;ve all heard about Babbage&amp;#x2019;s Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works.\nLet&amp;#x2019;s take a look at that process step-by-step.\n
  23. We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. \nAnother key element is the template, which we&amp;#x2019;ll examine in a moment. The template corresponding to the format and size of the book is exported to a version of XML which the MarkLogic server can read, and the server combines the two to create the White Layout. We&amp;#x2019;ve all heard about Babbage&amp;#x2019;s Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works.\nLet&amp;#x2019;s take a look at that process step-by-step.\n
  24. We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. \nAnother key element is the template, which we&amp;#x2019;ll examine in a moment. The template corresponding to the format and size of the book is exported to a version of XML which the MarkLogic server can read, and the server combines the two to create the White Layout. We&amp;#x2019;ve all heard about Babbage&amp;#x2019;s Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works.\nLet&amp;#x2019;s take a look at that process step-by-step.\n
  25. Here it is in Indesign format. Once the designer has the layout set, we identify the elements that are to be populated with cover copy, using InDesign&amp;#x2019;s capability to introduce structure into their layouts. Notice that this one contains 14 structural elements. So in addition to the cover copy that Editorial writes, it allows us to populate items like prices and ISBN, and what you might call &amp;#x201C;boilerplate,&amp;#x201D; like the line that tells buyers that the book is also available as an ebook or an audio. This is a great timesaver, and eliminates the possibility of of a lot of errors, as it used to be up to the designers (notoriously bad typists) to type in the ISBN, prices, and tag lines like &amp;#x201C;Also available as and ebook.&amp;#x201D;\nI want to bring to your attention that there is just one placeholder frame in the flap section &lt;click&gt;. Yet, as we&amp;#x2019;ll see, there will be more than one element that is to be placed in the flap. We could never predict how many elements there might be in a section, and we&amp;#x2019;ll discuss how we solved that particular problem a little later on.\n
  26. Once the template is set, it is saved in idml format. IDML is an XML format, consisting of XML files describing each aspect and piece in the layout, then placed in a ZIP wrapper, similar to what Microsoft is doing with its Office documents. What is important about the fact that Adobe has done this, is that the IDML format is both complete and native. That is key for us, because it allows us to edit or even create InDesign documents through XML.\nHere is the template opened in an XML editor. Notice all the folders listed down the left side of the window.\n
  27. Here I&amp;#x2019;ve opened the &amp;#x201C;Stories&amp;#x201D; folder, and chosen one of the &amp;#x201C;Stories.&amp;#x201D; I&amp;#x2019;ve highlighted the tag that identifies this Story as the &amp;#x201C;placeholder&amp;#x201D; frame for copy that is to be placed on the flap &lt;click&gt;. A &amp;#x201C;placeholder&amp;#x201D; frame is the first in a column of one or more frames in that Section. \n
  28. The other key building block is the XML output by the SQL database. We refer to it as &amp;#x201C;cover-info.&amp;#x201D; Cover-info XML is not valid, but it is well-formed and Mark Logic is perfectly fine with it. It is a collection of the elements that are to appear on the cover, both those that were written by Editorial and data generated from the product database.\nWe can see the first section, frontcover, and the first three and part of a fourth element, identified as TextFrames: the TITLE, SUBTITLE, AUTHOR, and part of the HEADLINE, &amp;#x201C;#1 New York Times bestselling author&amp;#x201D; The words &amp;#x201C;New York Times&amp;#x201D; are in a SPAN identified as &amp;#x201C;emphasis.&amp;#x201D;\n
  29. If we scroll down through this file, we come to the Description, which we were following. It&amp;#x2019;s part of the Section called &amp;#x201C;flap.&amp;#x201D; There is a quote from Stephen King followed by the Description we were looking at before. So, as I pointed out before, here are two text frames that are to be positioned in the same section.\nHere&amp;#x2019;s how we solve that problem: Notice the InDesign-specific information. There is an argument in the section tag called y-offset. This indicates that each TextFrame for that section is to start 54 points--or 3/4 inch--below the previous one. \nAlso notice that the ParagraphStyle argument. It corresponds with InDesign paragraph styles, and the CharacterStyle corresponds with character styles set up in the template.\nWhere Mark Logic comes into the equation, as you might have guessed, is putting these two XML streams--the IDML of the template and the XML of the cover-info--together to generate the White Layout.\n
  30. Let&amp;#x2019;s look behind the scenes at what happens when the Designer pushes the &amp;#x201C;Generate Layout&amp;#x201D; and then &amp;#x201C;Start&amp;#x201D; buttons. In this administrator&amp;#x2019;s view we can see the templates we&amp;#x2019;ve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.&lt;click&gt; Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: &lt;click&gt; At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the &amp;#x201C;Job Ticket&amp;#x201D; that corresponds to the template.\n
  31. Let&amp;#x2019;s look behind the scenes at what happens when the Designer pushes the &amp;#x201C;Generate Layout&amp;#x201D; and then &amp;#x201C;Start&amp;#x201D; buttons. In this administrator&amp;#x2019;s view we can see the templates we&amp;#x2019;ve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.&lt;click&gt; Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: &lt;click&gt; At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the &amp;#x201C;Job Ticket&amp;#x201D; that corresponds to the template.\n
  32. Let&amp;#x2019;s look behind the scenes at what happens when the Designer pushes the &amp;#x201C;Generate Layout&amp;#x201D; and then &amp;#x201C;Start&amp;#x201D; buttons. In this administrator&amp;#x2019;s view we can see the templates we&amp;#x2019;ve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.&lt;click&gt; Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: &lt;click&gt; At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the &amp;#x201C;Job Ticket&amp;#x201D; that corresponds to the template.\n
  33. Let&amp;#x2019;s look behind the scenes at what happens when the Designer pushes the &amp;#x201C;Generate Layout&amp;#x201D; and then &amp;#x201C;Start&amp;#x201D; buttons. In this administrator&amp;#x2019;s view we can see the templates we&amp;#x2019;ve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.&lt;click&gt; Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: &lt;click&gt; At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the &amp;#x201C;Job Ticket&amp;#x201D; that corresponds to the template.\n
  34. I&amp;#x2019;m not going to go through the Xqueries line by line. I&amp;#x2019;m not an Xquery expert by any stretch, but that would take us the better part of a month to do. \n
  35. Suffice to say that Controller does what its name implies. It runs the show. The heavy lifting is done by another Xquery, lib-ss. That&amp;#x2019;s where &amp;#x201C;run-layout&amp;#x201D; resides.\n
  36. Lib-ss is 14 screens long, but the operative phrase appears in this snippet: &amp;#x201C;replace all the replaceable content.&amp;#x201D; A few slides back we looked at the XML contained in one of the IDML templates. It has placeholder-type tags, and values for them. We also looked at the cover-info XML delivered by the SQL database. It has tags in it called placeholder-type, pageitem type, and pageitem-number.\n
  37. The cover-info is placed into a corresponding folder in one part of the MarkLogic system &lt;click&gt;. What lib-ss and its minions do is open the cover-info XML and for each XML element &lt;click&gt; it looks into the content folder &lt;click&gt; for the relevant JobTicket and finds the Story with the relevant placeholder-type &lt;click&gt;--the &amp;#x201C;flap&amp;#x201D; in our case. It then replaces the contents of that story with the contents from cover-info &lt;click&gt;. It stores the result in the staging folder &lt;click&gt;.\nIf there is more than one pageitem for that placeholder-type--the Description in &amp;#x201C;flap&amp;#x201D; in our case--it generates another Story, identical in type and size, and drops it in, 3/4&amp;#x201D; lower on the layout. \nOnce it is through processing the contents of cover-info, it then combines--in memory--the Stories that are in Staging with the ones in the Job Ticket, replacing whatever stories there have the same ID. \n
  38. It ZIPs this collection of files into an IDML, which the user downloads. Here is the IDML, opened in an XML editor, with the first Story of the flap displayed. If you look closely enough, &lt;click&gt; you&amp;#x2019;ll see that the local formatting is preserved from the Cover Editor.\n
  39. Here is the Description that we were following. &lt;click&gt;It is tagged as item-content, and a description.\n
  40. So the IDML that is delivered to the user--what I call the &amp;#x201C;white layout&amp;#x201D;--contains a wealth of structural information. We don&amp;#x2019;t turn on the structure view for the user, of course.\n
  41. The designer then proceeds to add graphics, photos, color, and to set the type to create the finished layout.\n
  42. The result still contains the structure of the template even at this stage. Here the book description is highlighted. The system is actually designed so that if this layout is saved as an IDML, it can be submitted to the system and the corresponding fields we saw in the Cover Editor could be updated with any changes made to the text in the InDesign file. The idea was that this finished cover would contain the final word on what editors, publishers and marketing types wanted to say about the book.\nHowever, we uncovered two flaws in our logic.\n
  43. For one thing, many covers, like this one, have artwork instead of text. For Children&apos;s, this meant that our approach of having cover copy delivered from a database wasn&apos;t of much use. There wasn&apos;t much text on their covers anyway. But designers in the Adult Division wanted to be able to turn some of the text into art, too. And they were apt to break up or combine frames of text, resulting in the IDML Stories getting out of synch.\nMore importantly, waiting for the final, approved text on the cover has become too much of a luxury. While at the time we began this project, total, letter-perfect correspondence between the cover and online copy was mandatory. Online marketing demands that this information be timely. That means it must be out before the cover is even close to being ready for the printer. We&amp;#x2019;re getting to the point that what goes out o the Net does not have to batch the cover character for character.\n
  44. We are thinking of other applications for this technology, though. Here is the page from our HTML-based digital catalog for Sing You Home, but we&amp;#x2019;re exploring the idea of using a similar approach to building print catalogs, as well.\n
  45. Our Digital Group, with the help of MarkLogic Professional Services and some third-party tools, has built a content enrichment and search tool which will be going live shortly. Here I&amp;#x2019;ve typed in the phrase &amp;#x201C;civil rights.&amp;#x201D; It has turned up a list of relevant titles and extracted appropriate sections of text. The tool also allows the user to drill down into the title. At a demonstration of the tool, someone typed in Quaddafi and turned up a book from years ago, that everyone had forgotten about, written by a personal friend of his. I think this will become a valuable tool, and it shows why saving content in XML is important, even though we don&amp;#x2019;t have an XML-based workflow.\n
  46. So we&amp;#x2019;ve covered (read) &lt;click&gt; &lt;click&gt; &lt;click&gt; &lt;click&gt; &lt;click&gt;\n
  47. So we&amp;#x2019;ve covered (read) &lt;click&gt; &lt;click&gt; &lt;click&gt; &lt;click&gt; &lt;click&gt;\n
  48. \n