Publish Your Own ebooks with FOSS Tools - Notes


Published on

The notes for a presentation given at FSOSS 2011 on publishing ebooks (in EPUB format) using free and Open Source tools.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Publish Your Own ebooks with FOSS Tools - Notes

  1. 1. Publish Your Own ebooks with Open Source Toolsor: Adventures in Modern PublishingBy: Scott Nesbitt Publishing has changed dramatically in the last 20 years. In fact, it’s undergone something of a minor revolution in the last 10. You dont need me to tell you that, but its a good opportunity for me to tell a story. Once upon a time, publishing was the domain of the folks who could afford printing presses. Not just to own them, but to operate and maintain them. Printing presses were, and are, huge machines that require skilled people to work with them. Compounding that, if you were a writer the only way you could get your book published was to put the fate of your book in the hands of an editor at a publishing house. Your chances then, as now, weren’t that good. While you could go the vanity publishing route that was expensive; out of reach for most. But as time went on, alternatives emerged. Devices like the typewriter, the mimeograph machine, and© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 1
  2. 2. the photocopier. The quality varied, depending on did more powerful tools like Ventura Publisher, Quarkwhat people were using but those devices put Xpress, PageMaker and, in the world of technicalprinting in the hands of ordinary people. They communication, FrameMaker. But those tools hadenabled everyday folk to put things in print and in one thing in common. Although the work was done onsome level of mass quantity. Still, paper and binding computers, the goal was to put the work into print.could be expensive. That was, and is still (to adegree), a big barrier to entry. Going mobileBut high-quality publishing did move closer into the It wasn’t until the mid-1990s, when truly mobilehands of the average person and most authors. I like devices -- ones with screens only a few inchesto think that that move started with Donald Knuth acrosss – started to hit the market en masse thatand his creation, the TeX typesetting system. For me, some people got the idea to create books that werethough, the turning point came in 1988. I was in meant to be read on those devices.journalism school and the newsroom of the studentnewspaper got a battery of Macintosh computers. Admittedly, most of those books were public domainThe ones that we now call Classic Macs. Using works, classics, and reference material. There wasMicrosoft Word, a laser printer, and the venerable little, if any new or mainstream content. But thetechniques of paste up, we were able to quickly seeds were sown and from those days until today aassemble an edition of the paper and send it to the variety of devices (whether for reading ebooks orprinter. not) have come and gone. And ebook formats have bloomed like a thousand flowers.I remember one instance in particular, where myclass covered an event late one Thursday afternoon. There are a number of ebook formats availableWe rushed back to the newsroom, wrote up our today. Most are just niche or marginal. The two thatstories, put together the paper, and sent it to the are arguably the most popular are PDF and EPUB.printer. All before 7:00 p.m. the same day.What we did was primitive, but it opened my eyes. As© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 2
  3. 3. PDF versus EPUB EPUB is the best format (at the moment, anyway) for publishing ebooks.PDF has been around since the early 1990s. At thetime, it was somewhat revolutionary. Here was a First off, EPUB is based on open standards. I’ll beformat that could literally take a snapshot of the look talking a bit more about this in a moment. While PDFand feel of any document no matter how complex the (or, at least, some variants of PDF) is an ISOlayout. That, in itself, was pretty impressive. For the standard, it’s not really open. To be honest, I’d rathertime and even for now. use an open standard than a closed one. Or a closedBut PDF, no matter what Adobe says, is really a format.format for printing. At best, it’s a format for viewingon larger screens -- desktop monitors, laptops, and Secondly, EPUB is widely supported. Most ebook(in a stretch) on larger tablets. readers can handle EPUB files, and reader software for computers and tablets and smartphones (most ofEPUB, on the other hand, is a young upstart. From it free or Open Source) can too. There are evenday one, EPUB had the advantage of being created inthe right place at the right time. EPUB was built for browser-based EPUB readers, like the extension forviewing on screen. Print wasn’t even an afterthought Firefox called EPUBReader.-- I don’t think it was even considered to be anecessary feature of EPUB. Third, EPUB content is designed to flow. What do I mean by that? Think of all of the devices that you’dWhile EPUB files might not be as visually pretty as read an ebook on: computer or laptop, a tablet, andPDFs, they’re more than up to the task for reading on ebook reader, or a smartphone. All of them havescreen. Any screen. Let me give you a few reasons different sized screens and different screenwhy. resolutions. EPUB pages aren’t exactly one-size-fits- all. They’re more one-size-adapts-to-all. You alwaysWhy EPUB? (well, there are exceptions) get text on a single page, within the margins of the screen.I’d like to take a moment to look at why think that© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 3
  4. 4. the structure and content of mathematics).With a PDF file, things can be very different. I’ve used Instead, you need to use image files.readers that leave widows and orphans. On top of ● There’s no provision for linking into or betweenthat, one strength of PDF is a major drawback when books.the format is used for ebooks. And that’s PDF’s abilityto maintain the layout, the look, and the feel of a Taking a peek into EPUBprinted document. It’s always nice to admire thework of a good layout or design person. But when This isn’t going to be an in-depth technical look at thereading that on a small screen, you often wind up innards of EPUB. I just want to give you a birds-eyescrolling and resizing. That disrupts the flow of view of the format just so you know what it consistsreading, and gets really frustrating. of and how it works.Finally, EPUB is very well suited for text-heavy books. Remember when I said that EPUB is based on openYou can include vector and raster images as well. standards? Well, those standards are XHTML, XML,And, unlike PDF, including graphics won’t overly bloat and CSS.the size of the file. The text of a book is in XHTML. Yes, one of the fileDrawbacks of EPUB formats used to create Web pages. So if you have existing content -- for example, articles that haveI’d be remiss if I didn’t mention a few of EPUB’s been published on the Web or blog posts -- you candrawbacks. The main ones are: use them as the basis of an EPUB book. More about this shortly. ● It’s not suited to books with more complex and precise layouts -- for example photo books or CSS, if you don’t know, is short for Cascading Style digital comics. Sheets. Cascading Style Sheets let you apply ● When it comes to scientific and technical formatting to a Web page. Think of a CSS file as publishing, EPUB doesn’t support equations set being like a template in a word processor. By using MathML (an XML variant for presenting changing attributes in a CSS file, you can change the© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 4
  5. 5. look and feel of an EPUB file. tools do the bulk of the work for you?XML comes in with an EPUB file’s table of contents I’ll be looking at five tools. Well, not all of them arefile (named toc.ncx) and a metadata file (named tools -- two of them are markup languages. For thecontent.opf). The table of contents file not only purposes of this talk, let’s just pretend they’re allprovides structure to an ebook, it also provides tools. They’re not the only games in town, but they’renavigation. Yes, a true table of contents. The the ones I’m most familiar with.metadata file, obviously, contains information aboutthe book -- like its title, author, language, the I’m going to put these tools into three broadsoftware used to publish it, and the like. This is categories:information that readers rarely, if ever, see but whichshould in an EPUB file to make it complete. ● Conversion ● Native authoringEPUB files have the extension .epub. What a surprise ● A hybrid solution… But .epub isn’t some esoteric and murky formatlike, say, .doc. It’s actually a ZIP file. You can open an The tools I’m going to discuss, for the most part,EPUB file using any file compression utility -- like aren’t meant for high-volume publishing. But for aArchive Manager in GNOME or WinZip in Windows. lone writer wanting to produce EPUB books or even a small firm wanting to put out content as EPUB they’reLet’s look at some tools more than up to the task.A while ago, I heard someone say that creating Let’s get to the tools.EPUBs in 2011 is like creating Web pages circa 1997.The implication there was that a lot of manual work is DocBookinvolved. I don’t agree. Sure, you can assemble your DocBook is a markup language, based on XML, that’sown EPUB books (including building your own table of widely used in documenting hardware and software.contents files by hand). Why do that? Why not let the But a few publishers, notably O’Reilly Media and XML© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 5
  6. 6. Press, use DocBook for publishing their books. DocBook source file. What do you do? You point theThere’s even a subset of DocBook aimed at processor at the right stylesheet and tell it the namepublishers. of the file you want to transform. With xsltproc, you’d use this command:If you want to create EPUBs from DocBook sourcefiles, it’s a lot easier than it used to be. That’sbecause the DocBook stylesheets now support EPUB xsltproc [path to stylesheets]/epub/docbook.xsloutput. In case you’re wondering, DocBook’s [your_file.xml]stylesheets are simply a set of files that aid inconverting XML documents to various formats like That was easy, wasn’t it? Well things get a bitHTML, PDF, and EPUB. messier from here. Remember the .epub container I mentioned earlier? While the DocBook tools justThe EPUB stylesheets are a relatively recent addition. create the files that go into that container, you’llWhen I first tried them, the EPUB stylesheets left a need to create that container yourself. That’s fairlylot to be desired. They’ve gotten a lot better though. easy. Just use a file archiving utility to create a .zip file, then change the extension to .epub. There are aIn addition to the stylesheets, you’ll need an XSLT few other things you need to do, which are explainedprocessor. An XSLT processor is software that does in detail in this article.the actual work of transforming a DocBook file intoanother format. Most XSLT processors are command To me, what I just mentioned is the biggest drawbackline tools, but they’re easy to use. If you use Linux, to using DocBook to create EPUB books. Onemany distributions come with one called xsltproc complaint (well, actually a whine) that I constantlyalready installed. You can also download and install acouple of other popular processors called Saxon and hear is that DocBook has too many tags. Over 400 ofXalan. them, as I recall. People complain that they can’t possibly learn them all. Guess what? You don’t haveLet’s assume you’ve got everything you need -- the to learn them all. You might use a dozen or two tagsstylesheets and an XSLT processor installed, and a at the most. Focus on those ones, and use reference© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 6
  7. 7. material for the rest. Second, it’s a set of scripts and stylesheets that will convert a marked-up file to various formats likeAsciiDoc XHTML, PDF, and EPUB.AsciiDoc is one of those quintessential Open Source One thing that I should mention is that AsciiDoc is aprojects. The programmer behind it, Stewart command line tool. But don’t worry: you don’t needRackham, wanted to use DocBook to document the to remember a long string of commands and he was writing. But he found that: Rackham wrote a script named a2x which does all the heavy lifting for you. All you need to do is tell the DocBook is a complex language, the markup is script what format you want to output and what file difficult to read and even more difficult to you want to convert. write directly — I found I was spending more time typing markup tags, consulting reference Here’s how to use the script: manuals and fixing syntax errors, than I was writing the documentation. a2x -fepub -dbook [ebook_source.txt]So he came up with AsciiDoc. Overall, AsciiDoc outputs a nice looking EPUB. OfAsciiDoc is a couple of things. First, it’s a lightweight course, to do that you should follow the format formarkup language. Unlike HTML and XML, which use preparing the source file. If you do that, you’ll runtags surrounded by angle brackets to format a into fewer headaches.document, AsciiDoc uses keyboard symbols to apply For example, if you want to create aheading you put a set of dashes below the text of the OK, you’re probably thinking: using a word processorheading. A numbered list consists of items with a as an ebook publishing tool? There’s no reason why you can’t. People have written and published ebooksnumber and a period before them. I think that you using and LibreOffice. OK, thoseget the idea. ebooks were PDFs ... What about EPUB?© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 7
  8. 8. a document look nice, are there to enforceThanks to an extension for Writer consistency and structure. If you don’t use styles,called Writer2EPUB, you can do just that. In case there can be problems. The biggest one is that theyou’re wondering: the extension works with table of contents for your EPUB file won’t generateLibreOffice Writer, too. properly. Which means you won’t have proper navigation or structure.After you’ve installed the extension, using it is quiteeasy. Just open your book file in or SigilLibreOffice Writer. Then, just click the Writer2EPUB In some ways, I consider Sigil to be the main event. Itbutton on your toolbar. You can enter metadata(remember, that’s information about the book) and pretty much does it all when it comes to creating andeven attach a separate cover file if you have. Then, publishing EPUB OK. Sigil is a simple application, but it works. Consider itI’ll be honest: I’ve only experimented with files about a WYSIWYG word processor for creating EPUB files.50 or 60 pages long at the most. That said, the And guess what? It’s native format is .epub.conversion was fairly fast and quite smooth. Thebook looked good to boot. All you need to do is download and install Sigil, and then just fire it up. From there, you can start typing.When you’re preparing content for conversion to Just remember to start a new document for eachEPUB with Writer (and even if you aren’t), always chapter and for the cover of your ebook.keep this in mind: use styles. Don’t apply formattingmanually -- for example, don’t create a heading bymaking text 22 point DejaVu Sans and applying Earlier in this talk, I mentioned that if you havebolding. Apply the Heading 1 style instead. articles published on the Web or blog posts you can use them as the basis of an ebook. Sigil can help youThe reason you need to do this is simple. EPUB files do just that. You can import HTML and XHTML filesare very structured. Styles, while they can help make into Sigil, and they’ll become chapters in your book.© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 8
  9. 9. You can arrange the chapters, edit them, add images… well, everything that you’d do in a word processor Being a wiki, Booki’s backend format is (as you mightto tidy up or change their look. have guessed) XHTML. Which, as we know, is one of the components of an EPUB file.Sigil will also automatically generate a table of There are two ways you can create an EPUB withcontents using the headings in the chapters of your Booki. One, just go to any manual on the FLOSSebook. You can also add some basic metadata to the Manuals site. Then, click the EPUB button in thefile -- title, author, and language. navigation panel on the left side of any page. After a few seconds, you get a nicely-formatted EPUB file.Sigil is a quick and easy way to create an EPUB book.In fact, I used it to create the EPUB version of my first The other way is to go to I was very pleased with the results. Objavi is the publishing backend of Booki, and using it enables you to choose from a number of outputBooki types including EPUB. You can also modify the defaultI have a soft spot in my heart for Booki. It’s the tool Cascading Style Sheet or point to another one of thethat the FLOSS Manuals project uses to write and Web. That, as you know, will let you change the lookpublish it’s guides. Booki isn’t a desktop application. and feel of the book. Why do that? While the defaultIt’s a wiki. In fact, I’ve heard book descibed as a wiki, stylesheet is fine, you might want to change the fontbut instead of Web pages it produced books. It does being used or the spacing between paragraphs or theproduce Web pages, too but that’s beside the point size of headings.… In either case, Booki assembles the chapter files,Booki is fairly easy to use. There’s no wiki markup to creates a table of contents, and surrounds it all withdeal with. The editing interface is like a Web-based the EPUB wrapper.word processor. You can change formatting with aclick or two. I’ve worked with a number of people A quick note about PDFwho, when being thrown into Booki, adapted to it DocBook, AsciiDoc, and Booki all have one crucialwithin 30 minutes.© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 9
  10. 10. piece of flexibility: if you need a PDF, you can easily an EPUB in Amazon’s Kindle Store. The PDF versioncreate it. That probably sounds strange, especially outsells the EPUB version by about 1.5:1.after what I said about PDF earlier in thispresentation. Validation and testingEven though ebooks are all the rage, you might want So you’ve got a nicely-formatted EPUB file. Now, allto print your work. EPUB isnt suited for that. But PDF you have to do is let it loose into the wild. Not so Last year, I ran a FLOSS Manuals book sprint at You can do that, but it’s not the best move. BeforeToronto Open Source Week. We used Booki to create offering your EPUB for download or for sale, youa manual for the Thunderbird email client. To do should validate and test it first.something special for the participants, I generated aPDF and printed copies of the manual using Let’s take a look at both processes.something called the Espresso Book Machine. ValidationBut let’s face it: like it or not, PDF is a de-factoelectronic publishing standard. Some commercial Validation is the process of making sure that yourelectronic publishing channels will only distribute EPUB books contain all the elements that ebookPDFs. And there are a number of people who only readers expect. Like what? Here’s a partial list:know PDF. ● Complete metadataAnd not everyone owns an ebook reader or a tablet. ● The proper directory structure in the EPUB fileThey’ll read they read on their desktop or laptop ● Valid XHTMLcomputers. ● Working links and references to files in the EPUB fileFor now, PDF is still a bit more popular than EPUB. ● A table of contentsHere’s a very unscientific example. Recently, Ipublished my first ebook. It’s sold as a PDF through And a lot more. If you don’t validate your EPUB book, (an electronic fulfillment service) and as chances are it will render properly in your ebook reader. But why take that chance? But don’t worry:© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 10
  11. 11. validation isn’t difficult to do. There are some good When I was validating an ebook, I got an errorsoftware and services that let you do just that. message telling me that there was invalid HTML syntax in a particular file. I went to the line numberOne of the features of Sigil that I didn’t mention that the validator pointed to in the file, and I didn’tearlier is its built-in validator. All you need to do is see anything wrong. And I have a strong knowledgeopen your EPUB file in Sigil, click a button, and after a of HTML. Well, it turned out that the validator wasfew seconds it points out any problems. expecting paragraph tags (<p> and </p>) around text surrounded by <blockquote> tags. I only figuredAnother validator you might want to consider is the that out by running the offending HTML file throughonline validator maintained by digital publishing firm an HTML validator.ThreePress. Just upload your ebook and the servicedoes the rest. TestingIf you don’t want to do that, then download and Like validation, testing is optional. But it’s worthwhileinstall epubcheck. epubcheck is what powers the doing it, if only as a final quality check. Crossing “i”s,ThreePress validator. It’s a command line Java dotting “t”s, making sure that line and paragraphapplication that’s quite easy to use. Just run the breaks are accurate. That sort of thing.command: In a perfect world, someone publishing an ebookjava -jar epubcheck-0.9.2.jar ebook_file.epub would have access to one of every device on which people read electronic books -- ebook readers,That seems simple enough, doesn’t it? There is one tablets, and smartphones. Sadly, it’s not a perfectcatch, though. Validators are great at finding world.problems. But in many cases, they’re lacking when itcomes to explaining what those problems are, So, what do you do? Use the devices that you have.specifically. The validators assume that you have a They should give you a good idea of how your ebooklevel of knowledge and the knowledge to fix the will look when people read it. Also, consider usingproblem. That’s not always the case. Calibre. Calibre is an Open Source ebook management application for desktop and laptop© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 11
  12. 12. computer. While it’s not (as some people believe) a happen if you have an interesting idea and do a goodtool for reading ebooks, Calibre does have a solid job of presenting that idea in writing. EPUB is just aebook reading feature. One sneaky trick you can use delivery system. It’s the content that to resize Calibre’s ereader window to simulate howyour ebook will look on screens of various sizes.Chances are, you won’t find many (if any) problems. Want to connect?Final thoughts Web site: http://scottnesbitt.netAs with a number of other areas, Open Source tools Blog:are more than up to the task of publishing ebooks. Itdoesn’t hurt when one of the most popular formats http://weblog.scottnesbitt.netfor distributing ebooks is an open standard, either. Twitter:Whether you’re creating a short report or manual, a non fiction book, or a novel there’s an Open tool that will help you do the job. While I don’t that creating an EPUB in 2011 is like creatinga Web page in 1997, I do have to admit that there’sstill a way to go. That said, those of us in the OpenSource world have some solid tools at our disposal.And they’re only getting better.Remember, though, that all the tools in the worldwon’t make your book worth reading. That will only© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 12