- English
- Italiano
Introducing xml Single Source Publishing
Copyright © 2011 Ecomind
2011
Table of Contents
This document describes the xml technology for publishing and has been prepared using the xml technology.
So, it is a collection of specifications, instructions and explanations, but it is also an example itself of a publishing process based on docbook single source publishing.
The best way to use it is not just reading the content, but also exploring the related documents:
The base document everything comes from.
The epub document can be read, but also uncompressed and explored inside.
This is the file that should be imported in indesign.
Yes, you can transform Docbook documents into html pages or web sites.
Another important transformation of Docbook documents.
Table of Contents
Writing is a hard matter.
Whatever the subject, the task, the context and the target, only a small number of human accomplishments are as difficult as the act of writing is.
The advent of word processing has made writing easier from a certain point of view, but we have partially lost the idea of structuring our thoughts and concepts because we are distracted by the way they look.
The tendency of "formatting" documents has taken the place of ordering and arranging the ideas in a well constructed design.
So, first of all, we need to make it clear that formatting is just a visual aid in order to structure documents.
When we write a chapter title, for instance, we tend to use large characters. Larger than sub-titles, obviously.
What we are not used to thinking about is that the dimension of the character is just a visual trick in order to immediately grasp the fact that that title has a higher place in a structural hierarchy where chapter titles describe more general concepts, or fields, or subjects than their sub-titles.
So, when it comes to the art of structuring itself, it doesn't really matter which font we are using and how big the chapter title is. What actually matters is that it is absolutely unambiguous that the text we are writing is a chapter title. And the best way to do it is by writing a label close to that text which says something like: this-is-a-chapter-title[1].
Well, the xml language does exactly that.
When we write an xml file we write the content and some labels which describe what each block of text means in terms of structure.
In other words, structuring has nothing to do with the presentation (fonts, borders, margins, colors, etc.).
Structuring means arranging ideas and concepts in a meaningful way using a specific hierarchy, and setting the role of each piece of information, related to the rest of the document.
On the other hand, the art of presentation has to do with how we decide to render that structural design, what kind of look and feel it should have. So the art of presentation has to do with margins, borders, fonts, alignments, colors, etc. What a professional has to keep in mind is that whatever the presentation, it doesn't change the structural design.
So when we talk about a document we need to make it clear whether we are talking about its structural design or its presentation design, otherwise known as layout.
Here are some examples of structural parts of a document:
a chapter title represents the topic of an entire section of the document and it can be divided into other small sub-topics;
a quotation is a piece of content written or said by someone else and reported with the name of their authors;
a table is a grid of data each one related to some parameters and to the others;
a box is a specific content that is not part of the text stream, but contains something useful to understand or analyse the topic;
a caption is a description of a figure.
The comprehensive design of a document is made of a texture of all these fragments in a specific order.
Well, once we have stated that using xml means to structure a document, we need to become somewhat familiar with the xml language itself.
This is a piece of xml:
<fruit> <item>banana</item> <item>orange</item> <item>apple</item> </fruit>
You can see that the information is self-descriptive. And this is a general feature of xml.
But what we need to highlight here is the syntax we have used.
First of all, tags must be opened and closed. The tag fruit is opened, then the item elements are added, then it is closed. We can say that it contains the items elements.
Each item tag has to be closed as well. And each item contains some text.
Technically speaking, the tag fruit is parent of the tags item and the tags item are children of the tag fruit
Another important term in the xml language is element.
An element is a tag plus what it contains. So, the element fruit is made by both its opening tag, its children and the closed tag.
There are three item elements. Each is made by the opening tag, the text it contains and the closing tag.
By the way, the text inside a tag is a child of the tag.
So you have learned the most important concepts of the xml language: tags and elements, parents and children.
But what about the publishing contents? What kind of xml tags we have to use?
Fortunately, we do not have to re-invent the wheel because almost every possible structural element has already been precisely pre-defined by two very effective standards: the DITA and the DocBook.
This document is about how to use the DocBook standard in order to structure our documents and it is about how to transform these docbook files into specific formats and layouts.
The enormous advantage of structuring a document with DocBook is the fact that a DocBook document can be easily and almost automatically converted into whatever rendered format: pdf, ebook, html pages, or even into an indesign layout.
We have already said this: the magic is made possible by xml technology.
A DocBook document is, in fact, an xml document where each piece of content has a label attached which describe what this content is about and which is its place in the overal hierarchy.
But, before hearing your complaints I must say that in order to write a good DocBook document you don't need to become an xml geek.
It is another fortunate circumstance that a lot of DocBook editors assist us in preparing well formed DocBook documents, almost as easily as using a word processor.
The only significant, extremely significant, difference between a word processor and an xml editor is that the word processor isn't concerned with the structure of the text. You can do whatever you want and nobody complains. With Docbook, instead, you MUST follow the xml rules (well-formedness) and you MUST follow the Docbook rules (validation).
These are the final couple of xml concepts you need to keep in mind: when you write an xml document you need to make it well formed, which means that it has to comply with the xml syntax. In any case, the editor will do this job for you. You don't need to write a single xml tag. Secondly, you need to make the document valid as a Docbook document. This means that you must use only the docbook tags. When you make a mistake, the editor will complain until yo fix it. Fortunately, the editor will suggest how to do that.
Now we need to become familiar with the most important Docbook tags because they are the core of the Docbook philosophy. Each tag has a specific structural meaning and there are a lot of them. Fortunately you don't need to use all the tags, you will probably use just 20% of the Docbook tags in your entire copy editor career. I'll describe the most important tags in the next chapter.
Althought docbook editors can assist us in working with files, it is important to grasp the idea of the main Docbook tags.
In this chapter we'll describe the most frequently used tags in the DocBook xml file that will be converted into Indesign styles.
The tag <book>
is the root tag, i.e. it includes everything.
The tag doesn't need to be explicitly inserted because the editor does it for you.
The tag <part>
is a division of the book. You can insert this tag using your Docbook editor of choice. The part tag is optional.
In order to specify the number of the part, you need to insert the root="number-of-part" attribute
Sections are sub-divisions of a chapter. There are two ways of inserting sections. The first one is using the pre defined section tags:
<sect1>, <sect2>, <sect3>, <sect4>
These represent four levels of sections. Remember that a lower sect must be included into a higher one. For instance, you cannot type a sect2 without a sect1 which comprises it.
The other way is using the generic tag:
<section>
This kind of section has no predefined level. They can be nested indefinitely.
Blocks are discreet pieces of content like paragraphs, sidebars, lists, images, tables, examples, etc.
Here is a list of some blocks:
paragraph <para>
It is the most elementary block made of text content preceded and followed by a break-line. Most Docbook editors don't require that you explicitely insert the tag. It is inserted automatically when you press enter.
sidebar <sidebar>
A block of content isolated from the mainstream of text flow. Don't become confused about the term. It doesn't mean that it must be displayed at the side. Docbook elements have nothing to do with the layout or the way they are presented. The term sidebar just means that the content included in this block is not part of the mainstream of the text flow.
Here is an example of the sidebar:
lists
There are three main kinds of lists: <itemizedlist>, <orderedlist> <variablelist>
Variable lists are about short items with definitions. Ordered lists are about sequential labels for each item. The labels can be numbers or letters. They can have definitions or not. Itemized lists are about bulleted lists with definitions or not. What is really interesting about lists is their flexibility. Itemized lists, variable lists, and ordered lists may include items and their definitions. The only difference is that variable lists are mainly used for short terms and short definitions, whereas itemized lists and ordered lists may contain almost any kind of blocks as definitions. The item you are reading here is an example of item of an itemized list.
table
<table>A table with columns and rows.
Tables are inserted automatically by almost all wysiwyg xml editors.
codes and symbols <programlisting>