XML – DITA, DocBook, S1000D or Shipdex – Are you confused?
More and more, technical writers are realizing the value of XML as a format to use for their document creation. One of the first questions facing this new user of XML is what standard of information structure to use? In our view the standard must fit the company business requirements and our experience is that a very complex data model rarely has a payback. This is a short article to try and sort out the differences and also to introduce our own “standard” Simonsoft Techdoc.
We once received a question what XML standards are there for our industry? A simple Google search later and we had found some 60 different XML standards all claiming to be a just that – a standard. This can make anyone draw their breath so let’s start with the principles.
An XML document separates between Content, Style and Structure.
For the author to follow the same principles of a document the structure is controlled by a “DTD” (Document Type Definition) or “Schema”. This is a structural XML file that controls what content is allowed when, how content can be reused, what file elements are allowed when etc. In short, 2 authors that use the same DTD can always merge their documents into one without any issues of style like page breaks.
So far so good but once you look into different industry specific documentation the content will be rather different when documenting a Linux server or a Submarine. So the different industries began to form standard DTDs that fit their needs. The three most common are DITA, DocBook and S1000D (Shipdex).
Book based DTD´s - DocBook
The most common publication of all is the classic book. It has a front page an index, chapters and a back matter. Some more technical publications might also include subchapters or sections.
This is a format that everyone is comfortable with. The abstract structure of front page, title, index and chapter can be the placeholder for content regardless of which. The book as a DTD works!
There are a few standardized DTDs around the book and the most common is by far Docbook. It is a structural file that has some 20years of development and will be included in any standard XML editor you purchase. Most existing DTDs are book based with the advantages of:
- Easy to migrate MSWord, InDesign or other documents into
- Comprehensive and easy to learn
- Easy to build Style sheets for
Topic-based DTD´s - DITA
As a rather sharp difference with a book based DTD there are many in the XML industry talking about topics. The by far most known standard is DITA (Darwin Information Type Architecture). The DITA standard was invented by IBM and then mainly for software documentation purpose.
To explain the topic based structure one could make an analogy of a company’s web page. Each page can then represent a topic and the complete page set the document. Now for web this sounds like a brilliant idea, but what if I need to actually produce a book. Well DITA solves this with a structural placeholder called the “book map”. In this map all topics can be linked in with the order that is wanted for the actual publication.
As you can imagine DITA makes it rather hard to migrate old content from its book form but fits extremely well for software documentation or any content that is aimed for web publish. It´s also so that writing rules must be much harder enforced since each topic must stand for itself. Remember the old game where you write a saga on a paper but each person in the class writes one paragraph and the folds the paper so the next writer only can see the last sentence. If no rules are given beforehand the saga can be rather funny in the end.
Advantages with DITA:
- Extremely modularized that can allow for higher re-use
- Good for software documentation and web-publish
- Possible to translate topics individually since no context dependency Disadvantages:
- Hard to learn
- Costly migration
- More complex to create Style sheet
Module based DTD´s – S1000D, ATA and Shipdex
There is one more major alternative to the book based and topic based DTD and that is to chop up the XML content in modules. In our meaning not that far from the topic based but it is really about linking it to the product structure.
The three standards are very similar but are initiative from specific industries. S1000D – military, ATA – Commercial Airlines, Shipdex –shipping.
It is actually quite easy to understand the purpose of these standards. If an airplane manufacturer wants to assemble all the documentation about the airplane, probably more than hundred suppliers are involved. If each supplier uses their own format and way of writing it would be almost impossible to merge the information into one documentation set. It is even so that suppliers within these standards must deliver the information as XML where no style is applied. All to better serve the creation of a complete documentation set.
These standards are extremely demanding for a manufacturer to follow and so it is very rarely used outside of the mentioned industries. For suppliers only partly supplying to these industries we would even recommends using another standard and creates an export function to the needed standard.
A standard is always a standard and as such it tries to serve all. This leads to that they are all over engineered and very hard to use out of the box. For instance there are 5 different ways to tag an image in DITA. That means that reuse will take a big hit. Most projects then require a pre-phase where the standard is adopted for the company needs. This is expensive and time consuming.
Simonsoft has developed “Techdoc”. Our own version of a book based DTD but with some increase in modularization, better process descriptions and a predefined style. We have also enabled techniques to export the information for any modularized standard like Shipdex or S1000D.
All of this, in order to enable a fast XML deployment, for the medium sized company.