Home About Us News Contact Us Employment
Production ePublishing Samples Support Recent Projects
XML Basics

XML—Is It Really So Xtraordinary?

More and more publishers are making plans to code their documents in XML. Why is XML suddenly exceedingly popular? And how is XML different from SGML and HTML?

Markup languages serve the purpose of identifying information, stylistically or in terms of content. At the most basic level, a markup language can help publishers create standardized ways to identify formatting in their documents. HTML (Hypertext Markup Language), for example, was created to display information on the Web, and is mostly limited to this graphical function.

XML (Extensible Markup Language) and SGML (Standard Generalized Markup Language) are far more versatile when it comes to categorizing information. SGML is the mother of all markup languages, but is excessively verbose and complex. XML (a subset of SGML) was designed to provide 90% of the functionality of SGML, with 10% of the complexity. XML also provides significant advantages over HTML.

For instance, where HTML uses tags only to describe the document’s appearance—like <i> text </i>—if you need a word italicized, XML tags describe the data itself.

XML looks like this: [<dog> Gomer </dog>]

As you can see, XML tells us about the information in the document. We now know that the word “Gomer” refers to a dog. You can even go so far as to give the word multiple definitions: <swimmer>, <dog breed = “labrador” color = “black”>. And if you want a word to appear visually distinctive (italicized, for instance) you can use a cascading stylesheet specification (CSS). The CSS defines how the tagged text will be displayed.

The entire XML document is defined by a document type definition (DTD), which describes the structure of the document—the elements, attributes, entities, and notations. With XML you can create a DTD that is unique to your own document.

So, how does that relate to book and journal publishing? Well, instead of searching through entire pages of information, the search engine can go straight to the relevant tag. You can sort out all the dog types in the entire document. Say you are looking for a dog that likes to fetch. You would do a search for <retriever> and you would be led directly to a list of all dogs that have a particular affinity for chasing sticks. Here are some perks of XML:

  • You can code a document in XML and it can be read on different, or even incompatible, systems. In fact, XML can be read on almost any platform.
  • In XML, markup can be customized to fit the content.
  • It is not limited to Internet use only. XML can be used to organize information and is ideal for exchanging data between different systems.
  • XML’s precise tags make it easier to index documents.
  • It allows you to create cross-references.
  • The language has powerful hypertext linking capabilities. You can link to a URL, a span of text, portions within your document, or a separate document.
  • You can display math in XML if you use a math-supportive DTD. For instance, MathML is composed of XML tags, which can be used to mark up an equation in terms of its appearance and its meaning.
  • Documents can be XML-coded as the book is typeset. For instance, we can use avenue.quark to extract each tagged element into an XML element type. And within LaTeX, we can convert LaTeX commands into XML code. When converting from an application to XML, LaTeX is even more efficient than Quark because LaTeX is a markup language and therefore conversion is only a matter of moving from one syntax to another.
  • And of course, we can always use your company’s proprietary markup system or process to convert to XML.

If you have questions or comments, please send a message to “Ask Gomer” at gomer@iccorp.com.


Gomer is XML’s biggest fan.