:: Digital Libraries Columns


Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date. :: Digital Libraries Columns

XML: The Digital Library Hammer


   Abraham Maslow once said, 'When the only tool you own is a hammer,
   every problem begins to resemble a nail.' Once you understand XML and
   the opportunities it offers for creating and managing digital library
   services and collections, you will begin seeing nails everywhere. It is
   not the only tool you have, but it is by far the most useful.

   XML (Extensible Markup Language) is born of a marriage of SGML
   (Standard Generalized Markup Language) and the web. HTML can't do much
   more than describe the look of a web page, whereas SGML is too
   complicated and unwieldy for most applications. XML achieves much of
   the power of SGML without the complexity and adds web capabilities
   beyond HTML.

   XML provides a method by which you can mark (or 'tag') the structure of
   an object -- a document, a database entry, or just about anything made
   up of definable components. XML tags define the beginning of a
   structure, such as a section title, and the end. Whether you know it or
   not, you use a similar kind of technology frequently. Whatever word
   processing software you use tags the text you write with style
   information such as the font used.

   How it works
   XML differs from word processing software in several essential ways.
   First, it is open and transparent -- the specifications can be freely
   read and adapted by anyone, and the markup (the tags themselves) can be
   seen as well as the text. Also, it is capable of describing the
   structure of a virtually infinite variety of objects, not just a text
   document. Finally, it is being built for the web -- with a robust set
   of linking, transformation, and rendering capabilities.

   To understand the XML applications described below, it will help to
   know the various required pieces and how they interact. First, there is
   the information that has been tagged either by hand or by using an
   editor similar to an HTML editor. Unlike an HTML document, an XML
   document must be tagged according to three basic rules: 1) all tags
   must be in lowercase, 2) all beginning tags must have an ending tag,
   and 3) no tag can span another tag (that is, all tags must properly

   Second, you must have an XML style sheet called Extensible Stylesheet
   Language Transformations, or XSLT. While a style sheet usually
   specifies how the information that references it (typically an HTML
   file) should be displayed within a browser, XSLT offers more powerful,
   transformative features. For example, you can choose to display text or
   not. In some ways, XSLT resembles a simple programming language that
   has been optimized for transforming XML files into other forms.

   Third, there should be an HTML style sheet (Cascading Style Sheet or
   CSS) that tells the web browser how to display the page. All three
   elements reside on the server, usually (but not necessarily) close to
   each another.

   XML and software
   If you use software such as the Cocoon publishing framework, when a
   user requests an XML document from your web server, the request is
   passed to special software. The software then applies the XML style
   sheet transformations to produce the HTML version that is sent to the
   client along with the HTML style sheet.

   If you don't use special software on the server for these operations,
   the client software (typically a web browser) must attempt to process
   the XML file. The latest versions of Microsoft Internet Explorer will
   attempt to process the file, but you're unlikely to be pleased with the
   result. Don't even try with Netscape.

   Few people know this, but any library with an integrated library system
   from Innovative Interfaces (with Update D) can view XML versions of
   catalog records. Kyle Bannerjee of Oregon State University has used
   this capability to provide information essential to relocating 50,000
   items to a storage facility. Bannerjee also uses it to solve problems
   that many other libraries face, as with his program ILL ASAP
   (Interlibrary Loan Automatic Search and Print). Bannerjee says that
   'XML and XSLT are the most significant developments in information
   management since relational databases and SQL.'

   XML and structured information
   Bibliographies are commonplace in libraries, whether as lists of books
   by a particular author or pathfinders by subject. What are
   bibliographic citations but a structured set of textual elements? XML
   is made for this.

   Imagine a list of historical novels. Marked up in XML (which is almost
   as easy as an HTML markup), this same list could be used to create
   several different web pages: one that lists items by author, another by
   title, and another by time period. All you'd need is an XSLT style
   sheet that produces a different view of the document, depending on the
   user request. You need only maintain one XML document. This is not yet
   a trivial task, but employing XML gets steadily easier, as tools

   XML and digital publishing
   XML seems made to order for publishing. A book, for instance, is a
   highly structured object, with basic bibliographic information, front
   matter, chapters, headings, paragraphs, and back matter. A book marked
   up in XML can then be displayed in various ways -- chapter by chapter,
   as a table of contents (by extracting section headings from the file),
   and more. XML-encoded books will soon be viewed both on the web and
   personal devices such as e-book readers and personal digital
   assistants. The Open eBook Forum has promulgated a standard method of
   encoding e-books in XML specifically to provide an easy method for
   interchanging books across reading devices.

   To see this in action on the web, see Tobacco War: Inside the
   California Battles, which only exists in a single XML file but is
   delivered in chunks of HTML to the user upon request. This one example
   will soon be joined by several dozen books on topics in international
   and area studies, all provided to the client in HTML from XML source

   An essential precondition for interoperability is the capacity to share
   information effectively with other systems. XML supports that
   capability by providing information in a structured way. For example,
   the Open Archives initiative is using XML as the carrier syntax for
   bibliographic information about e-prints ('[135]Open Archives: A Key
   Convergence,' LJ 2/15/00, p. 122ff.).

   XML toolbox
   To learn more, the site offers a good start, while Robin
   Cover's XML Cover Pages is nearly exhaustive. Also see Norman
   Desmarais's The ABCs of XML: The Librarian's Guide to the eXtensible
   Markup Language. To discuss XML and its use in libraries, join the
   newly created XML4Lib discussion.

   If you are running an Apache web server and want to jump in with both
   feet, download and install Cocoon. This will provide a platform to
   transform your XML documents to HTML on the fly. For information on
   using XSLT, an essential part of the process, see Chapter 14 of the XML
   Bible as well as Michael Kay's excellent book XSLT: Programmer's

   MARC then, XML now
   MARC was the data encoding standard upon which librarians built modern
   librarianship. It enabled us to create automated library systems,
   shared cataloging, resource networks, and many things that we today
   take for granted. Now libraries are becoming publishers. We must
   provide access to a wider array of information resources, and resource
   sharing is more essential than before. XML likely will provide the
   carrier syntax for all of this, thereby becoming the MARC-equivalent of
   the 21st century.
                                 LINK LIST

                                                           The ABCs of XML
                                                        Apache XML Project
                                               Cocoon Publishing Framework
                                                  Open Archives Initiative
                                            ILL Automatic Search and Print
                                                          Open eBook Forum
                                                               Tobacco War
                                                     XML Bible, Chapter 14
                                                           XML Cover Pages
                                              XSLT: Programmer's Reference