[XML4LIB] Re: XML -> PDF considerations
eriksta at vpl.vancouver.bc.ca
Tue Jul 17 15:20:48 EDT 2001
On Tue, 17 Jul 2001, Roy Tennant wrote:
> The short answer to the question in your last sentence is "Yes".
> There are no doubt any number of ways to do this, but the technology
> I have a little (by no means a lot) familiarity with is Cocoon (see
> The thing is, it's the conversion from the "native" format (MSWord,
> Pagemaker, etc.) which will kill you. Allow me to repeat myself.
> Converting documents from most word processing or desktop publishing
> programs to decent XML is NOT quick and easy.
The key to these conversions is to treat the process as another exercise
in XML/XSLT. Typically if you are dealing with a library of documents you
have a finite reportoire of formatting conventions in each "class" of
document. You need to develop a processing stream for each class which
does justice to the unique formatting values of such a class. With this
precept in mind, the process goes something like this:
1. insert into the documents of a class conversion markup tags which
capture the display values you wish to preserve
2. export the content to plain text
3. massage the resulting file (insert DTD declaration/stylesheet
processing instructions, etc)
4. you may wish to reprocess the resultant files into a tighter, or more
elaborate XML - which reprocessing may now be treated as a batch process,
because you have classes of XML documents
In these days of mass conversions I gaze longingly back at the good old
days of WordPerfect 5.1 which was perfectly suited to this type of batch
tag replacements. <sigh/> Progress is so often misguided.
Erik Stainsby (604) 331-4083
SST - Web & Database Specialist
Systems Librarian's Webmonkey
Vancouver Public Library (604) 331-3600
More information about the xml4lib