erik stainsby eriksta at vpl.vancouver.bc.ca
Tue Jul 17 15:20:48 EDT 2001

On Tue, 17 Jul 2001, Roy Tennant wrote:

> The short answer to the question in your last sentence is "Yes". 
> There are no doubt any number of ways to do this, but the technology 
> I have a little (by no means a lot) familiarity with is Cocoon (see 
> http://xml.apache.org/cocoon/).
> The thing is, it's the conversion from the "native" format (MSWord, 
> Pagemaker, etc.) which will kill you. Allow me to repeat myself. 
> Converting documents from most word processing or desktop publishing 
> programs to decent XML is NOT quick and easy. 

The key to these conversions is to treat the process as another exercise
in XML/XSLT.  Typically if you are dealing with a library of documents you
have a finite reportoire of formatting conventions in each "class" of
document.  You need to develop a processing stream for each class which
does justice to the unique formatting values of such a class.  With this
precept in mind, the process goes something like this:

1. insert into the documents of a class conversion markup tags which
capture the display values you  wish to preserve

2. export the content to plain text

3. massage the resulting file (insert DTD declaration/stylesheet
processing instructions, etc)

4. you may wish to reprocess the resultant files into a tighter, or more
elaborate XML - which reprocessing may now be treated as a batch process,
because you have classes of XML documents

In these days of mass conversions I gaze longingly back at the good old
days of WordPerfect 5.1 which was perfectly suited to this type of batch
tag replacements. <sigh/> Progress is so often misguided.

Cheers, Erik 

