[XML4LIB] Re: Announcement for James (Java MARC Events) beta 2
banerjek at ucs.orst.edu
Thu Dec 13 17:48:59 EST 2001
> 4.2.3 b concerning the directory of a record is also interesting.
If you're using slow computers to load data sequentially from tape this
is useful. If not, I'm not sure why you'd want to do this. All this info
does is point to data and it becomes worthless as soon as you change the
length of a single field. Besides, your XML parser performs the function of
identifying where the data is.
> > But I suspect that such a kind of error, and other much more complex
> > errors, like ``this tag is not allow for cartographic material'' or
> > ``you must specifiy both width and heigth of a folio'' (I'm simply
> > guessing), which are specific of every national MARC format, could be
> > handled better by means of a (quite complex) combination of XML schemas
> > and XSL stylesheets. At the moment I don't know how, but I'm almost sure
> > it's possible.
Theoretically possible, but not practical. Even if we were to ignore the
enormous task of keeping the data up to date and the fact that this level of
validation would destroy performance, you'd still have the problem that
standards change over time. If you strictly validate on the current
standard, a lot of the legacy data won't validate. If you let through all
the old stuff, the validation doesn't accomplish anything. It is totally
unfeasible to fix all the old records.
> Have you ever looked at this URL http://www.xfront.com/isbn.html? An ISBN
> datatype Schema (using XSLT to check some additional contraints). What
> a fully validating MARC Schema look like.
More complicated by several orders of magnitude. An ISBN is structually
much simpler than MARC in that it contains only 4 data elements that are
rigidly prescribed. Even the simplest MARC records are far more complex in
terms of structure and the data they contain. A validating MARC schema would
probably be so big that you'd blow all your memory in no time flat
processing a file (assuming that the stylesheet itself didn't consume it
all) ;). However, I hadn't seen this ISBN schema before. Pretty cool stuff.
More information about the xml4lib