[XML4LIB] Re: The impact of XML/RDF on digital libraries
em at w3.org
Thu Feb 21 13:28:46 EST 2002
At 08:00 AM 2/21/2002 -0800, Jerome McDonough wrote:
>At 09:40 AM 2/21/2002 -0500, Rhyno Art wrote:
>I suspect that the whole RDF argument boils down to another instance of the
>debates about artificial intelligence; if you believe in the ability to
>semantic/pragmatic understanding in machines, you believe in RDF's promise.
I disagree with this assessment. The real question in my mind is how much
knowledge of the data you wish to store procedurally (in code, etc.) and
how much you want to represent declaratively. My assertion is declarative
is far better in most cases (but certainly not all) that
procedurally. Artificial intelligence has absolutely nothing to do with
this; its about effective cataloging, semantic interoperability and
effective data reuse.
For (a relevant to the author) example, in
http://www.loc.gov/standards/mets/sfquad.xml we find...
<gdm:creator NAMETYPE="cn" SRCCHECK="false" ROLE="Publisher"
If I (working on another Digital Library project for example) wanted to
build services that allow me to search for 'authors/creators' and retrieve
corresponding resources, the above makes this extremely difficult to
do. Its not that you *can't* do this (I'm sure an example will be sent
even illustrating you can), its just how difficult is it for this to be
done? And how much of this information that makes this possible would be
represented in the code vs some additional set of information (e.g. just
Further, how difficult will it be for the next user of this information
that is not familiar with your particular domain? (e.g. a musuem collection
interested in maps trying to reuse this data or even another dig lib
project). If this information in stored in the code the data is becoming
increasingly difficult for effective integration and reuse.
If you're not concerned about this, you have consistent data and no plans
for sharing this ouside of your application than RDF may not simply be for
you... I'm a firm believer in using the right tools for the right job. But
it seems to me that in the diglib communities in particular data reuse is
high on the list.
If reuse is important and if how the terms used to describe resources are
related in a declarative manner its a step (not the whole solution) in a
direction which facilitates more effective data reuse.
in RDF (syntax neutral) to accomplish the above searching we might say
'gdm:creator' is a 'rdfs:subPropertyOf' of 'METS:agent' .
'METS:agent' is a 'rdfs:subPropertyOf' of 'dc:creator" .
This says anything that is a 'gdm:creator' is also (more specifically a
semantic refinement) of a METS:agent. Anything thats a 'METS:agent' is
also a 'dc:creator'.
in RDF/XML this would look like...
de-referencing &gdm;#creator would yield:
<rdf:Property rdf:about = "&gdm;#creator">
<rdfs:subPropertyOf rdf:resource = "&METS;#agent" />
de-referencing &METS;#agent would yield:
<rdf:Property rdf:about = "&METS;#agent" />
<rdfs:subPropertyOf rdf:resource =
and http://purl.org/dc/elements/1.1/creator yields (err... follow the link :)
in essence, just adding more metadata. And it this schema information (and
it does) follows the same model as your RDF instance data (the bit that
talked about METS:agent and gdm:creator) in the first place, the same tools
can be used to manage both the schema and instance data ... in essence this
is exactly what Jena provides http://www.hpl.hp.com/semweb/
Now, if I'm that museum project I mentioned above (and lets say happened to
use dublin core), and via an HTTP get retrieved a chunk of XML that had
METS:agent and gdm:creator. Not knowing anything about METS:agent and
gdm:creator per se, my applications has a better chance at understanding
that they relate to something I do know (dc:creator).
Searching then for 'dc:creator's with a term 'Rick' would return results,
searching for 'usgs' would return results... its not AI, its not magic..
its taking advantage of a systematic means of declaring vocabularies and
how they relate.
Please note: In this specific METS example above, however, given its
current structural representations its very unclear what specific results
would indeed be returned, as its unclear what the 'thing' is either of the
agents or publishers actually created... Being clear about what these
'things' are, who created them, etc. is precisely why modelling is
important. Even if you don't buy into the RDF means for doing this, its
important for maximum reuse that its done!
RDF is not about making this more complex, but rather providing some
modeling primitives that help communities think about their data a bit more
(and what your really trying to represent) and more effectively communicate
this information among applications.
--eric (promising to resume his lurking status after this message)
More information about the xml4lib