[XML4LIB] Re: The impact of XML/RDF on digital libraries

Eric Miller em at w3.org
Thu Feb 21 13:28:46 EST 2002

At 08:00 AM 2/21/2002 -0800, Jerome McDonough wrote:
>At 09:40 AM 2/21/2002 -0500, Rhyno Art wrote:
>I suspect that the whole RDF argument boils down to another instance of the
>debates about artificial intelligence; if you believe in the ability to
>create real
>semantic/pragmatic understanding in machines, you believe in RDF's promise.

I disagree with this assessment.  The real question in my mind is how much 
knowledge of the data you wish to store procedurally (in code, etc.) and 
how much you want to represent declaratively.  My assertion is declarative 
is far better in most cases (but certainly not all) that 
procedurally.  Artificial intelligence has absolutely nothing to do with 
this; its about effective cataloging, semantic interoperability and 
effective data reuse.

For (a relevant to the author) example, in 
http://www.loc.gov/standards/mets/sfquad.xml we find...

   <METS:name>Rick Beaubien</METS:name>
<gdm:creator NAMETYPE="cn" SRCCHECK="false" ROLE="Publisher" 
SEQ="1">U.S.Geological Survey</gdm:creator>

If I (working on another Digital Library project for example) wanted to 
build services that allow me to search for 'authors/creators' and retrieve 
corresponding resources, the above makes this extremely difficult to 
do.  Its not that you *can't* do this (I'm sure an example will be sent 
even illustrating you can), its just how difficult is it for this to be 
done? And how much of this information that makes this possible would be 
represented in the code vs some additional set of information (e.g. just 
more metadata).

Further, how difficult will it be for the next user of this information 
that is not familiar with your particular domain? (e.g. a musuem collection 
interested in maps trying to reuse this data or even another dig lib 
project).   If this information in stored in the code the data is becoming 
increasingly difficult for effective integration and reuse.

If you're not concerned about this, you have consistent data and no plans 
for sharing this ouside of your application than RDF may not simply be for 
you... I'm a firm believer in using the right tools for the right job. But 
it seems to me that in the diglib communities in particular data reuse is 
high on the list.

If reuse is important and if how the terms used to describe resources are 
related in a declarative manner its a step (not the whole solution) in a 
direction which facilitates more effective data reuse.

in RDF (syntax neutral) to accomplish the above searching we might say 
something like...

'gdm:creator' is a 'rdfs:subPropertyOf' of 'METS:agent' .
'METS:agent' is a 'rdfs:subPropertyOf' of 'dc:creator" .

This says anything that is a 'gdm:creator' is also (more specifically a 
semantic refinement) of a METS:agent.  Anything thats a 'METS:agent' is 
also a 'dc:creator'.

in RDF/XML this would look like...

de-referencing &gdm;#creator would yield:

<rdf:Property rdf:about = "&gdm;#creator">
   <rdfs:subPropertyOf rdf:resource = "&METS;#agent" />

de-referencing &METS;#agent would yield:

<rdf:Property rdf:about = "&METS;#agent" />
   <rdfs:subPropertyOf rdf:resource = 
"http://purl.org/dc/elements/1.1/creator" />

and http://purl.org/dc/elements/1.1/creator yields (err... follow the link :)

in essence, just adding more metadata.  And it this schema information (and 
it does) follows the same model as your RDF instance data (the bit that 
talked about METS:agent and gdm:creator) in the first place, the same tools 
can be used to manage both the schema and instance data ... in essence this 
is exactly what Jena provides http://www.hpl.hp.com/semweb/

Now, if I'm that museum project I mentioned above (and lets say happened to 
use dublin core), and via an HTTP get retrieved a chunk of XML that had 
METS:agent and gdm:creator.  Not knowing anything about METS:agent and 
gdm:creator per se, my applications has a better chance at understanding 
that they relate to something I do know (dc:creator).

Searching then for 'dc:creator's with a term 'Rick' would return results, 
searching for 'usgs' would return results... its not AI, its not magic.. 
its taking advantage of a systematic means of declaring vocabularies and 
how they relate.

Please note: In this specific METS example above, however, given its 
current structural representations its very unclear what specific results 
would indeed be returned, as its unclear what the 'thing' is either of the 
agents or publishers actually created...  Being clear about what these 
'things' are, who created them, etc. is precisely why modelling is 
important.  Even if you don't buy into the RDF means for doing this, its 
important for maximum reuse that its done!

RDF is not about making this more complex, but rather providing some 
modeling primitives that help communities think about their data a bit more 
(and what your really trying to represent) and more effectively communicate 
this information among applications.

--eric (promising to resume his lurking status after this message)

More information about the xml4lib mailing list