[XML4LIB] Re: Native XML and Exist for Large Collections?

Kevin S. Clarke ksclarke at stanford.edu
Fri Nov 14 21:10:05 EST 2003


Hi Eli, you wrote:

> What I want to do is parse a multi-"field" user query from a form into
> a single XPath.  So for example, if I want to search for EAD finding
> aids containing "Brown" as a personal name and "Berkeley" in the
> abstract, I want to be able to do this:
>
> //persname[contains(., "Brown")] and
> /ead/archdesc/did/abstract[contains(., "Berkeley")]
>
> This works in an <xsl:if> statement, not in any NXD I've tried.

I think it works for XSL and not NXDs because it returns a boolean. 
XSLT's "test" evaluates an expression and says, in this case, "This
expression is true if the first expression is resolved to _found_ and
the second expression is resolved to _found_."  The "test" attribute
doesn't return a node-set, though; it just says that both node-sets you
are testing for do exist.  It is then up to you (in XSLT) to "select"
them.

This is unlike the functionality which is described in the section on
node-sets in the XPath spec.  In that section there is talk of paths,
predicates, and unions ('or'), but no intersections ('and').  It is
right beneath one of the snippets you provided.  You quoted:

> "A location path can be used as an expression. The expression returns
> the set of nodes selected by the path."

It continues: "The | operator computes the union of its operands, which
must be node-sets."  This is why "//SPEECH | //STAGEDIR" works, I guess.

What you referenced from the spec also seems, to me, to support the idea
that the ANDed paths would return a boolean (not a node-set)...

> "An and expression is evaluated by evaluating each operand and
> converting its value to a boolean as if by a call to the boolean
> function. The result is true if both values are true and false
> otherwise. The right operand is not evaluated if the left operand
> evaluates to false."
> 
> And from the documentation on the boolean() function:
> 
> "a node-set is true if and only if it is non-empty"

My understanding is that you'd like to be able to get back all the
elements where both occurred in a single EAD record, but you do not want
a container element (something to group to say "these two or these three
were found in the same EAD record")?  Without a record container how
would you know which to display together (as from the same record)?

In a stylesheet you are working from a context (you have selected
something previously), but when you are searching a NXD there is no
context... if you do not specify, how does the database know you want
these ANDed in a single record?  Without specifying that the search
should be done at the document level, the search seems the equivalent to
a union like:

/PLAY/PERSONAE/PERSONA[.&='Hamlet'] | //LINE[.&='Nay']

(which works in the Cocoon interface but not the raw XPath one b/c of
the previously mentioned bug).

However, if you do want a container element:

/PLAY[self::TITLE&='Elsinore' and PERSONAE/PERSONA&='Hamlet']

seems to mirror the type of search you describe above... it finds any
TITLE with the text 'Dramatis' and then a specific path and contained
text value.

You could even do something like:
  /PLAY[self::TITLE&='Elsinore' and PERSONAE/PERSONA&='Hamlet']/TITLE
for a hot-linkable brief display...

> Something like this seems to work, kind of, in the eXist example.  But this:
> 
> /PLAY[contains(.//SPEECH, "the") and //STAGEDIR]
> 
> returns only three documents, which seems wrong.

It seems there is a bug in the old(?) version of eXist that does not
properly handle '//' as the second expression in a predicate (it
mistakenly uses the context of the first part).  For more info see:
  http://sourceforge.net/mailarchive/message.php?msg_id=5070819
This means the three hits are for plays that had SPEECHes with STAGEDIRs
somewhere underneath them.

Oops, bad example... but I think the XPath above (with "self::TITLE")
provides the functionality you are looking for(? -- and is pretty fast).

>   And this:
> 
> /PLAY[contains(.//SPEECH, "the)]
> 
> doesn't work at all.

I think the context of this is confused.  The following works:

/PLAY[.//SPEECH and contains(., 'the')]

Sometimes it is trial and error to find the right (or more efficient)
path (in my experience at least, maybe there is an easier way).  The
ones in this email seem to be faster than the ones from yesterday at
least.  I should also mention that the latest CVS drop of eXist has
preliminary support for XPath2/XQuery... though I haven't tried it out
yet.

FWIW...
Kevin

-- 
Kevin S. Clarke <ksclarke at stanford.edu>
Lane Medical Library, Stanford University



More information about the xml4lib mailing list