[ANNOUNCEMENT] VTD-XML released under GPL
crackeur at comcast.net
Sat Jul 10 00:16:36 EDT 2004
I am pleased to announce that version 0.5 of VTD-XML -- a new,
non-extractive, Java-base XML processing API licensed under GPL
-- is now freely available on sourceforge.net. For source code,
documentation, detailed description of API and code examples,
Capable of random-access, VTD-XML attempts to be both memory
efficient and high performance. The starting point of this project is
the observation that, for XML documents that don't declare entities
in DTD, tokenization can indeed be done by only recording the starting
offset and length of a token. A discussion on this subject appeared
in a recently article on xml.com
The core technology of VTD-XML is a binary format specification
called Virtual Token Descriptor (VTD). A VTD record is a 64-bit integer
that encodes the starting offset, length, type and nesting depth of a
token in an XML document. Because VTD records don't contain actual
token content, they work alongside of the original XML document, which
is maintained intact in memory by the processing model.
VTD's memory-conserving features can be summarized as follows:
* Avoid Per-object overhead -- In many VM-based object-oriented
programming languages, per-object allocation incurs a small amount
of memory overhead. A VTD record is immune to the overhead because
it is not an object.
* Bulk-allocation of storage -- Fixed in length, VTD records can be
stored in large memory blocks, which are more efficient to allocate
and GC. By allocating a large array for 4096 VTD records, one incurs
the per-array overhead (16 bytes in JDK 1.4) only once across 4096
records, thus reducing per-record overhead to very little.
Our benchmark indicates that VTD-XML processes XML at the performance
level similar to (and often better than) SAX with NULL content handler.
The memory usage is typically between 1.3x ~ 1.6x of the size of the
document, with "1" being the document itself.
Other features included in this release are:
* Incremental update -- VTD-XML allows one to modify content of XML
without touching irrelevant parts of the document.
* Content extraction -- VTD-XML also allows one to pull an element
out of XML in its serialized format. This can be an important
feature for partial signing/encryption of SOAP payload for
In the upcoming releases, we plan to add the persistence support so
that one can save/load VTD to/from the disk along with the XML documents
to avoid repetitive parsing in read-only situations. XPATH support is
also on the development roadmap. However, we would like to collect as
many suggestions and bug reports before taking the next step.
Your input and suggestions are very important to make VTD-XML a truly
useful XML processor.
More information about the xml4lib