[xsd-users] Poor performance of Unicode conversion

Boris Kolpackov boris at codesynthesis.com
Wed Sep 26 11:05:00 EDT 2007


Hi Ray,

Ray Lischner <rlischner at proteus-technologies.com> writes:

> In our performance measures, we see that a significant amount of time
> is spent transcoding UTF-8 to UTF-16 and vice versa.

Do you know if the bulk of it is spent in the XML-to-DOM stage or
the DOM-to-object model stage? Or is it your own manipulation of DOM?
I am also somewhat surprised since the transcoding should be very
fast.

> Our code would be simpler, smaller, and faster if we could work purely
> with UTF-8, and skip all transcoding.
>
> I realize that Xerces is the source of the problem. Have you investigated
> any other XML libraries that might not have this performance penalty? We
> use cxx-tree and need to manipulate the DOM at times.

The only other viable (stability and maturity-wise) candidate that I can
think of is libxml2. There are a number of problems, however:

1. XML Schema validation support in libxml2 is not usable.

2. It's a C library and its DOM-like representation is very low-level.
   I am not sure how usable it will be.

3. I find the documentation unreadable.


We are also working on another in-memory (or rather hybrid in-memory/
event-driven) mapping that will address this and other issues. It won't
be based on DOM but there will still be a way to get under the hood with
an even-driven API similar to SAX (it will be based on the C++/Parser
mapping). Do you think this will be a viable approach for you in the
long run?

Boris




More information about the xsd-users mailing list