[xsd-users] Poor performance of Unicode conversion

Wed Sep 26 11:40:45 EDT 2007

> Do you know if the bulk of it is spent in the XML-to-DOM stage or
> the DOM-to-object model stage? Or is it your own manipulation of DOM?
> I am also somewhat surprised since the transcoding should be very
> fast.

DOM-to-object model stage. Some of our schemas are predominantly strings.

> 1. XML Schema validation support in libxml2 is not usable.

That's a show-stopper. I've noticed how libxml2 says an XML document is valid when Xerces says it isn't. With only one exception, Xerces has been correct and libxml2 has been wrong.

> 2. It's a C library and its DOM-like representation is very low-level.
>   I am not sure how usable it will be.

The Xerces library is extremely C-like. (No destructors, so everything needs a formal release call. Test node type by checking node-type field that contains an enumerated value. Raw pointers passed everywhere. No UTF-16 string type, so raw XMLCh* pointers passed around. And so on.)

> 3. I find the documentation unreadable.

I agree, but I also find the Apache Xerces-C++ documentation to be unusable. I spend a lot of time experimenting to understand what the library actually does.

> We are also working on another in-memory (or rather hybrid in-memory/
> event-driven) mapping that will address this and other issues. It won't
> be based on DOM but there will still be a way to get under the hood with
> an even-driven API similar to SAX (it will be based on the C++/Parser
> mapping). Do you think this will be a viable approach for you in the
> long run?

Maybe. We need to manipulate document trees when we don't always have a schema. But if the interface is event-driven, I would need to invent my own DOM-like model to store the tree.  Maybe that would be best, however.
--
Ray Lischner, Proteus Technologies LLC