[xsd-users] dealing with xml written/read on-the-fly

Boris Kolpackov boris at codesynthesis.com
Thu Nov 5 09:02:23 EST 2009


Hi Cerion,

Cerion Armour-Brown <cerion at kestrel.ws> writes:

> > What actually happens is this: if the raw character buffer has less 
> > than 100 bytes when Xerces-C++ tries to transcode the next batch of 
> > characters, then it will try to read some more. There is actually a 
> > technical reason for this other than efficiency (it has to do with 
> > multi-byte encodings and the buffer containing only some of the bytes 
> > constituting a code point).
>
> Indeed.
>
> > Because Xerces-C++ won't keep trying to read more if the stream returned
> > less than 100 bytes, one way to mitigate this would be to return the 
> > data from InputSource::readBytes() in small chunks. If you return it 
> > one byte at a time, there will be no buffering at all.
>
> Eugh - that's horrible! :-)

A quick update: I have fixed this issue for the upcoming Xerces-C++ 3.1.0
(should be out in about a month). Now it is possible to change this "low
water mark" for each parser instance. Setting it to 0 disables buffering
altogether. For details, see:

https://issues.apache.org/jira/browse/XERCESC-1607

Boris



More information about the xsd-users mailing list