[xsd-users] Non-xml data in attributes with xsd:string, xsd:normalizedString

Boris Kolpackov boris at codesynthesis.com
Sun Feb 22 12:42:25 EST 2009


Hi Bill,

Bill Pringlemeir <bpringle at sympatico.ca> writes:

> It appears that 'tab', etc are escaped.  I guess that there is no
> allowable XML character reference for some values.  I thought that
> only 'null'/zero would be disallowed.

XML 1.0 disallows quite a few control characters. XML 1.1 only 
disallows zero.


> So it seems if the data contains this range, you must use
> 'base64Binary'?

Correct. Or XML 1.1 which is supported by Xerces-C++.


> A problem is that the serializer doesn't bother to tell you that 
> the value is illegal.  If we are scanning for character entity
> escaping, can't an exception be thrown when this value range is
> encountered?

Yes, I think that's how it should be. I filed a bug report and
we will be fixing this for the next release of Xerces-C++:

https://issues.apache.org/jira/browse/XERCESC-1854

The fix will most likely be to simply fail (i.e., no "remove bad 
characters for me, please" behavior).


> I will try to strip the values in my code that comes from 'untrusted
> sources'. 

You can do it this way or you can serialize the object model to a DOM
document and then "sanitize" that document by detecting/removing bad
characters. This way you can do it in one place and it will take just
a few lines of code.

Boris




More information about the xsd-users mailing list