[xsd-users] XML validation against a schema

Boris Kolpackov boris at codesynthesis.com
Mon Oct 17 09:45:48 EDT 2005


Andrey,

Andrey Yanukovich <jas.post at gmail.com> writes:

> I think schema validation is very expensive to use for every xml
> document processing,

You mean you did some measurements and they showed that it takes
too much time -or- you simply declared so? ;-)


> so i was trying to set a parser not to use a
> schema validation in libxsd for some time. I did the following:
> in file elements.txx
> xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument> xsd::cxx::xml::dom::dom(...)
> i've changed parser features from true to false in calls
> parser->setFeature (XMLUni::fgDOMNamespaces, true);
> parser->setFeature (XMLUni::fgDOMValidation, true);
> parser->setFeature (XMLUni::fgXercesSchemaFullChecking, true);

Yeah, that should turn it off.


> But it caused a process break and core dump while creating a class
> instance from dom document. It was happened in class parser next_element ()
> call during casting DOMNode* to DOMElement*.

I think I know what's happening. When there is a schema, the parser can
figure out which whitespaces are important and which ones are not and
remove those that are not important. When you turn validation off, the
parser can no longer do such whitespace cleanup so it inserts DOMText
for every bunch of them (which can be somewhat expensive too, btw ;-)).

The funny thing I already fixed this in libxsd/xsd/cxx/parser.hxx when
I was adding support for the mixed content model. I've attached the file.


> Could you suggest some way to switch off the schema validation

In 1.4.0 you can hack libxsd as you did or you can use one of the
parsing functions that expects DOMDocument. This way you will have
to set the XML->DOM stage yourself so you can select whatever options
you like.

In 1.5.0 (due in a couple of days) there will be dont_validate
parsing flag.

Note that passing an invalid instance document to xsd-generated code
when validation is turned off results in undefined behavior (read
core dump).


> or may be use preparsing schema grammar or something else in libxsd code?

I think this is a much better idea that to switch validation completely
off. If you are parsing several instance documents that use the same
schema then it could be beneficial to pre-load (and pre-parse) the
grammar once and then re-use this grammar every time you parse an
instance document. I would think this should improve performance
significantly.

The only way to do it now is to set the XML->DOM stage yourself (see
above) and use loadGrammar to pre-load the schemas (see Xerces-C++ docs
for details).

hth,
-boris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parser.hxx
Type: text/x-c++hdr
Size: 1758 bytes
Desc: not available
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20051017/df55fa25/parser.hxx


More information about the xsd-users mailing list