[xsd-users] Large XSD-schema, speed and identity constraint validation

Boris Kolpackov boris at codesynthesis.com
Thu May 14 08:43:28 EDT 2020


Stefan de Konink <stefan at konink.de> writes:

> >Also, keep in mind that CodeSynthesis XSD delegates XML Schema validation,
> >including identity constraint validation, to Xerces-C++.
> 
> Does this practically mean that if I would only care about XSD-validation,
> there would not be any net benefit to use the XSD toolset, because the
> resulting code is not used to generate a specific parser that is employed
> while doing a XSD validation? I am thinking in the direction of XML Screamer
> research.

Correct. Validation in generated code (also called "perfect parser") works
well for smaller/simpler schemas (which is the reason why we went this way
for XSD/e, our mobile/embedded version). But for schemas we are talking
about (e.g., GML), the size of the generated code becomes impractical
in many cases.


> >>real	17m6.611s
> >>user	17m1.399s
> >>sys	0m3.917s
> >>
> >>real	5m21.199s
> >>user	5m19.587s
> >>sys	0m1.450s
> >
> >I am confused, what are these two results for? Hot vs cold?
> 
> Same machine, same data, multiple runs, same code, showing the min and max.
> From my benchmarking background I would consider them both cold. I cannot
> explain (other than hardware reasons, tested it on a laptop Ryzen 2500U) why
> the results give huge outliers for both libxml2 and xerces-c. I cannot
> exclude the initial loading (i/o) of the XSD-schema either.

Do you perhaps have remote (e.g., http://) schema references in (some
of) your schemaLocation attributes? That would explain these results
quite well.


> So I am missing the "Key/Value" report but get an ocean of duplicates where
> I can't find out the reason.

I haven't looked into this in detail but maybe you can resolve the schema
names referenced in the error message back to schema locations based on
the loaded schema grammar.



More information about the xsd-users mailing list