[xsd-users] Large XSD-schema, speed and identity constraint validation

Thu May 14 08:43:28 EDT 2020

Stefan de Konink <stefan at konink.de> writes:

> >Also, keep in mind that CodeSynthesis XSD delegates XML Schema validation,
> >including identity constraint validation, to Xerces-C++.
> 
> Does this practically mean that if I would only care about XSD-validation,
> there would not be any net benefit to use the XSD toolset, because the
> resulting code is not used to generate a specific parser that is employed
> while doing a XSD validation? I am thinking in the direction of XML Screamer
> research.

Correct. Validation in generated code (also called "perfect parser") works
well for smaller/simpler schemas (which is the reason why we went this way
for XSD/e, our mobile/embedded version). But for schemas we are talking
about (e.g., GML), the size of the generated code becomes impractical
in many cases.

> >>real	17m6.611s
> >>user	17m1.399s
> >>sys	0m3.917s
> >>
> >>real	5m21.199s
> >>user	5m19.587s
> >>sys	0m1.450s
> >
> >I am confused, what are these two results for? Hot vs cold?
> 
> Same machine, same data, multiple runs, same code, showing the min and max.
> From my benchmarking background I would consider them both cold. I cannot
> explain (other than hardware reasons, tested it on a laptop Ryzen 2500U) why
> the results give huge outliers for both libxml2 and xerces-c. I cannot
> exclude the initial loading (i/o) of the XSD-schema either.

Do you perhaps have remote (e.g., http://) schema references in (some
of) your schemaLocation attributes? That would explain these results
quite well.

> So I am missing the "Key/Value" report but get an ocean of duplicates where
> I can't find out the reason.

I haven't looked into this in detail but maybe you can resolve the schema
names referenced in the error message back to schema locations based on
the loaded schema grammar.