[xsd-users] Re: In-memory validation

Mon Jan 21 02:39:56 EST 2008

Hi David,

After some more thinking and experimentation on the subject of in-
memory validation I would like to get your thoughts on our current
view of how it can be implemented and whether it will still be
useful for you project.

You initial use-case (quoted below) calls for what I would call
"immediate detection" of validation errors. Based on this notion
of immediate detection, we can place every XML Schema validation
construct (e.g., maxOccurs/minOccurs, ordering of elements, facets,
uniqueness, etc.) into one of the three categories:

1. Desirable and possible/efficient to implement

2. Desirable and impossible/inefficient to implement

3. Undesirable

Into the "desirable and possible/efficient" category fall, for
example, the enforcement of xsd:ID uniqueness constraint as well
as maxOccurs and the maxLength list facet.

The "desirable and impossible/inefficient" category contains the
bulk of the XML Schema validation constructs. These include most
of the facets and key/keyref/unique constructs. Let's consider
the minInclusive and maxInclusive facets from your example below.
The range checking code will have to be called after every
modification to the underlying int value. In the current
architecture you can do, for example, the following:

int& i = rt->bounded_int(); // Get a reference to the "base" type (int)
i = 100; // Impossible to detect.

This is an example of a check that is impossible to implement in the
current C++/Tree architecture. A check that is possible but inefficient
to implement is, for example, the pattern facet on xsd:string. The
pattern checking code (most likely a virtual function with quite an
expensive body) will have to be called every time a modification is
made to any single character in the underlying string.

Then there is a number of undesirable checks that, if enforced
immediately, would make the object model very awkward to use.
These are minOccurs, the length and minLength list facets, ordering
of elements, as well as compound keys in key/unique. The problem
with all these constraints is that you may need to perform several
operations (e.g., several push_back's for minOccurs and element
ordering or modification of several elements/attributes for
compound key/unique) before the resulting object model becomes
valid.

Based on these considerations, it appears that while the "immediate
detection" model may look appealing on the surface, in reality it is
either not practical or undesirable for the majority of XML Schema
constructs.

This brings us to the next option: "on-demand detection". The idea
is that each generated class will have the validate() function which
can be called to detect validation errors on a fragment of object
model:

rt->bounded_int( 50 );
rt->restricted_string( "abc" );
rt->validate (); // Exception is thrown if invalid.

There is, however, a number of questions about practical usefulness
and implementation of this model:

1. How to point to the error location? Possible options: (1) a
   reference to the invalid node passed as xml_schema::type&
   (drawback: hard to know the actual type and thus to do
   anything about the error), (2) XPath identifying the error
   location (drawback: impossible to use to correct the error).

2. Some errors may be impossible for the application to correct.
   For example, if an error indicates that a string does not match
   a pattern, what is the application going to do?

3. If error correction by the application is hard/impossible then
   what is the use of in-memory validation other then to know
   whether the object model is valid/invalid?

I would therefore appreciate any feedback on these concerns as
well as on what people expect from the in-memory validation
facility in their applications.

Thanks,
Boris

david.r.moss at selex-comms.com <david.r.moss at selex-comms.com> writes:

> A while ago there was talk of in-memory validation as outlined in an
> example below.
>
> Example code:
>
> // Load will fail when content is invalid.
> auto_ptr<memory_test_t> rt( root( "in-memory-validation-test.xml" ) );
>
> // Modify with a valid value (range is 20-80 inclusive).
> rt->bounded_int( 50 );
> cout << *rt << endl;
>
> rt->bounded_int( 100 ); // Ideally, this should fail (throw exception - as
> initial file load would)
> cout << *rt << endl;
>
>
> I believe this kind of capability was on your 'to do' list at one point;
> is this still the case and, if so, do you have an idea of time-scales?!
>
> Cheers,
> Dave.