[xsd-users] Ignoring unknown elements

Fri Apr 6 00:39:16 EDT 2018

Henry Dornemann <henry at commvault.com> writes:

> Could you elaborate further on the steps you are proposing here? At a
> glance they seem rather involved [...]
>
> Do you consider what I am trying to do here an invalid use case for xsd?

Schema evolution is a hard problem that cannot be solved generally at
the XSD level. But provided certain conditions are met and with some
effort from your side, it could be handled.

The two common strategies are:

1. Modify the schema with element wildcards that will serve as
   "extension points" for future versions of the schema. Essentially,
   this boils down to adding:

   <xsd:any namespace="##targetNamespace" processContents="lax" maxOccurs="unbounded" />

   At the end of every complex content type. While you could do this
   yourself (perhaps with a help of a script or some such), ideally
   you would want to ask the authors of the schema to maintain this.

   One significant advantage of this approach is that you can keep
   XML Schema validation enabled.

2. The second approach involves parsing XML to DOM, filtering out
   unknown elements, and then passing the "fixed up" DOM to the
   XSD-generated object model.

   This approach relies on the ability to distinguish new elements
   from old. For example, if the authors of the schema place new
   elements for each version in their own namespace, then you can
   easily and generically remove all such elements from DOM. Again,
   ideally, you would want the authors of the schema to maintain
   this.

> I don't quite understand why not breaking from the loop would break
> inheritance. I can see each object calling it's parent parse method
> that has its own for loop parsing the elements. Perhaps the derived
> parse picks up where the other method left off so it would miss out.

Yes, that's exactly what would happen.

> Couldn't the parse method be broken up to the for loop and a
> parseElement which would handle each element and could call up the
> class hierarchy looking for an object that recognizes each element?

Yes, this could be an alternative, much slower, implementation. It
will penalize the 99% of cases (where people are parsing valid XML)
to handle the 1%. And still it won't be a general solution to the
schema evolution problem since additions of elements is only one
issue. What if an existing element's type was changed? For example,
it was an integer and now it is a string: all the old XML documents
are still valid per the new schema but not the other way around.

Boris