[xsd-users] Tree Parsing: How to handle schema namespaces missing from XML data

Boris Kolpackov boris at codesynthesis.com
Mon Mar 31 06:20:48 EDT 2008


Hi Henry,

I've CC'ed xsd-users to my reply in case someone else is interested
in this.

Bruce, Henry <henry.bruce at intel.com> writes:

> I guess I could change the schema, but am reluctant to so if there if a
> chance of a workaround. I am OK with disabling schema validation if the
> spirit of "just getting it to work".
>
> I have attached the schema as I'm not quite sure how to answer your
> namespace-qualified question (I'm guessing the answer is yes). Although
> I'm new to XML, even I can tell that schema seems to be badly coded.

I've checked your schema and it appears that only the root element in
the document should be qualified: both elementFormDefault and
attributeFormDefault in your schema are set to 'unqualified' and all
the inner element declarations are local (that is, there are no
ref=""-style declarations). In other words, your document looks like
this:

<p:root xmlns:p="http://www.example.com/example">

  <inner> ... </inner>

</p:root>

BTW, compared to other schemas I have seen, your schema doesn't look
too bad.

There are two ways to handle the absence of the proper namespace
qualification in the root element. First is quite clean but does
not support validation in Xerces-C++. The second is a bit hairy
but will work with validation enabled.

The first approach involves parsing the XML document to DOM and
then calling the root element's type constructor directly instead
of using one of the parsing functions. We are essentially bypassing
the element name and namespace check for the root element which
is performed in the parsing functions.

Let's say your document root element name is 'root' and its type
is 'root_t'. Normally, you would parse it like this:

std::istringstream istr ("<xml document content>");

std::auto_ptr<root_t> r (root (istr));

With this approach you would do it like this:

xml_schema::dom::auto_ptr<xercesc::DOMDocument> doc (
  parse (istr, "", false));

xercesc::DOMElement* root (doc->getDocumentElement ());

std::auto_ptr<root_t> r (new root (*root));

In this code fragment, the parse() function that is used to parse
XML to DOM is taken from the 'multiroot' example. Note also that
you will need to initialize and terminate the Xerces-C++ runtime
as shown in the 'multiroot' example. You may also find some of the
answers in the C++/Tree Mapping FAQ useful:

http://wiki.codesynthesis.com/Tree/FAQ

The 'multiroot' example will probably also be useful to you by
itself since it shows how to handle several document types
(in your case requests) without knowing which one it is.

The second approach involves "fixing up" the document before it
is parsed in order to make it valid per the schema. To do this,
you will need to detect the beginning of the opening tag (e.g.,
"<root") and replace it with the namespace-qualified version
plus the namespace declaration (e.g.,
"<p:root xmlns:p='http://www.example.com/example'") as well
as do the same with the closing tag (that is replace "</root>"
with "</p:root>"). This is quite easy to do if the XML document
is saved in a string or buffer and harder if you get a chunk of
the document at a time. A more advanced version of this method
which avoids copying would be to create a special implementation
of std::istream or xercesc::InputSource which will do this "fixing
up" on the fly without modifying the original buffer.

Boris




More information about the xsd-users mailing list