[xsd-users] Using XSD to validate and process documents without namespaces specified in top level elements

Boris Kolpackov boris at codesynthesis.com
Thu Aug 21 14:56:19 EDT 2008


Hi Karl,

Karl Mutch <karlmutchlists at gmail.com> writes:
 
> I have an issue where I am trying to parse documents that are coming from a
> third party. These have none of the normal namespace attributes and I would
> like to parseinput sources in such a way as they can be validated using my
> own xsd files.

There are two problems that you are trying to solve here. First is how
to associate the schema with XML documents that don't contain the schema
location attributes. The second problem is how to work around the absent
namespace attributes in the XML documents. Let's consider each of these
problems separately.

For the first problem there are two solutions: you can either specify
schema locations using the XercesSchemaExternalNoNameSpaceSchemaLocation
and XercesSchemaExternalSchemaLocation Xerces-C++ properties (or use
no_namespace_schema_location and schema_location in xml_schema::properties
which are just shortcuts for the Xerces-C++ properties if you don't do
your own XML-to-DOM parsing). Or you can pre-parse and cache the schemas
using the loadGrammar function (see the 'caching' example in 
examples/cxx/tree/ for more information on how to do this).

You can also use EntityResolver to intercept schema loading but you still
need to trigger this loading with one of the above methods (or via the
schema location attributes in the XML document).

In your code you seem to be trying to use all these methods together
which is not really necessary. If you are doing your own XML-to-DOM
parsing then the loadGrammar approach is probably the best choice. 

The second problem (how to work around the absent namespace attributes)
is quite a bit harder to solve. First, it is helpful to understand that
an XML document without the namespace attributes is invalid per your
schema. So the general approach to solving this is to either change
the XML document to conform to the schema or change the schema to
allow XML documents without the namespace.

One variant of the changing XML document approach is to programmatically
fix it up with the necessary namespace attributes before parsing it. If
the document is small, you can probably load it first into a memory buffer
or string and then change it there. If the document is large then it may
make sense to provide your own InputSource implementation (see 
libxsd/xsd/cxx/xml/sax/std-input-source.hxx for an example) that does this
on the fly as chunks of the document become available.

The best way to fix the document depends on the kind of vocabulary you
have. If all your elements are qualified (e.g., you have 
elementsFormDefaul="qualified" in your schema) then you can simply add
the default namespace declaration to the start tag of the root element:

<?xml version="1.0" encoding="utf-8"?>
<request xmlns="http://www.enterprise.com/GetNextLaneResponse" task="Monitor">
  <message status="12">A Message</message>
</request>

This will automatically make all elements under and including the root in
the specified namespace.

If only the root element is qualified (like in your case) then things are
a bit more complex since you will need to rewrite the opening and  closing
tags of the root element, for example:

<?xml version="1.0" encoding="utf-8"?>
<p:request xmlns:p="http://www.enterprise.com/GetNextLaneResponse" task="Monitor">
  <message status="12">A Message</message>
</p:request>

If you can make some assumptions about the content of your XML documents
(e.g., that it cannot contain "<request" and "</request" strings) then
this can be as simple as searching for a substring in a string.  

One way to implement this in the general case would be to run the original
XML document through a SAX parser that adds the necessary namespace
declarations to the root element and outputs the rest as is.

There are also simpler ways to work around the lack of namespace declarations
if you don't need  XML Schema validation. Let me know if you are interested
in those.

Boris




More information about the xsd-users mailing list