[xsd-users] Schema Validation

Fri Dec 14 02:41:13 EST 2007

Hi Shiva,

Balasubramanyam, Shivakumar <sbalasub at qualcomm.com> writes:

> I used the alternative you mentioned in the email and available in test
> program (examples/xsd/tree/caching), which works great so far.
>
> The XSD's have the namespaces and also I have to specify only the
> physical location to the schema file.

If by "works" you mean that it allows you to load more than one
schema with the same target namespace, then I am afraid you are
mistaken. The loadGrammar function uses target namespace as a
key to determine whether a schema for this namespace is already
available (see below). So if you try to load more than one schema
with the same target namespace, all schemas except the first will
be ignored.

> However, I am curious why Code Synthesis is not supporting this scheme.
>
> For instance, in Code Synthesis, if I use schemaLocation in schemas,
> then I could not find a way to validate without specifying the mapping
> of namespaces to physical schema files.

Most XML Schema processors (in fact I am not aware of any processor
that uses a more elaborate approach) use target namespaces as keys
in their schema repositories to identify and ignore subsequent
references to the same schemas. This mechanism is used for both
referencing schemas from XML documents as well as importing one
schema into another. Note also that this is what the spec is
prescribing. In case of the schemLocation attributes, see Section
4.3.2, "How schema definitions are located on the Web" sub-section
"Schema Representation Constraint: Schema Document Location Strategy":

http://www.w3.org/TR/xmlschema-1/#schema-loc

As a result, and this is a well known "best practice", you need to
have an "entry point" schema for a particular target namespace which
should bring in all type and element declarations for this namespace.
You would then use this root schema in XML documents, xsd:import's,
etc.

It is often tempting (as is probably the case in your situation) to
"optimize" by referencing only the necessary declarations from a
particular XML document. You normally can achieve this by creating
smaller "entry point" schemas which include all the necessary
declarations for various document classes but remember that it
should always be one schema per namespace.

If you do not want to (or cannot) create any extra schema files,
then you can create an "entry point" schema in memory dynamically
and then use loadGrammar to load all the necessary declarations
(if it is likely that your application will need all declarations
during its execution, then it make sense to load all declarations
from the beginning).

Using the example from the previous emails, let's say we have two
schema files schemafile1.xsd and schemafile2.xsd both with target
namespace http://www.example.com/namespace. We can manually create
an entry point schema and then use loadGrammar or schema location
properties to load it:

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.example.com/namespace">

  <include schemaLocation="schemafile1.xsd"/>
  <include schemaLocation="schemafile2.xsd"/>

</schema>

Alternatively, we can create the entry point schema on the fly (
C++ namespaces are assumed to be as in the caching example):

    std::string entry_schema (
"<schema xmlns='http://www.w3.org/2001/XMLSchema'\n"
"        targetNamespace='http://www.example.com/namespace'>\n"
"  <include schemaLocation='schemafile1.xsd'/>\n"
"  <include schemaLocation='schemafile2.xsd'/>\n"
"</schema>");

    std::istringstream istr (entry_schema);
    xml::sax::std_input_source isrc (istr, "entry-schema.xsd");
    Wrapper4InputSource wrap (&isrc, false);

    parser->loadGrammar (wrap, Grammar::SchemaGrammarType, true);

Boris