From binjiang at alcatel-lucent.com  Wed Oct 10 00:58:28 2007
From: binjiang at alcatel-lucent.com (Jiang, Bin (Bin))
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] What's the difference between XSD/e and XSD,
	when I using XSD with expat as the underlying parser
Message-ID: <D9F4444D4E5294438E29E618DCA1B80124A6C7@CNEXC1U02.bj.lucent.com>

Hi Boris & all,

What's the difference between XSD/e and XSD, if I config XSD with expat
as the underlying parser?
More specifically, are they same in schema validation capability,
performance (memory/CPU usage), etc?
Thank you!

Thanks,
Jiang Bin (Bin)
GLMS Developer
Alcatel-Lucent


From boris at codesynthesis.com  Wed Oct 10 02:19:54 2007
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] What's the difference between XSD/e and XSD,
	when I using XSD with expat as the underlying parser
In-Reply-To: <D9F4444D4E5294438E29E618DCA1B80124A6C7@CNEXC1U02.bj.lucent.com>
References: <D9F4444D4E5294438E29E618DCA1B80124A6C7@CNEXC1U02.bj.lucent.com>
Message-ID: <20071010061954.GA28418@karelia>

Hi,

Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:

> What's the difference between XSD/e and XSD, if I config XSD with expat
> as the underlying parser?
> More specifically, are they same in schema validation capability,
> performance (memory/CPU usage), etc?

The main difference is the ability of XSD/e to work without many
C++ features, such as exceptions, STL, RTTI, iostream, and templates.
As a result, XSD/e-generated code is smaller, and can be compiled
with older, legacy compilers, especially if some or all of the
above C++ features are disabled. On the other hand, XSD provides
some extra features, such as different underlying parsers and
configurable character type (char or wchar_t).

XSD/e and XSD are the same in schema validation capabilities and
should have roughly equivalent performance when configured similarly
(that is, XSD/e is configured with exceptions, stl, etc.).

To put this in more general terms, in XSD/e portability, low
footprint, and performance are prioritized. While in XSD
convenience, ease of use, and performance are prioritized.

Boris


From binjiang at alcatel-lucent.com  Sun Oct 21 22:21:35 2007
From: binjiang at alcatel-lucent.com (Jiang, Bin (Bin))
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] How to config xsde to parse without namespace check /
	How to gain a better error message
Message-ID: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>

Hi Boris & all,

Sometimes I need parsing a xml fragment without checking the namespace,
it there any convenient way to do this?
I've used XSD 2.1.1 2.3.0, and now xsde 1.1.0, my way is modifiy the xsd
library files and the generated parser files,
Take xsde 1.1.0 as an example, I will modify 
xsde/cxx/parser/elements.hxx  
xsde/cxx/parser/elements.hxx  
to add a flag in class parser_base to control whether do namespace
checking during parsing:

      struct parser_base
      {
        // ....
        bool getNamespaceCheck() { return bNamespaceCheck; }
        void setNamespaceCheck(bool b) { bNamespaceCheck = b; }
        private:
          bool bNamespaceCheck;
       // ....
      };

And then I will modify the generated parser files as below:
Before:
if (n == "display-name" &&
                 ns == "urn:oma:xml:poc:list-service" )
)
After:
if (n == "display-name" &&
                ( !getNamespaceCheck() || ns ==
"urn:oma:xml:poc:list-service" )
)

By doing this, I can config whether doing namespace check when
constructing the parsers. 
But is there a better way to do this?
I can't modify the schemas definitions, cause I still need namespace
checking when parsing a WHOLE xml document, 
Only with xml fragments I don't want namespace checking.

Another question is when there is an exception, like unexpected element
or attribute, 
how to get the namespace and name of the encountered element or
attribute, I only find there is line/column and text().

Thank you!

Thanks,
Jiang Bin (Bin)
GLMS Developer
Alcatel-Lucent


From boris at codesynthesis.com  Mon Oct 22 12:04:31 2007
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] How to config xsde to parse without namespace check
	/ How to gain a better error message
In-Reply-To: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
Message-ID: <20071022160431.GB8418@karelia>

Hi Bin,

Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:

> Sometimes I need parsing a xml fragment without checking the namespace,
> it there any convenient way to do this?
> I've used XSD 2.1.1 2.3.0, and now xsde 1.1.0, my way is modifiy the xsd
> library files and the generated parser files.

I think there is a more elegant way to accomplish this. The idea is
to override the _start_element (and _start_attribute if necessary)
low-level hook on the root parser and call the original version with
a proper namespace when necessary:

virtual void
_start_element (const xml_schema::ro_string& ns,
                const xml_schema::ro_string& name)
{
  if (need_to_add_namespace)
  {
    xml_schema::ro_string ns ("urn:oma:xml:poc:list-service");
    base::_start_element (ns, name);
  }
  else
  {
    base::_start_element (ns, name);
  }
}

This will work because all events are going through the root
element parser. This can get a bit more complicated if your
vocabulary mixes qualified and unqualified elements. But
normally it is either all qualified or only root element that
is qualified and both of these cases are easy to handle with
this method.

> Another question is when there is an exception, like unexpected element
> or attribute, how to get the namespace and name of the encountered
> element or attribute, I only find there is line/column and text().

The current error propagation architecture makes it costly to pass
this information up to the caller. Because of that we decided not
to provide it. However, we are planning to change the inner workings
of C++/Parser which will also make it fairly cheap to pass extra
error information around and we will add support for names and
namespaces then. Unfortunately, this is not planned for XSD/e 2.0.0
(due in a couple of weeks) and will be implemented in XSD/e 2.1.0
which is scheduled for the end of 2007 - beginning of 2008. How
urgent is this feature for your project?

Boris


From boris at codesynthesis.com  Wed Oct 24 12:45:09 2007
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] How to config xsde to parse without namespace check
	/ How to gain a better error message
In-Reply-To: <D9F4444D4E5294438E29E618DCA1B801273608@CNEXC1U02.bj.lucent.com>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia>
	<D9F4444D4E5294438E29E618DCA1B801273608@CNEXC1U02.bj.lucent.com>
Message-ID: <20071024164509.GC861@karelia>

Hi Bin,

Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:

> [Jiang, Bin (Bin)] This may not work if there is more than one
> namespace definitions in a instance document, right?

It can still work but it gets harder. You will need to know
which element is in which namespace (e.g., have a map of
element names to namespaces).

Also, since there are several namespaces involved, all but
one must have some prefix assigned to them. You can configure
Expat to ignore namespaces in which case it will pass names
with namespace prefixes (e.g., "nsp:name"). You can use
the prefix to figure out the corresponding namespace.


> [Jiang, Bin (Bin)] It's not so urgent now, since there is
> line/column information and we can always locate the error
> position. But if I can provide name or namespace, the error
> message would be more user-friendly and complete. Is there
> any work-around I can use?

I can't think of an easy way to make it work. The schema
validation code is generated by the compiler so you will
have to modify that. I guess the easiest way is to use XSD
in the meantime (which uses exceptions to propagate errors
and includes name/namespace information).


Boris


From binjiang at alcatel-lucent.com  Wed Oct 24 12:40:50 2007
From: binjiang at alcatel-lucent.com (Jiang, Bin (Bin))
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] How to config xsde to parse without namespace check
	/ How to gain a better error message
In-Reply-To: <20071022160431.GB8418@karelia>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia>
Message-ID: <D9F4444D4E5294438E29E618DCA1B801273608@CNEXC1U02.bj.lucent.com>

Hi Boris,

Really appreciate your timely and detailed response, thank you!
Please see my more questions below.

Thanks,
Jiang Bin (Bin)
GLMS Developer
Alcatel-Lucent

> -----Original Message-----
> From: Boris Kolpackov [mailto:boris@codesynthesis.com]
> Sent: 2007?10?23? 0:05
> To: Jiang, Bin (Bin)
> Cc: xsde-users@codesynthesis.com
> Subject: Re: [xsde-users] How to config xsde to parse without namespace
> check / How to gain a better error message
> 
> Hi Bin,
> 
> Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:
> 
> > Sometimes I need parsing a xml fragment without checking the namespace,
> > it there any convenient way to do this?
> > I've used XSD 2.1.1 2.3.0, and now xsde 1.1.0, my way is modifiy the xsd
> > library files and the generated parser files.
> 
> I think there is a more elegant way to accomplish this. The idea is
> to override the _start_element (and _start_attribute if necessary)
> low-level hook on the root parser and call the original version with
> a proper namespace when necessary:
> 
> virtual void
> _start_element (const xml_schema::ro_string& ns,
>                 const xml_schema::ro_string& name)
> {
>   if (need_to_add_namespace)
>   {
>     xml_schema::ro_string ns ("urn:oma:xml:poc:list-service");
>     base::_start_element (ns, name);
>   }
>   else
>   {
>     base::_start_element (ns, name);
>   }
> }
> 
> This will work because all events are going through the root
> element parser. This can get a bit more complicated if your
> vocabulary mixes qualified and unqualified elements. But
> normally it is either all qualified or only root element that
> is qualified and both of these cases are easy to handle with
> this method.
> 

[Jiang, Bin (Bin)] 
[Jiang, Bin (Bin)] This may not work if there is more than one namespace definitions in a instance document, right?

> > Another question is when there is an exception, like unexpected element
> > or attribute, how to get the namespace and name of the encountered
> > element or attribute, I only find there is line/column and text().
> 
> The current error propagation architecture makes it costly to pass
> this information up to the caller. Because of that we decided not
> to provide it. However, we are planning to change the inner workings
> of C++/Parser which will also make it fairly cheap to pass extra
> error information around and we will add support for names and
> namespaces then. Unfortunately, this is not planned for XSD/e 2.0.0
> (due in a couple of weeks) and will be implemented in XSD/e 2.1.0
> which is scheduled for the end of 2007 - beginning of 2008. How
> urgent is this feature for your project?

[Jiang, Bin (Bin)] 
[Jiang, Bin (Bin)] It's not so urgent now, since there is line/column information and we can always locate the error position. But if I can provide name or namespace, the error message would be more user-friendly and complete. Is there any work-around I can use? Performance is not a big issue for me at this moment, since from our performance testing results, xsde1.1.0(with --no-iostream) is good than xsd 2.1.1 which we used before.

> 
> Boris


From binjiang at alcatel-lucent.com  Mon Oct 29 05:52:32 2007
From: binjiang at alcatel-lucent.com (Jiang, Bin (Bin))
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] Why xml_schema::schema exception is thrown when the
	xml document is not well formed
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia> 
Message-ID: <D9F4444D4E5294438E29E618DCA1B80127383F@CNEXC1U02.bj.lucent.com>

Hi, Boris and all,

I found in some cases, xml_schema::schema will be thrown when the xml document is not well-formed. Take the example library in the xsde release 1.1.0 as an example:

<lib:catalog xmlns:lib="http://www.codesynthesis.com/library"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.codesynthesis.com/library library.xsd">

  book id="MM" available="false" >
    <isbn>0679760806</isbn>
    <title>The Master and Margarita</title>
    <genre>fiction1</genre>
...
</lib:catalog>

If I remove the "<" of book, the parser would say:
schema error: line[15] column[34] unexpected characters encountered

while I think this should be an xml_schema::xml exception.

Thanks,
Jiang Bin (Bin)
GLMS Developer
Alcatel-Lucent

> -----Original Message-----
> From: Jiang, Bin (Bin)
> Sent: 2007?10?25? 0:41
> To: 'Boris Kolpackov'
> Cc: xsde-users@codesynthesis.com
> Subject: RE: [xsde-users] How to config xsde to parse without namespace
> check / How to gain a better error message
> 
> Hi Boris,
> 
> Really appreciate your timely and detailed response, thank you!
> Please see my more questions below.
> 
> Thanks,
> Jiang Bin (Bin)
> GLMS Developer
> Alcatel-Lucent
> 
> > -----Original Message-----
> > From: Boris Kolpackov [mailto:boris@codesynthesis.com]
> > Sent: 2007?10?23? 0:05
> > To: Jiang, Bin (Bin)
> > Cc: xsde-users@codesynthesis.com
> > Subject: Re: [xsde-users] How to config xsde to parse without namespace
> > check / How to gain a better error message
> >
> > Hi Bin,
> >
> > Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:
> >
> > > Sometimes I need parsing a xml fragment without checking the namespace,
> > > it there any convenient way to do this?
> > > I've used XSD 2.1.1 2.3.0, and now xsde 1.1.0, my way is modifiy the
> xsd
> > > library files and the generated parser files.
> >
> > I think there is a more elegant way to accomplish this. The idea is
> > to override the _start_element (and _start_attribute if necessary)
> > low-level hook on the root parser and call the original version with
> > a proper namespace when necessary:
> >
> > virtual void
> > _start_element (const xml_schema::ro_string& ns,
> >                 const xml_schema::ro_string& name)
> > {
> >   if (need_to_add_namespace)
> >   {
> >     xml_schema::ro_string ns ("urn:oma:xml:poc:list-service");
> >     base::_start_element (ns, name);
> >   }
> >   else
> >   {
> >     base::_start_element (ns, name);
> >   }
> > }
> >
> > This will work because all events are going through the root
> > element parser. This can get a bit more complicated if your
> > vocabulary mixes qualified and unqualified elements. But
> > normally it is either all qualified or only root element that
> > is qualified and both of these cases are easy to handle with
> > this method.
> >
> 
> [Jiang, Bin (Bin)]
> [Jiang, Bin (Bin)] This may not work if there is more than one namespace
> definitions in a instance document, right?
> 
> > > Another question is when there is an exception, like unexpected
> element
> > > or attribute, how to get the namespace and name of the encountered
> > > element or attribute, I only find there is line/column and text().
> >
> > The current error propagation architecture makes it costly to pass
> > this information up to the caller. Because of that we decided not
> > to provide it. However, we are planning to change the inner workings
> > of C++/Parser which will also make it fairly cheap to pass extra
> > error information around and we will add support for names and
> > namespaces then. Unfortunately, this is not planned for XSD/e 2.0.0
> > (due in a couple of weeks) and will be implemented in XSD/e 2.1.0
> > which is scheduled for the end of 2007 - beginning of 2008. How
> > urgent is this feature for your project?
> 
> [Jiang, Bin (Bin)]
> [Jiang, Bin (Bin)] It's not so urgent now, since there is line/column
> information and we can always locate the error position. But if I can
> provide name or namespace, the error message would be more user-friendly
> and complete. Is there any work-around I can use? Performance is not a big
> issue for me at this moment, since from our performance testing results,
> xsde1.1.0(with --no-iostream) is good than xsd 2.1.1 which we used before.
> 
> >
> > Boris


From boris at codesynthesis.com  Mon Oct 29 06:06:25 2007
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] Re: Why xml_schema::schema exception is thrown when
	the xml document is not well formed
In-Reply-To: <D9F4444D4E5294438E29E618DCA1B80127383F@CNEXC1U02.bj.lucent.com>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia>
	<D9F4444D4E5294438E29E618DCA1B80127383F@CNEXC1U02.bj.lucent.com>
Message-ID: <20071029100625.GE5582@karelia>

Hi Bin,

Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:

> I found in some cases, xml_schema::schema will be thrown when the
> xml document is not well-formed. Take the example library in the
> xsde release 1.1.0 as an example:
>
> <lib:catalog xmlns:lib="http://www.codesynthesis.com/library"
>              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>          xsi:schemaLocation="http://www.codesynthesis.com/library library.xsd">
>
>   book id="MM" available="false" >
>     <isbn>0679760806</isbn>
>     <title>The Master and Margarita</title>
>     <genre>fiction1</genre>
> ...
> </lib:catalog>

Actually, this XML is perfectly well-formed. By removing '<' from
the book tag, you made it to appear as just a text fragment. Note
that you don't have to escape '>' in the element content.

> If I remove the "<" of book, the parser would say:
> schema error: line[15] column[34] unexpected characters encountered

Which is correct. The catalog type specifies that its content should
be a sequence of book elements. As a result, when parser encounters
text, it reports it as a validation error.

I guess this example shows how far a well-formed XML can be from what
an application expects and why it is generally a good idea to validate
the documents against the vocabulary schema :-).

Boris


From binjiang at alcatel-lucent.com  Mon Oct 29 12:59:19 2007
From: binjiang at alcatel-lucent.com (Jiang, Bin (Bin))
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] RE: Why xml_schema::schema exception is thrown when
	the xml document is not well formed
In-Reply-To: <20071029100625.GE5582@karelia>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia>
	<D9F4444D4E5294438E29E618DCA1B80127383F@CNEXC1U02.bj.lucent.com>
	<20071029100625.GE5582@karelia>
Message-ID: <D9F4444D4E5294438E29E618DCA1B801273857@CNEXC1U02.bj.lucent.com>

Hi Boris,

Thank you for the explanation!
Please see my other two questions below.

Thanks,
Jiang Bin (Bin)
GLMS Developer
Alcatel-Lucent

> -----Original Message-----
> From: Boris Kolpackov [mailto:boris@codesynthesis.com]
> Sent: 2007?10?29? 18:06
> To: Jiang, Bin (Bin)
> Cc: xsde-users@codesynthesis.com
> Subject: Re: Why xml_schema::schema exception is thrown when the xml
> document is not well formed
> 
> Hi Bin,
> 
> Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:
> 
> > I found in some cases, xml_schema::schema will be thrown when the
> > xml document is not well-formed. Take the example library in the
> > xsde release 1.1.0 as an example:
> >
> > <lib:catalog xmlns:lib="http://www.codesynthesis.com/library"
> >              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >          xsi:schemaLocation="http://www.codesynthesis.com/library
> library.xsd">
> >
> >   book id="MM" available="false" >
> >     <isbn>0679760806</isbn>
> >     <title>The Master and Margarita</title>
> >     <genre>fiction1</genre>
> > ...
> > </lib:catalog>
> 
> Actually, this XML is perfectly well-formed. By removing '<' from
> the book tag, you made it to appear as just a text fragment. Note
> that you don't have to escape '>' in the element content.
> 
> > If I remove the "<" of book, the parser would say:
> > schema error: line[15] column[34] unexpected characters encountered
> 
> Which is correct. The catalog type specifies that its content should
> be a sequence of book elements. As a result, when parser encounters
> text, it reports it as a validation error.
[Jiang, Bin (Bin)] 
In this case, why the column number is not the beginning of the line, but is 34, the end of the line?

When the characters would be ignored by the parser?
I add a type like below to the xsd:
       <xsd:complexType name="identityType">
           <xsd:complexContent>
               <xsd:restriction base="xsd:anyType">
                   <xsd:choice maxOccurs="unbounded">
                       <xsd:element name="one" type="xsd:string"/>
                       <xsd:element name="two" type="xsd:string" minOccurs="0"/>
                   </xsd:choice>
               </xsd:restriction>
           </xsd:complexContent>
       </xsd:complexType>

And find characters inside "identityType" element would be ignored, is this because the "identityType" type is derived from "xsd:anyType"?


> 
> I guess this example shows how far a well-formed XML can be from what
> an application expects and why it is generally a good idea to validate
> the documents against the vocabulary schema :-).
> 
> Boris


From boris at codesynthesis.com  Mon Oct 29 13:59:54 2007
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue Jul  1 03:37:21 2008
Subject: [xsde-users] Re: Why xml_schema::schema exception is thrown when
	the xml document is not well formed
In-Reply-To: <D9F4444D4E5294438E29E618DCA1B801273857@CNEXC1U02.bj.lucent.com>
References: <D9F4444D4E5294438E29E618DCA1B8012733B1@CNEXC1U02.bj.lucent.com>
	<20071022160431.GB8418@karelia>
	<D9F4444D4E5294438E29E618DCA1B80127383F@CNEXC1U02.bj.lucent.com>
	<20071029100625.GE5582@karelia>
	<D9F4444D4E5294438E29E618DCA1B801273857@CNEXC1U02.bj.lucent.com>
Message-ID: <20071029175954.GB9333@karelia>

Hi Bin,

Jiang, Bin (Bin) <binjiang@alcatel-lucent.com> writes:

> In this case, why the column number is not the beginning of the
> line, but is 34, the end of the line?

That's how the underlying XML parser (Expat) works. If you query
the column after the event has been triggered, you get the value
that points at the end of content that triggered the event.


> When the characters would be ignored by the parser?

One way to achieve this would be to declare your type as having
mixed content (add mixed="true" attribute). The text content
will be delivered to the _any_characters() hook which by default
does nothing.


>        <xsd:complexType name="identityType">
>            <xsd:complexContent>
>                <xsd:restriction base="xsd:anyType">
>                    <xsd:choice maxOccurs="unbounded">
>                        <xsd:element name="one" type="xsd:string"/>
>                        <xsd:element name="two" type="xsd:string" minOccurs="0"/>
>                    </xsd:choice>
>                </xsd:restriction>
>            </xsd:complexContent>
>        </xsd:complexType>
>
> And find characters inside "identityType" element would be ignored, is
> this because the "identityType" type is derived from "xsd:anyType"?

Hm, the characters should still be flagged as error. I guess you just
found a bug! We will try to fix it for the next release.

Boris