From scott.houchin at aero.org  Mon Apr  2 17:06:32 2018
From: scott.houchin at aero.org (Scott Houchin)
Date: Mon Apr  2 17:10:51 2018
Subject: [xsd-users] Default parsers for incomplete polymorphic schemas
Message-ID: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org>

Hi,

I am starting to look at xsd cxx-parser as part of development of a forward compatible XML interface standard. What I want to do is define a parser that will handle all extended types as if they were the base type if the xsi:type is unknown.

As a simple example, take the example code for examples/cxx/parser/polymorphism and remove everything about batman. But still have it parse the supermen.xml file that includes that element with xsi:type=batman.

The XML document itself might still validate, but is based on a newer version of the schema than was used to generate the C++ code; but I don?t want the C++ code to outright reject the element when partial parsing would provide some information.

I am perfectly OK if the resulting behavior ignores any contained element or attribute unknown to the generic definition of the superman type, but if there was a way to allow DOM or SAX based access to the raw element, that would be preferred.

I did look at creating a custom type map, but in this case I can?t create a reliable regex to match ?batman? if I also consider possible other superman subclasses like ?wonderwoman? and ?wolverine?.


Regards,
Scott Houchin


From henry at commvault.com  Mon Apr  2 11:54:47 2018
From: henry at commvault.com (Henry Dornemann)
Date: Wed Apr  4 06:39:47 2018
Subject: [xsd-users] Ignoring unknown elements
Message-ID: <aa3492ca3778486c9970211055c273c4@POST-1.gp.cv.commvault.com>

Is there some way to ignore unknown elements while parsing an XML document?

The situation that I have is the xsd schema file is provided by a 3rd party and I utilize xsd to generate code to parse this.  The 3rd party will extend their schema with new versions and this breaks our ability to parse the documents even though I only care about the already known elements and not the new elements that are added in the new version.

What's worse here is that this condition doesn't even generate a parsing error, we just get a partially populated object and don't even realize that there was something wrong with the input document

I found a previous question on this same topic
https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453.html
but I don't see any resolution from it; was any command line option added to avoid breaking out of the element parsing loop?


Thanks
-Henry


***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
From boris at codesynthesis.com  Wed Apr  4 06:45:15 2018
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Wed Apr  4 06:49:37 2018
Subject: [xsd-users] Ignoring unknown elements
In-Reply-To: <aa3492ca3778486c9970211055c273c4@POST-1.gp.cv.commvault.com>
References: <aa3492ca3778486c9970211055c273c4@POST-1.gp.cv.commvault.com>
Message-ID: <boris.20180404123930@codesynthesis.com>

Henry Dornemann <henry@commvault.com> writes:

> Is there some way to ignore unknown elements while parsing an XML document?
>
> The situation that I have is the xsd schema file is provided by a 3rd party
> and I utilize xsd to generate code to parse this. The 3rd party will extend
> their schema with new versions and this breaks our ability to parse the
> documents even though I only care about the already known elements and not
> the new elements that are added in the new version.
> 
> What's worse here is that this condition doesn't even generate a parsing
> error, we just get a partially populated object and don't even realize
> that there was something wrong with the input document

You would have gotten a validation error (against schema that you have
used to generate the code) if you haven't disabled it. In other words,
if you disable validation and feed invalid XML, pretty much all bets
are off.


> I found a previous question on this same topic
> https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453.html
> but I don't see any resolution from it; was any command line option
> added to avoid breaking out of the element parsing loop?

Unfortunately, there is no auto-magic resolution other than what's
suggested in that email (either use wildcards or pre-filter the DOM).
Specifically, not breaking out of the loop will break support for
inheritance.

Boris

From boris at codesynthesis.com  Wed Apr  4 07:01:36 2018
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Wed Apr  4 07:06:59 2018
Subject: [xsd-users] Default parsers for incomplete polymorphic schemas
In-Reply-To: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org>
References: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org>
Message-ID: <boris.20180404125804@codesynthesis.com>

Scott Houchin <scott.houchin@aero.org> writes:

> I am starting to look at xsd cxx-parser as part of development of a forward
> compatible XML interface standard. What I want to do is define a parser that
> will handle all extended types as if they were the base type if the xsi:type
> is unknown.
>
> [...]
> 
> I did look at creating a custom type map [...]

We did something like this in XSD/e -- see the examples in
examples/cxx/hybrid/evolution/.

We could probably do the same in XSD for C++/Parser, though this mapping
is no longer actively developed (we will still fix bugs, etc). 

Boris

From henry at commvault.com  Wed Apr  4 08:11:54 2018
From: henry at commvault.com (Henry Dornemann)
Date: Fri Apr  6 00:25:52 2018
Subject: [xsd-users] Ignoring unknown elements
In-Reply-To: <boris.20180404123930@codesynthesis.com>
References: <aa3492ca3778486c9970211055c273c4@POST-1.gp.cv.commvault.com>
	<boris.20180404123930@codesynthesis.com>
Message-ID: <b7225724ba79440084989a2b0badb134@POST-1.gp.cv.commvault.com>

Boris,

Could you elaborate further on the steps you are proposing here?  At a glance they seem rather involved, and that I'd essentially be parsing the XML myself, which would defeat the purpose of using xsd to generate the parsing code.

I don't quite understand why not breaking from the loop would break inheritance.  I can see each object calling it's parent parse method that has its own for loop parsing the elements.  Perhaps the derived parse picks up where the other method left off so it would miss out.  Couldn't the parse method be broken up to the for loop and a parseElement which would handle each element and could call up the class hierarchy looking for an object that recognizes each element? 

Do you consider what I am trying to do here an invalid use case for xsd?  

Thanks
-Henry


-----Original Message-----
From: Boris Kolpackov <boris@codesynthesis.com> 
Sent: Wednesday, April 4, 2018 6:45 AM
To: Henry Dornemann <henry@commvault.com>
Cc: xsd-users@codesynthesis.com
Subject: Re: [xsd-users] Ignoring unknown elements

Henry Dornemann <henry@commvault.com> writes:

> Is there some way to ignore unknown elements while parsing an XML document?
>
> The situation that I have is the xsd schema file is provided by a 3rd 
> party and I utilize xsd to generate code to parse this. The 3rd party 
> will extend their schema with new versions and this breaks our ability 
> to parse the documents even though I only care about the already known 
> elements and not the new elements that are added in the new version.
> 
> What's worse here is that this condition doesn't even generate a 
> parsing error, we just get a partially populated object and don't even 
> realize that there was something wrong with the input document

You would have gotten a validation error (against schema that you have used to generate the code) if you haven't disabled it. In other words, if you disable validation and feed invalid XML, pretty much all bets are off.


> I found a previous question on this same topic 
> https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453.
> html but I don't see any resolution from it; was any command line 
> option added to avoid breaking out of the element parsing loop?

Unfortunately, there is no auto-magic resolution other than what's suggested in that email (either use wildcards or pre-filter the DOM).
Specifically, not breaking out of the loop will break support for inheritance.

Boris

***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************


From boris at codesynthesis.com  Fri Apr  6 00:39:16 2018
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Fri Apr  6 00:43:40 2018
Subject: [xsd-users] Ignoring unknown elements
In-Reply-To: <b7225724ba79440084989a2b0badb134@POST-1.gp.cv.commvault.com>
References: <aa3492ca3778486c9970211055c273c4@POST-1.gp.cv.commvault.com>
	<boris.20180404123930@codesynthesis.com>
	<b7225724ba79440084989a2b0badb134@POST-1.gp.cv.commvault.com>
Message-ID: <boris.20180406062222@codesynthesis.com>

Henry Dornemann <henry@commvault.com> writes:
 
> Could you elaborate further on the steps you are proposing here? At a
> glance they seem rather involved [...]
>
> Do you consider what I am trying to do here an invalid use case for xsd?

Schema evolution is a hard problem that cannot be solved generally at
the XSD level. But provided certain conditions are met and with some
effort from your side, it could be handled.

The two common strategies are:

1. Modify the schema with element wildcards that will serve as
   "extension points" for future versions of the schema. Essentially,
   this boils down to adding:

   <xsd:any namespace="##targetNamespace" processContents="lax" maxOccurs="unbounded" />

   At the end of every complex content type. While you could do this
   yourself (perhaps with a help of a script or some such), ideally
   you would want to ask the authors of the schema to maintain this.

   One significant advantage of this approach is that you can keep
   XML Schema validation enabled.

2. The second approach involves parsing XML to DOM, filtering out
   unknown elements, and then passing the "fixed up" DOM to the
   XSD-generated object model.

   This approach relies on the ability to distinguish new elements
   from old. For example, if the authors of the schema place new
   elements for each version in their own namespace, then you can
   easily and generically remove all such elements from DOM. Again,
   ideally, you would want the authors of the schema to maintain
   this.


> I don't quite understand why not breaking from the loop would break
> inheritance. I can see each object calling it's parent parse method
> that has its own for loop parsing the elements. Perhaps the derived
> parse picks up where the other method left off so it would miss out.

Yes, that's exactly what would happen.


> Couldn't the parse method be broken up to the for loop and a
> parseElement which would handle each element and could call up the
> class hierarchy looking for an object that recognizes each element?

Yes, this could be an alternative, much slower, implementation. It
will penalize the 99% of cases (where people are parsing valid XML)
to handle the 1%. And still it won't be a general solution to the
schema evolution problem since additions of elements is only one
issue. What if an existing element's type was changed? For example,
it was an integer and now it is a string: all the old XML documents
are still valid per the new schema but not the other way around.

Boris

From n.e.x.g.e.n.s at gmail.com  Fri Apr  6 06:09:02 2018
From: n.e.x.g.e.n.s at gmail.com (Shrikant)
Date: Fri Apr  6 06:13:49 2018
Subject: [xsd-users] Need to speed up performance of parser
Message-ID: <CAMTKWrFhhmTZ93uzNAXmLST=PPsLHUk0qK8hncJD4pF1Dg55eQ@mail.gmail.com>

Hi,

We have used the cxx/tree/embedded example exactly as it is in our
application and parsing the incoming XML and mapping the XML elements to
internal structures. We have tried optimizing our code as much as possible
. But with no luck. Can anyone suggest tweaking any setParameter options
for parser config, so that the parser performance can improve. We are
looking to parse and process 100 or more XML messages per second , but with
Codesynthesis , we are getting performance of about 80 XML messages per
second.

Appreciate any help any one can offer

Thanks,
Shrikant
From boris at codesynthesis.com  Mon Apr  9 02:04:20 2018
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Mon Apr  9 02:08:51 2018
Subject: [xsd-users] Need to speed up performance of parser
In-Reply-To: <CAMTKWrFhhmTZ93uzNAXmLST=PPsLHUk0qK8hncJD4pF1Dg55eQ@mail.gmail.com>
References: <CAMTKWrFhhmTZ93uzNAXmLST=PPsLHUk0qK8hncJD4pF1Dg55eQ@mail.gmail.com>
Message-ID: <boris.20180409080042@codesynthesis.com>

Shrikant <n.e.x.g.e.n.s@gmail.com> writes:
 
> We have used the cxx/tree/embedded example exactly as it is in our
> application and parsing the incoming XML and mapping the XML elements to
> internal structures. We have tried optimizing our code as much as possible.
> But with no luck. Can anyone suggest tweaking any setParameter options
> for parser config, so that the parser performance can improve.

The idea is to cache and reuse as much as possible. Take a look at the
'performance' example for some ideas.

If that still doesn't get you to your desired performance, then the
next step would be to profile your application and see if there are
any hotspots.

Boris

From email at yangkaijin.cn  Tue Apr 24 02:55:11 2018
From: email at yangkaijin.cn (=?gb18030?B?0e6/qr31?=)
Date: Tue Apr 24 09:25:41 2018
Subject: [xsd-users] Problem: About targetNamespace
Message-ID: <tencent_BE969569C6A1C3989D3C25A49DA8A53A3609@qq.com>

I use the xsd.exe to generate xxx.hxx and xxx.cxx for my xxx.xsd

but when I try to parse the xml file, it failed with: no declaration found for element 'xxx'.


I try to locate the problem, and it seems to be that, my xxx.xsd has a targetNamespace declaration.


I take the situation into the Hello World Example:


xsd:


<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance"
           targetNamespace="http://www.w3.org/2001/XMLSchema_instance"
           elementFormDefault="qualified" attributeFormDefault="unqualified">


  <xs:complexType name="hello_t">
    <xs:sequence>
      <xs:element name="greeting" type="xs:string"/>
      <xs:element name="name" type="xs:string" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>


  <xs:element name="hello" type="xsi:hello_t"/>


</xs:schema>


xml:


<?xml version="1.0"?>
<hello xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance"
       xsi:noNamespaceSchemaLocation="hello.xsd">


  <greeting>Hello</greeting>


  <name>sun</name>
  <name>moon</name>
  <name>world</name>


</hello>


Then it does not work well, I need to known if there is some way to come through.


Note:
When I put the xsi prefix before the hello, greeting, name in the xml file, it stay still.
My xsd file comes from a third part, and it is long enough, so it would be large work to change all the declarations in the xsd file to kill the targetNamespace.