From scott.houchin at aero.org Mon Apr 2 17:06:32 2018 From: scott.houchin at aero.org (Scott Houchin) Date: Mon Apr 2 17:10:51 2018 Subject: [xsd-users] Default parsers for incomplete polymorphic schemas Message-ID: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org> Hi, I am starting to look at xsd cxx-parser as part of development of a forward compatible XML interface standard. What I want to do is define a parser that will handle all extended types as if they were the base type if the xsi:type is unknown. As a simple example, take the example code for examples/cxx/parser/polymorphism and remove everything about batman. But still have it parse the supermen.xml file that includes that element with xsi:type=batman. The XML document itself might still validate, but is based on a newer version of the schema than was used to generate the C++ code; but I don?t want the C++ code to outright reject the element when partial parsing would provide some information. I am perfectly OK if the resulting behavior ignores any contained element or attribute unknown to the generic definition of the superman type, but if there was a way to allow DOM or SAX based access to the raw element, that would be preferred. I did look at creating a custom type map, but in this case I can?t create a reliable regex to match ?batman? if I also consider possible other superman subclasses like ?wonderwoman? and ?wolverine?. Regards, Scott Houchin From henry at commvault.com Mon Apr 2 11:54:47 2018 From: henry at commvault.com (Henry Dornemann) Date: Wed Apr 4 06:39:47 2018 Subject: [xsd-users] Ignoring unknown elements Message-ID: Is there some way to ignore unknown elements while parsing an XML document? The situation that I have is the xsd schema file is provided by a 3rd party and I utilize xsd to generate code to parse this. The 3rd party will extend their schema with new versions and this breaks our ability to parse the documents even though I only care about the already known elements and not the new elements that are added in the new version. What's worse here is that this condition doesn't even generate a parsing error, we just get a partially populated object and don't even realize that there was something wrong with the input document I found a previous question on this same topic https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453.html but I don't see any resolution from it; was any command line option added to avoid breaking out of the element parsing loop? Thanks -Henry ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** From boris at codesynthesis.com Wed Apr 4 06:45:15 2018 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed Apr 4 06:49:37 2018 Subject: [xsd-users] Ignoring unknown elements In-Reply-To: References: Message-ID: Henry Dornemann writes: > Is there some way to ignore unknown elements while parsing an XML document? > > The situation that I have is the xsd schema file is provided by a 3rd party > and I utilize xsd to generate code to parse this. The 3rd party will extend > their schema with new versions and this breaks our ability to parse the > documents even though I only care about the already known elements and not > the new elements that are added in the new version. > > What's worse here is that this condition doesn't even generate a parsing > error, we just get a partially populated object and don't even realize > that there was something wrong with the input document You would have gotten a validation error (against schema that you have used to generate the code) if you haven't disabled it. In other words, if you disable validation and feed invalid XML, pretty much all bets are off. > I found a previous question on this same topic > https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453.html > but I don't see any resolution from it; was any command line option > added to avoid breaking out of the element parsing loop? Unfortunately, there is no auto-magic resolution other than what's suggested in that email (either use wildcards or pre-filter the DOM). Specifically, not breaking out of the loop will break support for inheritance. Boris From boris at codesynthesis.com Wed Apr 4 07:01:36 2018 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed Apr 4 07:06:59 2018 Subject: [xsd-users] Default parsers for incomplete polymorphic schemas In-Reply-To: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org> References: <1BDE6FC4-503F-4BAF-860E-D51D8EC99FED@aero.org> Message-ID: Scott Houchin writes: > I am starting to look at xsd cxx-parser as part of development of a forward > compatible XML interface standard. What I want to do is define a parser that > will handle all extended types as if they were the base type if the xsi:type > is unknown. > > [...] > > I did look at creating a custom type map [...] We did something like this in XSD/e -- see the examples in examples/cxx/hybrid/evolution/. We could probably do the same in XSD for C++/Parser, though this mapping is no longer actively developed (we will still fix bugs, etc). Boris From henry at commvault.com Wed Apr 4 08:11:54 2018 From: henry at commvault.com (Henry Dornemann) Date: Fri Apr 6 00:25:52 2018 Subject: [xsd-users] Ignoring unknown elements In-Reply-To: References: Message-ID: Boris, Could you elaborate further on the steps you are proposing here? At a glance they seem rather involved, and that I'd essentially be parsing the XML myself, which would defeat the purpose of using xsd to generate the parsing code. I don't quite understand why not breaking from the loop would break inheritance. I can see each object calling it's parent parse method that has its own for loop parsing the elements. Perhaps the derived parse picks up where the other method left off so it would miss out. Couldn't the parse method be broken up to the for loop and a parseElement which would handle each element and could call up the class hierarchy looking for an object that recognizes each element? Do you consider what I am trying to do here an invalid use case for xsd? Thanks -Henry -----Original Message----- From: Boris Kolpackov Sent: Wednesday, April 4, 2018 6:45 AM To: Henry Dornemann Cc: xsd-users@codesynthesis.com Subject: Re: [xsd-users] Ignoring unknown elements Henry Dornemann writes: > Is there some way to ignore unknown elements while parsing an XML document? > > The situation that I have is the xsd schema file is provided by a 3rd > party and I utilize xsd to generate code to parse this. The 3rd party > will extend their schema with new versions and this breaks our ability > to parse the documents even though I only care about the already known > elements and not the new elements that are added in the new version. > > What's worse here is that this condition doesn't even generate a > parsing error, we just get a partially populated object and don't even > realize that there was something wrong with the input document You would have gotten a validation error (against schema that you have used to generate the code) if you haven't disabled it. In other words, if you disable validation and feed invalid XML, pretty much all bets are off. > I found a previous question on this same topic > https://www.codesynthesis.com/pipermail/xsd-users/2014-October/004453. > html but I don't see any resolution from it; was any command line > option added to avoid breaking out of the element parsing loop? Unfortunately, there is no auto-magic resolution other than what's suggested in that email (either use wildcards or pre-filter the DOM). Specifically, not breaking out of the loop will break support for inheritance. Boris ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** From boris at codesynthesis.com Fri Apr 6 00:39:16 2018 From: boris at codesynthesis.com (Boris Kolpackov) Date: Fri Apr 6 00:43:40 2018 Subject: [xsd-users] Ignoring unknown elements In-Reply-To: References: Message-ID: Henry Dornemann writes: > Could you elaborate further on the steps you are proposing here? At a > glance they seem rather involved [...] > > Do you consider what I am trying to do here an invalid use case for xsd? Schema evolution is a hard problem that cannot be solved generally at the XSD level. But provided certain conditions are met and with some effort from your side, it could be handled. The two common strategies are: 1. Modify the schema with element wildcards that will serve as "extension points" for future versions of the schema. Essentially, this boils down to adding: At the end of every complex content type. While you could do this yourself (perhaps with a help of a script or some such), ideally you would want to ask the authors of the schema to maintain this. One significant advantage of this approach is that you can keep XML Schema validation enabled. 2. The second approach involves parsing XML to DOM, filtering out unknown elements, and then passing the "fixed up" DOM to the XSD-generated object model. This approach relies on the ability to distinguish new elements from old. For example, if the authors of the schema place new elements for each version in their own namespace, then you can easily and generically remove all such elements from DOM. Again, ideally, you would want the authors of the schema to maintain this. > I don't quite understand why not breaking from the loop would break > inheritance. I can see each object calling it's parent parse method > that has its own for loop parsing the elements. Perhaps the derived > parse picks up where the other method left off so it would miss out. Yes, that's exactly what would happen. > Couldn't the parse method be broken up to the for loop and a > parseElement which would handle each element and could call up the > class hierarchy looking for an object that recognizes each element? Yes, this could be an alternative, much slower, implementation. It will penalize the 99% of cases (where people are parsing valid XML) to handle the 1%. And still it won't be a general solution to the schema evolution problem since additions of elements is only one issue. What if an existing element's type was changed? For example, it was an integer and now it is a string: all the old XML documents are still valid per the new schema but not the other way around. Boris From n.e.x.g.e.n.s at gmail.com Fri Apr 6 06:09:02 2018 From: n.e.x.g.e.n.s at gmail.com (Shrikant) Date: Fri Apr 6 06:13:49 2018 Subject: [xsd-users] Need to speed up performance of parser Message-ID: Hi, We have used the cxx/tree/embedded example exactly as it is in our application and parsing the incoming XML and mapping the XML elements to internal structures. We have tried optimizing our code as much as possible . But with no luck. Can anyone suggest tweaking any setParameter options for parser config, so that the parser performance can improve. We are looking to parse and process 100 or more XML messages per second , but with Codesynthesis , we are getting performance of about 80 XML messages per second. Appreciate any help any one can offer Thanks, Shrikant From boris at codesynthesis.com Mon Apr 9 02:04:20 2018 From: boris at codesynthesis.com (Boris Kolpackov) Date: Mon Apr 9 02:08:51 2018 Subject: [xsd-users] Need to speed up performance of parser In-Reply-To: References: Message-ID: Shrikant writes: > We have used the cxx/tree/embedded example exactly as it is in our > application and parsing the incoming XML and mapping the XML elements to > internal structures. We have tried optimizing our code as much as possible. > But with no luck. Can anyone suggest tweaking any setParameter options > for parser config, so that the parser performance can improve. The idea is to cache and reuse as much as possible. Take a look at the 'performance' example for some ideas. If that still doesn't get you to your desired performance, then the next step would be to profile your application and see if there are any hotspots. Boris From email at yangkaijin.cn Tue Apr 24 02:55:11 2018 From: email at yangkaijin.cn (=?gb18030?B?0e6/qr31?=) Date: Tue Apr 24 09:25:41 2018 Subject: [xsd-users] Problem: About targetNamespace Message-ID: I use the xsd.exe to generate xxx.hxx and xxx.cxx for my xxx.xsd but when I try to parse the xml file, it failed with: no declaration found for element 'xxx'. I try to locate the problem, and it seems to be that, my xxx.xsd has a targetNamespace declaration. I take the situation into the Hello World Example: xsd: xml: Hello sun moon world Then it does not work well, I need to known if there is some way to come through. Note: When I put the xsi prefix before the hello, greeting, name in the xml file, it stay still. My xsd file comes from a third part, and it is long enough, so it would be large work to change all the declarations in the xsd file to kill the targetNamespace.