From laurence.davies at unsw.edu.au Wed Sep 2 03:34:18 2015 From: laurence.davies at unsw.edu.au (Laurence Davies) Date: Wed Sep 2 03:34:27 2015 Subject: [xsd-users] xlink href use of idref class Message-ID: Hi Boris, I'm revisiting the xlink.href idref feature of XSD, as per the below email from you in 2008. My question is with regards to the scope of the search that idref carries out when calling get(). In the below code you initialise the idref as follows color_t& color = ... // reference to an object containing href xml_schema::idref ref (href, 0, &color); The `color` element is a "reference to an object containing href". Does this mean that color is a _type object which contains a tree of children, one of which containing a _tree object with ID==href? Or does it explicitly mean that color.ID==href? The reason I ask is because I am streaming a document as it is being parsed into custom classes and discarding the generated _type objects. Thus for the href to point to something valid, I need to know where XSD is looking for said valid object. Thank you and regards, Laurence > Boris Kolpackov boris at codesynthesis.com > Fri Aug 8 04:39:13 EDT 2008 > > I checked the GML schema and the gml:id attribute is of xsd:ID type. If > all href attributes always refer one of those gml:id attributes then this > is pretty simple to do: > > std::string href = ... // contains "COLOR_70", without # > color_t& color = ... // reference to an object containing href > > xml_schema::idref ref (href, 0, &color); > xml_schema::type* p = ref.get (); // p points to the object with ID COLOR_70 > > // Now you can use static_cast (if you are sure about the type of p) or > // dynamic_cast (if you are not) to cast it to the Color_t type. > > If you cannot assume that all href attributes point to one of the gml:id > attributes (or other attributes as long as they are of the xsd:ID type) > then things get a bit more complex. Let me know if this is the case and > I will describe how this can be done. Laurence Davies ____________________ Research Assistant in eGeodesy CRC-SI, UNSW Desk phone: (03) 8636 2373 Mobile: 0427 519 289 From boris at codesynthesis.com Fri Sep 4 11:02:20 2015 From: boris at codesynthesis.com (Boris Kolpackov) Date: Fri Sep 4 11:02:28 2015 Subject: [xsd-users] xlink href use of idref class In-Reply-To: References: Message-ID: Hi Laurence, Laurence Davies writes: > color_t& color = ... // reference to an object containing href > xml_schema::idref ref (href, 0, &color); > > The `color` element is a "reference to an object containing href". Does > this mean that color is a _type object which contains a tree of children, > one of which containing a _tree object with ID==href? Yes, that's correct. From laurence.davies at unsw.edu.au Fri Sep 4 19:20:16 2015 From: laurence.davies at unsw.edu.au (Laurence Davies) Date: Fri Sep 4 19:20:28 2015 Subject: [xsd-users] xlink href use of idref class In-Reply-To: References: , Message-ID: > > color_t& color = ... // reference to an object containing href > > xml_schema::idref ref (href, 0, &color); > > > > The `color` element is a "reference to an object containing href". Does > > this mean that color is a _type object which contains a tree of children, > > one of which containing a _tree object with ID==href? > > Yes, that's correct. In that case, in the streaming example would this tree of children (stored in map_?) be complete? I see in the idref::get() method it performs root_()->lookup() which would assume that the correct root container can be referenced in root_() and that lookup() can find the matching ID element from map_. Considering that in the streaming code the type_ objects are often discarded, would keeping a "shell" of the root element solve this problem? Or do I need to have the complete tree in memory for it to work correctly? Thank you and regards, Laurence Laurence Davies ____________________ Research Assistant in eGeodesy CRC-SI, UNSW Desk phone: (03) 8636 2373 Mobile: 0427 519 289 ________________________________________ From: Boris Kolpackov [boris@codesynthesis.com] Sent: Saturday, 5 September 2015 1:02 AM To: Laurence Davies Cc: xsd-users@codesynthesis.com Subject: Re: [xsd-users] xlink href use of idref class Hi Laurence, Laurence Davies writes: > color_t& color = ... // reference to an object containing href > xml_schema::idref ref (href, 0, &color); > > The `color` element is a "reference to an object containing href". Does > this mean that color is a _type object which contains a tree of children, > one of which containing a _tree object with ID==href? Yes, that's correct. From boris at codesynthesis.com Mon Sep 7 10:11:19 2015 From: boris at codesynthesis.com (Boris Kolpackov) Date: Mon Sep 7 10:11:27 2015 Subject: [xsd-users] xlink href use of idref class In-Reply-To: References: Message-ID: Hi Laurence, Laurence Davies writes: > In that case, in the streaming example would this tree of children (stored > in map_?) be complete? I see in the idref::get() method it performs > root_()->lookup() which would assume that the correct root container can be > referenced in root_() and that lookup() can find the matching ID element > from map_. Considering that in the streaming code the type_ objects are > often discarded, would keeping a "shell" of the root element solve this > problem? Or do I need to have the complete tree in memory for it to work > correctly? I think you know the answer to this. The ID/IDREF mechanism allows you to establish a "pointer" from one element anywhere in the *entire* document to another element, also anywhere in the *entire* document. The streaming approach allows you not to store the *entire* document in memory. What do you think, are they compatible, in the general case? The only situation where you can use the ID/IDREF mechanism with streaming is if you know (i.e., this is your vocabulary-specific knowledge) that the ID points to an element inside the same chunk that you have parsed. Boris From vladimir.zykov at ncloudtech.ru Tue Sep 8 10:46:26 2015 From: vladimir.zykov at ncloudtech.ru (Vladimir Zykov) Date: Tue Sep 8 10:47:44 2015 Subject: [xsd-users] XercesC security problems Message-ID: Hi Boris, We've encountered one nasty security problem with the way how we use XSD. In general the problem is not in XSD. It's in Xerces-C++ but since XSD wraps it we need some way to configure XercesC in runtime through XSD. The problem is known and described here https://www.owasp.org/index.php/XML_External_Entity_%28XXE%29_Processing Before investigating/trying to fix it on our own I wanted to ask you if XSD already provides a way to disable external entity resolving in XercesC. Actually we want to completely disable entity processing even if we'll not be able to parse some XML instances as there is the other related problem described here http://projects.webappsec.org/w/page/13247002/XML%20Entity%20Expansion and which we potentially may hit. Thanks, Vladimir Zykov Software Engineer New Cloud Technologies, Ltd From boris at codesynthesis.com Thu Sep 10 06:27:57 2015 From: boris at codesynthesis.com (Boris Kolpackov) Date: Thu Sep 10 06:28:02 2015 Subject: [xsd-users] XercesC security problems In-Reply-To: References: Message-ID: Hi Vladimir, Vladimir Zykov writes: > Before investigating/trying to fix it on our own I wanted to ask you > if XSD already provides a way to disable external entity resolving > in XercesC. No, XSD doesn't have built-in support for this. And it is surprisingly hard to achieve in Xerces-C++. It has a security manager that can be used to limit the number of entity expansions. So one would think it is a simple matter of setting this limit to 0. But, no, Xerces-C++ first expands the entity and then checks the limit. Brilliant! In any case, I've decided to create an example that demonstrates how this can be done. In a nutshell, we have to provide a customized DOM parser that intercepts attempts to parse DOCTYPEs that contains either internal or external DTD subset. It still accepts simple DOCTYPE declarations, though. It is not exactly what you asked for (i.e., ignoring external entity expansions), but I think this is the best we can do in Xerces-C++. I think we might be able to ignore the external subset via the EntityResolver. The example archive is here: http://codesynthesis.com/~boris/tmp/xsd/secure.tar.gz See the README file to get started. Boris From vladimir.zykov at ncloudtech.ru Fri Sep 18 12:00:22 2015 From: vladimir.zykov at ncloudtech.ru (Vladimir Zykov) Date: Fri Sep 18 12:47:03 2015 Subject: [xsd-users] XercesC security problems In-Reply-To: References: Message-ID: <997F7630-5328-46C7-9295-2427DC57250B@ncloudtech.ru> Hi Boris, Thanks a lot for your hint. It really saved us a lot of trouble. I've completely forgotten that generated XSD parsing functions accept Xerces-C++ XML DOM and that we can separate XML parsing from creation of domain model. On Sep 10, 2015, at 13:27, Boris Kolpackov > wrote: In any case, I've decided to create an example that demonstrates how this can be done. In a nutshell, we have to provide a customized DOM parser that intercepts attempts to parse DOCTYPEs that contains either internal or external DTD subset. It still accepts simple DOCTYPE declarations, though. It is not exactly what you asked for (i.e., ignoring external entity expansions), but I think this is the best we can do in Xerces-C++. I think we might be able to ignore the external subset via the EntityResolver. Vladimir Zykov Software Engineer New Cloud Technologies, Ltd