From laurence.davies at unsw.edu.au  Wed Sep  2 03:34:18 2015
From: laurence.davies at unsw.edu.au (Laurence Davies)
Date: Wed Sep  2 03:34:27 2015
Subject: [xsd-users] xlink href use of idref class
Message-ID: <C5E007B651C50B45ADF202294E955B8937513EE1@INFPWXM004.ad.unsw.edu.au>

Hi Boris,

I'm revisiting the xlink.href idref feature of XSD, as per the below email from you in 2008. My question is with regards to the scope of the search that idref carries out when calling get(). In the below code you initialise the idref as follows

    color_t& color = ... // reference to an object containing href
    xml_schema::idref ref (href, 0, &color);

The `color` element is a "reference to an object containing href". Does this mean that color is a _type object which contains a tree of children, one of which containing a _tree object with ID==href? Or does it explicitly mean that color.ID==href?

The reason I ask is because I am streaming a document as it is being parsed into custom classes and discarding the generated _type objects. Thus for the href to point to something valid, I need to know where XSD is looking for said valid object.

Thank you and regards,
Laurence

> Boris Kolpackov boris at codesynthesis.com 
> Fri Aug 8 04:39:13 EDT 2008
> 
> I checked the GML schema and the gml:id attribute is of xsd:ID type. If
> all href attributes always refer one of those gml:id attributes then this
> is pretty simple to do:
> 
> std::string href = ... // contains "COLOR_70", without #
> color_t& color = ... // reference to an object containing href
> 
> xml_schema::idref ref (href, 0, &color);
> xml_schema::type* p = ref.get (); // p points to the object with ID COLOR_70
> 
> // Now you can use static_cast (if you are sure about the type of p) or
> // dynamic_cast (if you are not) to cast it to the Color_t type.
> 
> If you cannot assume that all href attributes point to one of the gml:id
> attributes (or other attributes as long as they are of the xsd:ID type)
> then things get a bit more complex. Let me know if this is the case and
> I will describe how this can be done.


Laurence Davies
____________________
Research Assistant in eGeodesy
CRC-SI, UNSW
Desk phone: (03) 8636 2373
Mobile: 0427 519 289

From boris at codesynthesis.com  Fri Sep  4 11:02:20 2015
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Fri Sep  4 11:02:28 2015
Subject: [xsd-users] xlink href use of idref class
In-Reply-To: <C5E007B651C50B45ADF202294E955B8937513EE1@INFPWXM004.ad.unsw.edu.au>
References: <C5E007B651C50B45ADF202294E955B8937513EE1@INFPWXM004.ad.unsw.edu.au>
Message-ID: <boris.20150904170017@codesynthesis.com>

Hi Laurence,

Laurence Davies <laurence.davies@unsw.edu.au> writes:

>     color_t& color = ... // reference to an object containing href
>     xml_schema::idref ref (href, 0, &color);
> 
> The `color` element is a "reference to an object containing href". Does
> this mean that color is a _type object which contains a tree of children,
> one of which containing a _tree object with ID==href?

Yes, that's correct. 

From laurence.davies at unsw.edu.au  Fri Sep  4 19:20:16 2015
From: laurence.davies at unsw.edu.au (Laurence Davies)
Date: Fri Sep  4 19:20:28 2015
Subject: [xsd-users] xlink href use of idref class
In-Reply-To: <boris.20150904170017@codesynthesis.com>
References: <C5E007B651C50B45ADF202294E955B8937513EE1@INFPWXM004.ad.unsw.edu.au>,
	<boris.20150904170017@codesynthesis.com>
Message-ID: <C5E007B651C50B45ADF202294E955B8937514FC2@INFPWXM004.ad.unsw.edu.au>


> >     color_t& color = ... // reference to an object containing href
> >     xml_schema::idref ref (href, 0, &color);
> >
> > The `color` element is a "reference to an object containing href". Does
> > this mean that color is a _type object which contains a tree of children,
> > one of which containing a _tree object with ID==href?
> 
> Yes, that's correct.

In that case, in the streaming example would this tree of children (stored in map_?) be complete? I see in the idref::get() method it performs root_()->lookup() which would assume that the correct root container can be referenced in root_() and that lookup() can find the matching ID element from map_.  Considering that in the streaming code the type_ objects are often discarded, would keeping a "shell" of the root element solve this problem? Or do I need to have the complete tree in memory for it to work correctly?

Thank you and regards,
Laurence

Laurence Davies
____________________
Research Assistant in eGeodesy
CRC-SI, UNSW
Desk phone: (03) 8636 2373
Mobile: 0427 519 289

________________________________________
From: Boris Kolpackov [boris@codesynthesis.com]
Sent: Saturday, 5 September 2015 1:02 AM
To: Laurence Davies
Cc: xsd-users@codesynthesis.com
Subject: Re: [xsd-users] xlink href use of idref class

Hi Laurence,

Laurence Davies <laurence.davies@unsw.edu.au> writes:

>     color_t& color = ... // reference to an object containing href
>     xml_schema::idref ref (href, 0, &color);
>
> The `color` element is a "reference to an object containing href". Does
> this mean that color is a _type object which contains a tree of children,
> one of which containing a _tree object with ID==href?

Yes, that's correct.

From boris at codesynthesis.com  Mon Sep  7 10:11:19 2015
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Mon Sep  7 10:11:27 2015
Subject: [xsd-users] xlink href use of idref class
In-Reply-To: <C5E007B651C50B45ADF202294E955B8937514FC2@INFPWXM004.ad.unsw.edu.au>
References: <C5E007B651C50B45ADF202294E955B8937513EE1@INFPWXM004.ad.unsw.edu.au>
	<boris.20150904170017@codesynthesis.com>
	<C5E007B651C50B45ADF202294E955B8937514FC2@INFPWXM004.ad.unsw.edu.au>
Message-ID: <boris.20150907160526@codesynthesis.com>

Hi Laurence,

Laurence Davies <laurence.davies@unsw.edu.au> writes:

> In that case, in the streaming example would this tree of children (stored
> in map_?) be complete? I see in the idref::get() method it performs
> root_()->lookup() which would assume that the correct root container can be
> referenced in root_() and that lookup() can find the matching ID element
> from map_.  Considering that in the streaming code the type_ objects are
> often discarded, would keeping a "shell" of the root element solve this
> problem? Or do I need to have the complete tree in memory for it to work
> correctly?

I think you know the answer to this. The ID/IDREF mechanism allows you
to establish a "pointer" from one element anywhere in the *entire*
document to another element, also anywhere in the *entire* document.
The streaming approach allows you not to store the *entire* document
in memory. What do you think, are they compatible, in the general
case?

The only situation where you can use the ID/IDREF mechanism with
streaming is if you know (i.e., this is your vocabulary-specific
knowledge) that the ID points to an element inside the same chunk
that you have parsed.

Boris

From vladimir.zykov at ncloudtech.ru  Tue Sep  8 10:46:26 2015
From: vladimir.zykov at ncloudtech.ru (Vladimir Zykov)
Date: Tue Sep  8 10:47:44 2015
Subject: [xsd-users] XercesC security problems
Message-ID: <F72937A2-C0F9-4194-B5BF-6CB190369A82@ncloudtech.ru>

Hi Boris,

We've encountered one nasty security problem with the way how we use XSD. In general the problem is not in XSD. It's in Xerces-C++ but since XSD wraps it we need some way to configure XercesC in runtime through XSD. The problem is known and described here https://www.owasp.org/index.php/XML_External_Entity_%28XXE%29_Processing<https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing>

Before investigating/trying to fix it on our own I wanted to ask you if XSD already provides a way to disable external entity resolving in XercesC.

Actually we want to completely disable entity processing even if we'll not be able to parse some XML instances as there is the other related problem described here http://projects.webappsec.org/w/page/13247002/XML%20Entity%20Expansion and which we potentially may hit.

Thanks,

Vladimir Zykov
Software Engineer
New Cloud Technologies, Ltd


From boris at codesynthesis.com  Thu Sep 10 06:27:57 2015
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Thu Sep 10 06:28:02 2015
Subject: [xsd-users] XercesC security problems
In-Reply-To: <F72937A2-C0F9-4194-B5BF-6CB190369A82@ncloudtech.ru>
References: <F72937A2-C0F9-4194-B5BF-6CB190369A82@ncloudtech.ru>
Message-ID: <boris.20150910121833@codesynthesis.com>

Hi Vladimir,

Vladimir Zykov <vladimir.zykov@ncloudtech.ru> writes:

> Before investigating/trying to fix it on our own I wanted to ask you 
> if XSD already provides a way to disable external entity resolving
> in XercesC.

No, XSD doesn't have built-in support for this. And it is surprisingly
hard to achieve in Xerces-C++. It has a security manager that can be
used to limit the number of entity expansions. So one would think it
is a simple matter of setting this limit to 0. But, no, Xerces-C++
first expands the entity and then checks the limit. Brilliant!

In any case, I've decided to create an example that demonstrates
how this can be done. In a nutshell, we have to provide a customized
DOM parser that intercepts attempts to parse DOCTYPEs that contains
either internal or external DTD subset. It still accepts simple
DOCTYPE declarations, though. It is not exactly what you asked for
(i.e., ignoring external entity expansions), but I think this is
the best we can do in Xerces-C++. I think we might be able to
ignore the external subset via the EntityResolver.

The example archive is here:

http://codesynthesis.com/~boris/tmp/xsd/secure.tar.gz

See the README file to get started.

Boris

From vladimir.zykov at ncloudtech.ru  Fri Sep 18 12:00:22 2015
From: vladimir.zykov at ncloudtech.ru (Vladimir Zykov)
Date: Fri Sep 18 12:47:03 2015
Subject: [xsd-users] XercesC security problems
In-Reply-To: <boris.20150910121833@codesynthesis.com>
References: <F72937A2-C0F9-4194-B5BF-6CB190369A82@ncloudtech.ru>
	<boris.20150910121833@codesynthesis.com>
Message-ID: <997F7630-5328-46C7-9295-2427DC57250B@ncloudtech.ru>

Hi Boris,

Thanks a lot for your hint. It really saved us a lot of trouble. I've completely
forgotten that generated XSD parsing functions accept Xerces-C++ XML DOM
and that we can separate XML parsing from creation of domain model.

On Sep 10, 2015, at 13:27, Boris Kolpackov <boris@codesynthesis.com<mailto:boris@codesynthesis.com>> wrote:

In any case, I've decided to create an example that demonstrates
how this can be done. In a nutshell, we have to provide a customized
DOM parser that intercepts attempts to parse DOCTYPEs that contains
either internal or external DTD subset. It still accepts simple
DOCTYPE declarations, though. It is not exactly what you asked for
(i.e., ignoring external entity expansions), but I think this is
the best we can do in Xerces-C++. I think we might be able to
ignore the external subset via the EntityResolver.


Vladimir Zykov
Software Engineer
New Cloud Technologies, Ltd