[xsde-users] variable vs fixed-length types

Wed Sep 23 02:40:52 EDT 2009

Hi Samuel,

I have CC'ed the xsde-users mailing list to my reply since other people
may be interested this. In general, let's keep our discussions CC'ed to
this list so that they will be available to others.

Samuel Toulouse <stoulouse at me.com> writes:

> I'm using XSD/e more and more. I'm using XDR binary as it is very  
> simple, I'd love to see a non-cross platform, compiler specific binary  
> format,

Have you tried the raw binary format from the 
examples/cxx/hybrid/binary/custom/ example yet?

> but that's a discussion we already had, if one day you'll find  
> it useful, then I'll start to use XSD/e everywhere to generate nearly  
> all my code/data structures :)

I remember you promised some benchmark results for XDR and that custom
format I mentioned above ;-).

> I've encountered a problem. Let say you want to have a Vector4 type,  
> there is two way to do it:
> 	<complexType name="Vector4">
> 		<sequence>
> 			<element name="x" type="float" default="0.0" />
> 			<element name="y" type="float" default="0.0" />
> 			<element name="z" type="float" default="0.0" />
> 			<element name="w" type="float" default="0.0" />
> 		</sequence>
> 	</complexType>
> or:
> 	<simpleType name="FloatList">
> 		<list itemType="float" />
> 	</simpleType>
> 	<simpleType name="Vector4">
> 		<restriction base="FloatList">
> 			<length value="4" />
> 		</restriction>
> 	</simpleType>
>
> the problem is:
> if you use the first method, you won't be able to change the default  
> value when declaring a Vector4 element:
> 	<complexType name="Struct">
> 		<sequence>
> 			<element name="vector" type="state:Vector4" minOccurs="1"  
> maxOccurs="1"/>
> 		</sequence>
> 	</complexType>
> if you use the second method, then this time, XSD/e will consider  
> Vector4 as a variable-length type with a podSequence, which is not true 
> because we've used the length facet to have a fixed length type.
> I'd like to be able to use the second method if it generates a fixed- 
> length structure because I'll be able to use default values.

Hm, I see what you mean. However, it is not that clear cut. What if
the length is set to 1000? Would you still want the resulting type 
to be treated as fixed-length and copied by value? While there won't
be dynamic memory allocations, there will still be the copying of
all the elements which can be quite expensive. I guess there should
be some limit (probably configurable) on the length. Once it is 
exceeded, the type is considered variable-length.

There are also implementation issues. In the above example, Vector4
will be derived from FloatList which is a normal, unlimited sequence,
with dynamic memory allocation, etc. It is not clear how to convert
it to a fixed-length one with pre-allocated buffer. I guess one way
would be to make the base sequence use the buffer provided by the
derived type.

BTW, how many types like this do you have?

Samuel Toulouse <stoulouse at me.com> writes:

> I was thinking that in my opinion the rule should be:
> if there is a length or a maxLength facet, then memory size should be  
> reserved and structure should be of fixed size.
> for example, if you declare a string like this:
> 	<complexType name="Struct">
> 		<attribute name="filename" default="">
> 			<simpleType>
> 				<restriction base="string">
> 					<maxLength value="64" />
> 				</restriction>
> 			</simpleType>
> 		</attribute>
> 	</complexType>
>
> then it should generate something like this:
> class Struct {
> private:
> 	char	filename_[64];
> };

I am not sure maxLength should always result in pre-allocation. 
Semantically, maxLength places the limit on the number of elements 
and doesn't really say how many element there will be, typically.
For example, an implementation may have a limit of 64Kb and put 
this value in the maxLength facet while, typically, there won't be
more than, say, 10 elements. In this case we probably won't want to
have 64Kb pre-allocated for every instance. On the other hand, if
maxLength is some small number, say 4, then it probably makes sense
to pre-allocate the buffer. Again, I think the strategy here should
be to have a configurable limit that one can set while compiling the
schemas.

The other issue are the string-based types. The length and maxLength
facets for these types are defined in terms of Unicode characters. 
XSD/e stores them in UTF-8. So if you have 64 as maxLength for string,
the pre-allocated buffer should be 64*4 to accommodate all cases while,
typically, most if that space will be unused. It will also be impossible
to pull off that buffer substitution trick mentioned above with 
std::string.

So it seems the only types for which such an optimization is practical
are xsd:list and xsd:{hex,base64}Binary. I could try to implement it
for xsd:list if you really need it.

Boris