[xsd-users] C++11, value semantics, and polymorphism

Fri Feb 22 08:57:02 EST 2013

> There are so many issues/complications with this approach that I don't
> even know where to begin (inheritance, type discovery, code bloat,
> customizability, etc, etc). I don't think we will go this route.

I put together a prototype that addresses most of these concerns. For polymorphic types, it uses two parallel type hierarchies. One has "type" at its root, which hold a pointer to "impl" (which is the root of the other tree). The impl classes are the real classes with real data members, following the normal Code Synthesis pattern. The type classes are the classes that the developer uses. Each class has all the member functions of the impl class, each one forwarding to the impl class with inline functions. Parsing functions create impls and return type.

A downcast<> function template can cast a type to a different type, performing the equivalent dynamic_cast<> on the impl pointers.

The upshot is that a developer who does not want to need to know which types are polymorphic can have that fact completely hidden. Our use of Code Synthesis often involves the construction of objects from other objects, so we end up with code sort like this:

A a(B("string", C(D("string", value), "string", "string")));

except that some of those types are polymorphic. Right now, we can create all the intermediate objects dynamically and sling auto_ptr<>s around to avoid excess copying. With C++ 11, we want to use value semantics where possible, but that forces the user to know which types are polymorphic and which are not, so we end up with the following:

A a(B("string", new C(D("string", value), "string", "string")));

It seems like a minor point until you sit down with a couple dozen schemas and a hundred types, and you need to know which few types are polymorphic roots, and which ones are substitutionGroups. You need to keep the schemas open in another monitor just to write your code.

With the type/impl classes hiding the polymorphism, the user no longer needs to know. The user can write everything as values classes, pass objects by value, returns objects by value, and everything just works. Values are implemented as values. Polymorphic classes are implemented with pimpls.

The only time it matters to the user of a type is when code needs to downcast. The developer knows the type is polymorphic, so there is no problem exposing polymorphism. The only cost is cognitive, using downcast<> instead of dynamic_cast<>.

The other time the type hackery is exposed is when the developer needs to customize a type. In that case, the type and the impl need to be customized; although I imagine that in many cases, one would want to customize only one or the other. Nonetheless, one must assume that the cost of customizing a type is doubled.

The code bloat in executable code is minimal because the type classes are purely inline, with no virtual functions. The one<>, optional<>, and sequence<> templates are different, but there are still only two basic specializations: polymorphic and value. The value specialization uses a template I wrote that stores the value in std::aligned_storage<>, using placement new to construct the object when present, or leaving the buffer uninitialized when absent. An additional byte keeps track of whether the object is initialized. Thus, the normal semantics of these templates is preserved when fund=true.

In the end, we decided not to follow the route of this prototype. We decided that the complexity costs in Code Synthesis were too high for the modest benefit gained, especially because we would be maintaining these considerable changes to Code Synthesis on our own. But the prototype was interesting to write and I can see the value of hiding polymorphism in certain cases.

Ray Lischner,
Distinguished Member of Technical Staff
133 National Business Pkwy, Ste 150     t. 443.539.3448
Annapolis Junction, MD 20701            c. 410.854.9787
rlischner at proteuseng.com                f. 443.539.3370
________________________________________
From: Boris Kolpackov [boris at codesynthesis.com]
Sent: Thursday, February 14, 2013 10:32 AM
To: Ray Lischner
Cc: xsd-users at codesynthesis.com
Subject: Re: [xsd-users] C++11, value semantics, and polymorphism

Hi Ray,

Ray Lischner <rlischner at proteuseng.com> writes:

> We are jumping head first into C++11 and very much want to take full
> advantage of value semantics. We want to be able to construct most Code
> Synthesis objects without resorting to any dynamic memory allocation.
> Unless there are plans for Code Synthesis 4 to support this functionality,
> we will undertake the work ourselves.

Yes, we do plan C++11 support, including move support, for the next release.
While it would be great to have someone give this a try before the final
release, the bad news is that I will only have something to try probably
around end-June. Until then my schedule is very tight.

> It is a straightforward matter to modify xsd::cxx::tree::one<>,
> optional<>, and sequence<> to support value semantics as long as
> the types are not polymorphic.

Right, we will have to have two different version of these. In fact,
we already do, for fundamental and user-defined types. In C++11 mode
we will just use the by-value version for all non-polymorphic types.

> But we need to support polymorphism, too. That's where we run into a
> hitch. I want the parsing functions to return values for value classes,
> but they must return pointers for polymorphic classes. I certainly
> don't want to force the user of a class to write different code
> depending on whether it is polymorphic,

I don't think this would be forcing anybody. Down the line, polymorphic
and non-polymorphic roots will be handled in a very different ways. With
polymorphic, there will probably be a dynamic_cast to discover the
actual type (try supporting this with your pimpl approach).

Generally, polymorphism and by-value are in conceptual contradiction.
By-value assumes the size of the object is known and fixed (so stack
can be allocated for it). Polymorphism assumes we don't even know
which object it is, let alone its size. By-value is: here is a chunk
of memory, construct object foo there. Polymorphism is: this chunk of
memory contains some object that is-a foo. Using dynamic memory to
return a polymorphic object is the "C++ way" (think of the virtual
destructor mechanism -- it was augmented to call the correct operator
delete). Well, that was a bit of philosophical aside ;-).

Now, in ODB, for example, we have two versions of the database::load()
function. One which returns a (dynamically-allocated) object and one
that can load the state into an existing instance:

person p;
db.load (p, "John Doe");

This works well for when you want to avoid allocations. We could do
this with parsing function, but I am not even sure it makes sense:
How will you know which object is being parsed? The only way to do
this generally is to examine the element name (or xsi:type). But
then, seeing that you already have the DOM element, you can just
call the parsing constructor directly and place the object anywhere
you want.

I guess this could still be useful for situations like "the response
to this message can only be class foo".

> so the only solution I see is for xsd to generate a pimpl wrapper
> for polymorphic classes to hide the difference.

There are so many issues/complications with this approach that I don't
even know where to begin (inheritance, type discovery, code bloat,
customizability, etc, etc). I don't think we will go this route.

> "is there any other way to support value semantics and polymorphism?"

I think we should just keep it simple and return values by-value and
use pointers/dynamic memory for polymorphic objects. Parsing functions
will return std::unique_ptr for polymorphic roots.

Boris

--
Follow this link to mark it as spam:
http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=EC10B28341.AF0B4