[odb-users] Automatic generation of C++ classes from database schema

Fri Jan 31 00:11:34 EST 2014

Hi All,

There seems to be quite a bit of interest in being able to
automatically generate C++ classes from the database schema.
However, this is a fairly "hairy" feature in the sense that
there are a lot of unclear/complex aspects that need to be
better understood. This is especially so since we are trying
to design a general tool.

The goal of this thread is to try and flesh-out an overall
design for this feature based on experience and use-cases.
So if you have some ideas or a need for this functionality,
feel free to chime in.

I've been thinking about this on and off for a couple of
years now and here is an initial list of things that I
believe we need to consider/discuss. Note also that not
all of these features/ideas will be implemented in the
first version (or even ever). However, it is a good
idea to think through them to a certain level in order
to understand how everything fits (or will fit) together.

* What is the input to this tool? It can be an .sql file
  (dump from the database or manually created/maintained).
  Or it could be programmatically retrieved from a running
  database instance.

  The .sql approach feels cleanest to me but the complexity
  of parsing SQL is probably too much (don't believe me?
  check the Oracle SQL reference ;-)).

  The programmatic approach is probably the most practical
  even though it has a number of serious drawbacks (like
  the need to connect to a running database). Also, most
  likely it will be a separate tool that connects to the
  database and extracts the schema since we cannot link
  the ODB compiler to every database API library. So we
  need some kind of an intermediate format that the tool
  can produce and the ODB compiler can read. The XML
  format that we already have for the schema evolution
  sounds like a good candidate.

  Other things to consider in this area:

  - A way to limit the list of tables considered.

  - Do we use the ODB runtimes to access databases or
    should we just use the C APIs? Runtimes are
    not that convenient for manual database access
    though we could probably improve that. Also, for
    cases where we need to run plain SQL queries (as
    opposed to a special-purpose C API), we could even
    use ODB (views, etc).

  - We could make the ODB compiler call the extraction
    tool automatically and pipe the output to it.

* What is the output of the tool?

  - File per class? File per schema? Something in-between.
    For large schemas, the file-per-schema approach is not
    going to scale, especially when the database support
    code generated by ODB is concerned. The file per class
    approach can also get unwieldy very quickly for a large
    number of classes. We have the same problem in XSD
    (may end up with a couple of thousand source files).
    It is manageable but not pretty.

    The in-between solution is to somehow allow the user
    to specify how to group classes into files (e.g.,
    all related classes in a single file).

* Intended uses: "rough draft" or "data access".

  What happens if/when the schema changes? Does the user
  re-generate the classes or update them manually?

  In other words, is this feature going to generate classes
  that are the "rough draft" and the user can fill them in
  with customizations (e.g., functions) or are they only for
  "data access" (i.e., don't have anything other than
  accessors and modifiers)?

  The problem with the "rough draft" approach is what
  happens when the schema changes and re-generating
  the classes will loose those customizations?

  The problem with the "data access" approach is that
  no functionality/logic can be added to the generated
  classes.

  We will probably have to support both use-cases.

* Support for customization?

  There are some options for supporting the customization of
  the generated classes though none of them are particularly
  elegant.

  We could also consider doing the unspeakable and extract
  user customizations from the C++ header files. The only
  reason why I am even bringing this option up is because we
  are C++-parsing this file anyway (during the database support
  code generation). The user will still have to mark the
  regions (e.g., with pragmas which ODB could pre-insert
  for each class) so it could be brittle (if you make your
  changes in the wrong place, they will be gone). Though
  there doesn't seem to be anything better.

* Basic types mapping (string, containers, smart pointers)

  Different users will want different basic types to be used
  in their generated classes (std::string, QString, etc).
  In a sense, this is a reverse mapping of what ODB currently
  does: C++ type to database type. What we need is a database
  type to C++ type mapping. The big question is how and where
  it is specified.

  It would also be nice if this somehow tied up to profiles.
  That is, if a user specified -p qt, then ODB will use
  Qt types (QString, Qt smart pointers, Qt containers, etc)
  in the generated C++ classes automatically. 

* Mapping for relationships, containers, (polymorphic)
  inheritance.

  This one is hard. ODB would somehow need to recognize
  certain patterns and map them to relationships, containers,
  etc. It may also need user guidance (see mapping
  customization/annotations).

  Generally, there are a lot more ways to structure
  these things (relationships, containers, inheritance)
  in relational databases than in C++ so for more esoteric
  cases there might not even be a sensible mapping. What
  would be nice is to come up with a general mechanism
  that would allow the user to specify the mapping for such
  cases. The big problem, of course, is that it can become
  so complex (see Hibernate and their relationship mapping)
  as to be completely unusable.

  An alternative could be to only support the straightforward
  cases and map the rest to plain objects for the user to
  deal with (i.e., one will be able to access the data but
  working with it won't be very convenient).

* Mapping customization/annotations.

  Where and how is it specified?

  Things that the user may want to specify:

  - which tables to map
  - how to map tables (container, poly-inheritance, etc)
  - column type mapping

* Naming convention used in the generated classes.

  We have licked this problem nicely in XSD. The idea is
  to use a set of regex patterns to transform names to
  conform to a specific naming convention. XSD comes
  with a set of predefined patterns (K&R, Camel Case,
  and Java). The user can "adjust" one of these with
  a few regex'es of their own or can create a completely
  custom naming convention. We should most likely just
  use the same mechanism since it seems to work great.

  Probably should also make spacing/indentation adjustable,
  especially if the user is expected to add their code to
  the generated files (see customization).