CLI in C++: CLI Definition Language

This is the seventh installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:

In the last post we decided that the only way for us to achieve the ideal solution is to design our own domain-specific language (DSL) for command line interface definition. And this is what we are going to be doing today.

As I mentioned in one of the previous posts, I like to explore and think things through even though there may be no plans to implement everything in the early versions of the program. This helps to ensure that once I decide to implement more advanced features, they will fit into the existing model and won’t require a complete redesign. So today I am going to try to cover as many features of a CLI definition language as I can come up with. At the end of the post I will narrow the number of features to a much smaller subset that will be supported in the initial version.

Let’s start with the overall design principles. While it would be more expressive to introduce custom keywords, it is more practical to reuse C++ keywords whenever possible. By introducing new keywords we make more identifiers unavailable to the user. For example, we could use options as a keyword:

options foo
{
  ...
};

But then options cannot be used as an identifier and for most applications it is natural to call the options containers options. So it makes sense to use the already reserved class keyword, for example:

class options
{
  ...
};

We should also strive to make the CLI constructs model conceptually similar C++ constructs. For example, the option declaration needs to capture a type, a name, as well as the default value. The closest C++ construct is probably a variable declaration with initialization. For example:

class options
{
  bool --help;
  short --compression = 5;
};

An option can have a number of aliases and the idiomatic way to represent alternatives in C++ is the OR operator (|). So we can extend our option declaration syntax to allow several names:

class options
{
  bool --help|-c;
  short --compression|-c = 5;
};

It also seems natural to reuse the C++ type system for option types. The fundamental C++ types such as bool or short will be recognized by the CLI compiler. However, the user of our CLI language will most likely also want to use user-defined C++ types. While the CLI compiler may not need to do any type analysis (such as whether the type is actually defined), we need to provide a mechanism for inclusion of such user-defined type definitions into the generated C++ code. The most natural way to do this is to mimic the C++ preprocessor #include mechanism without actually doing any preprocessing. However, if at some later stage we decide to run a preprocessor on the CLI definition files, this choice will cause problems. The next best thing is then to use include without #. Here we have no choice but to introduce a new keyword since there are no existing C++ keywords with a similar meaning. [There is a module proposal for C++ which introduces the import keyword. However, the semantics of this new keyword will be very different from what we are trying to achieve here.] Plus, the use of include as an identifier does not seem very common. Here is an example:

include <string>
include <vector>
 
class options
{
  std::string --name = "foo";
  std::vector<std::string> --names;
};

Since we support user-defined types, the default value can actually be more complex than a single literal initialization, for example:

include <complex>
 
class options
{
  std::complex<float> --value = std::complex<float> (0, -1);
};

While this approach works, it is verbose. We can, therefore, support the construction syntax in addition to the assignment, for example:

include <complex>
 
class options
{
  std::complex<float> --value (0, -1);
};

Since we will be generating C++ code that may be used throughout the application, we will need to support namespaces sooner or later. Naturally, we will reuse the namespace C++ keyword:

namespace example
{
  class options
  {
    bool --help|-c;
    short --compression|-c = 5;
  };
}

Another feature that might come in handy in more complex applications is option inheritance. For example, the XSD and XSD/e compilers that I am working on support a number of XML Schema to C++ mappings. Each mapping has some unique command line options but also a large set of common options, for example, --output-dir, --namespace-map, --reserved-name, etc. It makes sense to factor such common options out into a separate option class that is then inherited by each mapping-specific option class. Here is an example:

include <string>
include <vector>
 
class common_ops
{
  std::string --output-dir;
  std::vector<std::string> --namespace-map;
  std::vector<std::string> --reserved-names;
};
 
class cxx_tree_ops: common_ops
{
  bool --generate-serialization;
};
 
class cxx_parser_ops: common_ops
{
  bool --generate-print-impl;
};

Once we start splitting option declarations into several classes, the next thing we will want to do is to place them into different files. And for that to work we will need an inclusion mechanism for CLI definition files.

It would be straightforward to reuse the include keyword that we already use to “include” C++ files. However, there is one problem. Since we are not actually parsing the C++ files but merely including them in the generated C++ code, it will be impossible to know whether we are including a C++ file (which we don’t need to parse) or a CLI file (which we do need to parse). As a result, we will need a way to distinguish between different include types. One way to achieve this would be to introduce a new keyword for CLI inclusion. Or we can add an inclusion type prefix to the file path, similar to the scheme part in URIs. For example:

include <cxx:string>
include "cli:common.cli"
 
class cxx_tree_ops: common_ops
{
  bool --generate-serialization;
};
 
class cxx_parser_ops: common_ops
{
  bool --generate-print-impl;
};

The type prefix approach is preferable because we don’t need to introduce yet another keyword. It also looks more consistent. Since there will most likely be more C++ inclusions than CLI, we should default to C++ when the prefix is not specified.

Other constructs that would be nice to have are comments, using declarations/directives, and typedef’s, for example:

include <string>
include <vector>
 
namespace example
{
  using namespace std;
 
  // Application options.
  //
  class options
  {
    typedef vector<string> strings;
 
    string --name = "foo";
    strings --names; /* List of names. */
  };
}

The last big feature that we need to consider is options documentation. In its simplest form we would like to associate a documentation string or two with each option. The first string may provide a short description that is used, for example, in the usage information. The second string may contain a more detailed description for, say, automatic man pages generation. The use of {} feels appropriate here, something along these lines:

namespace example
{
  class options
  {
    bool --help|-h {"Show usage and exit."};
    bool --version|-v {"Show version and exit."};
 
    bool --compression|-c = 5
    {
      "Set compression level.",
      "Set compression level between 0 (no compression) "
      "and 9 (maximum compression). 5 is the default."
    };
  };
}

For applications that need to support multiple languages, a separate file for each language or locale would be appropriate. Such a file would use a special CLI documentation format. Something along these lines:

include "options.cli"
 
namespace example
{
  documentation options ("en-US")
  {
 
    --help {"Show usage and exit."};
    --version {"Show version and exit."};
 
    --compression
    {
      "Set compression level.",
      "Set compression level between 0 (no compression) "
      "and 9 (maximum compression). 5 is the default."
    };
  }
}

Now that we have identified every major feature that could be useful in a CLI definition language, we can try to narrow them down to a set that is minimal but still complete enough to be usable by a typical application. We will then use this set of features for the initial implementation. Here are the core features that I have identified:

  • option class
  • option declaration (without documentation)
  • C++ inclusion
  • namespaces

And here is the list of features to be added in subsequent releases:

  • option inheritance
  • option documentation
  • CLI inclusion
  • using declarations/directives and typedef’s
  • comments

Next time we will start thinking about how to map these CLI definitions to C++. As always, if you have any thoughts, feel free to add them in the comments.

Comments are closed.