Software Anatomies, part 2: Anatomy of a CLI Program Written in C++

by Matthew Wilson

This article first published in ACCU's CVu, September 2015. All content copyright Matthew Wilson 2015.

Abstract

This article, the second in a series looking at software anatomy, examines the structure of a small C++ command-line interface (CLI) program in order to highlight what is boilerplate and what is application-specific logic. Based on that analysis, a physical and logical delineation of program contents will be suggested, representing the basis of the design principles of a new library for assisting in the development of CLI programs in C and C++ to be discussed in the third instalment.

Introduction

In the first instalment of this series, Anatomy of a CLI Program written in C [AoCLIC], I considered in some depth the different aspects of structure and coupling of a simple but serious C program. The issues examined included: reading from/writing to input/output streams; failure handling; command-line arguments parsing (including standard flags "--help" and "--version" and application-specific flags); cross-platform compatibility.

The larger issues comprised:

In this second instalment I will consider further these issues, in the context of a small but serious C++ program, with the aim of defining a general CLI application structure that can be applied for all sizes of CLI programs.

Strictly speaking, some of the differences in sophistication and scope between the first instalment and this do not directly reflect the differences between the language C and C++. Rather, they reflect the different levels of complexity that it's worth considering when deciding in which language to implement a CLI application. I'll come back to this, and point out some rather important differences, in the third instalment.

DADS Separation

Before we start working on the example program, I want to revisit the classification issue. In the first instalment I argued that CLI program code written (or wizard-generated) by the programmer is one of:

To this list I now add a fourth:

In the examples of both instalments, the clearest example of declarative logic is the "aliases" array (interpreted by the CLASP library [CLASP]) that defines which command-line flags and options are understood by the program.

Example Program: pown

I'm going keep the code listings as short as possible by focusing on a small program, albeit one with real-world concerns. For the purposes of pedagogy, I ask you to imagine that we need to write a program to show the owner of one or more files on Windows; in reality this is a feature (/Q) of the built-in dir command.

The features/behaviours of such a program include:

  1. parse the command-line, either for the standard "--help" or "--version" flags, or for the path(s) of the file(s) whose owner(s) should be listed;
  2. properly handle "--" special flag. It's very easy to simulate the problem with naïve command-line argument handling: just create a file called "--help" or "--version" (or the name of any other flags/options), and then run the program in that directory with a argument of * (or *.* on Windows);
  3. expand wildcards on Windows, since its shell does not provide wildcard-expansion before program invocation;
  4. for each value specified on the command-line, attempt to determine its owner and write to standard output stream; if none is specified, fail and prompt the user;
  5. provide contingent reports on all failures, including program identity as prefix (according to UNIX de facto standard).

Non-functional behaviour includes:

  1. use diagnostic logging;
  2. initialise diagnostic logging library before all other sub-systems (other than language runtime);
  3. initialise command-line parsing library before all other sub-systems (except diagnostic logging library and language runtime);
  4. include program identity and version information and include as required in output;
  5. do not violate DRY SPOT in program identity and version information.

pown.monolith

The first version of this program is done all in one file, as shown in Listing 1. Even with as many editorial elisions as I can manage, such a simple program is remarkably large and a big part of its size is boilerplate.

Listing 1

    // includes
    #include <pantheios/pan.hpp>
    #include <systemtools/clasp/main.hpp>
    #include <pantheios/extras/main.hpp>
    #include <pantheios/extras/diagutil.hpp>
    #include <pantheios/inserters/windows/sc.hpp>
    #include <winstl/filesystem/file_information_functions.h>
    #include <winstl/security/security_functions.h>
    #include <winstl/system/console_functions.h>
    #include <stlsoft/smartptr/scoped_handle.hpp>
    #include <stlsoft/system/environment/functions.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include "MBldHdr.h"

    #include <systemtools/clasp/implicit_link.h>
    #include <pantheios/implicit_link/core.h>
    #include <pantheios/implicit_link/fe.simple.h>
    #include <pantheios/implicit_link/be.WindowsDebugger.h>
    #include <recls/implicit_link.h>

    // aliases
    static
    clasp::alias_t const Aliases[] =
    {
      CLASP_FLAG(NULL, "--help", "shows this help and terminates"),
      CLASP_FLAG(NULL, "--version", "shows version information and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    // identity
    #define TOOL_NAME  "pown"
    int const          toolVerMajor     = __SYV_MAJOR;
    int const          toolVerMinor     = __SYV_MINOR;
    int const          toolVerRevision  = __SYV_REVISION;
    int const          toolBuildNumber  = __SYV_BUILDNUMBER;
    char const* const  toolToolName     = TOOL_NAME;
    char const* const  toolSummary      = "Example project for Anatomies article series in CVu";
    char const* const  toolCopyright    = "Copyright Matthew Wilson 2015";
    char const* const  toolDescription  = "Prints the owner of one or more files/directories";
    char const* const  toolUsage        = "USAGE: " TOOL_NAME " { --help | --version | <path-1> [ ... <path-N> ] }";
    extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = "pown";

    // pown
    int
    pown(
      char const* path
    )
    {
      pan::log_DEBUG(
        "pown("
        "path="
      , path
      , ")"
      );

      SECURITY_DESCRIPTOR* psd;
      if(!winstl::file_information::get_SECURITY_DESCRIPTOR(path, OWNER_SECURITY_INFORMATION, &psd))
      {
        DWORD const e = GetLastError();
        pan::log_ERROR(
          "could not obtain file security for '"
        , path
        , "': "
        , pan::windows::sc(e)
        );
        fprintf(
          stderr
        , "%s: could not obtain file security for '%s': %lu\n"
        , TOOL_NAME
        , path
        , e
        );
        return -1;
      }
      else
      {
        stlsoft::scoped_handle<SECURITY_DESCRIPTOR*> scoper1(psd, winstl::file_information::free_SECURITY_DESCRIPTOR);

        PSID psidOwner;
        BOOL ownerDefaulted;
        if(!GetSecurityDescriptorOwner(psd, &psidOwner, &ownerDefaulted))
        {
          DWORD const e = GetLastError();
          pan::log_ERROR(
            "could not obtain owner SID for '"
          , path
          , "': "
          , pan::windows::sc(e)
          );
          fprintf(
            stderr
          , "%s: could not obtain owner SID for '%s': %lu\n"
          , TOOL_NAME
          , path
          , e
          );
          return -1;
        }
        else
        {
          char*   accountName;
          char*   domainName;
          size_t  accountNameLen;
          size_t  domainNameLen;
          SID_NAME_USE  snu;
          if(!winstl::security::lookup_account_SID_info(NULL, psidOwner, &accountName, &accountNameLen, &domainName, &domainNameLen, &snu))
          {
            DWORD const e = GetLastError();
            pan::log_ERROR(
              "could not obtain SID info for '"
            , path
            , "': "
            , pan::windows::sc(e)
            );
            fprintf(
              stderr
            , "%s: could not obtain SID info for '%s': %lu\n"
            , TOOL_NAME
            , path
            , e
            );
            return -1;
          }
          else
          {
            fprintf(
              stdout
            , "'%s' owned by '%.*s\\%.*s'\n"
            , path
            , (int)domainNameLen, domainName
            , (int)accountNameLen, accountName
            );
            winstl::security::free_string(accountName);
            winstl::security::free_string(domainName);
          }
        }
      }
      return 0;
    }

    // main/program entry
    static
    int
    program_main(
      clasp::arguments_t const* args
    )
    {
      // process flags and options
      if(clasp::flag_specified(args, "--help"))
      {
        clasp_usageinfo_t info  = { 0 };
        info.version.major      = toolVerMajor;
        info.version.minor      = toolVerMinor;
        info.version.revision   = toolVerRevision;
        info.version.build      = toolBuildNumber;
        info.toolName           = toolToolName;
        info.summary            = toolSummary;
        info.copyright          = toolCopyright;
        info.description        = toolDescription;
        info.usage              = toolUsage;
        info.param              = stdout;
        info.width              = (int)winstl::get_console_width();
        info.assumedTabWidth    = -4;
        info.blanksBetweenItems = stlsoft::environment_variable_exists("SS_TOOLS_NO_HELP_BLANK_LINES");
        clasp_showHeaderByFILE(args, &info, Aliases);
        clasp_showBodyByFILE(args, &info, Aliases);
        return EXIT_SUCCESS;
      }
      if(clasp::flag_specified(args, "--version"))
      {
        clasp_usageinfo_t info  = { 0 };
        info.version.major      = toolVerMajor;
        info.version.minor      = toolVerMinor;
        info.version.revision   = toolVerRevision;
        info.version.build      = toolBuildNumber;
        info.toolName           = toolToolName;
        info.param              = stdout;
        clasp_showVersionByFILE(args, &info, Aliases);
        return EXIT_SUCCESS;
      }
      clasp::verify_all_flags_and_options_used(args);

      // process values
      if(0 == args->numValues)
      {
        fprintf(
          stderr
        , "%s: no paths specified; use --help for usage\n"
        , TOOL_NAME
        );
        return EXIT_FAILURE;
      }
      { for(size_t i = 0; i != args->numValues; ++i)
      {
        pown(args->values[i].value.ptr);
      }}
      return EXIT_SUCCESS;
    }

    // main/boilerplate
    static
    int
    main_cmdline_(
      int     argc
    , char**  argv
    )
    {
      return clasp::main::invoke(argc, argv, program_main, NULL, Aliases, 0);
    }
    static
    int
    main_memory_leak_trace_(
      int     argc
    , char**  argv
    )
    {
      return ::pantheios::extras::diagutil::main_leak_trace::invoke(argc, argv, main_cmdline_);
    }
    int
    main(
      int     argc
    , char**  argv
    )
    {
      return ::pantheios::extras::main::invoke(argc, argv, main_memory_leak_trace_);
    }
 

The program source has the following sections:

It's pretty clear that boilerplate is eating space, not to mention effort. Furthermore, structuring source in such a manner is an imposition on programmer visibility (and, I would suggest, happiness).

Note the inclusion of the MBldHdr.h header and use of the symbols __SYV_MAJOR, __SYV_MINOR, etc. (whose names violate the standard's reservation of symbols contain runs of two or more underscores). These are aspects of an extremely old, but still used, mechanism for controlling module version by an external tool, and I include them only to show how such schemes can be used with the proposed anatomical delineation discussed herein.

Separation of Concerns : pown.alone

The first obvious thing is to partition the file. This can be done, at least in part, by identifying what parts confirm to the DADS classification. Let's tackle all the identified sections (apart from includes, which is a necessary evil of C and C++ programming):

Given these designations, the parts may now be separated physically according to the scheme I have been evolving over the last few years, as follows:

Salient fragments of all the above are presented in Listing 2. Note that, for now:

Listing 2

    // pown.hpp:

    extern "C"
    int
    pown(
      char const* path
    )
    ;

    // pown.cpp:

    . . .
    #include "pown.hpp"
    #include "identity.hpp"
    . . .
    int
    pown(
      char const* path
    )
    {
      . . .

    // entry.cpp:

    . . .
    #include "pown.hpp"
    #include "identity.hpp"
    . . .
    static
    clasp::alias_t const Aliases[] =
    {
      . . .
    static
    int
    program_main(
      clasp::arguments_t const* args
    )
    {
      . . .
    . . . // other "main"s, including main()

    // identity.hpp:

    #define TOOL_NAME     "pown"

    extern int const          toolVerMajor;
    extern int const          toolVerMinor;
    . . .

    // identity.cpp:

    #include "identity.hpp"
    #include "MBldHdr.h"

    int const          toolVerMajor     = __SYV_MAJOR;
    int const          toolVerMinor     = __SYV_MINOR;
    . . .
    char const* const  toolToolName     = TOOL_NAME;
    char const* const  toolSummary      = "Example project for Anatomies article series in CVu";
    . . .


    // diagnostics.cpp:

    #include "identity.hpp"

    extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = TOOL_NAME;

    // implicit_link.cpp:

    #include <systemtools/clasp/implicit_link.h>
    #include <fastformat/implicit_link.h>
    #include <pantheios/implicit_link/core.h>
    #include <pantheios/implicit_link/fe.simple.h>
    #include <pantheios/implicit_link/be.WindowsDebugger.h>
    #include <recls/implicit_link.h>
 

"Program Design is Library Design" : pown.alone.decoupled

In the first instalment, I mentioned the importance I attach to being able, as much as is reasonable, to subject the guts of CLI programs to automated testing. As such, separating out the action logic into pown.[ch]pp is an important step. However, there's still a problem. Consider the current definition of pown() (which is that from Listing 1 transplanted into its own source file, with requisite includes): it has three areas of undue coupling:

You may offer a fourth area of undue coupling - use of Pantheios C++ API diagnostic statements. The rejoinder I would offer to that would be an article in itself, so in this context I will simply observe that diagnostic logging is important, it must reliably be available at all points during the lifetime of a program, it must be very efficient when enabled and have negligible runtime cost when not, it should be near impossible to write defective code using it, any non-standard (and they are all non-standard) diagnostic logging API will incur some coupling however far one might wish to abstract it, and that there is no (possibility of a) perfect solution. (Though I couldn't be more biased) I believe that Pantheios offers the best mix of features and, since it may be stubbed readily at both compile and link-time, I think it's as less bad as coupling can get.

To our three areas of undue coupling. The first two are basically the same thing: the output streams are hard-coded into the function, which restricts potential uses of the function. Even if we would always want those output streams in the wild, hard-coding makes automated testing more difficult. The answer is simple - to pass in the stream pointers as parameters to pown() - though the rationale may be less clear cut (see sidebar).

That just leaves coupling to identity. Fairly obviously, coupling to any preprocessor symbol is not a great idea. (The main reason why TOOL_NAME is even a preprocessor symbol is to facilitate widestring builds, which I'm not dealing with in this instalment; the other, minor one, is that it can be used in composing string fragments, as seen in the definition of the literal string initialiser for toolUsage in Listing 1.) The fix here is just as simple as with the streams: a parameter to the function, as shown in Listing 3.

Listing 3

    #include <stdio.h>
    int
    pown(
      char const* path
    , char const* program_name
    , FILE*       stm_output
    , FILE*       stm_cr
    );
 

Finally, though it's not shown in this example, I believe it's appropriate to place the action logic library components in a namespace, since it's conceivable that the names may clash with those in an automated framework (less likely) or with those of other components when used in other program contexts (more likely). I'll illustrate this clearly in the next instalment.

Summary

In these two articles I have considered some of the fundamental - important, but not very exiting - aspects of program structure in C and C++ CLI programs, and have outlined in this instalment a delineation scheme that is now sufficient for all CLI programs, even large ones for which multiple (implementation and/or header) files for action logic are required, and may be encapsulated into a framework and/or generated by a wizard. Program generating wizards can follow the separation defined previously, and can, in the same operation, generate automated test client programs that include the action logic header and implementation files.

There's nothing inherent in the scheme that requires use of CLASP for command-line parsing and Pantheios for diagnostic logging (and Pantheios.Extras.Main and Pantheios.Extras.DiagUtil for handling initialisation, outer-scope exception-handling, and memory-leak tracing); you may substitute your own preferences to suit, and a well-written wizard would be able to allow you to select whatever base libraries you require.

In the next instalment I will introduce a new library, libCLImate, which is a flexible mini-framework for assisting with the boilerplate of any command-line programs and which may be used alone or in concert with program suite-specific libraries to almost completely eliminate all the boring parts of CLI programming in C or C++. Listing 4 is a version of the exemplar pown project's entry.cpp using libCLImate alone; Listing 5 is the entry.cpp for the pown program in Synesis' system tools program suite: as you can see, almost every line pertains to the specific program, rather than any common boilerplate. (Having written this tool as an exemplar for this article I realised a few enhancements - adding some behaviour options, splitting into ownership retrieval and output functions - would make pown()'s functionality a useful library in several tools, including a new, more powerful standalone pown.)

Listing 4

    // includes
    #include "pown.hpp"
    #include "identity.hpp"
    #include <libclimate/main.hpp>
    #include <stlsoft/system/environment/functions.h>
    #include <stdio.h>
    #include <stdlib.h>

    // aliases
    extern "C"
    clasp::alias_t const libCLImate_aliases[] =
    {
      CLASP_FLAG(NULL, "--help", "shows this help and terminates"),
      CLASP_FLAG(NULL, "--version", "shows version information and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    // main / program entry
    extern "C++"
    int
    libCLImate_program_main(
      clasp::arguments_t const* args
    )
    {
      namespace sscli = ::SynesisSoftware::CommandLineInterface;
      if(clasp::flag_specified(args, "--help"))
      {
        return sscli::show_usage(args, libCLImate_aliases, stdout, toolVerMajor, toolVerMinor, toolVerRevision, toolBuildNumber, toolToolName, toolSummary, toolCopyright, toolDescription, toolUsage, stlsoft::environment_variable_exists("SS_TOOLS_NO_HELP_BLANK_LINES"));
      }
      if(clasp::flag_specified(args, "--version"))
      {
        return sscli::show_version(args, libCLImate_aliases, stdout, toolVerMajor, toolVerMinor, toolVerRevision, toolBuildNumber, toolToolName);
      }
      clasp::verify_all_flags_and_options_used(args);

      if(0 == args->numValues)
      {
        fprintf(
          stderr
        , "%s: no paths specified; use --help for usage\n"
        , TOOL_NAME
        );
        return EXIT_FAILURE;
      }
      { size_t i; for(i = 0; i != args->numValues; ++i)
      {
        if(0 != pown(args->values[i].value.ptr, TOOL_NAME, stdout, stderr))
        {
          break;
        }
      }}
      return EXIT_SUCCESS;
    }
 

Listing 5

    // includes
    #include "pown.hpp"
    #include "identity.h"
    #include <SynesisSoftware/SystemTools/program_identity_globals.h>
    #include <SynesisSoftware/SystemTools/standard_argument_helpers.h>
    #include <systemtools/clasp/clasp.hpp>
    #include <stlsoft/util/bits/count_functions.h>

    using namespace ::SynesisSoftware::SystemTools::tools::pown;

    // aliases
    extern "C"
    clasp::alias_t const libCLImate_aliases[] =
    {
      // stock
      SS_SYSTOOLS_STD_FLAG_help(),
      SS_SYSTOOLS_STD_FLAG_version(),

      // logic
      CLASP_BIT_FLAG("-a", "--show-account", POWN_F_SHOW_ACCOUNT, "shows owner account. By default, this is shown only when multiple paths are to be powned"),
      CLASP_BIT_FLAG("-d", "--show-domain", POWN_F_SHOW_DOMAIN, "shows owner account domain. By default, this is shown only when multiple paths are to be powned"),

      CLASP_BIT_FLAG("-r", "--show-file-rel-path", POWN_F_SHOW_FILE_REL_PATH,  "shows file relative path, By default, no path is shown for a single powned path, and relative paths are shown for multiple powned paths"),
      CLASP_BIT_FLAG("-s", "--show-file-stem", POWN_F_SHOW_FILE_STEM,      "shows the file stem only. By default, . . .
      CLASP_BIT_FLAG("-p", "--show-file-path", POWN_F_SHOW_FILE_PATH, "shows the file full path. By default, . . .

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    // main
    int
    tool_main_inner(
      clasp::arguments_t const* args
    )
    {
      // process flags & options
      int flags = 0;
      clasp_checkAllFlags(args, SSCLI_aliases, &flags);
      clasp::verify_all_options_used(args);

      // can specify at most one file-path flag
      if(stlsoft::count_bits(flags & POWN_F_SHOW_FILE_MASK_) > 1)
      {
        fprintf(
          stderr
        , "%s: cannot specify more than one file-path flag; use --help for usage\n"
        , systemtoolToolName
        );
        return EXIT_FAILURE;
      }

      // process values
      switch(args->numValues)
      {
        case 0:
          fprintf(
            stderr
          , "%s: no paths specified; use --help for usage\n"
          , systemtoolToolName
          );
          return EXIT_FAILURE;
        case 1:
          break;
        default:
          if(0 == (POWN_F_SHOW_FILE_MASK_ & flags))
          {
            flags |= POWN_F_SHOW_FILE_REL_PATH;
          }
          break;
      }
      { size_t i; for(i = 0; i != args->numValues; ++i)
      {
        char const* const path = args->values[i].value.ptr;
        if(0 != pown(path, flags, systemtoolToolName, stdout, stderr))
        {
          break;
        }
      }}
      return EXIT_SUCCESS;
    }
 

In the meantime, I plan to release wizards that generate CLI programs, starting with Visual Studio (2010-15), and possibly moving on to Xcode if I get time. Look out on the Synesis Software website in September for these resources, and feel free to make requests or lend a hand.

Acknowledgements

Many thanks to the members of accu-general who volunteered suggestions for the name of libCLImate, and to Jonathan Wakeley in particular, whose ghastly pun I will explain next time. Thanks too to the long-suffering editor whose patience with my lateness is never taken for granted.

Author Bio

Matthew is a software development consultant and trainer for Synesis Software who helps clients to build high-performance software that does not break, and an author of articles and books that attempt to do the same. He can be contacted at matthew@synesis.com.au.

References

[AoCLIC] Anatomy of a CLI Program written in C, Matthew Wilson, CVu September 2012.

[AoUP] Art of UNIX Programming, Eric Raymond, Addison-Wesley, 2003

[C++PS] C++ Coding Standards, Herb Sutter and Andrei Alexandrescu, Addison-Wesley, 2004

[CLASP] An Introduction to CLASP, part 1: C, Matthew Wilson, CVu January 2012; also http://sourceforge.net/projects/systemtools

[EAM] http://c2.com/cgi/wiki?ExecuteAroundMethod

[FF-1] An Introduction to FastFormat, part 1: The State of the Art, Matthew Wilson, Overload #89, February 2009

[FF-2] An Introduction to FastFormat, part 2: Custom Argument and Sinks Types, Matthew Wilson, Overload #90, April 2009

[FF-3] An Introduction to FastFormat, part 3: Solving Real Problems, Quickly, Matthew Wilson, Overload #91, June 2009

[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004

[PragProg] The Pragmatic Programmer, Dave Thomas and Andy Hunt, Addison-Wesley, 2000

[PANTHEIOS] http://pantheios.org/

[QM-6] Quality Matter #6: Exceptions for Practically-Unrecoverable Conditions, Matthew Wilson, Overload 98, August 2010