Quality Matters, #2: Correctness, Robustness, and Reliability

by Matthew Wilson

This column instalment first published in ACCU's Overload, #93, October 2009. All content copyright Matthew Wilson 2009-2010.

Introduction

In the previous instalment I defined correctness as 'the degree to which a software entity's behaviour matches its specification' [QM-1], but didn't offer definitions of robustness or reliability. This time I'm going to take the plunge and attempt definitions of them. I embark on a (possibly deranged) attempt to equate computing with the worlds of Newtonian and Quantum Physics, along with the somewhat more obvious parallel drawn between the behaviour of software systems and chaos theory.

I'll do my best to keep my feet on planet Earth by using examples from real-world experience, illustrating how some software entities can be established to be correct, but the best we can hope for with most is to ensure adequate levels of robustness. I'll also comment on why correctness may be of no interest to non-programmers, and reliability is not of much interest to programmers.

Weaving together a cogent narrative for this instalment has been exceedingly difficult, and you may well feel that it's escaped me. If so, all I can say is watch out for the instalment on contract programming!

Extant definitions

Before I can begin to pontificate about robustness and reliability, I need to consider the definitions that currently exist in the canon.

The Shorter Oxford English Dictionary (SOED) [SOED] gives the following definitions:

Correct: Free from error; accurate; in accordance with fact, truth, or reason. Conforming to acknowledged standards of style, manners, or behaviour; proper.
Robust: Strong and sturdy in physique or construction. Involving or requiring bodily or mental strength or hardiness. Strong, vigorous, healthy; not readily damaged or weakened.
Reliable: That which may be relied on; in which reliance or confidence may be put; trustworthy, safe, sure.

Steve McConnell [CC] gives these definitions:

Correctness: The degree to which a system is free from [defects] in its specification, design, and implementation.
Robustness: The degree to which a system continues to function in the presence of invalid inputs or stressful environmental conditions.
Reliability: The ability of a system to perform its requested functions under stated conditions whenever required - having a long mean time between failures.

Bertrand Meyer [OOSC] gives these definitions:

Correctness: The ability of software products to perform their exact tasks, as defined by their specification.
Robustness: The ability of software systems to react appropriately to abnormal conditions.
Reliability: A concern encompassing correctness and robustness.

As is probably quite obvious, my definition of correctness is informed by these definitions, which I've examined many times previously. The important aspect taken from Meyer's definition [OOSC] is that correctness is relative to a specification. Indeed, Meyer states this most clearly in his Software Correctness Property [OOSC]:

Software Correctness Property: Correctness is a relative notion.

Without a specification against which to compare behaviour, the notion of correctness is meaningless.

The important aspect taken from McConnell [CC] is that correctness is a variable notion, and that a software entity's behaviour may correspond to a specification to a certain degree. At first blush this may seem a bizarre idea. Certainly, a software entity that is known to fail to meet its specification is defective (aka incorrect), plain and simple. From the perspective of a potential user of a software entity, that its creator (or any other agent) may volunteer that it is 50% correct or 90% correct is of no use, because such figures, even if obtained by repeatable quantitative measurements, e.g. unit-tests, cannot be meaningfully used in the calculation of quantitative failure probabilities of a software system built from the offending entity. We'll discuss why in the next section.

Beyond this, there are other, serious, objections to attempting to make use of a software entity that is known to be defective - something known as the Principle of Irrecoverability - but those discussions will have to wait until another time.

So what is the purpose of considering correctness as a quantitative concept (in addition to its being a relative one)? Well, there are the practical benefits to the producers of a software entity in being able to quantify its degree of divergence from a specification. Of course, we all know that any given defect can, upon cursory examination, appear to be of the same magnitude (of corrective development effort) as another, and yet take two, five, ten, sometimes hundreds of times longer to correct. But averaged out over the course of a project, team, career, there is a usefulness to being able to quantify. Certainly, when I'm developing software libraries, I can take the temperature for the project velocity (if you'll pardon the atrocious mixing of my metaphors) via the defect fix rate in a new (or regressive) group of tests.

But all of the foregoing paragraph, while having some utility to our consideration of the subject, is pretty pedestrian stuff, infused with more than a whiff of equivocation. It could even be taken as an invitation to debates I'm not much interested in having. In point of fact, we needn't care about this stuff, because there's a point of far greater significance in eschewing the speciously attractive binary notion of correctness. Simply, a given software entity can exist in three apparent states of correctness:

correct. It has been established correct against its specification
defective. It has been established incorrect against its specification
unknown. Its correctness has not been established against a specification

The third state is somewhat like poor old Schrödinger's cat [GRIBBIN], who is neither dead nor alive until examined. So too, software can be correct, or defective, or neither (known to be) correct nor defective. The latter state collapses into exactly one of the former when it is evaluated against a specification.

In this instalment I'm going to consider the notion that most, perhaps all, software systems are built up from layers of abstraction most of which are in the disconcerting third state of uncertain correctness. Furthermore, I'm going to argue that software has to be like this, and that's what makes it challenging, fun, and not a little frightening.

(Note: I'm still not going to discuss the definitions of what a specification is in this instalment. What a tease ...)

Exactitude, non-linearity, Newtonian software, quantum execution environments, and why Software Development is not an engineering discipline.

A perennial debate within (and without) the software community is whether software development is an engineering discipline, and, if not, why not. Well, despite plentiful (mis)use of the term 'software engineer' in my past, I'm increasingly moving over to the camp of those whose opinion is that it is not an engineering discipline. To illustrate why I'm going to draw from three of my favourite branches aspects of science: Newtonian physics, Chaos theory, and Quantum physics, with a modicum of logic thrown in for good measure.

The Unpredictable Exactitude Paradox

As my career has progressed - both as practitioner (programmer, consultant) and as (imperfect) philosopher (author, columnist) - the issues around software quality have grown in importance to me. The one that confounds and drives me more than all others is (what I believe to be) the central dichotomy of software system behaviour:

The Unpredictable Exactitude Paradox: Software entities are exact and act precisely as they are programmed to do, yet the behaviour of (non-trivial) computer systems cannot be precisely understood, predicted, nor relied upon to refrain from exhibiting deleterious behaviour.

Note that I say programmed to do, not designed to do, because a design and its reification into program form are often, perhaps mostly, perhaps always, divergent. Hence the purpose of this column, and, to a large extent, the purposes of our careers. (The issue of the commonly defective transcription of requirements to design to code will have to wait for another time.)

Consider the behaviour of the computer on which I'm writing this epistle. Assuming perfect hardware, it's still the case that the sequence of actions - involving processor, memory, disk, network - carried out on this machine during the time I've written this paragraph have never been performed before, and that it is impossible to rely on the consequences of those actions. And that is despite the fact that the software is doing exactly what it's been programmed to do.

I mentioned earlier that the relationship between the size/number of defects and the effort involved to remedy them is not linear. This non-linearity is also to be seen in the relationship between the size/number of defects and their effects. Essentially, this is because software operates on the certain, strict interpretation of binary states, and there are no natural attenuating mechanisms in software at the scale of these states. If one Iron atom in a bridge is replaced by, say, a Molybdenum atom, the bridge will not collapse, nor exhibit any measurable difference in its ability to be a bridge. Conversely, an errant bit in a given process may have no effect whatsoever, or may manifest benignly (e.g. a slightly different hue in one pixel in a picture), or may have major consequences (e.g. sending confidential information to the wrong customer).

We, as software developers, need language to support our reasoning and communication about software, and it must address this paradox, otherwise we'll be stuck in fruitless exchanges, often between programmers and non-programmers (clients, users, project managers), each of whom, I believe, tend to think and see the software world at different scales. I will continue the established use of the term correctness to represent exactitude. And I will, influenced somewhat by Meyer and McConnell, use the terms robustness and reliability in addressing the inexact, unpredictable, real behaviour of software entities.

Bet-Your-Life?: review

Let's look at some code. Remember the first of the Bet-Your-Life? Test cases from the last the previous instalment [QM-1]:

    bool invert(bool value);

We can implement this easily, as follows:

    bool invert(bool value)
    {
      return !value;
    }

In fact, it'd be pretty hard to write any implementation other than this. Certainly there are plenty of (possibly apocryphal) screeds of long-winded alternative implementations available on the web (such as on www.thedailywtf.com), but pretty much any functionally correct implementation that does not involve fatuous complexity/dependencies - such as converting value to string and them using strcmp() against "0" or "1" - will evaluate to the following pseudo-machine code in Listing 1.

Listing 1

    bool invert(register bool value)
    {
      register bool result;
      if(0 != value)
      {
        result = 0;
      }
      else
      {
        result = 1;
      }
      return result;
    }

With languages that have a bona-fide Boolean type, such as Java and C#, the value may not need to be compared against 0, and may well be implemented as equal to true (or to false). Other languages such as C and C++ represent (for historical and performance reasons [IC++]) a notional Boolean false value as being 0, and a notional Boolean true value as being all non-0 values. In those, comparison against zero is necessary, even for their built-in bool types! In either case, it's almost impossible to implement this function incorrectly.

If we permit ourselves the luxury of assuming a correctly functioning execution environment, then without recourse to any automated techniques, or even to a detailed written specification, we may reasonably assert the correctness of this function by visual inspection.

Now consider the definition of strcmp(), the second Bet-Your-Life? Test case:

    int strcmp(char const* lhs, char const* rhs);

The Sign of Shame

Even with a function as logically straightforward as strcmp() there are different ways to implement it. Rather than this implementation shown, we could instead take the difference between *s1 and *s2 for each iteration and, when != 0, return that value. What was interesting to me was that I had forgotten the nuances resolved in previous implementations, and I initially wrote the last line as:

    return *s1 - *s2;

Then (being as how I'm writing a column about software quality, and my Spidey-sense is set to max) I stopped and wondered how this would work between compilation contexts where char is signed or unsigned. Clearly, for certain ranges of values, the negation result will be different between the two. I then immediately set about writing a test program, and building for both signed and unsigned modes, and saw the suspected different behaviour for certain strings (with values in the range 0x80 - 0xff).

My next instinct was to include stlsoft/stlsoft.h - STLSoft has limited C-compatibility - and write the implementation in an STLSOFT_CF_CHAR_IS_UNSIGNED-dependent manner. Serendipitously, it was at this point that I thought to check with the C-99 standard, in order to verify my recollection that the return values could be any negative value for less-than (rather than strictly -1) and any positive value for greater-than (rather than strictly +1). What I was wrily amused to learn (and not a little appalled to have forgotten/not known) was that clause 7.21.4(1) stipulates that the return value 'is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared'. I've been programming C for, ulp!, 21 years, and have written several implementations of strcmp() (which, by happy coincidence, have done the right thing), but either I've forgotten that comparison was to be done unsigned, or I never knew it in the first place. Either way, it's quite sobering.

What this illustrates all too well is that software is exact, humans operate on assumption and expectation, and the two are not good bedfellows. (When I spoke to my good friend and regular reviewer, Garth Lancaster, about this, he too was ignorant of the unsigned comparison aspect of strcmp(). He was equally abashed by this omission/presumption, but callously stated that I bear more shame than he because I've written more articles/libraries/books. Bah!)

Here's an implementation I knocked up during the preparation of this instalment, without recourse to any I've written in the past (or to various open-source and commercial implementations).

    int strcmp(char const* s1, char const* s2)
    {
      for(; '\0' != *s1 && *s1 == *s2; ++s1, ++s2)
      {}
      return (int)(unsigned char)*s1 - (int)(unsigned char)*s2; /* C99: 7.21.4(1) */
    }

Notwithstanding an issue I had with the signedness of the comparison (see sidebar), I intended to use the example of strcmp() as a (modest) stepping-stone in complexity - up from invert(), and down from b64_encode() - which relies on more assumptions about the execution environment:

That s1 points to a region of memory that is read-accessible over the range [s1, s1 + N1], where N1 is a non-negative integer representing the number of char elements in the memory starting at s1 that do not have the value '\0'
That s2 points to a region of memory that is read-accessible over the range [s2, s2 + N2], where N2 is a non-negative integer representing the number of char elements in the memory starting at s2 that do not have the value '\0'

If this smells suspiciously like a contract pre-condition [OOSC, IC++], well, that's something we'll examine in a later instalment.

This additional reliance on external factors is a significant part of the increased complexity over invert(). In languages such as C#(/.NET) and Java, it is reasonable to assume that an object reference is valid (or is the sentinel value, null), but in C (and C++) where pointers have free range, it is possible for strcmp() to receive pointers that:

are intentionally valid, and point into a C-style string (which might be the empty string "")
are null, having the value NULL, representing "no string". For strcmp(), this is unequivocally a defect, and in many execution environments will precipitate a hardware event that should be interpreted as a fatal condition, stopping the process, which is a good thing
are invalid, in that they point into areas in the address range that are reserved for the kernel, are unmapped, write-only, and so forth. In many execution environments this will precipitate a hardware event, and in most cases this will be interpreted as a fatal condition, and the process stopped
are unintentionally apparently valid, in that they point into some area of the address range in which the memory is read-accessible in a range that includes a (byte that can be interpreted as a) nul-terminator character ('\0'). In this case, what is unequivocally a defect (in the caller) will not be detected, and the defective process may continue to execute, with unpredictable outcomes

The possibility of the latter two options makes reasoning about the correctness of strcmp() and software entities built in terms of it more complicated than is the case for invert(). Specifically, it is possible for strcmp() to be passed invalid arguments (as a result of a defect elsewhere within the program), whereas all 'physically' possible arguments to invert() are valid.

The next Bet-Your-Life? Test case is b64_encode() (see Listing 2).

Listing 2

    size_t b64_encode(
      void const* src
    , size_t      srcSize
    , b64_char_t* dest
    , size_t      destLen
    )
    {
      . . .
      b64_char_t* p   =   dest;
      b64_char_t* end =   dest + destLen;
      size_t      len =   0;
      for(; NUM_PLAIN_DATA_BYTES <= srcSize; srcSize -= NUM_PLAIN_DATA_BYTES)
      {
        b64_char_t
           characters[NUM_ENCODED_DATA_BYTES];
        characters[0] = (b64_char_t)((src[0] & 0xfc) >> 2);
        characters[1] = (b64_char_t)(((src[0] & 0x03) << 4) + ((src[1] & 0xf0) >> 4));
        characters[2] = (b64_char_t)(((src[1] & 0x0f) << 2) + ((src[2] & 0xc0) >> 6));
        characters[3] = (b64_char_t)(src[2] & 0x3f);
        src += NUM_PLAIN_DATA_BYTES;
        *p++ = b64_chars[(unsigned char)characters[0]];
        ++len;
        *p++ = b64_chars[(unsigned char)characters[1]];
        ++len;
        *p++ = b64_chars[(unsigned char)characters[2]];
        ++len;
        *p++ = b64_chars[(unsigned char)characters[3]];
        if( ++len == lineLen &&
            p != end)
        {    {
          *p++ = '\r';
          *p++ = '\n';
          len = 0;
        }
      }

      if(0 != srcSize)
      {
        unsigned char dummy[NUM_PLAIN_DATA_BYTES];
        size_t        i;
        for(i = 0; i < srcSize; ++i)
        {
          dummy[i] = *src++;
        }
        for(; i < NUM_PLAIN_DATA_BYTES; ++i)
        {
          dummy[i] = '\0';
        }
        b64_encode_(&dummy[0], NUM_PLAIN_DATA_BYTES,
           p, NUM_ENCODED_DATA_BYTES * (1 + 2),
           0, rc);
        for(p += 1 + srcSize; srcSize++ < NUM_PLAIN_DATA_BYTES; )
        {
          *p++ = '=';
        }
      }
      . . .
    }

I'm not going to show the full implementation of this for brevity's sake. (If you're interested you can download the library [B64] and see for yourself.) Like strcmp() (and invert()), the b64 library has no dependencies on any other software libraries, not even on the C runtime library (except when contracts are being enforced, e.g. in debug builds). This permits a substantial level of confidence in behaviour, because only the b64 software entities themselves are involved in such considerations. Broadly speaking, it means that behaviour, once 'established', can be relied on regardless of other activities in the execution environment. However, it's fair to say that the internal complexity of b64_encode() is substantially increased over that of strcmp(). Consequently, I think it is impossible in a library such as this to stipulate its correctness based on visual inspection of the code; anyone who would do so would be rightly seen as reckless (at best).

Thus we can see that increasing complexity acts strongly against human-assessed correctness. But there's more to this than correctness. Let's now consider the final member of the Bet-Your-Life? Test cases, Recls_Search() from the recls library [RECLS]:

    RECLS_API Recls_Search(
      char const* searchRoot
    , char const* patterns
    , int         flags
    , hrecls_t*   phSrch
    );

An incomplete description of the semantics of this function are as follows:

The function conducts a search of the file-system location specified by searchRoot, looking for entries that match the given patterns, according to the given search flags. If a file-system search can be initiated then a search context is created and assigned to *phSrch (to be used to elicit search results via other API functions), and a success return code returned; if not, a failure code is returned.
If searchRoot is NULL or empty, the current directory is used.
If patterns is NULL, the 'all entries' pattern - "*" on UNIX, "*.*" on Windows - is assumed.
Multi-part patterns may be specified, separated by the '|' symbol, e.g. "makefile*|*.c"
File entries can be included in the search by specifying the RECLS_F_FILES flag. Directories by the RECLS_F_DIRECTORIES flag. (Files is assumed if neither specified.)
etc. etc. ...

Clearly the recls library (or at least this part of it) has substantial behavioural complexity. That alone makes it, in my opinion, impossible for any reasonable developer to stipulate its correctness. But that's only part of it. Of greater significance is that recls is implemented in terms of a great many other software entities, including library components (from STLSoft) and operating system facilities (e.g. the opendir() API on UNIX and the FindFirstFile() API on Windows). And even that is not the major issue. The predominant concern is that recls interacts with the file-system, whose structure and contents can (and do) change independently of the current process. By definition, it is impossible to establish correctness for recls or any other software entities who interact with aspects of the execution environment that are subject to change from other, independent software entities.

By now you're probably starting to worry now that I'm asserting that correctness cannot be stipulated. Am I saying that software cannot be correct?

Newtonian software, quantum execution environment

At the risk of embarrassing myself, because it's been 20 years since I did any formal study of the subject, I will now draw parallels between software+hardware and Newtonian+quantum physics.

Consider a point object travelling through an empty universe. In Newtonian physics, the object will continue to travel in the same line, at the same speed, forever more. If there are two point objects, they will influence each other's travel in predictable ways, based on their masses, positions and velocities. But add in a third, fourth, ... trillionth object, and the behaviour of the universe becomes complex, and therefore unpredictable (beyond small timescales within which simplifying assumptions may be used to form reasonable approximate results). As is the case in reality, if the objects are non-point, then we have to consider rotation of the bodies, and heat, and a whole lot more besides, including chemistry, biology, even sociology and technology! Thus, even in a Newtonian universe, behaviour is non-linear (and unpredictable) due to the interactions of entities (in part because some of the quantities involved are irrational, and calculations thereby require infinite precision).

In a quantum universe, there are two challenges to our understanding even in the case of a single point object. For one thing, it is, in principle, impossible to state with certainty the position and momentum of the object. Second, it's possible that a virtual particle will spring into existence in any part of the otherwise empty universe at any time. (Here my inadequate training lets me down in understanding whether a virtual particle can have a net effect on our single travelling particle, but I think you get enough of the picture for us to have a working analogy.)

I contend that software is conceived in a Newtonian frame, where we imagine we can rely on perfect (non-defective) execution environments, and that hardware, necessarily, introduces a quantum aspect, due to the imperfect reliability of hardware systems (and the occasional cosmic ray that might flip a bit inside your processor) and the actions of other operating entities (programs, hardware, etc.). Let's look back to the Bet-Your-Life? Test examples from the previous instalment, and consider the behaviours in light of the two perspectives, where imperfect execution environments are subject to 'Quantum' surprises:

invert() is subject only to failures in the execution environment - specifically the processor.
In addition to execution environment failures - in this case processor and memory system (e.g. bus + virtual memory) - strcmp() will also fail if it is passed invalid inputs as a result of a defect in another part of the program. The same goes for b64_encode().
With Recls_Search(), we have a great many more reliances, both hardware and software. As well as failures in execution environment - processor, memory, disk - recls must also handle asynchronous external non-failure events of the disk system (such as examining a directory concurrent with files being added/removed from it by a different process/thread). And in pure software terms, it depends on the software entities from the system APIs and other open-source libraries.

Specification

I'm not going to engage in discussion about specifications in this instalment, but must at least provide a definition in order that we can properly engage in further reasoning about correctness. Without further ado, a specification is one (or both, if used in concert) of two things:

Specification: A software entity's specification is the sum of all its passing unit-tests.

and

Specification: A software entity's specification is the sum of all its unfired active contract enforcements.

Everything else is fluff and air.

(Note: for today, I'm considering only functional aspects of specifications. Other aspects, such as performance - time and/or resource consumption - are outside the scope of this instalment, and will be discussed at another time. I'm also only going to be talking about measuring specifications in terms of unit-tests.)

Final definitions

Given the forgoing discussion, I'm now in a position to offer my definitions of these three important aspects of software quality.

Correctness: The degree to which a software entity's behaviour matches its specification.

Robustness: The adjudged ability of a software entity to behave according to the expectations of its stakeholders.

Reliability: The degree to which a software system behaves robustly over time.

Correctness

Correctness is exact and measurable. It is the concern of software developers.

When measured (against its specification), the correctness of a software entity 'collapses' from the unknown state to exactly one of two states: correct and defective.

The binary nature of measured correctness is a great thing. For example, consider that we measure the correctness of invert() as shown in Listing 3 (assuming a C# implementation, with NUnit [NUNIT]).

Listing 3

    [Test]
    public void Test_False()
    {
      Assert.IsTrue(BetYourLifeTests.Invert(false));
    }
    [Test]
    public void Test_True()
    {
      Assert.IsFalse(BetYourLifeTests.Invert(true));
    }

That's a complete functional test for BetYourLifeTests.Invert(). Informed by this, we could now implement another function, Nor(), as shown in Listing 4 (sticking with C#).

Listing 4

    public static class LogicalOperations
    {
      public static bool Nor(bool v1, bool v2)
      {
        return BetYourLifeTests.Invert(v1) && BetYourLifeTests.Invert(v2);
      }
    }

Knowing that Invert() is correct, we may choose to assert that Nor() will faithfully give expected behaviour based on visual inspection. And we could go on to completely measure that correctness with ease, involving just four unit-tests.

Robustness

However, add in just a little complexity and things get sticky very quickly. Consider that we've measured strcmp()'s correctness against a unit-test suite as shown in Listing 5, this time in C, with xTests [XTESTS].

Listing 5

    static void test_equal()
    {
      XTESTS_TEST_INTEGER_EQUAL(0, strcmp("", ""));
      XTESTS_TEST_INTEGER_EQUAL(0, strcmp("a", "a"));
      XTESTS_TEST_INTEGER_EQUAL(0, strcmp("ab", "ab"));
      XTESTS_TEST_INTEGER_EQUAL(0, strcmp("abc", "abc"));
    }

    static void test_less()
    {
      XTESTS_TEST_INTEGER_LESS(0, strcmp("a", "b"));
      XTESTS_TEST_INTEGER_LESS(0, strcmp("ab", "bc"));
      XTESTS_TEST_INTEGER_LESS(0, strcmp("abc", "bcd"));
    }

    static void test_greater()
    {
      XTESTS_TEST_INTEGER_GREATER(0, strcmp("b", "a"));
      XTESTS_TEST_INTEGER_GREATER(0, strcmp("bc", "ab"));
      XTESTS_TEST_INTEGER_GREATER(0, strcmp("bcd", "abc"));
    }

Clearly, this is not a comprehensive test suite. But the permutations of arguments passed to strcmp() in the myriad programs built from it will dwarf that found in any unit-test suite. Consequently, we are all using strcmp() beyond its specification. Specifically, we are using strcmp() in a state of unknown correctness. How do we get away with it? We apply judgement.

A correct software entity has been proven so by mechanical means. A robust software entity has been judged as likely to behave according to expectations. This judgement is based on our knowledge of the software entity's interface, its likely complexity, its author(s), its published test suite, the skills and experience of the judge, and many other factors.

We can define a principle for robustness as:

The Robustness Principle: A robust software entity is comprised of:

0 or more correct software entities
0 or more robust software entities
0 defective software entities

We must now concern ourselves with how correctness propagates between software entity dependencies. Consider the function f(), which is implemented in terms of strcmp():

    int f(char const* s)
    {
      return strcmp(s, "fgh");
    }

What can we say about the correctness of f()? Well, until we test it, by definition it has unknown correctness. But howsoever we make use of it - whether in test or in a software application - we are using strcmp() outside the bounds of its specification, because "fgh" is not included in strcmp()'s test suite. By definition, therefore, we will be using strcmp() in a manner in which its correctness is unknown.

Consider that we now write a suite of tests for f(), as in Listing 6.

Listing 6

    static void test_equal()
    {
      XTESTS_TEST_INTEGER_EQUAL(0, f("fgh"));
    }
    static void test_less()
    {
      XTESTS_TEST_INTEGER_LESS(0, f("abc"));
      XTESTS_TEST_INTEGER_LESS(0, f("bcd"));
      XTESTS_TEST_INTEGER_LESS(0, f("cde"));
      XTESTS_TEST_INTEGER_LESS(0, f("def"));
    }
    static void test_greater()
    {
      XTESTS_TEST_INTEGER_GREATER(0, f("ghi"));
      XTESTS_TEST_INTEGER_GREATER(0, f("hij"));
      XTESTS_TEST_INTEGER_GREATER(0, f("ijk"));
      XTESTS_TEST_INTEGER_GREATER(0, f("jkl"));
    }

Since we have a specification for f(), and f() meets that specification, we can state that f() is correct. However, there is something a little strange about having a component that is correct when it is implemented using another component that has unknown correctness.

Taking this notion to extreme, we might wonder whether we can implement a correct software entity in terms of a defective one? Let's imagine that our implementation of strcmp() always returns a value of 0 when passed a string of less than three characters. With this behaviour it would fail four of the tests of its specification and thus be proven defective. But since the test suite for f() always uses strings of length three, it would still pass all cases. f() is proven correct, yet is implemented in terms of a defective component. That is more than a little strange, and violates the robustness principle given above.

The answer to this apparent conundrum lies in the notion of robustness. Confidence is placed in a software entity based on a number of factors, knowing that it will be used outside the exact, but necessarily limited, aspects of its specification. An implementation of f() that uses a correct strcmp() outside the bounds of its specification is, while common, something that should give pause for thought. An implementation of f() that uses a defective implementation of strcmp() violates the robustness principle and, in my opinion, should never be countenanced.

In both cases, the implementation of f() is brittle. And as each layer of abstraction and dependency is added, this brittleness spreads and compounds, and the combinatorial cracks through which extra-correctness behaviour can permeate increase. Thus, an important part of the skill/art of the software developer lies in making judgements about robustness when implementing software entities in terms of others that have unknown correctness (and that must therefore be judged on their robustness).

Robustness is inexact and subjective. It cannot be measured or proven, and it cannot be automated (beyond a few static analysis tricks). It is equally the concern of software developers, who must provide it, and stakeholders, whose experiences of the software system define it.

Reliability

I am moved to almost completely agree with McConnell's definition of reliability, but I do feel that reliability is a measurable, quantifiable, emergent property of a software system's behaviour. In some senses, it could be thought of as robustness over time, but robustness can't measured, so maybe it's better thought of as apparent robust action over time.

Reliability is more a concern to stakeholders than it is to developers, reflecting the differing perspectives between these groups. To stakeholders, it is almost entirely irrelevant how many constituent software entities were correct versus those adjudged robust. To stakeholders, the proof of the pudding is the eating, and that's its reliability.

Conversely, to software developers, the more correctness that can be adduced the better, because it simplifies the construction of dependent software entities. Reliability, on the other hand, is a distant prospect to a developer, and probably viewed in different ways. For example, I can say that I am motiviated, by pride, to have 0 failures ever; 1 or 10 failures would be equally galling. Conversely, frequency of failures is of proper relevance to a user, who may well tolerate one failure per month if the software can cost him/her significantly less than the version that fails once per year (or never). Many do, and many others just expect software failure, otherwise how do we explain the popularity of certain operating systems, editors, websites, ...

Naturally, I'm not suggesting that tuning software failure frequencies is a good thing; I believe that we can all write much more robust software without suffering in the process. That's the raison d'être of this column, and as we proceed I intend to pursue the notion that we should all be aiming for maximum quality all the time.

Summary

At this point I'd intended to go on to examine some of the interesting conflicts between correctness and robustness, and between them and other software characteristics, as well as discussing practical techniques for ensuring robustness when correctness is not achievable. I've even got an argument in favour of Java's hateful checked exceptions. But I've run out of space (and time), and these will have to wait until another instalment.

For the moment, I'll posit a parting rubric that correctness is the worthy aim wherever possible (which is rare), and robustness is the practical must-have in all other circumstances.

I'm not sure what's coming in the next instalment, but I'm determined that it's going to have a lot more code, and a lot less philosophy than this one. It's too exhausting!

See you next time.

References and asides

[B64] http://synesis.com.au/software/b64/

[CC] Code Complete, 2nd Edition, Steve McConnell, Microsoft Press, 2004

[GRIBBIN] In Search of Schrödinger's Cat, John Gribbin, Corgi, 1984

[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004

[NUNIT] http://nunit.org/

[OOSC] Object Oriented Software Construction, 2nd Edition, Bertrand Meyer, Prentice-Hall, 1997

[QM-1] Quality Matters, Part 1: Introductions, and Nomenclature, Matthew Wilson, Overload 92, August 2009

[RECLS] http://www.recls.org/

[SOED] The New Shorter Oxford English Dictionary, Thumb Index Edition, ed. Lesley Brown, Clarenden Press, Oxford, 1993.

[XTESTS] http://xtests.org/