On formalism in specifications

Bertrand Meyer
(1985)



To the article (PDF).



Reference: On Formalism in Specifications, in IEEE Software, vol. 3, no. 1, January 1985, pages 6-25.

(Also published in: in T. Colburn, J. Fetzer, and T. Rankin (eds.), Program Verification: Fundamental Problems in Computer Science, Kluwer Academic Publishers, Dordrecht, Netherlands, 1993. Also in Dutch translation: Over het gebruijk van formalismen in specificaties, in Informatie, jaargang 28 nr. 5-6, 1986.)

Click here for the PDF version of the article.

Below: a subset of the article, some text parts only. Not really usable -- refer to the PDF above.


ON FORMALISM IN SPECIFICATIONS


Bertrand Meyer
Department of Computer Science
University of California
Santa Barbara, Ca. 93106

We emphasize the inadequacy of natural language for writing precise specifications of software systems. The discussion is based on a careful study of a short text which was explicitly written to provide a good specification of a simple programming problem. The deficiencies of this specification are analyzed and a classification of common mistakes is proposed, which may be of use to all authors of software-related documentation. A mathematical specification is offered for the example problem, and it is shown that such a formal specification may in return yield a better natural-language statement of the requirements. The paper concludes with a discussion of the respective ro^les of natural language and formal notations in requirements and specifications.

1. INTRODUCTION

Specification is the phase of the software life-cycle which is concerned with obtaining a precise definition of the tasks to be performed by the system under construction. Although the necessity for such a phase is emphasized in software engineering textbooks, it is often overlooked in practice. More precisely, there is a tendency to confuse specification with either the preceding phase - the defini- tion of system objectives - or the following one - design. In the first case, which we shall particularly consider here, a natural-language requirements document is deemed sufficient to proceed to the system design, without any further specification activity.

The aim of this paper is to emphasize the drawbacks of such an informal approach and to show the usefulness of for- mal specifications. To avoid any misunderstanding, however, one point should be made clear from the beginning: in no way do we advocate formal specifications as a replacement for natural language requirements. We view them as a complement to natural language descriptions, and also, as will be illustrated by an example, as an aid in improving the qual- ity of natural language specifications.

We expect that many readers will already be convinced of the benefits of formal specifications; they may find in this paper some useful arguments to reinforce their viewpoint. Readers who do not share it will, we hope, find some interesting ideas to ponder.

2. THE SEVEN SINS OF THE SPECIFIER

The study of requirements documents as they are rou- tinely produced in industry yields recurring patterns of deficiencies. Table 1 lists seven classes of deficiencies which we have found to be both common and particularly damaging to the quality of requirements.

The classification given here seems to us interesting for two reasons. First, by showing the pitfalls of natural-language requirements documents, it gives some weight to the thesis that formal specifications are needed as an intermediate step between requirements and design. Second, since natural-language requirements will be neces- sary anyway (whether or not one accepts the thesis that they should be complemented with formal specifications), it pro- vides writers of such requirements with a checklist of com- mon mistakes which they should avoid as much as possible. In fact, writers of most kinds of software documentation (e.g. user manuals, programming language manuals, etc.) should find this checklist useful.

center allbox tab (!); L C. Noise!T{ The presence of an element in the text which does not carry information relevant to any feature of the problem. Variants: redundancy; remorse. T} Silence!T{ The existence of a feature of the problem which is not covered by any element of the text. T} Overspecification!T{ The presence of an element in the text which does not correspond to a feature of the problem, but to features of a possible solution. T} Contradiction!T{ The presence of two or more elements in the text which define a feature of the system in an incompatible way. T} Ambiguity!T{ The presence of an element in the text which makes it possible to understand a feature of the problem in at least two different ways. T} Forward reference!T{ The presence of an element in the text which uses features of the problem defined later in the text. T} Wishful thinking!T{ The presence of an element in the text which defines a feature of the problem in such a way that a candidate solution may not realistically be validated with respect to this feature. T}

Table 1: The Seven Sins of the Specifier

Rather than commenting on these defects in detail now, we prefer to study them on a particular example, which exhi- bits all but the last.

3. A REQUIREMENTS DOCUMENT

The reader is invited to study some of the software documentation available to him in light of the previous list. We might do just the same here and discuss actual requirements documents, taken from industrial software pro- jects. We did this in a previous version of this paper Meyer formalisme globule . Such a discussion is not entirely satisfactory, however, since the reader may always feel that the examples chosen are not representative. Also, one some- times hears the remark that there is nothing inherently wrong with natural language specifications, and that all one has to do is to be careful when writing them, or hire people with good writing skills. Although well-written require- ments are obviously preferable to poorly written ones, we think that they do not solve the problem; in our view, natural language descriptions of any significant system, even if they are of good quality, are doomed to exhibit deficiencies which will make them unacceptable for rigorous software development.

To emphasize this view, we have chosen to concentrate on a single example, which, although openly academic in nature, is particularly interesting because it was expli- citly and carefully designed as a "good" natural language specification.

This example is the specification of a well-known text processing problem. The problem first appeared in a 1969 paper by Peter Naur where it was described as reproduced here in table 2 (see box for the precise references on that problem).

box center tab(!); C. T{ Given a text consisting of words separated by BLANKS or by NL (new line) characters, convert it to a line-by-line form in accordance with the following rules: T} T{ (1) line breaks must be made only where the given text has BLANK or NL; T} T{ (2) each line is filled as far as possible as long as T} T{ (3) no line will contain more than MAXPOS characters T}

Table 2: Naur's Specification

Naur's paper was on a method for program construction and program proving; the problem statement of table 2 was thus accompanied by a program and by a proof that the pro- gram indeed satisfied the requirements.

The problem appeared again in a paper by Goodenough and Gerhart, which had two versions Both versions included a criticism of Naur's original specification.

Goodenough and Gerhart's work was on program testing. To understand why a paper on program testing would have included a criticism of Naur's text, it is necessary to sum- marize the methodological dispute surrounding the very con- cept of testing. Some researchers dismiss testing as a method of software validation on the ground that only a fraction of significant cases may ever be covered by a test; in the words of E.W. Dijkstra Dijkstra Humble Programmer , "testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence". Thus in the view of such critics, testing is futile; the only acceptable way to validate a program is to prove its correctness mathematically.

Since Goodenough and Gerhart (whom we shall refer to as G & G for the rest of this paper) were discussing test data selection methods, they felt it necessary in their article to refute this a priori objection to any research on test- ing. They chose to deal with it by showing significant errors remaining in some programs whose "proofs" had been published. Among those examples was Naur's program, in which G & G were able to find seven errors, some minor, some serious.

Our purpose here is not to enter the "testing vs. proving" controversy. The Naur-Goodenough-Gerhart problem is interesting, however, because it exhibits in a particularly clear fashion some of the difficulties associated with natural language specifications. G & G mention that the trouble with Naur's paper was partly due to inadequate specification; since their paper proposes a replacement for Naur's program, they give a corrected specification. Of par- ticular interest is the fact that this specification was prepared with particular care, and was changed as the paper was rewritten. Apparently somebody criticized the initial version, since the last version contains the following foot- note:

Making these specifications precise is difficult and is an excellent example of the specification task. The specifications here should be compared with those in our original paper.

So when we examine G & G's final specification, it is fair to consider that what we have is not an imperfect docu- ment written under the schedule constraints usually imposed on software projects in industry, but the second version of a carefully thought out text, describing what is really a toy problem, unplagued by any of the numerous special con- siderations which often obscure "real-life" problems. If a natural language specification of a programming problem has ever been written with care, then this is it. However, as we shall see, it is not without its own shadows.

G & G's final specification is given as table 3, which the reader is invited to read carefully at this point. In the sequel, numbers in parentheses, e.g. (21), refer to the lines of this text, as numbered on table 3.

4. AN ANALYSIS OF THE SPECIFICATION

The first remark that comes to mind when seeing G & G's specification after Naur's one is its length: about four times that of the original by a simple character count. Clearly, the authors have gone to great pains to make sure that nothing is left out and that no ambiguity remains. As we shall see, this overzealous effort in fact induces prob- lems. In any case, such a length does not seem appropriate for the specification of a problem which, after all, looks fairly simple to the unprejudiced observer.

Before embarking on a more detailed analysis of this text, we should emphasize that the aim of the game is not to criticize this particular paper; the official subject matter of G & G's work was testing, not specification, and the prescription period has expired anyway. We take the paper as an example because it provides a particularly compact basis for the study of common mistakes.

4.1. Noise

"Noise" elements are underlined by a solid line on table 3. Noise is not necessarily a bad thing in itself; in fact, it may play the same ro^le as comments in programs. It is often the case, however, that noise elements will actu- ally obscure the text: what happens is that, when first encountering such an element, the reader thinks it brings new information, but then upon closer examination he real- izes that the element only repeats known information in new terms. The reader must thus ask himself non-essential ques- tions, which take his attention away from the truly diffi- cult aspects of the problem.

Here, a fraction of a second is needed to realize that a "nonempty sequence" of characters (8) is the same thing as "one or more" characters (9). These two expressions are used within a line of each other: the authors' aim was presumably to avoid a repetition. One is indeed taught in elementary writing courses that repetitions should be avoided, and no doubt this is a good rule as far as literary writing is concerned. In a technical document, however, the rule to observe is exactly the opposite, namely that the same concept should always be denoted by the same words, lest the reader be confused.

An interesting variant of noise is rrrreeeemmmmoooorrrrsssseeee, i.e. a res- triction to the description of a certain element of the specification, which is made not where the element is defined, but at a place where it is used, as if the specif- ier suddenly regretted his initial definition. Such an exam- ple is given here by the expression "the output text, if any" (21): up to this point, the specification freely used the notion of "output text" (13, 18); nowhere was there any hint that such a text might not exist; if the reader won- dered about this problem, the specification did not provide any answer. Now suddenly, as the discussion is on something else, the reader is "reminded" that there might be no such thing as an output text. Note that no precise criterion is given as to when there is and where there isn't.

Another remorse is the late definition of the "line" concept (25), to which we will come back. We shall meet again the tendency to say too much, which generates noise, as a source of contradictions and ambiguity.

9 box center tab(!); C. T{

The program's input is a stream of characters whose end is signaled with a special end-of-text character, ET. There is exactly one ET character in each input stream. Characters are classified as:

- Break characters - BL (blank) and NL (new line);

- Nonbreak characters - all others except ET;

- the end-of-text indicator - ET.

A word is a nonempty sequence of nonbreak characters. A break is a sequence of one or more break characters. Thus, the input can be viewed as a sequence of words separated by breaks, with possibly leading and trailing breaks, and ending with ET.

The program's output should be the same sequence of words as in the input, with the exception that an oversize word (i.e. a word containing more than MAXPOS characters, where MAXPOS is a positive integer) should cause an error exit from the program (i.e. a variable, Alarm, should have the value TRUE). Up to the point of an error, the program's output should have the following properties:

1. A new line should start only between words and at the beginning of the output text, if any.

2. A break in the input is reduced to a single break character in the output.

3. As many words as possible should be placed on each line (i.e., between successive NL characters).

4. No line may contain more than MAXPOS characters (words and BLs). T}

- 10 -

Table 3: The Specification by Goodenough and Gerhart

4.2. Silence

In spite of all his efforts, the specifier often leaves, along with over-documented elements, undefined features. Quite commonly, these features may be fairly obvi- ous for a community of specialists of the application domain, close to the initial customers, but they will be much more obscure outside this initial circle. An example here is the concept of "line" which is not really defined except in a parenthesized remorse towards the end of the text (25), according to which a line is a sequence of char- acters "between successive NL characters" (by the way, are those characters part of the line ?).

An interesting point here is the cultural background which is necessary to understand this concept. In ASCII oriented environments, "New Line" is a character; people working in such environments (e.g. on DEC machines) will probably understand easily the basic hypothesis of the specification, namely that NL is treated upon input as an ordinary character, but upon output triggers a carriage return. These concepts are foreign, however, to somebody working in an EBCDIC environment, especially on IBM OS sys- tems, on which files are made up of a sequence of "records" (corresponding e.g. to lines), each made up of a sequence of characters. A person coming from such an environment would not have written the above specification, and will probably have trouble understanding it.

Besides this, the late definition of "line" is also interesting in that it is plainly wwwwrrrroooonnnngggg: it only applies to lines which are neither at the very beginning nor at the very end of the text, since in both these cases a line is not "between successive NL characters" but between the beginning of the file and an NL, or between an NL and the end of the file (i.e. between an NL and an ET). So if we accept the authors' definition, the first and last lines of the output may be of arbitrary length; in fact, an output containing no NL at all is acceptable regardless of its length, since it does not have "lines" according to the definition given! This is obviously absurd and is not what the authors had in mind; but the use of natural language leads naturally to such slips of the pen.

Another interesting silence concerns the variable Alarm: it is specified (17-18) that this variable should be set to ttttrrrruuuueeee in case of an error, but nothing is said about what happens to it in other cases. The answer is obvious, of course; but the matter may only be brushed aside as a minor one by programmers who have never run into a bug due to an uninitialized variable...

Regarding silence, it must be pointed out that G & G's specification corrects a notable silence of Naur's original description: Naur's text does not explain what should be done with consecutive groups of more than one break charac- ter. This is one of the seven errors analyzed in G & G's paper; their specification corrects it by requiring that such groups be reduced to a single break character in the output. Although it is clear that something had to be done about the problem, we may note that this solution is to some extent obtained at the expense of simplicity. Eliminating redundant break characters and dividing a text into lines are two unrelated problems; merging these problems into a single specification makes the whole affair more compli- cated.

It is probably better to deal with these two require- ments separately. This is what we shall do in the formal specification given below; this view is also emphasized by some important trends in programming methodology, notably under the influence of the Unix programming environment, which, at least in principle, favors tools that are simple and composable rather than large and multi-purpose.

4.3. Contradictions

There is another problem with the concept of "line". Given a type t, one should distinguish between the types sssseeeeqqqq [t], whose elements are finite sequences of objects of type t, and sssseeeeqqqq [sssseeeeqqqq [t]], whose elements are sequences of sequences of objects of type t. Such a confusion is made here when we are first told (1) that the input is a "stream" (i.e. sequence) of characters, and later (10) that it "may be viewed" as a sequence of words and breaks. As any Lisp programmer knows, the sequences

<a b a c c a> [sequence of objects]

and

<<a> <b a> <c c a>> [sequence of sequences of objects]

are not the same. Note that the same problem also arises with respect to the output, redeemed only by ambiguity; in fact, the type of the output is not clear:

- is it seq [CHAR], as (22) would seem to imply?

- or is it seq [WORD], i.e. seq [seq [CHAR]], see (13- 14)?

- or is it even sssseeeeqqqq [LINE], i.e. sssseeeeqqqq [sssseeeeqqqq [sssseeeeqqqq [CHAR]]] if we consider a line as a sequence of words and breaks?

Thus a sentence which first appears to be only noise -12) yields in fact, a few lines down (13-14), a contrad- iction: "The program's output should be the same sequence of words as in the input". This last comment is particularly remarkable since neither the input nor the output is a sequence of words. Worse yet, even if we parse the input into a sequence of words, this sequence is not sufficient to determine the output: one also needs two binary informations (whether there is a leading and/or a trailing break).

The same sentence (8-11), in its overzealous effort to leave no stone unturned, ends up introducing another con- tradiction. An unbiased reader would be rather puzzled when trying to understand how the input may "end with [the char- acter] ET" (11-12) and at the same time possibly have a "trailing break" (11): "trailing" precisely means "at the end"! What's the last character if there is a "trailing" break: ET or a break character?

A more experienced reader, e.g. a programmer, will have no difficulty resolving this contradiction by relying on his experience, which will tell him that end markers come even after "trailing" characters. This kind of reliance on intuition and knowledge of the application domain may be particularly damaging, however, when transposed to large requirements documents which will be handed down to a group of system designers and implementors comprising people of diverse backgrounds and abilities.

4.4. Overspecification

Overspecification in requirements is often annoyingly associated with silence: the reader is being told too much about what the solution should be, while desperately trying to grasp what the problem is and to figure out by himself features not covered by the text. Overspecification is typ- ically, although certainly not exclusively, found in requirements documents written by programmers. Psychologi- cally, it is quite understandable: implementation-level con- cepts are good, concrete, technical stuff, whereas true requirements deal with much less palpable material; to a computer specialist, a stack is easier to visualize than, say, the flow of information in a company or the needs of a radar operator. Thus many specifiers have a natural tendency to cling to programming concepts. There is a price to pay for this, however, in terms of implementation decisions taken too early, which may later turn out to be wrong, and of important problem features being overlooked.

The text studied here contains an example of over- specification right from the first sentence: the notion of the end-of-text character ET. The only reason for the pres- ence of this notion is that G & G wanted to correct Naur's original program, written in Algol 60; the version of Algol 60 input-output used by Naur (and, for fairness, by G & G) does not provide for the detection of end of file when read- ing, so that one has to assume the presence of a special character at the end of the file to make up for this defi- ciency. But ET is an implementation detail and should not be included in an abstract specification. Conceptually, the input is a finite sequence of characters; it should be transformed into an output which is a sequence of lines, or (depending on the interpretation which one chooses) a sequence of characters. It is a programmer's vice to insist that finite sequences should be specially marked at the end.

Why does the ET character receive such emphasis in G & G's specification? The reason is one of the errors in Naur's original program: that program would go into an infinite loop unless the input was incorrect (i.e. contained an oversize word). Upon closer examination, however, there is a case to be made for Naur's solution (without the other errors, of course): it is not so unrealistic to consider the required program as a potentially infinite process, which takes characters as input and produces lines as output, working somewhat like a device handler (for instance one which drives a printer) in an operating system. If such is the interpretation, it should of course be clearly described in the specification, which was not the case with Naur's text. Such a decision would be less arbitrary than the one taken here: the inclusion of ET is a change of the data structure made at the specification level in order to accom- modate the programming language which will be used at the implementation stage.

That such a change is not acceptable here is further evidenced by the fact that G & G's output does not satisfy the requirement on the input. Is it realistic to expect of an eeeexxxxiiiissssttttiiiinnnngggg file that it be terminated by an explicit marker? Well, if it were, then the output produced by G & G's program should satisfy that condition; however, examina- tion of their specification (which is not completely clear on this matter) and, as a final criterion, of their proposed program, shows that ET will not be passed on to the output file. So assume that we want to write another program, e.g. for right-justifying the text, which will take as input G & G's output (in "pipe" mode a` la Unix): when designing that program, we shall nnnnooootttt be able to make the same assumption on its input. We thus have an example of overspecification opening the way to serious inconsistencies.

Another overspecification in the text is the concept of "error exit" (14-15) which causes a "variable", Alarm, to have the value ttttrrrruuuueeee. Clearly, the notion of variable belongs to the program level, not to a specification. It may be noted that this piece of overspecification would have been less shocking if the problem had been defined as the task of writing a procedure with Alarm as one of its param- eters, or as one of the "exceptions" (in the sense of the languages CLU or Ada) which it may raise. A variable is internal to the program unit to which it belongs, whereas the specification of a parameter or of an exception may be given relative to the environment of that unit.

The problem of the Alarm variable is less innocuous that it seems. One of the reasons why it is shocking to meet the reference to this variable in a sequential reading of the text is that the definition of the error case (the one in which there is an oversize word) looks like overspecifi- cation as long as one has not seen the last sentence of the text (26-27), ten lines down, which gives the basic con- straint on line size (MAXPOS). The world is really standing upside down here: clearly, the constraint on word sizes is a consequence of the constraint on line size, and the defini- tion of the error case cannot be understood until the latter constraint has been introduced.

What we see here is one of the major deficiencies which plague requirements documents of more significant size: the early inclusion of detailed descriptions of error handling, interwoven with the description of normal cases, which is usually much simpler (here the matter is even worse since error processing is described before the reader has had a chance to understand what the problem is about, i.e. what the normal processing should be). Failure to clearly separate the normal cases from the erroneous ones makes the document much harder to understand.

It should be noted that, mathematically, a program which performs input-to-output transformations often corresponds to the implementation of a ppppaaaarrrrttttiiiiaaaallll ffffuuuunnnnccccttttiiiioooonnnn, which is not defined for some arguments of the input domain. Error processing then consists in "completing" the function with alternate results (e.g. error messages) for those argu- ments; this completion should not be confused with the definition of the function in its normal cases. Here, as we shall see when we come to a formal specification, the fact that words of size greater than MAXPOS cannot be normally accommodated is a consequence of the requirements for the normal processing, which may be proved, as a theorem, from the definition of the function.

4.5. Ambiguities

Error processing also gives rise to an ambiguity in the text. The requirement that the output text satisfy proper- ties 1 to 4 "up to the point of an error" is susceptible of at least two interpretations, evidenced by the following example, assuming MAXPOS = 10:

box center tab (!); C || C C C C C C C C C C C C. !1!2!3!4!5!6!7!8!9!10 = 1!U!N!I!X! !I!S! !A! 2!T!R!A!D!E!M!A!R!K! 3!O!F! !B!E!L!L! 4!L!A!B!O!R!A!T!O!R!I!E!S = !1!2!3!4!5!6!7!8!9!10

The text says that up to (and presumably including) the point of the error, the program's output should correspond to the input; but where is the "point of the error" here? Is it [line 4, column 10] (last acceptable letter) or [3, 7] (end of the last acceptable word)? Nothing in the text allows the reader to decide between these two interpreta- tions.

Another interesting ambiguity is connected with the basic constraint on acceptable solutions (24-25): "As many words as possible should be placed on each line". If we have, say, MAXPOS = 10 and the input text

WHO WHAT WHEN

then there are two equally correct two-line solutions (WHAT may be on either the first or second line). This ambiguity may be considered acceptable since no solution appears supe- rior to the other; the specification as such is thus non- deterministic. We suspect (perhaps wrongly) that this non- determinism was not intentional and that there was an implicit overspecification in the authors' mind: they con- sidered it obvious that the input would be processed sequen- tially, so that any ambiguity as in the example above would be solved by placing as many words as possible on the ear- lier line (here giving line WHO WHAT followed by line WHEN). In this interpretation, property 3 (24-25) actually means "as many words as possible should be placed on each line as it is encountered in the sequential construction of the out- put". If such is the case, however, the specification should state it precisely.

Another potential source of ambiguity is the use of imprecise or poorly defined terms. Here we have examples such as the use of "stream" (1) rather than the more stan- dard "sequence". The expression "error exit" (16), stemming from the overspecification seen above, is ambiguous, and the reader is not comforted by the "explanation" which follows it ("i.e. a variable, Alarm, should have the value TRUE"), since the notion of assigning a value to a variable does not by itself imply the idea of an "exit", which also means that the program stops in some fashion. We have seen that the concept of "line" is not well defined (25). We may also note that the expression "new line" is to be parsed as a single entity (the new line character) in the first place where it appears (5) and as to separate words ("a new line should start...") in the second (20).

4.6. Forward References

In a requirements documents, not all forward references are bad. Some, corresponding to a top-down presentation of the concepts ("the notion of ... will be studied in detail in section ..."), may even be considered good practice, pro- vided there are not too many. Others, however, are much more of a problem: those which are implicit (a notion is used before it has been properly introduced, but the reader has not been made aware of it). These are the kind which may make a document extremely hard to read - especially in the absence of the technical apparatus (index, glossary, etc.) which should always be part of requirements specifications and other software documents.

Here, of course, the text is very short, and the annoy- ance brought in by forward references cannot be anywhere close to what it may be in full-size documents. We may note, however, that ET is used three times (2, 3, 5) before it is defined (6); that the notion of line, defined not quite satisfactorily (25), had been used earlier (20-21); and that MAXPOS is used just before its definition (15-16).

4.7. So What?

By dissecting G & G's specification, we brought to light a significant number of problems in a text which may seem innocuous to a superficial observer. Not all of these problems, of course, are equally serious, and the reader may have felt that we were a bit pedantic at times. What we would like to emphasize, however, is that one has to be pedantic when dealing with such matters. Inconsistencies, ambiguities and the like may not warrant the gallows when the problem is to split up a sequence of characters into lines; but what the reader should have in mind is how the above defects transpose to more serious matters - a nuclear reactor control system, a missile guidance system, or even just a payroll program. The problem is that the computer which will execute the code resulting from a faulty specifi- cation is more pedantic than any human referee can ever be.

Thus in the above analysis we should not only consider G & G's specification as an object of study in itself, but also and more importantly as a microcosm in which one can conveniently observe deficiencies that are typical of more meaningful requirements documents. It is particularly noteworthy that this text was written with great care. What we have witnessed is how the authors, starting out to improve upon a terse but simple text (Naur's), sentence after sentence get a little more entangled in their own rosary of caveats. We think that this says a lot about why so much shelf space in programmers' offices and computer rooms is occupied by interminable manuals.

In our opinion, the situation can be significantly improved by a reasoned use of more formal specifications. We should again emphasize that such specifications are a com- plement to natural language documents, not a replacement. We shall in fact show that a detour through formal specifica- tion may eventually lead to a better English description. In our opinion, this and other benefits of formal approaches more than compensate for the effort needed to write and understand mathematical notations.

We will now introduce such notations, which will allow us to give a formal specification of the Naur-Goodenough- Gerhart problem.

5. ELEMENTS FOR A FORMAL SPECIFICATION

Many formal specification languages have been designed in recent years (see box). Choosing one of these languages would force the reader to learn the particular notation used and obscure the essential fact, namely that the underlying concepts are for the most part well-known mathematical notions like sets, functions, relations, sequences and the like. We thus prefer to use here a more or less standard mathematical notation. The style of exposition will be simi- lar to that found in mathematical texts; translation to a specific formal specification language should not be hard provided that language supports the relevant concepts.

5.1. Overview

Perhaps the only difficult aspect of the Naur- Goodenough-Gerhart problem is that the processing to be per- formed on the text involves three aspects: reducing breaks to a single break character; making sure no line has more than MAXPOS characters; and filling lines as much as possi- ble. If these three requirements are separated, things become much simpler. Consequently, we shall define the prob- lem formally by considering two simple binary relations, called shortbreaks and limitedlength, and a function called FEWESTLINES (the reader may wish to refer to figure 3 for a picture of the overall structure of the relations and functions involved in this specification).

Relation shortbreaks holds between two sequences of characters a and b if and only if b is identical to a except that breaks in a (i.e. successive break characters) have been reduced to a single break character in b.

Relation limitedlength holds between two sequences of characters b and c if and only if c is identical to b except that some blanks may have been replaced with new lines, or conversely, and that no line in c has length greater than MAXPOS.

By applying these two relations successively, we asso- ciate with any sequence of characters a all sequences of characters which are "made of the same words", separated only by single breaks, and fitting on lines no longer than MAXPOS. Given such a set of sequences, say SSC, then FEWESTLINES (SSC) is the subset of SSC containing those sequences which consist of a minimum number of lines and thus are acceptable outputs for the program.

We now define these notions formally. A few simple con- ventions are first needed.

5.2. Basic Form of the Specification

As a general convention, we use upper-case for sets and for functions whose results are sets, and lower-case for other functions, elements of sets (except for MAXPOS which we write in upper case as in the original specification), sequences and relations.

The program to be written is the implementation of a function

sol : INPUT $->$ OUTPUT

where INPUT and OUTPUT are the sets of possible inputs and outputs, which we will describe below as sets of sequences. Function sol must satisfy certain constraints, which it is the ro^le of the specification to express.

As noted above, there may be more than one correct out- put for a given input; in other words, a truly general specification of the problem at hand should be non- deterministic. We shall represent this fact by defining a binary rrrreeeellllaaaattttiiiioooonnnn between sets INPUT and OUTPUT We call goal this binary relation; then a function sol will be a correct solution if and only if the following two conditions are satisfied (readers who are not so sure about functions and relations are referred to the refresher in the adjacent box):

- function sol is defined wherever relation goal is defined (i.e. sol (i) exists for any i in the domain of goal);

- for any i for which goal is defined, then sol (i) yields a "solution" to goal, that is goal (i, sol (i)) holds.

This definition is expressed in mathematical notation by writing that sol is an acceptable function if and only

$forall ~ i ~ member~ bold dom ~ (goal) , ~ i member bold dom ~ (sol) ~ bold and ~ goal~(i,~sol~(i))$

where ddddoooommmm (sol) is the domain of function sol. Note that there may be some inputs for which there is no acceptable solution (those which are not in the domain of goal), so sol may be a partial function. Also, in more concise notation, the above property may simply be expressed by writing that the domain of sol is at least as large as the domain of goal, and that sol is iiiinnnncccclllluuuuddddeeeedddd in goal (both being defined as sets of pairs):

$bold dom ~ (goal) subset bold dom ~ (sol) ~ bold and ~ sol ~ subset ~ goal$

This way of presenting a specification is of very gen- eral applicability for programs performing input-to-output transformations: such a program may be viewed as the imple- mentation of a certain function (sol) which must ensure that a certain relation (goal) is satisfied between its argument and its result; in mathematical terms, the function is included in (is a subset of) the relation. To specify the pppprrrroooobbbblllleeeemmmm is to define the relation; to construct the pppprrrrooooggggrrrraaaammmm is to find an implementable function sol satisfying the above conditions Meyer basis Tokyo constructive IFIP .

5.3. Characters and Sequences

The principal set of interest in our problem is the set of characters, which we denote by CHAR. The only property of CHAR that matters here that CHAR contains two elements of particular interest, blank and newline. We call BREAKCHAR the subset of CHAR consisting of these two elements:

BREAKCHAR $==$ {blank, newline}

The basic concept in this problem is that of sequence. If X is a set, we denote by sssseeeeqqqq [X] the set whose elements are finite sequences of elements of X. Such a sequence will be written e.g.

<a, b, a, c, c, d>

and has a length which is a non-negative integer; thus length is a function from sssseeeeqqqq [X] to the set of natural numbers. Elements are numbered starting at 1; the i-th ele- ment of a sequence s (for $1 <= i <= length (s)$) is written s(i). A ssssuuuubbbbsssseeeeqqqquuuueeeennnncccceeee of s is a sequence made of zero or more of the elements of s, in the same order as in s; for exam- ple, if s is the above sequence, then some of its subse- quences are

<a, b, c, d> <b, c, c>

etc. On the other hand, <b, d, c> is not a subsequence of s even though all its elements appear in s, since their origi- nal order is not preserved.

The set of subsequences of s will be written SUBSE- QUENCES (s).

We rely on the reader's understanding of sequences here, since sequences may be considered a well-known con- cept; a formal definition of sequences and of the above notions is given in the box on the adjacent page.

5.4. Minima and Maxima

If X is a set, and f is a function from X to the set of natural numbers, we shall denote by

MINSET (X, f)

the subset of X consisting of the elements for which the value of f is minimum. For example, if X is the following set, containing four sequences:

X $==$ {<a, c, b, a>, <a, b>, <b, a, b>, <c, c>}

and f is the length function on sequences, then MINSET (X, f) will be the set consisting of the shortest of these sequences, namely the second and last.

In the same fashion, we denote by

MAXSET (X, f)

the subset of X consisting of the elements for which the value of f is maximum; thus in the above case MAXSET (X, f) is the set {<a, c, b, a>}, containing just one sequence.

MAXSET, however, is not always defined; we have to be careful to apply it only to sets X which are finite, other- wise there might be no maximum value for f. Note that the result of MINSET and MAXSET is a subset of X rather than a single element, since there may be more than one element with minimum or maximum f value. This subset is non-empty if and only if X is non-empty.

We shall also need a way to denote the minimum and max- imum elements of a set of natural numbers SN. They will be written, in the usual fashion, min (SN) and max (SN). Thus if SN is the set SN $==$ {341, 7, 3, 654} then min (SN) is 3 and max (SN) is 654. Note that min and max, contrary to MINSET and MAXSET, yield a natural number, not a set. Also in contrast to MINSET and MAXSET which are defined for empty sets (they yield an empty result), both min and max are only defined if the set SN is not empty; max further requires that SN be finite. It is essential to check for these conditions whenever using these functions.

5.5. Input and Output Sets

In the problem at hand, the input is a sequence of characters; we choose to describe the output as a sequence of characters as well. Thus we define the two sets:

INPUT $==$ sssseeeeqqqq [CHAR] OUTPUT $==$ sssseeeeqqqq [CHAR]

Note that, as mentioned above, another interpretation could have been to define the set of possible outputs as sssseeeeqqqq [LINE], with LINE itself being defined as sssseeeeqqqq [CHAR] (or possibly sssseeeeqqqq [WORD] with WORD $==$ sssseeeeqqqq [CHAR], plus informa- tion on leading and trailing breaks).

We shall now define the relations shortbreaks and limitedlength and the function FEWESTLINES in turn.

6. THE FORMAL SPECIFICATION

6.1. Short Breaks

Let a be a sequence of characters. We define SINGLEBREAKS (a) as the set of subsequences of a such that no two consecutive characters are break characters:

SINGLEBREAKS (a) $==$ {s $member$ SUBSEQUENCES (a) | $forall$ i $member$ 2..length(s), s(i-1) $member$ BREAKCHAR $=>$ s(i) $nomem$ BREAKCHAR}

Note that we use the Pascal notation, a..b, to denote the (possibly empty) set of integers i such that $a <= i <= b$.

Next we define COMPACTED (a) as the subset of SINGLEBREAKS (a) containing those sequences of maximum length:

COMPACTED (a) $==$ MAXSET (SINGLEBREAKS (a), length)

As stated above, MAXSET (X, f) may be undefined if X is an infinite set. This cannot occur here, however, since SINGLEBREAKS (a) is a subset of SUBSEQUENCES (a) which, for any sequence of characters a, is finite.

Note that any sequence b in COMPACTED (a) must have retained from a all non-break characters (if such a charac- ter had been omitted, it could be inserted into b and yield a longer element of SINGLEBREAKS (a)), and has a single break character where a had one or more consecutive break characters.

Thus the relation shortbreaks (a, b), which holds between a and b if and only if a and b are made of the same sequence of words and breaks but the breaks in b consist of a single break character, can be expressed simply by:

$shortbreaks~(a,~b)~==~~b~member~COMPACTED (a)$

6.2. Limited Length

The relation limitedlength (b, c) holds between sequences b and c if and only if:

- c is the same sequence as b except that it may have a newline wherever b has a blank or conversely;

- and the maximum line length of c, defined as the max- imum number of consecutive characters none of which is a newline, is less than or equal to MAXPOS.

This is expressed more precisely as follows:

$limitedlength~(b,~c)~==~c~member~TRIMMED~(b)$

where

TRIMMED (b) $==$ {s $member$ EQUIVALENT (b) | maxlinelength (s) $<=$ MAXPOS}

EQUIVALENT (b) $==$ {s $member$ sssseeeeqqqq [CHAR] | length (s) = length (b) aaaannnndddd ($forall$ i $member$ 1..length (b), s(i) $!=$ b(i) $=>$ s(i) $member$ BREAKCHAR aaaannnndddd b(i) $member$ BREAKCHAR)}

maxlinelength (s) $==$ max ({j-i | 0 $<=$ i $<=$j $<=$ length (s) aaaannnndddd ($forall$ k $member$ i+1..j, s(k) $!=$ newline)})

A few explanations may help understand these definitions. If s is a sequence of characters, maxlinelength (s) is the maximum length of a line in s, expressed as the maximum number of consecutive characters, none of which is a new line, in other words the maximum value of j-i such that s (k) is not a new line for any k in the interval i+1..j (we will have more to say about this definition below). EQUIVALENT (b) is the set of sequences that are "equivalent" to sequence b in the sense of being identical to b except that newline characters may be substituted for blank char- acters and conversely. Finally, TRIMMED (b) is the set of sequences which are "equivalent" to b and have a maximum line length less than or equal to MAXPOS.

6.3. Fewest Lines

Let SSC be a set of sequences of characters. These sequences may be interpreted as consisting of lines, separated by newline characters. We define the set FEWESTLINES (SSC) as the subset of SSC consisting of those sequences which have as few lines as possible:

FEWESTLINES (SSC) $==$ MINSET (SSC, numberofnewlines)

where the function numberofnewlines is defined by:

numberofnewlines (s) $==$ ccccaaaarrrrdddd ({i $member$ 1..length (s) | s (i) = newline})

and ccccaaaarrrrdddd (X), defined for any finite set X, is the number of elements (cardinal) of X.

6.4. The basic relation

The above definitions allow us to define the basic relation of the problem, relation goal, precisely. Relation goal (i, o) holds between input i and output o, both of which are sequences of characters, if and only if

o $member$ FEWESTLINES (TRANSF (i))

where TRANSF (i) is the set of sequences related to i by the composition of the two relations shortbreaks and limitedlength:

TRANSF (i) $==$ {s $member$ sssseeeeqqqq [CHAR] | tr (i, s)}

with

tr $==$ limitedlength . shortbreaks

where the dot operator denotes the composition of relations (see box). A look at figure 3 may help understand the role of the various functions and relations in the above specifi- cation.

center expand tab(!); C C C C C C C.

i!shortbreaks (r)!!limitedlength (r)! !FEWESTLINES (input)!TRIMMED (F)!!COMPACTED (F) !!!!!!o !!!!!!(acceptable !!!!!!outputs)

!!tr(r) !!TRANSF(F)

!!!!goal

(r) : relation (F) : FUNCTION

Figure 3: Overall Structure of the Specification

6.5. Existence of Solutions

Once we have such a formal specification, what can we do with it? There are several good uses for it; the most obvious is to rely on the specification as a basis for the next stages of the software lifecycle, program design and implementation (e.g. translating $forall$s into loops, etc.). Here we would like to emphasize two other important aspects. One (studied in the next section) is to use it as a starting point for better nnnnaaaattttuuuurrrraaaallll llllaaaannnngggguuuuaaaaggggeeee requirements. The other, to which we now turn, is to query the specifica- tion in order to learn as much as possible about properties of the problem and of any valid solution.

What can the given specification teach us about the Naur-Goodenough-Gerhart problem and its solutions? First, let us determine when solutions do exist. It is trivial to prove that, given a sequence of characters a, there is always at least one sequence b such that relation shortbreaks (a, b) holds. Given b, however, the necessary and sufficient condition for the existence of at least one sequence c such that limitedlength (b, c) holds is that b contains no word (i.e. contiguous subsequence of non-break characters) of length greater than MAXPOS. This follows from the definitions of TRIMMED and maxlinelength used in the definition of limitedlength. Thus the domain of definition of the relation tr, which is also the domain of the function TRANSF and thus of the relation goal, is the set of input texts containing no word longer than MAXPOS. This can be formulated as a theorem:

ddddoooommmm (goal) = {s $member$ sssseeeeqqqq [CHAR] | $forall$ i $member$ 1..length(s)-MAXPOS, $exists$ j $member$ i..i+MAXPOS, s(j) $member$ BREAKCHAR}

The property expressed by this theorem is that the domain of relation goal consists of those sequences which are such that if a character c is followed by MAXPOS other charac- ters, then at least one among c and these other characters must be a break.

An important problem which we do nnnnooootttt address here is how the specification can deal with erroneous cases, i.e. with inputs that are not in the domain of the goal relation, like sequences with oversize words in the example studied here. Clearly, if the specification is to be robust and com- plete, it should include together with goal another rela- tion, say exceptionalgoal, whose domain is INPUT - ddddoooommmm (goal) (set difference); this relation complements goal by defining alternative results (usually some kind of error messages) for erroneous inputs. Formal specification of erroneous cases, however, falls beyond the scope of this paper. A discussion of this problem and a precise defini- tion of terms such as "error", "failure" and "exception" can be found in a paper by Cristian Cristian exceptions failures .

6.6. Discussion

What we have obtained is an abstract specification, i.e. a mathematical description of the problem. Note that it would be difficult to criticize this specification as being oriented towards a particular implementation: if followed to the letter, the above specification would lead to a pro- gram which (as illustrated in figure 3) would first generate all possible distributions of the input over lines of length less than or equal to MAXPOS, and then search the resulting list for solutions with minimum number of newline charac- ters - not a very efficient implementation!

An element which does seem to point towards a particu- lar implementation technique is the composition of relations shortbreaks and limitedlength, which seems to imply a two-step process (first remove extra break characters, then cut into lines). A first design could indeed use a solution in two steps; using coroutine-like concepts, such as the Unix notion of pipe or the "program inversion" idea of Jackson's program design methodology Jackson Principles , the two steps could then be merged into a single one.

It should be noted that we chose to model the objects and operations of the problem by resorting to very simple mathematical notions (sets, relations, functions, sequences). Due to the specific nature of this problem, another approach would have been to rely on a more advanced

- 33 -

theory such as the theory of regular languages. As will be emphasized below, a realistic specification system should make it possible to re-use existing theories Burstall Goguen together theories .

Starting from the above definition, the specification should of course be refined in order to take into account the physical form of the data structure (including e.g. the end-of-file marker) and the particular response which should be given by the program in case of erroneous input.

7. CONCLUSION

Natural language is the ideal notation for most aspects of human communication, from love letters to introductory programming language manuals. There are, however, cases Hill English would where natural language is not the appropriate notation; this paper purports to show that software specifi- cations require a more rigorous formalism.

Clearly, the use of a formal notation does not preclude that of natural language; in fact, after one has written a mathematical specification of a problem, one is usually able to write a bbbbeeeetttttttteeeerrrr natural language description. The reason is that formal notations naturally cause the specifier to raise some questions which might remain unasked, and thus unanswered, in an informal approach; on the other hand, they help bring to light ambiguities and contradictions because they force the specifier to describe features of the problem precisely and rigorously. The problem studied in this paper contains many examples of this. For example, let us try to redefine the function maxlinelength using the definition of "line" taken from G & G's specification (line 25: "between successive NL characters"). Writing this defini- tion mathematically, we shall obtain something like:

maxlinelength (s) $==$ max ({linelength (s, i) | 1 $<=$ i $<=$ length (s)

- 34 -

aaaannnndddd s(i) = newline})

where linelength (s, i), the length of the line beginning after the newline at position i in sequence s, may be defined as a minimum:

linelength (s, i) $==$ min ({k | 0 $<=$ k $<$ length (s)-i aaaannnndddd s(i+k+1) = newline})

However, as mentioned above, the maximum or minimum of a set of natural numbers is defined if and only if this set is non-empty and, in the "maximum" case, finite; so the use of the mathematical notation prompts us to check for these conditions. Finiteness presents no problem but we see immediately that the set whose maximum is sought in the definition of maxlinelength will be empty if the sequence s does not contain any newline character; even if it con- tains one, linelength (s, i), itself a minimum, will not be defined if there is no other newline further in the sequence. This prompts us to look for a better definition.

A fairly natural remark at this point is that we do not really need to define the concept of "line", but only that of mmmmaaaaxxxxiiiimmmmuuuummmm lllliiiinnnneeee lllleeeennnnggggtttthhhh. Once we have noticed this, it is easy to come up with a correct definition: the maximum number of consecutive characters, none of which is a new line. This is the definition which was given above:

maxlinelength (s) $==$ max ({j-i | 0 $<=$ i $<=$j $<=$ length (s) aaaannnndddd ($forall$ k $member$ i+1..j, s(k) $!=$ newline)})

Note that we have been careful to apply max to a set which always contains at least one value (zero, obtained for i = j = 0), even if s is an empty sequence (see box).

Once such a mathematical definition has been produced, it may in return influence the natural language definition. In this example, the formal definition suggests that we should refrain from trying to define the concept of "a line in the text" which, although intuitively clear, is slightly tricky when one attempts to specify it precisely, as G & G's text shows; instead, we should focus on the notion of "max- imum line length", which is always defined, even for a text consisting of newline characters only. Once we have obtained the specification of maxlinelength as above, then it is natural to build on it and include into the English problem definition a sentence such as

the maximum number of consecutive characters, none of which is a newline, should not exceed MAXPOS which is a direct translation from the formal definition. This sentence is not, admittedly, of the most gracious style; but then it is easy to remove the double negation, yielding

any consecutive MAXPOS+1 characters should include a newline.

The main advantage of natural language texts is their understandability. Rather than trying to use natural language for what it is hopelessly inadequate - precision and rigor -, one should concentrate on this main asset. Understandability is seriously hindered, however, when natural language requirements are made ridiculously long in a vain attempt to chase away silence, ambiguity, contradic- tion etc. What we have evidenced on the text studied here is that such attempts only make the matter worse. For many of the requirements documents found in actual industrial prac- tice, often extending over hundreds or even thousands of pages, the length is due to such misuse of natural language. NNNNaaaattttuuuurrrraaaallll llllaaaannnngggguuuuaaaaggggeeee ddddeeeessssccccrrrriiiippppttttiiiioooonnnnssss sssshhhhoooouuuulllldddd rrrreeeemmmmaaaaiiiinnnn rrrreeeeaaaassssoooonnnnaaaabbbbllllyyyy sssshhhhoooorrrrtttt; the exact description of fine points, special cases, precise details etc. should be left to a formal specifica- tion.

The advantages of brevity cannot be overemphasized. It may even be claimed that Naur's specification, once corrected so that the problems of termination and consecu- tive break characters are properly tackled, is preferable to G & G's because it is shorter and does not make a fuss about what does not deserve one.

It would be fair game for the reader at this point to ask us what natural language specification we have to offer in lieu of both Naur's and G & G's texts. To answer such a request, we would try to capitalize on the lessons gained from writing the mathematical definition and propose some- thing like the text in table 4, which is directly deduced from that definition (see in particular its relation to fig- ure 3).

box center tab(!); C. T{

Given are a non-negative integer MAXPOS and a character set including two "break characters" blank and newline.

The program shall accept as input a finite sequence of characters and produce as output a sequence of characters satisfying the following conditions:

- it only differs from the input by having a single break character wherever the input has one or more break characters;

- any MAXPOS+1 consecutive characters include a newline;

- the number of newline characters is minimal.

If (and only if) an input sequence contains a group of MAXPOS+1 consecutive non-break characters, there exists no such output. In this case, the program shall produce the output associated with the initial part of the sequence up to and including the MAXPOS-th character of the first such group, and report the error. T}

Table 4: Yet Another Statement of the Requirements

No doubt this text is subject to some criticism of its own. In particular, it still needs to be refined: for exam- ple, the implementor must know how to "report the error" before embarking upon detailed design and coding; he must know what the allowable characters are apart from blank and newline, etc. One may also note that this text avoids defining specific concepts (e.g. line length, word) expli- citly; rather, it substitutes the definition for the concept when needed. Although this device can lead to interesting

- 38 -

literary experiments Oulipo Gallimard , it is certainly not recommended for large requirements documents where one must repeatedly refer to the same basic concepts.

It seems to us, however, that the above statement of the requirements embodies the essential elements of the problem and achieves a reasonable tradeoff between the imprecision of Naur's and the verbosity of G & G's specifi- cations (its length is in fact slightly more than double the former's and half the latter's). Its most important feature is that it draws heavily from the lessons gained in writing the formal specification, while retaining (we hope) clarity and simplicity.

An objection which is often voiced against formal specifications relates to the needs of end-users, who request easily understandable documents. Such an objection, we think, is based on an incorrect assessment of what specification is about. There is a need for requirements documents which must be read, checked and discussed by non- computer scientists; but there is also a need for technical documents for use by computer professionals; the difference is the same as exists between user requirements and engineering specifications in other engineering disciplines. Of course, there must be a way to communicate back the con- tents of technical specifications (for example in the case of changes). As we have seen, the existence of a good mathematical specification is a great asset for improving a natural language description.

Other ways of translating formal elements into forms which are more easily understandable may be found. Many peo- ple like graphical descriptions, which play a basic ro^le in such (non-formal) specification methods as SADT Ross Struc- tured Analysis Schoman or SREM Alford IEEE 1977 . A picture may be worth a thousand words at times, but it may also be dangerously misleading. On the other hand, a pictorial explanation of a well-defined concept can certainly do no harm. If the picture is considered more understandable than the function defini- tion

f: A $->$ B

then why not have graphic tools generate the picture from the formula for the benefit of those who want it? There is certainly a great need for software tools of this kind in specification systems.

The last point which we would like to emphasize is that formal specification is not necessarily difficult. The reader who is familiar with specification techniques will certainly have noted that the above example did not rely (at least explicitly) on such notions as abstract data types, finite-state machines, attribute grammars etc. In fact, only very simple notions from elementary set theory and logic were used. These notions are certainly no more diffi- cult than the basic core of college calculus, even if most of today's university students are regrettably less at ease when dealing with such concepts as sets, relations, partial functions, composition, predicate calculus, etc. than with other mathematical objects and operations which are better established in the traditional curriculum.

Of course, the example studied here is a small problem. Experience with e.g. the Z language Abrial Meyer Schuman Construction Abrial Z Syntax Semantics and subsequent work prompted by this experience Abrial Schuman Kahn Springer Morgan Sufrin Unix Sufrin Science Computer Programming Display shows, however, that the same basic concepts can be carried through to the description of much more complex sys- tems. The main limitation of the problem studied here is that it is defined by a simple input-to-output relation, whereas most significant programs may be characterized, in our view, as systems which are able to offer various ser- vices in response to possible user requests. We are currently working on methods, notations and tools for the modular specification of such systems Meyer System Descrip- tion Method babb mili .

An essential requirement on a good specification for- malism is that it should favor rrrreeee----uuuusssseeee of previously written elements of specifications. For example, the notion of sequence and the associated operations (of which the appen- dix contains a partial specification) should be available as predefined specification elements. Languages Z and Affirm, among others, provide for such "libraries" of basic specifi- cations. More work is needed to share and re-use the work of formal specifiers. Along with the availability of simple and efficient software tools, this is one of the conditions which must be met before formal specifications become for software engineers what (say) differential equations are for engineers of other fields.

Acknowledgements

Much of the material of sections 2, 3 and 4 was con- tained in earlier paper, written in French and published in in a newsletter meyer formalisme globule ; I am grate- ful to Axel van Lamsweerde for reminding me of the existence of that paper and suggesting that it might be of interest to a wider audience (and to him and Jean-Pierre Finance for some heated discussions on specification). I thank Flaviu Cristian for important remarks, relating in particular to the specification of exceptional cases, and the referees for making useful suggestions and pointing out an error in a previous version.

This paper also benefited from the involuntary contri- butions made by the authors of all the system requirements and other software documentation I have had to struggle with over a number of years.

I am of course the only one to blame for any error or misrepresentation contained in this paper.




Meyer home  -   Publications  -  Events  -  Chair of Software Engineering  -  CS Department