The ABI Files: More on Errno.h -- by Warren Toomey, UNIX Heritage Society

Saturday, February 21 2004 @ 08:25 PM EST

Contributed by: PJ

SCO's Chris Sontag said, at the Harvard appearance, that despite Linus' claiming authorship of ABI files such as errno.h and stating he didn't refer to UNIX when writing them, he, Sontag, still had issues with those files. So Groklaw member Warren Toomey from the Unix Heritage Society has done some work digging up a bit more on errno.h. I'm sure it will convince the reasonable folks at SCO that they are barking up the wrong tree. Or it won't. But the rest of you may read it and reach your own conclusions. This is only the first in what will be several articles on the ABI files by Dr. Toomey.


The ABI Files: Errno.h
~ by Warren Toomey, the Unix Heritage Society



SCO objects to this question as overly broad and unduly burdensome, and on the basis that it seeks information neither relevant nor calculated to reasonably lead to the discovery of admissible evidence insofar as it requests the identity of source code and other material in Linux contributed to Linux by parties other than IBM or Sequent. Subject to and without waiving these objections, as it pertains to SCO's rights involving IBM's contributions to Linux, SCO has set forth that information in response to Interrogatories Nos. 1 and 9 and the corresponding exhibits. As to others who have violated the terms of their Software and Sublicensing Agreements, that information is contained in Exhibits A through C. Specifically, in Exhibit A, it details the line-for-line copying of UNIX System V code that improperly appears in Linux. Similarly, in Exhibit B, SCO identifies the application binary interfaces ("ABIs") that SCO has rights to that are improperly in Linux. Specifically, in 1992, Unix Systems Laboratories (USL), SCO's predecessor in interest, sued Berkeley Software Design, Inc. (BSD) for, among other things, copyright infringement. One of the bases of that action was BSD's copying and distributing some USL UNIX System V files without proper permission or attribution. The confidential Settlement Agreement that ended the Unix Systems Laboratories, Inc. v. Berkeley Software Design, Inc., litigation required BSD to change the copyright information in certain of these files, including the nine files listed in Exhibit B. To SCO's knowledge, BSD complied with the terms of the Agreement, and gave USL the proper attribution, as also set forth in Exhibit B. At a later time, persons as yet unknown copied these files into Linux, erasing the USL copyright attribution in the process. The files in Linux that improperly use the ABIs are as follows [list omitted]:

SCO asserts that "line-for-line copying of UNIX System V code . . . improperly appears in Linux'' and that "persons as yet unknown copied these files into Linux, erasing the USL copyright attribution in the process''.

This report looks at SCO's assertion of direct copying of System V code into Linux with copyright removal and compares it with the assertion from Linus Torvalds that the code in question came from another source. This report examines only the ABI file errno.h.

The errno.h file in all Unix and Unix-like systems (and in many other non-Unix systems) is a list of possible errors that can be returned to an application program when it asks the operating system to perform a task, known as a "system call'', and that task cannot proceed normally. Some of the reasons for system call failures are lack of permissions, others are temporary lack of resources, while others occur because the application program gave an invalid request to the operating system.

Many systems share a common list of errors, and this list of errors is defined by the POSIX standard and also the Single UNIX standard. As these are both open standards, SCO cannot claim any copyright on the list of error names. However, each error must have a unique number, so that the operating system can communicate the error number back to the application program. For example, the error "operation not permitted'' (known as EPERM in the POSIX standard) might be given the value 2 in a specific Unix or Unix-like system. The actual value for each error is not defined by the POSIX standard, but if systems do use a consistent error numbering scheme, then executable binaries from one system can run on other systems and understand the errors that the other systems report. The choice of numbers and how they are alloted to the errors is arbitrary and without 'expressive content', so the mere facts of what number goes with which error cannot normally be copyrighted.

To have a valid assertion that "line-for-line copying of UNIX System V code . . . improperly appears in Linux'' for errno.h, SCO needs to demonstrate that error names, their numeric values, and any associated program comments were directly copied from System V to Linux.

Errno.h in Linux 0.01 to 0.96c

Linus Torvalds released version 0.01 of the Linux kernel source around the "middle of [19]91'', and this includes the kernel file linux/include/errno.h. SCO asserts that this file was copied from System V source code as noted above. Linus and others, on the other hand, assert that the file "was taken from Minix''. Let's examine Linus' assertion and then SCO's assertion.

Linus believes that he used the error definitions in Minix to construct the errno.h file in Linux 0.01. The Minix operating system, version 1.1, was released by Andy Tanenbaum and Prentice-Hall around 1987 as a book and an accompanying set of floppy disks. Subsequent releases quickly followed: 1.2 around 1988, 1.3 in 1988, 1.4 in January 1989, 1.5.0 in November 1989 and 1.5.10 in May 1990. Minix 1.6 was developed in-house after 1.5, then released to beta-testers in October 1992. It was followed up by Minix 1.7.1 in November 1995. If Linus did use Minix to construct the errno.h file, then it would have been based on the file from Minix 1.5.10.

The early vesions of Minix (1.1 to 1.4) had a very plain errno.h file: no copyright notice, no comment header, no comment for each definition. Minix 1.5 was a significant rewrite; although there is no copyright notice, the 1.5.10 errno.h file contains a comment header and comments for each error definition. More importantly, each definition's value is wrapped with a _SIGN macro to convert from negative numbers in the Minix kernel to the positive numbers used by the applications.

The earliest errno.h from Linux 0.01 has this comment:

 * ok, as I hadn't got any other source of information about  
 * possible error numbers, I was forced to use the same numbers  
 * as minix.  
 * Hopefully these are posix or something. I wouldn't know (and posix  
 * isn't telling me - they want $$$ for their f***ing standard). 
 * We don't use the _SIGN cludge of minix, so kernel returns must 
 * see to the sign by themselves.  
 * NOTE! Remember to change strerror() if you change this file!
Taking this comment at face value, it gives the impression that Linus did in fact use the errno.h file from Minix to construct the Linux 0.01 errno.h file. But do the error names and values match up? By stripping the Minix _SIGN macro and the error comments away using:

    grep define Minix/1.5/errno.h | sed 's/(_SIGN//;s/).*//'
and comparing the results with the Linux 0.01 errno.h kernel, we see that every error definition has the same name and value. The definition of the external variable errno is also identical between the files:

   extern int errno;
Thus there is significant evidence that Linus did refer to the Minix 1.5.10 errno.h file to produce Linux 0.01 errno.h. Let's now examine SCO's assertion that the errno.h file from Linux originated in System V.

Obtaining a copy of System V (binaries or otherwise) has proved to be difficult. However, I have been able to obtain a copy of System V errno.h with this copyright notice:

/*      Copyright (c) 1984, 1986, 1987, 1988, 1989, 1990 AT&T   */ 
/*      All Rights Reserved                                     */
/*      The copyright notice above does not evidence any        */ 
/*      actual or intended publication of such source code.     */ 
#ifndef _SYS_ERRNO_H 
#define _SYS_ERRNO_H 
#ident  "@(#)/usr/include/sys/ 1.1 4.0 10/15/90 58840 AT&T-SF"
 *              PROPRIETARY NOTICE (Combined)  
 *  This source code is unpublished proprietary information  
 *  constituting, or derived under license from AT&T's Unix(r) System V.  
 *  In addition, portions of such source code were derived from Berkeley  
 *  4.3 BSD under license from the Regents of the University of  
 *  California.  
 *              Copyright Notice   
 *  Notice of copyright on this source code product does not indicate   
 *  publication.  
 *      (c) 1986,1987,1988,1989  Sun Microsystems, Inc.  
 *      (c) 1983,1984,1985,1986,1987,1988,1989  AT&T.  
 *                All rights reserved.
The file, with the October 15, 1990 ident stamp is roughly contemporaneous with the first release of Linux. It is interesting that the file has a combined copyright notice from both AT&T and the Regents of the University of California.

The System V errno.h file is wrapped by the C-preprocessor defines

#ifndef _SYS_ERRNO_H 
#define _SYS_ERRNO_H 
#endif  /* _SYS_ERRNO_H */
which is similar to Linus' file, but which is also standard C practice to prevent a header file from being included twice into a C program. The System V errno.h file does not have a definition of the errno variable, unlike the Linux and Minix files. Each error definition also has a comment, as does the Minix file, but there are several differences between the System V and Minix comments:

ErrorSystem V CommentMinix 1.5.10 Comment
EPERMNot super-useroperation not permitted
EBADFBad file numberbad file descriptor
ECHILDNo childrenno child process
EAGAINNo more processesresource temporarily unavailable
ENOMEMNot enough corenot enough space
ENOTBLKBlock device requiredExtension: not a block special file
EBUSYMount device busyresource busy
EXDEVCross-device linkimproper link
ENFILEFile table overflowtoo many open files in system
ENOTTYNot a typewriterinappropriate I/O control operation
ETXTBSYText file busyno longer used

There are several more examples of different comments. This indicates that the Minix 1.5.10 errno.h file did not come directly from System V, and the earlier versions of Minix errno.h did not have comments.

Returning to the Linux 0.01 & System V comparison, the error names and values are identical from EPERM up to ERANGE, but then the equivalence breaks down:

ErrorSystem V ValueMinix 1.5.10 ValueLinux 0.01 Value
EDEADLK45no value35
ENAMETOOLONG78no value36
ENOLCK46no value37
ENOSYS89no value38
ENOTEMPTY93no value39

The simplest explanation here is that Linus borrowed error names and values from Minix from EPERM up to ERANGE, but Minix did not define errors 35 onwards. As new errors were required in Linux, these were added on an as-required basis, and so the numbers 35 to 39 were allocated. The difference in numbering between Linux 0.01 and System V supports the assertion that Linux 0.01 errno.h came from Minix 1.5.10 and not from System V.

Errno.h in Linux 0.97 Onward

The errno.h file in Linux does not change substantially from 0.01 to 0.96c of the kernel. The definition of ERROR is removed, and three new errors are defined: ELOOP as 40, ERESTARTSYS as 512 and ERESTARTNOINTR as 513.

However, from Linux version 0.97 the file (timestamped July 26 1992) changes significantly. In fact, this has been the only significant change to errno.h, and it remains essentially unchanged from 0.97 through to the 2.4.18 Linux kernel. In the new 0.97 errno.h file, the header comment about Minix _SIGN and the POSIX standard is removed, errors now have comments, and error numbers go from 1 up to 121 (then 512 and 513):

#ifndef _LINUX_ERRNO_H
#define _LINUX_ERRNO_H
#define EPERM            1      /* Operation not permitted */
#define ENOENT           2      /* No such file or directory */
#define ESRCH            3      /* No such process */
#define EINTR            4      /* Interrupted system call */
#define EIO              5      /* I/O error */
#define ENXIO            6      /* No such device or address */
#define E2BIG            7      /* Arg list too long */
#define ENOEXEC          8      /* Exec format error */
#define EBADF            9      /* Bad file number */
#define ECHILD          10      /* No child processes */
#define ENAVAIL         119     /* No XENIX semaphores available */
#define EISNAM          120     /* Is a named type file */
#define EREMOTEIO       121     /* Remote I/O error */
/* Should never be seen by user programs */
#define ERESTARTSYS     512
The large amount of new error numbers, and the fact that this predates Minix 1.6, strongly suggests that this new file was not derived from Minix. Was it directly derived from System V? Again, the evidence does not suggest so. From error numbers 35 onwards, both the System V and the Linux 0.97 files use different numbers for the same error names. Linux 0.97 has 121 errors; System V has 151 errors. While some error comments are identical apart from letter case, many error comments are different.

Where did the errno.h file for Linux 0.97 originate? The members of the Linux Kernel Archive mailing list searched for the origins of the file, and after some analysis, Linus Torvalds came to the conclusion that the errno.h file was automatically generated from the release of the libc-2.2.2 library that was part of the Gnu C compiler 2.2.2 for Linux (released on July 19 1992). Linus shows that "I can re-create _exactly_ the linux-0.97 "errno.h" file by using the "sys_errlist[]" contents from "libc-2.2.2". In particular, [a] trivial [C program] will generate the exact (byte-for-byte) list that is in the kernel''. Importantly, the regularity of the spacing within the 0.97 errno.h file strongly supports the idea that the file was not written by hand.

The file string/errlist.c from the libc-2.2.2 library has no copyright notice, and begins thus:


/* This is a list of all known signal numbers.  */

CONST char *CONST sys_errlist[] = {
        "Unknown error",                        /* 0 */
        "Operation not permitted",              /* EPERM */
        "No such file or directory",            /* ENOENT */
        "No such process",                      /* ESRCH */
        "Interrupted system call",              /* EINTR */
        "I/O error",                            /* EIO */
        "No such device or address",            /* ENXIO */
        "Arg list too long",                    /* E2BIG */
        "Exec format error",                    /* ENOEXEC */
From all of this analysis, I conclude that the errno.h file in Linux was not copied directly from UNIX System V. Early versions of the file were derived from the Minix source code, and the version of errno.h from Linux 0.97 onwards originated from a file distributed with the Gnu C compiler 2.2.2 for Linux.

Regardless of the origins of the errno.h files in Minix and the Gnu C compiler, it cannot be asserted, in my opinion, that Linus Torvalds or some other person directly copied a System V ABI file into Linux. Nor can it be asserted that Linus Torvalds or some other person removed a copyright notice from a file when the Linux errno.h file was constructed: it has been shown that the Minix errno.h files nor the libc-2.2.2 errlist.c file contained copyright notices.

I'll end with a short comment on this assertion by SCO in their supplemental response:

In 1992, Unix Systems Laboratories (USL), SCO's predecessor in interest, sued Berkeley Software Design, Inc. (BSD) for, among other things, copyright infringement. One of the bases of that action was BSD's copying and distributing some USL UNIX System V files without proper permission or attribution.
Firstly, Berkeley Software Design, Inc. is known by the acronym BSDi; the acronym BSD is reserved for the distributions of code that were released by the University of California, Berkeley. Secondly, SCO is completely wrong when they assert that infringing distribution of System V code was one of the bases of the lawsuit. In fact, nowhere in the court papers is System V even mentioned, except as a product that USL sells. All mention of copyright infringement in the USL vs. BSDi lawsuit relates to the 32V distribution from USL. For this reason, I consciously decided not to discuss BSD code in this article; that's a whole topic for later consideration.

Coming up next: a look at signal.h and the other ABI files.