Signal.h -- Part 2 of Warren Toomey's look at the ABI Files

Monday, March 01 2004 @ 05:30 AM EST

Contributed by: PJ

Here is UNIX Heritage Society's Warren Toomey's second article on the ABI files, as promised. The first, as you recall looked at errno.h. Now, it's time, he writes, "to turn our attention to SCO's assertion that signal.h was one of the files involved in the "line-for-line copying of UNIX System V code" which SCO alleges "improperly appears in Linux''.

To determine if their accusation is well-founded or not, we need to understand what signal.h is, what's in it, and a bit of its history.

It's important to point out that there are two versions of signal.h in most versions of UNIX ( /usr/include/signal.h and /usr/include/sys/signal.h), and as yet -- to the best of our collective knowlege -- SCO Group has not specified which, if either, is the file they claim has been improperly copied. The same is true of errno.h.

We have yet to see SCO list any "UNIX Derived Files" publicly, for that matter. The files SCO mentions in their Revised Supplemental Responses to IBM's 1st and 2nd Set of Interrogatories are all from AIX, Dynix and Linux, although on page 59 it references an Exhibit A that SCO says lists them. However, Exhibit A is not attached to the publicly available Revised Supplemental Responses, at least not yet. SCO has referenced UNIX files being attached to letters sent to their "dear Unix licensees" ("A complete listing of the UNIX Derived Files is attached"), but so far we have not heard of anyone actually getting the attachment with the letter. In Red Hat's most recent filing, they include the letter to Lehman Brothers, which also references the attachment, but again, there is no such attachment in the public court filing.

Has anyone who got a letter from SCO received this attachment listing "UNIX Derived Files"?


~ by Warren Toomey


Following on from my report into errno.h in Linux, it's time to turn our attention to SCO's assertion that signal.h was one of the files involved in the "line-for-line copying of UNIX System V code [which] improperly appears in Linux'' and that "persons as yet unknown copied these files into Linux, erasing the USL copyright attribution in the process''.

In Unix and Unix-like systems, the underlying operating system can send a message to a running program to inform it of some exceptional event: a signal. The program's execution is diverted to a signal handler which deals with the event, before returning the program to what it was originally doing.

The sort of events that can occur are numerous: access to an 'out of bounds' area of memory, a divide by zero operation, a signal to stop executing from the user, etc. For (nearly) each signal type on the system, a running program can decide to ignore the signal, catch the signal and deal with it, or simply let the default Unix behaviour happen for that signal type. Most signals if uncaught result in the program being terminated, and the SIGKILL signal can never be caught: it is the "terminate with extreme prejudice'' signal in Unix.

To have a valid assertion that "line-for-line copying of UNIX System V code . . . improperly appears in Linux'' for signal.h, SCO needs to demonstrate that the signal names, their numeric values, any associated program comments and other function definitions could only have been directly copied from System V to Linux, and from nowhere else. Our job here is to track down the origins of signal.h in Linux.

What's in Signal.h?

What's in a typical signal.h file on most Unix or Unix-like systems? First of all, there is a set of defined signal names, their values, and possibly a C comment describing the signal. Systems which comply with the POSIX standard need to define about 28 signal names and associated numeric values; the values are not defined by the POSIX standard, but nearly every Unix and Unix-like system uses the same numbering scheme.

The earliest version of the signal name/numbering scheme still in existence is the nsys/param.h file from the 3rd Edition of UNIX in August 1973, with 12 defined signals. As Unix grew, so too did the number of signals, and by the 7th Edition of UNIX and the 32V distribution in 1979, the file now called signal.h had 15 signals.

By the end of the 1970s, there were already Unix clones like Idris and Coherent, and of course they also had to enumerate the set of signals. Not surprisingly, they followed the same numbering convention as Unix, as is shown by this file from Idris in 1978, where nearly all of the names and numbers are derived from 6th Edition UNIX.

This sort of code "cloning'' is exactly the thing that seems to make SCO see red. However, at the time AT&T asked Dennis Ritchie (one of the developers of Unix) to visit Coherent's makers [first link] and determine if the Mark Williams Company relied on Unix code when they wrote Coherent, Dennis determined that he "couldn't find anything that was copied'', and "what they generated was [...] reproducible from the [Unix] manual''. It must be remembered that the manual pages for Unix were published and publicly available; in fact, each new version of Unix was known by the edition of the printed manuals.

Dennis goes on to indicate that AT&T "backed off, possibly after other thinking and investigation [... and] so far as I know, after that MWC and Coherent were free to offer their system and allow it to succeed or fail in the market''. This decision and others like it, together with the publicly available enumeration of the signal values, allowed the Unix signal numbers to be used in many Unix clones and non-Unix systems such as:

The list is probably endless; hyperlinks to other examples of the Unix numbering in non-Unix systems can be posted as replies to this article.[1]

We've digressed from the topic of "What's in signal.h?'' to observing that the contents of the original Unix file was copied with AT&T's knowledge as early as 1978. Let's get back to what is in a typical signal.h file.

Along with the list of signal types, there is a list of operations that a running program can do when a signal arrives. Typically:

There is no numeric definition for the program handling the signal itself. Instead, signal.h defines a prototype for the signal() function. This system function takes two arguments: the signal number to catch, and the name of a program-specified function that will catch it. This program-specific function must receive an integer (the number of the signal that has arrived) but not return any value. These days, example definitions of the program-specific function and the signal() function might look like:

   typedef void __sighandler_t __P((int));
   sig_t signal(int sig, sig_t func);

Earlier versions of signal.h often rolled both definitions into one line, giving an unreadable definition like:

   void (* signal(int sig, void (*func)(int)))(int);

The behaviour of signals and their handlers in Unix has changed dramatically over time, and now the whole signal system is mind-bogglingly complex. The POSIX standard lists many, many more type definitions and function definitions that must be found in modern signal.h files.

Signal.h in Linux 0.01

Linus Torvalds released version 0.01 of the Linux kernel source around the "middle of [19]91'', and this includes the kernel file linux/include/signal.h. We have:

Linux 0.01 vs Minix 1.5.10

If you're still awake at this point, then you are doing well. What sources of information did Linus use when he wrote this file? We saw that with errno.h, the most likely source of information was Minix 1.5. The evidence below suggests that Minix 1.5.10's signal.h was the source of inspiration for Linux 0.01 signal.h:

There are some differences though. The Minix 1.5.10 file defines the signal functions differently to Linux 0.01; in particular, the parameter names are different (_set vs. set, _oset becomes oldset etc.). The parameter names are really for decoration here, and serve no purpose to the compiler, so perhaps Linus was not so keen on the Minix parameter names.

One important difference is the different definitions of the signal() function:

void (*signal()) (); in Minix 1.5.10

void (*signal(int _sig, void (*_func)(int)))(int); in Linux 0.01

One possible clue here is Linus' comment in the file that he is "trying to keep headers POSIX''. The POSIX standard defines the signal() function thus:

void (*signal(int, void (*)(int)))(int);
and Linus has followed the POSIX standard and also decorated his definition with parameter names.

Linux 0.01 vs System V R4

Let's now compare the Linux 0.01 signal.h file to the corresponding file /usr/include/sys/signal.h from the 1990 version of System V R4.0 for i386:

I think it's pretty obvious that Linus did not have access to nor use System V source code to generate his 0.01 signal.h file.

Since Linux 0.01, the signal.h file has changed and expanded somewhat, but even the signal.h file from the Linux 2.4.22 distribution still bears little resemblance to the System V signal.h file; even a cursory inspection shows that the Minix 1.5.10 signal numbers are still used here.

Postscript: errno.h Proliferates

At the beginning I mentioned that, as early as 1978, the signal names and values from AT&T's original signal.h file had been used in other systems. The same is true for errno.h. Here is an example list that I put together in about 30 minutes of searching on Google:

AT&T did not put copyright notices on the "ABI files'' from 3rd Edition UNIX in 1973 up to and including the first release of System V in 1983. It makes you wonder, if Whitesmiths were putting copyright notices on their files in 1978, who really can claim copyright on the content of these files?

[1] These links are merely to demonstrate that the signal names and numbers have been used elsewhere. Copyright notices in the linked files should be observed. Copyrighted materials may not be used without the permission of the author.