Dick Gringas, a programmer and Groklaw reader spent the time to figure out some of SCO's math. They are talking about millions of lines of code. Dick has figured out the numbers for SMP/RCU/NUMA code in Linux, and even if you put them all together in one heap, it doesn't add up to millions of lines of code. Here is Dick Gringas' work, and thank you for it.
"Just finished spending about eight hours compiling info on the lines of SMP/RCU/NUMA code contained in the Linux kernel (see below).
"I'm not a member of the Linux kernel community, but I've been programming for upwards of 35 years, the first 12 of which I worked on operating systems and compilers, so I have sufficient background to do a credible job analyzing the code base.
"Because I had to eyeball each file that possibly contained some of the disputed code, I thought I might as well include the name(s) of author(s) and the last copyright year. So without further ado, here's the data:
"Lines of code (LOC) in Linux SMP, RCU and NUMA.
"The total LOC for all of SMP/RCU/NUMA is 5,124. To provide perspective, the total LOC for all of the Linux kernel is approximately 5.2 million, including the code for all twenty architectures that Linux will run on plus all the drivers for the myriad supported peripherals.
"The results here were obtained by searching the kernel tree for:
"1. a filename that contains the string smp/rcu/numa, or
"2. a source file that contains #ifdef for SMP/RCU/NUMA.
"Each resulting file was then manually examined and the lines pertaining to SMP/RCU/NUMA were counted.
"All line counts include comments and blank lines.
"Only files used as part of the Intel i386 architecture are included because that's the only platform on which SCO's OpenServer and UnixWare run. Most of the code for SMP and NUMA is completely different for other architectures, including the Intel IA64 (Itanium).
"Not counted: source files that contain trivial code, i.e.,
". includes of header files (.h)
". variable definitions
". macro definitions
". calls to external subroutines defined in one of the principle modules, for instance, drivers for peripheral hardware
"Names of authors and last copyright date is noted if copyright statements or authorship was given. If an author indicated his company, that is so noted. Where a source file was worked on by many programmers, only the principle authors are listed."
Linux Kernel 2.6.0-test3 (latest as of 8/17/03)
Symmetric MultiProcessing (SMP) Code:
592 arch/i386/kernel/smp.c 1995 Alan Cox, Red Hat; 2000 Ingo Molnar, Red Hat
1186 arch/i386/kernel/smpboot.c 995 Alan Cox, Red Hat; 2000 Ingo Molnar, Red Hat
295 kernel/module.c 2002 Rusty Russell, IBM
528 kernel/sched.c 2002 Linus Torvalds; Ingo Molnar
60 kernel/timer.c 1992 Linus Torvalds; Ingo Molnar, Red Hat; David S Miller; Alexey Kuznetsov
5 kernel/exit.c 1992 Linus Torvalds
35 kernel/posix-timers.c 2002 George Anzinger, MontaVista Software; Richard Henderson
22 mm/swap.c 1994 Linus Torvalds
60 mm/slab.c 1997 Mark Hemment; 2002 Manfred Spraul
3367=Total SMP Code
Read-Copy Update (RCU) Code: (actually part of SMP code)
267 kernel/rcupdate.c 2001 Dipankar Sarma, IBM
135 include/linux/rcupdate.h 2001 Dipankar Sarma, IBM
402= Total RCU Code
Non-Uniform Memory Architecture (NUMA) Code:
164 kernel/sched.c (see under SMP)
58 arch/i386/kernel/mpparse.c 1995 Alan Cox, Red Hat
25 arch/i386/kernel/smpboot.c 1995 Alan Cox, Red Hat
106 arch/i386/kernel/numaq.c 2002 Patricia Gaughen, IBM
429 arch/i386/mm/discontig.c 2002 Patricia Gaughen, IBM
129 arch/i386/pci/numa.c no copyright statement
19 arch/i386/mach-default/topology.c 2003 Patrick Mochel, OSDL; Paul Dorwin, IBM; Matthew Dobson, IBM
186 drivers/acpi/numa.c 2002 Takayoshi Kochi, NEC
23 mm/page_alloc.c 1999 Kanoj Sarcar, SGI
~50 mm/slab.c 2002 Manfred Spraul
166 include/asm-i386/numaq.h 2002 Patricia Gaughen, IBM
1355=TOTAL NUMA Code
Dick Gingras, August 19, 2003
I asked another programmer to repeat the work, and he reports that the work is good in his opinion, with minor number differences, but not of any significance to the main point. Gingras chose to use the 2.6 kernel, because it presumably has the most high-end code.
Then I got another email, and another coder has been doing some math homework too, and when he also found the code can't add up to millions of lines, he has a theory:
"I think SCO is including everything that _uses_ the 3 disputed technologies and not just allegedly copied SYSV code. I grepped for files that use the 3 technologies (using a rough method) and counted their lines.
$ grep -irlE '_smp|smp_' . | xargs cat | wc -l 1120087 (sco claims 750k)
$ grep -irlE '_rcu|rcu_' . | xargs cat | wc -l 79138 (sco claims 110k)
$ grep -irlE '_numa|numa_' . | xargs cat | wc -l 41809 (sco claims 55k)
"The figures don't exactly match but they're in the right ballpark. I think this is similar to the method SCO has been using to discover "derivative forks". They think anything that links against their allegedly copied SYSV code is a derivative work of SYSV. For example, the ext2 filesystem code uses spinlock code from the SMP core. I think SCO is claiming that ext2 is 'copied' from SYSV because of those spinlocks.
"I hope I've got it wrong because if this is what SCO is doing then they're engaged in a IP land-grab. They're using their allegedly copied SMP and NUMA and RCU code to steal millions of lines of code from thousands of Linux copyright holders. The hypocrisy of SCO claiming they're protecting IP rights for the 'little guy' while trampling over the IP rights of Linux copyright holders... it makes me sick to the stomach."