David Wheeler has just written an article in which he calculates the cost to re-develop the Linux 2.6 kernel. He figures about $612 million. That is the least it is worth however, as he notes:
"It's worth noting that these approaches only estimate development cost, not value. All proprietary developers invest in development with the presumption that the value of the resulting product (as captured from license fees, support fees, etc.) will exceed the development cost -- if not, they're out of business. Thus, since the Linux kernel is being actively sustained, it's only reasonable to presume that its value far exceeds this development estimate. In fact, the kernel's value probably well exceeds this estimate of simply redevelopment cost."
What is Linux's value, then? A lot. The word billions comes to mind. I enjoyed watching him do the calculations, and I hope you do too. My thanks to him for permission to share this with you on Groklaw.
Linux Kernel 2.6: It's Worth More!
David A. Wheeler
October 12, 2004
This paper refines Ingo Molnar's estimate of the development effort it would take to redevelop Linux kernel version 2.6. Molnar's rough estimate found it would cost $176M (US) to redevelop the Linux kernel using traditional proprietary approaches. By using a more detailed cost model and much more information about the Linux kernel, I found that the effort would be closer to $612M (US) to redevelop the Linux kernel. In either case, the Linux kernel is clearly worth far more than the $50,000 proposed by Jeff Merkey.
On October 7, 2004, Jeff V. Merkey made the following offer on the linux.kernel mailing list:
We offer to kernel.org the sum of $50,000.00 US for a one time license to the Linux Kernel Source for a single snapshot of a single Linux version by release number. This offer must be accepted by **ALL** copyright holders and this snapshot will subsequently convert the GPL license into a BSD style license for the code.
Groklaw, for example, included an article that mentioned this proposal. It also noticed that someone with the same name is listed on a patent recently obtained by the Canopy Group. SCO is a Canopy Group company. Thus, this proposal raised suspicions in many as to Mr. Merkey's motivations.
Many respondents noted that Merkey's proposal would require complete agreement by all copyright holders. Not only would such a process be lengthy, but many copyright holders made it clear in various replies that they would not agree to any such plan. Many Linux kernel developers expect improved versions of their code to be continuously available to them, and a release using a BSD-style license would violate those developers' expectations. Indeed, it was clear that many respondants felt that such a move would strip the Linux kernel of legal protections against someone who wanted to monopolize a derived version of the kernel. Many open source software / Free software (OSS/FS) developers allow conversion of their OSS/FS programs to a proprietary program; some even encourage it. The BSD-style licenses are specifically designed to allow conversion of an OSS/FS program into a proprietary program. However, the GPL is the most popular OSS/FS license, and it was specifically designed to prevent this. Based on the thread responses, it's clear that many Linux kernel developers prefer that the GPL continue to be used as the Linux kernel license.
In one of the responses, Ingo Molnar calculated the cost to re-develop the Linux kernel using my tool SLOCCount. Molnar didn't specify exactly which version of the Linux kernel he used, but he did note that it was in the version 2.6 line, and presumably it was a recent version as of October 2004. He found that "the Linux 2.6 kernel, if developed from scratch as commercial software, takes at least this much effort under the default COCOMO model":
Total Physical Source Lines of Code (SLOC) = 4,287,449 Development Effort Estimate, Person-Years (Person-Months) = 1,302.68 (15,632) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 8.17 (98.10) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 159.35 Total Estimated Cost to Develop = $ 175,974,824 (average salary = $56,286/year, overhead = 2.40). SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
After noting the redevelopment cost of $176M (US), Ingo Molnar then commented, "and you want an unlimited license for $0.05M? What is this, the latest variant of the Nigerian/419 scam?"
Strictly speaking, the value of a product isn't the same as the cost of developing it. For example, if no one wants to use a software product, then it has no value, no matter how much was spent in developing it. The value of a proprietary software product to its vendor can be estimated by computing the amount of money that the vendor will receive from it over all future time (via sales, etc.), minus the costs (development, sustainment, etc.) over that same time period -- but predicting the future is extremely difficult, and the Linux kernel isn't a proprietary product anyway. Estimating value to users is difficult, and in fact, value estimation is surprisingly difficult to compute directly. But if a software product is used widely, so much so that you'd be willing to redevelop it, then development costs are a reasonable way to estimate the lower bound of its value. After all, if you're willing to redevelop a program, then it must have at least that value. The Linux kernel is widely used, so its redevelopment costs will at least give you a lower bound of its value.
Thus, Molnar's response is quite correct -- offering $50K for something that would cost at least $175M to redevelop is ludicrous. It's true that the kernel developers could continue to develop the Linux kernel after a BSD-style release, after all, the *BSD operating systems do this now. But with a BSD-style release, someone else could take the code and establish a competing proprietary product, and it would take time for the kernel developers to add enough additional material to compete with such a product. It's not clear that a proprietary vendor could really pick up the Linux kernel and maintain the same pace without many of the original developers, but that's a different matter. Certainly, the scale of the difference between $176M and $50K is enough to see that the offer is not very much compared to what the offerer is trying to buy.
But in fact, it's even sillier than it appears; I believe the cost to redevelop the Linux kernel would actually be much greater than this. Molnar correctly notes that he used the default Basic COCOMO model for cost estimation. This is the default cost model for SLOCCount, because it's a reasonable model for rough estimates about typical applications. It's also a reasonable default when you're examining a large set of software programs at once, since the ranges of real efforts should eventually average out (this is the approach I used in my More than a Gigabuck paper). So, what Molnar did was perfectly reasonable for getting a rough order of magnitude of effort.
But since there's only one program being considered in this analysis -- the Linux kernel -- we can use a more detailed model to get a more accurate cost estimate. I was curious what the answer would be. So I've estimated the effort to create the Linux kernel, using a more detailed cost model. This paper shows the results -- and it shows that redeveloping the Linux kernel would cost even more.
Computing a Better Estimate
To get better accuracy in our estimation, we need to use a more detailed estimation model. An obvious alternative, and the one I'll use, is the Intermediate COCOMO model. This model requires more information than the Basic COCOMO model, but it can produce higher-accuracy estimations if you can provide the data it needs. We'll also use the version of COCOMO that uses physical SLOC (since we don't have the logical SLOC counts). If you don't want to know the details, feel free to skip to the next section labelled "results".
First, we now need to determine if this is an "organic", "embedded", or "semidetached" application. The Linux kernel is clearly not an organic application; organic applications have a small software team developing software in a familiar, in-house environment, without significant communication overheads, and allow hard requirements to be negotiated away. It could be argued that the Linux kernel is embedded, since it often operates in tight constraints; but in practice these constraints aren't very tight, and the kernel project can often negotiate requirements to a limited extent (e.g., providing only partial support for a particular peripheral or motherboard if key documentation is lacking). While the Linux kernel developers don't ignore resource constraints, there are no specific constraints that the developers feel are strictly required. Thus, it appears that the kernel should be considered a "semidetached" system; this is the intermediate stage between organic and embedded. "Semidetached" isn't a very descriptive word, but that's the word used by the cost model so we'll use it here. It really just means between the two extremes of organic and embedded.
The intermediate COCOMO model also requires a number of additional parameters. Here are those parameters, and their values for the Linux kernel (as I perceive them); the parameter values are based on Software Engineering Economics by Barry Boehm:
So now we can compute a new estimate for how much effort it would take to re-develop the Linux kernel 2.6:
MM-nominal-semidetached = 3*(KSLOC)^1.12 = = 3* (4287.449)^1.12 = 35,090 MM Effort-adjustment = 1.15 * 1.0 * 1.65 * 1.11 * 1.0 * 1.15 * 1.0 * 0.86 * 1.0 * 0.86 * 1.0 * 0.95 * 0.91 * 1.0 * 1.0 = 1.54869 MM-adjusted = 35,090 * 1.54869 = 54,343.6 Man-Months = 4,528.6 Man-years of effort to (re)develop If average salary = $56,286/year, and overhead = 2.40, then: Development cost = 56286*2.4*4528.6 = $611,757,037
In short, it would actually cost about $612 million (US) to re-develop the Linux kernel.
Why is this estimate so much larger than Molnar's original estimate? The answer is that SLOCCount presumes that it's dealing with an "average" piece of software (i.e., a typical application) unless it's given parameters that tell it otherwise. This is usually a reasonable default; almost nothing is as hard to develop as an operating system kernel. But operating system kernels are so much harder to develop that, if you include that difficulty into the calculation, the effort estimations go way up. This difficulty shows up in the nominal equation - semidetached is fundamentally harder, and thus has a larger exponent in its estimation equation than the default for basic COCOMO. This difficulty also shows up in factors such as "complexity"; the task the kernel does is fundamentally hard. The strong capabilities of analysts and developers, use of modern practices, and programming language experience all help, but they can only partly compensate; it's still very hard to develop a modern operating system kernel.
This difference is smoothed over in my paper More than a Gigabuck because that paper includes a large number of applications. Some of the applications would cost less than was estimated, while others would cost more; in general you'd expect that by computing the costs over many programs the differences would be averaged out. Providing that sort of information for every program would have been too time-consuming for the limited time I had available to write that paper, and I often didn't have that much information anyway. If I do such a study again, I might treat the kernel specially, since the kernel's size and complexity makes it reasonable to treat specially. SLOCCount actually has options that allow you to provide the parameters for more accurate estimates, if you have the information they need and you're willing to take the time to provide them. Since the nominal factor is 3, the adjustment for this situation is 1.54869, and the exponent for semidetached projects is 1.12, just providing SLOCCount with the option "--effort 4.646 1.12" would have created a more accurate estimate. But as you can see, it takes much more work to use this more detailed estimation model, which is why many people don't do it. For many situations, a rough estimate is really all you need; Molnar certainly didn't need a more exact estimate to make his point. And being able to give a rough estimate when given little information is quite useful.
In the end, Ingo Molnar's response is still exactly correct. Offering $50K for something that would cost would millions to redevelop, and is actively used and supported, is absurd.
It's interesting to note that there are already several kernels with BSD licenses: the *BSDs (particularly FreeBSD, OpenBSD, and NetBSD). These are fine operating systems for many purposes, indeed, my website currently runs on OpenBSD. But clearly, if there is a monetary offer to buy Linux code, the Linux kernel developers must be doing something right. Certainly, from a market share perspective, Linux-based systems are far more popular than BSD-based systems. If you just want a kernel licensed under a BSD-style license, you know where to find them.*
It's worth noting that these approaches only estimate development cost, not value. All proprietary developers invest in development with the presumption that the value of the resulting product (as captured from license fees, support fees, etc.) will exceed the development cost -- if not, they're out of business. Thus, since the Linux kernel is being actively sustained, it's only reasonable to presume that its value far exceeds this development estimate. In fact, the kernel's value probably well exceeds this estimate of simply redevelopment cost.
It's also worth noting that the Linux kernel has grown substantially. That's not surprising, given the explosion in the number of peripherals and situations that it supports. In Estimating Linux's size, I used a Linux distribution released in March 2000, and found that the Linux kernel had 1,526,722 physical source lines of code. In More than a Gigabuck, the Linux distribution had been released on April 2001, and its its kernel (version 2.4.2) was 2,437,470 physical source lines of code. At that point, this Linux distribution would have cost more than $1 Billion (a Gigabuck) to redevelop. The much newer and larger Linux kernel considered here, with far more drivers and capabilities than the one in that paper, now has 4,287,449 physical source lines of code, and is starting to approach a Gigabuck of effort all by itself. And that's just the kernel. There are other components that weren't included More than a Gigabuck (such as OpenOffice.org) that are now common in Linux distributions, which are also large and represent massive investments of effort. More than a Gigabuck noted the massive rise in size and scale of OSS/FS systems, and that distributions were rapidly growing in invested effort; this brief analysis is evidence that the trend continues.
In short, the amount of effort that today's OSS/FS programs represent is rather amazing. Carl Sagan's phrase "billions and billions," which he applied to astronomical objects, easily applies to the effort (measured in U.S. dollars) now invested in OSS/FS programs.
I'd like to thank Ingo Molnar for doing the original analysis (using SLOCCount) that triggered this paper. Indeed, I'm always delighted to see people doing analysis instead of just guesswork. Thanks for doing the analysis! This paper is not in any way an attack on Molnar's work; Molnar computed a quick estimate, and this paper simply uses more data to refine his effort estimation further.
Feel free to see my home page at http://www.dwheeler.com. You may also want to look at my paper More than a Gigabuck: Estimating GNU/Linux's Size, my article Why OSS/FS? Look at the Numbers!, and my papers and book on how to develop secure programs.
© Copyright 2004 David A. Wheeler. All rights reserved.