I heard that Andrew Morton gave a wonderful keynote speech on July 24th at a the Ottawa Linux Symposium. This is "is a core technology conference, targeting software developers working on the Linux kernel, OS infrastructure, security, networking, and related research projects." In short, it's where you don't want an earthquake to happen, if you want Linux the kernel to progress at its current clip, because core Linux experts gather from 30 different countries in one spot. So it's a speech to programmers. He does address some big picture items, though, and I found it fun and fascinating, even though I am not a programmer.
I asked him if by any chance Groklaw could do a transcript. He was kind enough to say yes. So, a group of us have been working on it, and here it is for your enjoyment. If you wish to listen, you can do so here. It is in what he calls a "funky format", so here are his instructions:
speexdec keynote-8.spx keynote-8.wav
turned it into something which I could play."
His speaking notes are available at http://www.zip.com.au/~akpm/linux/patches/stuff/ols-2004.txt but the intro and of course the followup discussions are not there. So here is the full transcript. Any errors are Groklaw's, and if you spot any, do let us know, so we can improve. We wish to thank Chris Jones, beast, MathFox, Scriptwriter, Totosplatz, and Pug Majere for all their hard work.
Rusty Russell: OK. Welcome to the final session. This is the long-awaited keynote. Is it all right if I say something nice about Andrew? Most of you know Andrew Morton as the person who in, back in 1986, released the Applix 1616. Who remembers that? Yeah? Both of you. Good. So, if you're not an Australian, the Applix 1616 was this 68000 based, self-built computer thing that Andrew wrote four articles in the Electronics Australia Magazines on. You know, how to put it together - was it? One of those magazines, anyway, that no one has forgotten. And if you search hard enough, you'll still find this up on the web. I did a Google search years ago, to find out who this Andrew Morton guy was, and the first hit was this Applix 1616 memorial page, basically, for people who were still using it. So it's still out there.
But I did the same search, to try to find the same page again, and some of you may know that for a long time Andrew Morton, the official biographer of Princess Diana used to get the first Google hit. Well, I'm happy to say, he is now down to number 9, and seven of the other eight are the Andrew Morton that we have all come to know and respect in the last year.
Seriously, for a moment, I mean, one of the things that happened last year at the kernel summit was people were asked "What are the good and bad things that have happened"? You know, "What - what have we done right, what have we done wrong in the last year of kernel development at that point?" And somebody, and I can't remember who it was, basically yelled out, "One thing that has gone right is Andrew Morton". And he got a huge round of applause, and we all feel that that's definitely true, I mean the kernel sources and the kernel community is much healthier for having Andrew's input.
And then I met him and I realized how *old* he is. Did you know that Andrew, Andrew reached *forty* this year I believe. Forty-FIVE? - Really? You lied! He's old. So . . . [laughter and applause] Yes, but he carries it with dignity. So, I would like to introduce, Andrew Morton.
(silence, audio off)
Andrew Morton: Oop - got caught there, didn't I? Thank you Rusty, I think. Today I'll be talking about the increasing interest in Linux from the large IT corporations -- I think we all know who all they are -- what effect it's had on us thus far, what effect I expect it'll have on us in the future, and how we should as a team, mainly the kernel team, react to these changing circumstances.
I think it's gone pretty well if you look at you have a significant number of very large and vigorously competitive IT corporations who are becoming quite dependent upon Linux; all of a sudden they're interracting with this group of crazy long-haired hackers. There's a significant cultural difference there, considerable potential for both commercial and cultural clashes, but I think things have gone very well. There has been very little friction.
If I had any concerns about the process so far, I think that perhaps that we have not been as efficient as we could be. There are some areas in which our processes and communications could be improved so that we can deliver better software more quickly than we have been, and I'll touch on some of those things as we go through. The question that might be asked is, "Is what I'm about to say an O-fficial position?" Well, there is no such thing as an official position in Linux, is there? Linus has opinions, I have opinions, everybody else has opinions, and the only consistency here is that most of us are wrong most of the time. Everyone is free to disagree because we have so little invested in a particular position. We are individuals. We are always open to argument and usually the only limiting factor is we all run out of bandwidth to discuss things and whoever has the simplest position ends up winning out.
Most of the pressure for change is upon the kernel and a lot of of the changes we are seeing are happening in the kernel. My talk will be kernel-centric but is presumably, hopefully, applicable to and generalizable to other projects within the free software stack -- capital "F" Free. We need to find ways in which the pressure to add new features and developers to the kernel project not ends up impacting the way we've historically gone about our development and release processes. Yes? But you're shaking your head. You should be, because what I've said there is grade "A" lawn fertilizer. We need to find ways to avoid . . . Do we need to find ways to keep our current processes intact, regardless of the pressure upon us?
Well, no. There is no reason why we should fail to adapt to the changing circumstances, and a conservatism such as that would be quite wrong, would eventually lead to all sorts of trouble. We need to recognize the changes which are occurring with open eyes and at the very least we need to react to them appropriately and ideally, if possible, we'd do better than that and actually anticipate events so that we won't be surprised by them. There is change in processes, in people, and in practises.
The most important and successful free software is what I would call system software. That's software which other software builds upon to provide a particular service. There is some successful free end user software. I guess most people would point to the desktop software, such things as Open Office, [unclear], KDE, Gnome, [unclear] software suites.
I think end user software is not where the bulk of the attention and the buzz lies at present. And I expect it never really will be. The successful system software we have at present includes the kernel, low-level software libraries, web, email and other services, databases, many system tools, compiler tool chain, xserver, windowing systems, GUI toolkits, and even a dot-net workalike.
I think it's not a coincidence that the most successful free software is the system software. I'll take a shot at explaining why that is so. Now's when I need to spout out some economic theories. You should never pay any attention to anybody's economic theories, especially when they come from journalists, or politicians, or economists or academics. I'm about to add stable kernel maintainers to that list.
But it's true. I mean paying attention to other people's economic theories is a major cause of violent death in the twentieth century, so don't do it. OK, you're not listening? Normal processes of market competition do not work with system software due to what the economists will call "high substitution costs." The cost of switching from one set of system software to a competing set is too high.
We'll walk through the costs to the major players. End consumers, if they're basically to use a different operating system, they need to retrain their admin staff and their users. They need to obtain new versions of their applications to run on the new operating system or go to completely different products if their current application set doesn't work on the new OS. If, as is very commonly the case, the end consumers use applications which were developed in-house, those applications will need additional development, testing and deployment work but the organization still must support the version which runs on their old system software because nobody migrates their IT systems in one day.
So in all cases, end consumers can only run their software on the hardware platforms for which their new OS vendor has chosen to make their system software available. So if they want to move to a hardware platform or peripheral hardware which has better price performance or offers some features they particularly want, they may not be able to.
If we look at the substitution costs for the independent software vendors, in a competitive system software world, the ISVs need to develop, maintain and support their products on various versions of 5, 10, or more different vendor system software suites. That’s a lot of cost which, when you compare it with the world in which all customers use the same operating system, doesn’t bring in a single new sale, and the additional development support work inevitably impacts the quality and feature richness of their overall product line.
The ISVs, I think, would much prefer that all the machines in the world be running the same system software so they’d need to develop, maintain and support a single version. So of course what happens is they’ll develop for the most popular OS amongst their target customers, starving off minor players in the system software marketplace and creating a really big barrier for any new entrants.
The third party in the industry who is seriously affected by system software choice is the hardware vendors. They need to ensure their hardware works well with the system software which is in the field, going back many releases, many years worth of releases. Often the hardware vendors develop, maintain the device drivers for their hardware, which the manufacturers, which puts them in the same boat as the ISVs with respect to support and development costs. So the end result is the hardware vendors also will only support the most popular system software and minor or new entrants have to take on the costs of supporting somebody else’s hardware, and sometimes without adequate access to the hardware documentation.
So as a consequence of these high substitution costs, all the main players in the industry tend to gravitate toward a single suite of system software. Which is great if you happen to be the provider of that software, you get to make 85% margins on it. But the situation obviously places that provider in a monopolistic position, and leaves the users of the software with a single source for a vital component and often from a direct competitor.
So to get around this fundamental tension between the single provider and the industry’s need for a uniform set of system software, the industry’s doing what I think is an amazing thing. There are, as we know, many IT companies that are congealing around the suite of free software, which in fact nobody owns or, if you like, which everybody owns. This allows the industry players to use the same basic set of system software but without relinquishing control to the provider of that software.
This adoption of free software to resolve incompatibility between the economic need for provider diversity and the engineering need to avoid product diversity is, I think, fairly unique across all industry. I can’t think of similar examples. There are some analogous situations such as the adoption by various competitors of written specifications, standards, but I don’t think that’s really the same thing because here we’re talking about the sharing of an actual, ongoing implementation, an end product.
And the uniqueness of this industry response, I believe, derives from the fact that software is an exceptional product. It’s uniquely different in several critical ways from bridges and cars and pharmaceuticals and petrochemicals and anything else you care to think of. We all know that software is an exceptional product but still we hear people trying to draw analogies between software versus the output of other engineering activities. And often the people who say these things are making serious errors because software is exceptional. Mainly because of the low cost of reproduction versus production, but also because of the high cost of substitution versus the cost of the initial acquisition. Sometimes people compare the software industry with the publishing industry, say writing books. And yes, with the book the cost of reproduction is also lower than the cost of production. But nobody bases their entire business on one particular book. Nobody is faced with huge re-engineering costs if they need to base their product line on a different book. So if you do hear people drawing analogies between programming and other forms of industry, beware, and be prepared to poke at holes in the analogy because software is exceptional.
Now although people do go on about the sticker price of free software, as in you can download it for free, use of free system software by the IT providing companies, of course, does have costs to them. They must employ staff to maintain it, staff to support internal deployment and other support teams. They must add staff to add features which their employees require.
And often these features are fed back into the main line free software product, and this all can be viewed as part of the acquisition cost of the free software. These companies pay the cost of training engineers and other technologists to become familiar with, and productive with, our software, thus increasing our base of developers and support staff. And the companies also bear part of the cost of making end users familiar with, and comfortable with, the very concept of using free software. And that’s also in our interests.
One point I like to make occasionally is that free software is not free. When a company chooses to include free software in their product offerings, they are obliged by both the licence and by their self interest to make any enhancements available for inclusion back into the public version of the software. And this can be viewed as a form of payment in kind. You get to use our software but the price you pay is in making that software stronger, more feature-rich, more widely disseminated and more widely understood. I don’t know if this self-regenerating aspect of the GPL was a part of Richard Stallman’s evil plan all those years ago. It wouldn’t surprise me. He’s a clever guy, and he’s thought a lot about these things. People like to diss Richard and give him a hard time but we owe him a lot for his consistency of purpose and uncompromising advocacy for our work. Few of us will do as much for our fellow man as he has done.
As the large IT corporations with diverse customer bases adopt our software to run end user applications, they, and we, do come under pressure to add new features every day. The first requirement which we’ve seen across the past few years was to be able to scale up to really large and expensive systems which we as developers had never even seen, let alone owned. The 2.6 kernel and the new glibc threading code have largely solved that problem. We scale OK up to 32 CPUs and beyond, with a few glitches here and there which we will fix. Thousands of disks, lots of memory, etcetera. Enterprise features such as CPU hotplug have been added and many others are inching forward. New features which are currently under consideration for the kernel are mainly in the area of reliability and serviceability. Things like crash dump collection analysis, fault hardening, fault recovery, standardized driver fault logging, memory hot add/remove, enhanced monitoring tools and lots of things I couldn’t think of.
Now all this stuff’s coming at us, and we need to be aware of where we are, where we’re heading, what drivers are forcing us there and how we’re going to handle it all. I don’t see such pressure for user space applications. Maybe it's because I don’t pay attention to user space. We just consider the entirety of user space to be a test case for the kernel.
That wasn’t a joke!
Well, I assume there are similar approaches on selected user space libraries and utilities.
I’ll come back to tie these thoughts together again in a few minutes but let's for a while dive off, and let's review the traditional free software requirements gathering and analysis process, such as it is.
What have been the traditional resources for the free software world’s requirements? Identify four major ones. To a large extent the process has been what our friends at Microsoft in the first Halloween document called “following tail lights”; doing what early Unix-like systems have done. And that’s fine, there’s nothing wrong with that. We are heavily committed to standards whether they be written or de facto, and we’re committed as a matter of principle. We want to be as compatible with as much other system software as we can, so it can be as useful as possible to as many people as possible.
Second source of requirements we see as very strong is personal experience of the individual developers who are doing the work on the project. We also receive requirements from the users who are prepared to contact us via email and say, “I want this”. We say, “That’s a good idea, we’ll do it”. And of course we receive requirements directly from the distributors, the Red Hats and the SuSEs and the rest, who are in contact with end users and deliver features which their users require.
So to date those four have been our major sources of requirements, and I think at times we’ve been pretty bad at turning requirements into project teams and then crunching out the end product, but I don’t think it's been a huge problem so far.
But as the big IT companies become committed to our free software, they’re becoming a prominent new source of requirements. The companies who are now adopting Linux bring on board new requirements which are based on their extensive context with possibly more sophisticated customers of system software than we’ve seen before – banks, finance companies, travel companies, telecommunications, aerospace, defence, and all the rest. A whole bunch of people who just wouldn’t have dreamed of using Linux a few years ago.
These tend to be mature requirements, in the sense that the features did exist in other Unixes and downstream customers and the field staff who support them, they found the features useful, and they continued to require them in Linux. And this does all constitute a new set of requirements for the incumbent team of system software developers. In some cases, it’s a little hard for us to understand the features, how they’ll be used, how important are the problems which they solve. And sometimes it’s hard for us old timers to exactly understand how the users will use these features in the field. We regularly see people from IT corporations coming to us -- that’s Linus, me, and everybody else -- with these weirdo features, and we’re frequently not given sufficient explanation of what the feature is for, and who requires it, and what value it has to the end users, and what the usage scenarios are, and what the implications of not having it are, and so the thing just tends to drop on the floor and drifts on for months and months and months. Sorry, I do feel guilty when it happens.
As the people who are responsible for reviewing and integrating the code, this does make things hard for us. If the feature is outside our area of personal experience, and if the proposers of the feature present us with a new implementation without having put sufficient effort into educating us as to the underlying requirement, it becomes hard for us to judge whether the feature should be included. And this is a particularly important point. Because the feature provides something of which we have no personal operating experience, it’s very hard for us to judge whether the offered implementation adequately addresses the requirement. So we need help understanding this stuff! So you, the people who are submitting the patches (and you know who you are), the people with the field experience which drove the development of the feature, please put more effort into educating others as to the requirement which backs up the feature.
Now it could be asked, why do their requirements affect us, us being the developers who aren’t employed by them? If the features are to be shipped to end users, I believe that all the parties in Linux development do prefer the features being merged into the main line free software project, rather than sitting out in separate patches, because for the submitter of the feature, they will find other people will fix their bugs, other people will update their code in response to system wide code sweeps, other people add new features, they’ll have broader test and review, and they don’t need to maintain the external patch sets. And from the point of us, the kernel tree owners, we get to keep all the various downstream vendor’s kernels in sync from a features perspective so we can offer a uniform feature set to all users.
An alternative to chucking everything into main line kernel would be, if you like, an ongoing sort of mini forking process in which we’d have lots of different implementations for Linux software stack out there, all of them slightly different. I believe that’s bad because it fragments the external appearance of Linux, and different versions of Linux would have different features and the whole thing starts gets to get a reputation amongst our users –- I almost said customers but I don’t have customers! Maintaining this sort of miniature fork is expensive for the maintainer of the fork and of course gets more expensive the further it gets away from main line. And incidentally, I think that expense is why we’ll never see a fork in the kernel project itself, because the simple, if somebody forks the kernel, all of a sudden they own millions of lines of incomprehensible code and they don’t have the resources to maintain it. I think the only way you could ever see a fork in the kernel is if there’s actually a great big bun fight amongst the kernel team and the actual development team split. And I’ve seen no sign that that will ever happen. So rather than, in practise full fork is less likely than that there will be multiple alternate trees which diverge a lot from main line but which continually track it. So basically patch sets as many people have been running for quite a number of years now.
But even maintaining an external patch set is bad for the maintainer of that set, and bad for everyone else due to the feature and bug divergence between the various streams of the kernel. After a while it begins to re-introduce the substitution costs where, probably not so much for the end user, but the ISVs and the hardware providers are again in the situation of having to test and certify their products against multiple kernel versions. OK, they’ve always been in that situation, even if the kernels are very similar. The ISVs do tend to recertify their products against new versions of the underlying software, but we can and should work to minimise the pain and cost of doing that.
Now getting back to the new requirements which we’re seeing flowing into the kernel project. My own inclination when I’m unsure about the feature’s value to the users is to trust the assertions of the person who is sending the patch because we know these people have customers and they have experiences of other Unixes, using these features out in the field, and we are seeing they normally just, they usually don’t write code just for fun.
So it means we are pressed with features which we, the developers, have frankly no personal interest in. As long as the feature is well encapsulated and has a long-term maintainer who will regression test it. And if, in the last resort, if the feature can be reasonably easily ripped out if people stop maintaining it, then we’re OK with putting it in.
Now there might be a bit of a tendency for the IT and hardware companies out there to develop and test the new features within the context of a partner Linux vendor’s kernel, rather than within the main line kernel. It’s understandable. Its easier to do. You’re working with someone who is contractually motivated to help you out, and the feature will probably get to your end users more quickly. But again, working against vendor kernels does cause, has the risk of causing feature set divergence, and that it could mean that something which is acceptable to a Linux vendor is deemed unacceptable for the main stream kernel because the main stream kernel is not contractually motivated to take the patches. So either, if it does get merged into the main line kernel there might be changes made to it during that merging process which will make it incompatible with the version which has already been shipped by the Linux vendor. So we’ll end up with a choice of either putting a sub-standard feature into the main kernel just to remain compatible with some fait accompli which someone else decided on, or we’re putting in something which is incompatible with versions which are out in the field. Neither outcome is particularly nice, so I would ask that the corporations target development against the main stream kernel instead of, or in parallel with, the vendor kernels. Just keep in touch with the main stream kernel developers and make sure there aren’t going to be any nasty surprises down the track.
When considering a new feature submission, one factor which we look at is the question of how many downstream users need this. Obviously if a lot of people can use it and get value from it, that makes the change more attractive. But adding features which only a small number of users need is OK, as long as the cost on everybody else who uses and develops Linux is low. We have code in the kernel now which virtually nobody uses, and if the kernel team was going to be really hardheaded about it, these things just wouldn’t have been included in the first place or we’d be ripping them out now. Code which has little use covers things such as drivers, file systems, even entire architectures such as IA64 which…
That was a joke! Made me lose my place! But there is little pressure to rip these features out because generally the cost of having them sitting there is low. They tend to be well encapsulated, and if they break we just don’t compile them or we don’t use them, and the system tends to be self-correcting in that if a significant number of people are affected by the broken feature then somebody will either fix it or pay to have somebody else fix it.
Some features do tend to encapsulate poorly and they have their little sticky fingers into lots of different places in the code base. An example which comes to mind is CPU hot plug, and memory hot unplug. We may not, we may end up not being able to accept such features at all, even if they're perfectly well written and perfectly well tested due to their long-term impact on the maintainability of those parts of the software which they touch, and also to the fact that very few developers are likely to even be able to regression test them.
So if all what I’ve said so far is an accurate overview of where the base Linux components are headed, what lessons can we learn from it and how, if at all, should we change our existing processes so that we can respond to the increasing wider use of free software and the broadening requirements which we are seeing?
Some of it, of course, is in the motherhood stuff. We need to keep the code maintainable. Techniques for this are well established and Linus, bless his heart, is very strong on these matters. Flexible, powerful, configuration system-minimizing interaction between sub-systems, careful interface design, consistent coding practices, commenting style, etcetera. OK, Linus doesn’t believe in comments but I do!
Similarly we need to keep the code comprehensible. This is a particular bee in my bonnet. The kernel particularly (I won’t even go into glibc), the kernel is becoming more and more complex. It’s growing more sub-systems, larger sub-systems, more complexity in the core, drivers, etcetera. And despite Robert’s sterling efforts, I’d have to say that I don’t think out-of-tree documentation such as books and websites really work because the subject matter is so large and the documentation is so much work to produce and yet it goes out of date so quickly. So although I’m acutely aware of the comprehensibility problem, and I’m concerned about it, I really don’t have a magical answer to it apart from keeping the code clean, commenting it well, good change logging, and with using revision control system. Concentrate on keeping discussions on the mailing lists rather than sliding off into face-to-face discussions or IRC or whatever.
We need to recognize that new people do need help in coming up to speed and that any time spent helping other developers will have longer term returns. And I think that we should start to recognize that the central maintainer’s role will continue to weaken in that the top level sub-system maintainers, David Miller, Greg and James and all those other guys, they’ll continue to gain more responsibility across more code, and the top level maintainer’s role will increasingly become pointy-haired, moving away from nitty gritty code level matters in favor of things like timing releases, timing of feature introduction, acceptance of feature work at a high level, co-ordination with distributors, tracking bugs, general quality issues and beating people up at conferences and things like that.
Now fortunately, as the complexity and size of the code base increases, so too does the number of people and companies which use that code, that have an interest in seeing it work well. The kernel development team is getting larger. I haven’t run up any statistics on this but empirically it’s pretty hard to identify many people who’ve abandoned kernel work in recent years but we’ve got a ton of new faces.
Kernel development tends to self-organize into teams in their, or their employer’s, area of interest and that resource allocation algorithm does seem to be working pretty well at present. We do have some gaps in the developer resource allocation. There are abandoned drivers. There are various project-wide maintenance tasks which should be undertaken. But it’s hard to identify an appropriately skilled developer who will do that work. It hasn’t been a huge problem so far but it does come up occasionally.
What else can we do to accommodate all these new requirements and all this new code? We need to be able to accommodate, within the stable kernel, large changes and a high rate of change without breaking the code base. Across the lifetime of the 2.6 kernel we’ll see many changes as features are added and as we support new hardware. And the rate of change within the kernel has sped up. If we look at the changes in the first six months from the release of 2.4.0 and compare that to the changes from the first six months after the release of 2.6.0, in 2.4 we deleted 22,000 lines and added 600,000 lines. And in 2.6 we deleted 600,000 lines and added 900,000 lines. In the first six months. That’s 1.5 million lines were changed in a 6.2 million-line tree, a 64 MB diff in the first six months of the stable kernel. We changed a quarter of it!
So I think we need to change our mindset about the stable kernel a bit. Traditionally once we declare a stable branch, people seem to think that we need to work towards minimizing the amount of changes to that tree, and that the metric of our success is how little change we’re introducing, rather than how much. And that model has been unrealistic for a long time, and we need to challenge it. And I think in the 2.4 series, it was a contributing factor to a large divergence between the public kernel and the vendor trees, so we end up with vendors carrying huge patch sets. And also the model introduces large gaps in time where the development kernel tree is basically unusable because it’s under furious development while the stable branch is too static and causing vendors to have to add all those patches.
So we should look at maintenance and development of the stable kernel tree in a new way. Yes, the stable kernel should stay stable as much as possible. But the super stable kernels are the vendor trees. Vendors will pick a kernel at any time that suits them by going through an extensive stabilization cycle, then ship it. Meanwhile the public kernel forges ahead. It may not be quite as stable as the vendor kernels, but still suitable for production use by people who take the kernel.org kernel -- individuals, Debian, Gentoo, you name it.
And this is the model that we’ve been following since 2.6.0 was released. It's fallen into a four- to five-week cycle wherein patches are pummelled into the tree for the first two weeks, then we go into a two- to three-week stabilization period. At the end of that, we do a release and people pummel the patches in which they’ve been saving up during the stabilization period. For example, right now we’re all back to release 2.6.8 and we’re showing a 16 MB diff between 2.6.7 and 2.6.8.
Is this a sustainable model? I think so. It hasn’t caused any problems thus far. Well, I haven’t heard many complaints. People are getting their work into the tree and it's getting better.
We do need to help external people understand that there is a new game in town. There is a new paradigm, and the size of the diff between 2.6.7 and 2.6.8 is not indicative of any particular problem in 2.6.7; it’s a good kernel. We are hundreds of kernel developers who are continuing to advance the kernel and we have found processes which permit the changes to keep flowing into the main tree. I’ll say a bit more about the “not new” kernel development model at the end.
Something else we need to do is to come to a general recognition that there’s pressure to add enterprise features to Linux. In a few years Linux can, and I personally believe should, offer a similar feature set to the big proprietary Unix systems. We should recognize up front that these things are going to happen. All those weirdo features in Solaris and HP-UX and AIX, you name it, will I suspect end up in Linux in some shape or form. And they should do so. If there’s a proven user need for these features, and we end up being unable to find an acceptable way to integrate these changes then it's we, the kernel development team, who have failed. So we need to plan for these changes and make sure that we can introduce large features into the kernel, which the current developers don’t even understand the need for, let alone understand the code.
Another small process change which I believe we need is that the IT corporations who are developing and contributing these new features need to be careful to explain the new requirement and its offered implementation. To document it carefully, to be patient with us. Treat it as an education exercise. The better you can communicate the need for the feature and its implementation, the smoother its ride will be.
I have a few words for the Linux vendors here. It’s understandably tempting for Linux vendors of various forms to seek to differentiate their products by adding extra features to their kernels. OK. But moving down this path does mean that there will be incompatible versions of Linux released to the public. This will, by design, lock some of our users into a particular vendor’s implementation of Linux. And this practice exposes the entire Linux industry to charges of being fragmented. And it exposes us to the charge that we are headed along the same path as that down which the proprietary Unixes are deservedly vanishing. I think we all know where these charges are coming from. And it's undeniable that some of the charges do have merit. Certain -- to be very frank here, I don’t view it as a huge problem at this time -- but as a person who has some responsibility for Linux as a whole, I see the perfectly understandable vendor strategy of offering product differentiation as being in direct conflict with the long-term interests of Linux. It’s not for me to tell vendors how to run their business but I do urge them to find other ways in which to provide value to their customers. I strongly oppose the practice and I will actively work to undermine it.
That was just a fancy way of saying send me patches! Lets get these features into the mainline kernel, if not first, at least at the same time please.
Now the hardware and system software vendors. Even though they do have partnerships with the Linux vendors, they should work with their partners to target the public tree wherever possible. Yes, you can do all of the QA and certification activities within your vendor’s kernel but the public kernel should always be kept up to date. Doing this has a number of benefits for the device driver developers. You’ll find all users of Linux are able to use your hardware, nobody has to carry any patches, your code gets wider review and wider testing, and other people will fix your bugs and add features and ensure your driver doesn’t get broken by external or kernel-wide changes. Yes, I must say that the acceptance criteria for the public kernel are more stringent than for vendor trees. We don’t take so much crap. And you may have to do additional work to get your code merged. But getting the code into the public kernel avoids the terrible situation in which you’ve managed to get your driver into the vendor tree and then all your staff are sent off to do other things and you never have the resources to do the additional work which might be needed for a mainline merge. So your driver ends up skulking along in vendor trees for the rest of its life or at least until you are sick of paying 100% of the cost of its maintenance. And getting your code into the main tree avoids the even worse situation where a competing implementation of whatever it is your code does is merged into the mainline instead of your code. That leaves all the users of your feature and your vendor partners bent over a barrel because either the vendor will need to carry the duplicated feature forever or your users will need to implement some sort of migration.
Still with the hardware and system software vendors. Please, you need to educate your own internal development teams regarding free software development practices and avoid the temptation, when time pressures are high, to regress into cathedral-style development wherein the rest of the public development team don’t get to see the implementation of a new feature until it is near complete, by which time the original developers are too wedded to the work which they’ve thus far done, and any rework to make the feature acceptable to the public tree becomes harder for them and more expensive. Tell us what you’re up to, keep us in the loop, get your designs reviewed early and avoid the duplication of effort, and this way we’ll minimize any nasty surprises that could occur further down the track. So, please use our processes. If sometimes people don’t reply, well, just tell me and I’ll get someone to kick them, and generally we can get some activity happening.
Now, we do have processes. They’re different from the normal processes. They’re pretty simple. We can help explain them to you and your staff, and your kernel vendor partners know the free software processes intimately and can help to educate your teams.
As the various strands of the 2.6 kernel outside kernel.org approach their first release milestones, I do sometimes perceive that the communication paths which we’re using are becoming a bit constricted. Whether it's for competitive reasons or confidentiality or most probably time pressure, it appears to me that the flow of testing results and the promptness of getting fixes out to the rest of the world is slowing down a bit. So I would ask the people involved in this release work to remain conscious of this and try to keep the old golden goose laying her eggs. You may need to lean on your management and customers and partners and others to get the necessary resources allocated to keep the rest of the world on the same page but it's better that way. We send you our test results and patches, so please send us yours. That’s probably a bit unfair, people are being very good about this but let's keep it in line please.
Someone’s written summary here, that’s a good sign!
Apparently I’m too focused on the server side of things. I know this because I read it in a comment thread on Slashdot!
It seems that all those megabytes which were shaved off the kernel memory footprint and all that desktop interactivity stuff was done by some of the other Andrew Mortons on Google!
The 2.6 development cycle has led to large changes in the kernel, large increases in kernel capability. Due to the care we’ve taken, I don’t believe that this progress has compromised the desktop and embedded applications for Linux. They’ve also advanced, although not to the same extent as the server stuff. And the emphasis on server performance in 2.6 was in fact not principally in response to the interest from the three-letter corps. We knew it had to be done simply because the 2.4 kernel performed so poorly in some situations on big machines, so it was a matter of pride and principle to fix the performance problems.
And believe it or not, this is one area in which our friends at SCO actually said something which was slightly less than wholly accurate…
The performance increases in 2.6 would have happened even if IBM had disappeared in a puff of smoke, because we knew about the problems and we wanted to fix them up. Now in this respect we need to distinguish between the server performance work and all the other enterprise requirements. We did the speed-up work for fun because it was cool! All the other requirements which we’ve been discussing will be implemented for other reasons!
We should expect further large changes in the enterprise direction, particularly in the kernel, and we should plan the technical and procedural means to accommodate them, because the alternative (and that is not applying the patches) will be a slowdown in the progress of free software across the industry. And here I’m assuming that increased adoption with industry is a good thing. You don’t have to agree with that. Our failure to adapt to this new level of interest in Linux could even lead to a degree of fragmentation of the feature set which we offer, and when you take all Linux development effort into account, a failure to skilfully adapt to these new requirements will cause additional programmer effort which could have been applied elsewhere.
Now what I’ve said today is, to a large extent, a description of changes which have already happened and which are continuing to evolve today. There are no radically new revelations or insights here. But I believe we should always attempt to understand the environment in which we are operating, and if possible come to some consensus about the direction in which we’re heading and how we should collectively react to the changing circumstances. Free software, or at least particularly this project, is moving away from something which gifted and altruistic individuals do “just for fun” to quote the book title, towards a, if you like, an industry-wide cooperative project in which final control is conditionally granted to a group of independent individuals. There’s a lot of good will on all sides, but we do need to understand and remain within the limits of that good will.
But you shouldn’t take all of this to mean that Linux is going to become some sort of buttoned down corporate quagmire. I don’t think it will. I expect that the free software ethos, this very lofty set of principles and ethics which underlie our work, will continue to dominate. And I suspect that the Linux-using IT corporations will want it to stay that way, as our way of operating tends to level the playing field and it fends off the temptation for any particular group to engage in corporate shenanigans. If you like, we provide a neutral ground for ongoing development.
End users with their desktop machines will continue to be a very important constituency for Linux developers, apart from that guy at Slashdot. At times, we don’t serve these people as well as I’d like. I’ll see that random device driver X has done a face plant yet again. I don’t know anyone who I can reliably turn to, to get the problem fixed. But one promising sign here is that desktop is becoming, finally is becoming increasingly prominent, in the words of Novell, various ODSL partners, new distributions, old distributions, maybe even Sun. So the desktop, we say this every year, the desktop is coming. Or at least the simple desktop I believe is coming, over the next twelve months. So I expect more resources will become available in that area soon.
It's nice to watch when corporations make their developers available to work on a free software project. Some of them, to various degrees, tend to become subverted. They begin to get it. They gain a knowledge into the project and its general quality and cultural goals. It's doubtful their loyalty to the project often comes into conflict with their obligations to their employer. But once this conflict does come, I expect these Linux developers end up standing in their bosses’ office, imparting a few open source clues explaining why there’s "no way, we’re not going to do that".
And when they do this, they’re standing up for the project’s interest because they’ve become free software programmers. It's all very good. There’s an analogy we can draw here, perhaps a slightly strained one. We know that the way Linux has progressed over the past five years is to enter organizations at the bottom. You take some random individual sys admin or programmer who’ll get sick and tired of resetting or reinstalling Windows boxes. So he drags a server into work. A Linux server. It works, so a few more boxes are dragged in. Soon, Linux will become acceptable to the decision makers and it ends up propagating throughout the organization. We’ve all heard the stories, there are many of them. I did it myself at Nortel. Well I think a similar process is coming into play as the corporate programmers are assigned to sit in their cubicles and work on Linux. They come in as good little corporate people but some of them get subverted. We end up taking over their brains and owning them! They become members of the community!
They become free software developers. Now here’s the plan.
Some of these guys are going to be promoted into management! So in a few years time we’ll have lots of little pre-programmed robotic penguins infiltrating the corporate hierarchy…
…imparting their little penguin-like clues in upwards, downwards and sideways directions. I’m going to have to kill you all after telling you this, but that’s the real reason why we’re applying their patches!
So rest assured, world domination is proceeding according to plan!
Thanks everyone! I was going to blab on a bit about the new kernel development process for a few minutes. You can sit down and shut up!
Since about two and a half years, since about the 2.5.10, 2.5.20 kernel, the process has gone on very smoothly, we’ve just been following a process of integrating more features, integrating speed ups, various cleanups and it's gone along pretty much a linear progression across that whole time. So when we released the 2.6.0 kernel, nothing really changed. We basically did that as a trick to get more testers. So what we decided last Tuesday at the kernel summit was a decision to do nothing at all. We’ll just continue on the current process until something indicates that we need to change. So if you like, the mental model would be we’ve got a continuing progression of features going into the kernel, a continuing progression of bugs being added because its an unavoidable effect of adding features, and also bugs being removed at a decreasing rate. And right now we’re still at the stage where we’re removing bugs faster than we’re adding them, but at some point in time those two lines will cross and the kernel will have a constant number of bugs!
Unless we stop adding features! So that’s a simple mental model; it seems to be working thus far. So, vendors, they’ll get this gravy train of kernel versions, they’ll pick a particular version and sit on it for a few months, reject all new features, back port all the new fixes and what they have will auto-magically become more stable and then they can.. if they’re extremely naughty put their value add into it then shift it to their customers.
The question has been asked, concerns have been raised, well, when will the kernel ever become sufficiently stable? I believe its sufficiently stable now, at least I’ve not seen a sufficient number of bug reports to disabuse me of that. But if the requirement is there, what we can do at kernel.org is to release, if you like, a bug fix series only. So say, we release 2.5.16, we let it sit there on the servers, and then after a couple of weeks we’ll back port a few fixes and release 2.5.16 post 1… post 2. So if we do have people who are especially concerned about the stability of kernel.org’s kernels, they wouldn’t have picked up 2.6.15 within the first couple of weeks of it being released anyway, so by the time they doodle over to kernel.org they’ll find 2.6.15 post 2 there and it will be nice and stable and there shouldn’t be any particular problems.
That was a five-minute run down of what we’re doing. Do you want to say something Rusty?
OK. Questions! Surely there’s somebody out there who violently disagrees with all the rubbish I’ve said? Hand raised.
Questioner: Hello? Hello? OK. One of the things that the vendors like to have is obviously stable APIs, so that, you know, all of their applications don’t have to be recompiled and things like that. What do we see about API changes going into the kernel, because traditionally the development kernels have, you know, been free for all in terms of API changes, and the stable kernel –- that’s one of the reasons why it can’t accept some changes, because, you know, it has to be upgradeable.
Morton: Well, there are APIs and there are APIs. Obviously the syscall API never changes. That’s absolutely verboten. We just, we will add to it but we’ll never take anything away from the syscall API. So there should never ever be a need to recompile an application.
There are other parts of the API such as the location of /proc files and things like that which we will fiddle with and which will break user space. The various monitoring tools in 2.6 needed to be upgraded.
The internal kernel APIs to the modules and drivers and things, we’ve decided not to keep that stable within the stable kernel. Tough luck if you need to recompile.
But yes, at some point in time there is going to be sufficient pressure on Linus with people coming up to him with big, intrusive patches, where he’s going to have to declare 2.7. And what we’ll do at that point in time is, 2.6 will just keep on going as its always been going, 10 MB of patches a month. 2.7 will pick up all the 2.6 features on daily basis, so 2.7 will be 2.6 plus these big scary patches which break everything.
At some point in time those big scary patches will stabilize, and we can then do one of two things. Either just plop them down into 2.6 or rename 2.7 to 2.8, carry on for a couple of months and then abandon 2.6. But in point of, in terms of APIs to user space, I think the sticky issue here is if we want to do things like move around /proc files. Not necessarily part of the syscall API which is golden but, yeah, our hands will be tied there until we can come out with the 2.8.
I’m also removing features. There’s been some contention about that lately -– crypto lib, devfs, etcetera. Traditionally the way we’d remove a feature like that was to tell people it was deprecated and then go through an entire kernel cycle and then remove it. So the feature, it could take up to three years to get rid of the feature. What I’m proposing is that what we, we want to speed that up, we want to get it down to say twelve months. So we’ll stick a printk in the kernel that says, you know, this feature’s going to go away at the end of June 2005 so please stop using it.
Questioner: Hi! Does OSDL accept funding from the corporations that it accepts patches from?
Morton: Oh yes!
Same Questioner: Well, as a nonprofit organization, how do you separate the funding and the patches from the perception of back handed persuasion, etcetera?
Morton: That is a concern. I’ve heard very little about that and - I’ve no way of proving that there are not issues surrounding that. In fact, I’ve heard stories that naïve OSDL sales people have in fact led companies to believe that working with OSDL can help them get things into the kernel. All I can do is assert that it's not the case. I personally, I actually work for Digeo, a separate company from OSDL, and there’s a contract between Digeo and OSDL which lets me work on Linux full time. And Linus himself is of course a fellow of OSDL. All I can do is, just promise it doesn’t happen.
[Inaudible question from audience]
[Inaudible question from audience]
Morton: Certainly our management at OSDL are very careful not to do that sort of thing. Occasionally they’ll ask me to go out and talk to a company, help explain a few things to them, impart clues in the other direction. And I will explain to those companies what we as the kernel development team deem to be acceptable. Plus the fact that, if I slip a dodgy patch in the kernel, I’d get beaten up. I already have been. But, although, if there was a lot of dissention, yes, Linus and I would have final call. It's very much a consensus decision what goes in. And if people who I know and respect have serious issues with a patch, that’s basically a blocker as far as I’m concerned until those issues are cleared away.
Someone else – Rusty?: I’d like to point out that the Red Cross as a nonprofit, gives out blood whether or not you’ve contributed to them or not.
Questioner: OK, one comment on the previous question is that I think a lot of us here don’t have an issue because we all know Linus would simply walk out the door of OSDL the day somebody forced him to do that. The proper question for that is the longer term one. The question I have for you, Andrew, is, what do you think about things like the Linux standard base and the ISO standardization project for Linux?
[Inaudible, then ominous silence]
[Laughter, thumping, applause]
Morton: Technical guy. So that was a bit of a non-answer I’m afraid, I just haven’t played in that area.
[End of recording]