Declaration of Randall Davis of MIT: SCO Is Wrong, Wrong, Wrong

Saturday, August 28 2004 @ 02:12 AM EDT

Contributed by: PJ

We don't have to reverse engineer the legal documents, so to speak, any more to try to figure out what Randall Davis's Declaration says. We have it, thanks to the indefatigable Frank Sorenson. Being a paper exhibit, it was scanned in, which means it's a large file and a little hard to read. But it's readable, and a very enjoyable read it is.

He is a Professor of Computer Science at MIT, as most of you know, and that's not telling one-tenth of the story. More on his credentials, which are amazing, in the attached Exhibits and the first few paragraphs of the Declaration. You will notice how IBM establishes him as an expert ("From 1995-1998 I served on the Scientific Advisory Board of the U.S. Air Force... I have served as a member of the Advisory Board to the US Congressional Office of Technology Assessment study on software and intellectual property . . . From 1998-2000 I served as the chairman of the National Academy of Sciences study on intellectual property rights and the emerging information infrastructure...I have been retained as an expert in over thirty cases dealing with alleged misappropriation of intellectual property, such as the allegations raised in this case, and have done numerous comparisons of code. . . . In 1990 I served as expert to the Court . . . in Computer Associates v. Altai, a software copyright infringement case that articulated the abstraction, filtration, comparison test for software. . . .", etc.), and now contrast how SCO introduced Sandeep Gutpa and Christopher Sontag (more or less= "I am an employee of SCO.").

Some highlights:

"11. In summary, I find fundamental errors in Mr. Sontag's conclusions. He grossly exaggerates what is required to determine whether there is substantial similarity between Linux and SCO's allegedly copyrighted works. In fact, the materials necessary to determine substantial similarity have been available to SCO for years (at least since it acquired access to the allegedly copyrighted works in 2001). Tools capable of efficiently evaluating that material have also been publicly available to SCO for years. The task Mr. Sontag says could take 25,000 man years to complete should take capable programmers no more than several months. . . .

"18. As stated, SCO has had, since before the initiation of this case, all the raw material it needs to find any alleged substantial similarity between Linux and Unix. It of course has all relevant versions of Unix; it can get any version of the Linux kernel from publicly available web sites. As one example, www.linuxhq.com contains every version of the Linux kernel since the original 1.0.0 and a complete history of every change made to every kernel file over its entire development history. . . .

"19. There are a number of tools that are publicly available (in both executable and source code form) to compare large, complex programs for the purpose of determining substantial similarity. . . . SCO acknowledges using such tools, but misrepresents their utility. They are in my opinion quite effective.

"20. In an attempt to show that SCO could not possibly compare the works at issue without more time and information, Mr. Sontag states that existing tools may not detect minor changes in the code... are subject to false positives...and will require years to implement unless SCO is afforded more information. ...

"21. First, the existing tools are entirely adequate, even accepting the observation . . . that minor changes can prevent an absolutely literal matching process from being effective. There are several reasons why the existing tools will do the job. Despite SCO's implication, one cannot casually change punctuation, rename variables, change spelling, alter the text (whatever that means), or insert, delete, or reorder lines of code.... Code is extremely brittle and thus is in some respects quite similar to a complex and intricately designed mechanical device, like a finely made wristwatch. One can no more casually change punctuation, insert, delete, or reorder lines of code than one could casually insert, delete, or reorder the parts in the watch and still expect it to work.

"22. Software development is difficult in large part exactly because the code has to be just right. In the C language (in which both Unix and Linux are written), for example, a semicolon means something very different from a comma; substituting one for the other changes the modified code completely (and very likely breaks it). Similarly, a single equal sign '=' means one thing, but two of them in a row '= =' mean something entirely different. Hence even a minor typo can go unnoticed (because it can produce a syntactically valid program), yet wreak havoc on program behavior. Programmers routinely have the experience of serious and obscure malfunctions arising out of the simplest typographical mistake, or out of the well-intentioned act of making a small change to code. Code is sensitive to even slight alterations; changes are not easy to make.

"23. This is one reason why, while it is in principle possible to copy code and then purposely obscure its origin, that practice is generally carried out on programs at the scale of freshman homework assignments (where it is more easily detectable than freshmen think), not sections of multi-million line operating systems. Especially where large, complex programs are concerned, changes are that much more difficult to make, and purposeful obfuscation on a large scale is nearly impossible. . . .

"27. Mr. Sontag poses the problem as if no results will be known until the entire comparison task is complete. Even if the entire task were daunting (which it is not), if 'much' of SCO's 3.5 million lines of code were copied..., this would imply that there must be thousands of examples waiting to be found, and hundreds able to be found after a modest amount of effort. Mr. Sontag's own declaration acknowledges that SCO has used one or more of the existing tools to do the requisite comparisons . . . but SCO has yet to present any credible examples of substantial similarity. . . .

"29. . . . Mr. Sontag states that the only way for SCO to determine substantial similarity is to get a vast amount of additional materials from IBM and a number of other individuals or entities. In fact, none of this additional material identified by Mr. Sontag is necessary to the substantial similarity task. . . .

"32. Having access to all of the materials concerning AIX and Dynix to which Mr. Sontag refers in his declaration (which appears to be a huge amount of information) would not, in my opinion, be of any assistance in determining whether Linux is substantially similar to Unix. Those materials are not useful for the task at hand. . . .

"34. To suggest otherwise leads to the absurd notion that one work can be considered similar to another even if the two are currently completely different, if only one can show a (perhaps very long) sequence of small changes that lead from one to the other. This would be like playing the game of 'telephone,' in which a sentence is successively whispered from one person to the next in a long line, and claiming that, even though the sentence that emerged was totally different from the one that started the process, they were 'substantially similar' because the last was the result of many small changes to the first. Similarity means just that -- similarity. And the determination of similarity is made on the code as it is, independent of how it got that way. . . .

"39. It is estimated that the additional AIX and Dynix source code that SCO seeks exceeds 2 billion lines of code. Based upon the estimates Mr. Sontag used to arrive at his 25,000 man-years calculation, it would take SCO more than 14 million man-years to review just the additional AIX and Dynix code that SCO says it needs, putting aside how many more man-years it would require SCO to review the other materials it says it needs. . . .

"42. Second, Mr. Sontag provides vanishingly little rationale for this voluminous request, which is not surprising, as the requested information is irrelevant to the task at hand. Once again, the task at hand is finding substantial similarity between Unix and Linux as it is now. Gathering information regarding the entire development history of Linux, including from potentially hundreds or even thousands of individuals, would not merely require a considerable amount of time, it would be of little or no meaningful assistance. The notion, for example, that 'Mr. Torvalds can answer specific questions as to what each contributor intended, and where and how the contributor acquired or developed the derived code,' suggests a wholly unrealistic picture of any mortal and of the code development process. The task would be done far faster, and the time better spent, if SCO were simply to put even part of the effort imagined by Mr. Sontag to the task of comparing the Unix and Linux source code SCO already has. . . .

"SUMMARY

"44. Mr. Sontag grossly exaggerates what is required to determine whether there is substantial similarity between Linux and SCO's allegedly copyrighted works. The materials necessary to the task have been available to SCO for years and tools capable of evaluating that material in a matter of months have also been available to SCO for years."

Translation: Your Honor, they are pulling your leg. They don't need more time for discovery and they don't need any more code. Period. By the way, he testified as an expert in the Gates Rubber, Inc., v. Bando American, Inc. case too.

So, having read that and considered the man's credentials, does it sound to you like IBM has been refusing to cooperate with SCO's legitimate discovery requests? Or does it sound to you like SCO has been gaming the system so as to achieve delay after delay by making inappropriate and even downright silly requests? SCO said it had three groups deep dive into the code, implying they already used the tools Davis says are useful for finding not only copied code but obfuscated code, and they have found nothing credible. And if they can't find anything, does it now sound to you like there is a problem with the provenance of the Linux kernel? That SCO has a leg to stand on? You be the judge. But can you not sense Davis's curled lip?

And some say Groklaw has a point of view.

517 comments



http://www.groklaw.net/article.php?story=20040828002640200