Google Scholar - A Suggestion: Why Not Google Legal?

Monday, November 22 2004 @ 02:51 AM EST

Contributed by: PJ

Google has announced a new service, Google Scholar. I think it's a fabulous idea. John Markhoff of the NYTimes explains [sub req'd]:

Google Scholar . . . is a result of the company's collaboration with a number of scientific and academic publishers and is intended as a first stop for researchers looking for scholarly literature like peer-reviewed papers, books, abstracts and technical reports.

Google executives declined to say how many additional documents and books had been indexed and made searchable through the service. While the great majority of recent scholarly papers and periodicals are indexed on the Web, many have not been easily accessible to the public.

You can read about Google Scholar on their FAQ page, which is quite enjoyable, as there is a bit of the usual dry Google wit.

Here are a couple of their "frequently asked" questions, and since they just began the service, I assume they mean inevitably anticipated:

A Suggestion for Google -- Google Legal

I have a suggestion for Google. Why not a Google Legal? Would it not be wonderful to have legal documents readily available and organized in one place for the general public? Groklaw is the proof of concept that non-lawyers are interested. Proprietary services like LexisNexis have cases sewn up currently, but there is one thing about it. Legal documents are public domain. If they are obtained not from proprietary sources, but instead from the courts directly, it is fine to post them. We do it on Groklaw all the time. Just harvesting what is already on the Internet would be useful, but at 7 cents a page on Pacer, it seems economically feasible to do pretty much everything, at least on the federal level even if you didn't wish to go directly to the courthouse. Some states are more digitally with it than others. Utah, for example, is cutting edge, Delaware has discovered the Internet and is working on it, and some states, like Nevada haven't yet addressed the digital issue. Here is a list of the courts that do not participate on Pacer, and local courts you'd need to contact directly. But even with digitally-challenged states or courts, you can usually contact transcription services the courts use, in-house or out, to get legal documents for a minimal fee.

In short, with funding, it's a doable task. I seriously see a need for this and a way to do it, and I hope they look into it. Of course, they would need to read the case law first, and run it past their legal department, naturally, to make sure they don't repeat the errors of others in the past, but in no way would a service like this undermine Westlaw or Lexis, who provide a wonderful service tailored to lawyers, who need and appreciate the value add they get from such services. Obviously, links to such services would be included. The rest of us nonlawyers, though, and smaller law firms that can't afford such services, can make do with the documents, plain vanilla, plus scholarly articles and legal commentary explaining how the system works, some of which is already available on the Internet and could likely be greatly enhanced by a Google Legal collaboration with legal publishers. Just having it all organized would be a great help.

If you are in a legal scholarly mood, you might enjoy Dan Hunter's article, "Amateur to Amateur", which you can get from a great legal service called Social Science Research Network Electronic Library. Here is their searchable Legal Scholarship Network page, which states its goal like this: "The goal of LSN is to facilitate the distribution of scholarly information related to law to legal, economics, and business scholars and practitioners throughout the world." The article is free to download, although others available there are $5 or so.

Cornell University Law School's Legal Information Institute Releases Code Under CC License

And there is another wonderful resource. Cornell University Law School's Legal Information Institute makes US Code freely available, as well as US Supreme Court oral argument previews from liibulletin, The Federal Rules of Civil Procedure, Federal Rules of Criminal Procedure, Federal Rules of Evidence, Federal Rules of Bankruptcy Procedure, Uniform Commercial Code, and other key reference works in linked and structured pdf format, which you can download for personal use for a fee, or you are free to use their website until the cows come home: "The content of all of these publications can be explored and indeed used, without limit, at our website." They also offer a lexicon of legal terms. And they have appellate decisions that are in the news as well as a page on how to do legal research.

I wrote to Thomas R. Bruce, Director, Legal Information Institute, whose bio says that he cofounded the LII (with Co-Director Emeritus Peter Martin) in 1992, served for several years as Director of Educational Technologies at the Cornell Law School, and is the author of Cello, the first Web browser for Microsoft Windows, and of a variety of other software tools used by the LII and others."

What prompted me to write was this press release last month, announcing that LLI was for the first time releasing the complete US Code in XML format under a Creative Commons license, as well as the underlying XML version as a dataset for use by researchers interested in legal text:

Legal Information Institute Releases Complete United States Code in XML Format.

Cornell Law School's Legal Information Institute has announced the release of a new online edition of the United States Code, including all the Federal law passed by Congress currently in force. For the first time, the project team is also releasing the underlying XML version as a dataset for use in research.

The XML data set has been generated from the most recent official version made available by the US House of Representatives, codified under fifty "titles". The United States Code "is the official compilation of the Federal statutes of a general and permanent nature; by Federal statute, the Law Revision Counsel of the U.S. House of Representatives is the publisher and compiler of the Code, and the Counsel is an appointee of the Speaker of the House."

Thomas R. Bruce, Director of Cornell's Legal Information Institute (LII), suggests that this edition of the United States Code represents perhaps the largest body of legislation ever made available online in XML format for use by researchers interested in legal text. One of the goals of the US Code project is to stimulate interest on the part of the research community in working with legal text, and to survey the uses to which people put XML versions of legislation.

According to the LII's USC Bell Code Browsing Environment User Guide, the Institite is sponsoring a "continuing effort to render the United States Code as an open-source multi-use XML data set. An important part has been to develop an environment to make the raw data, and emerging interpretations of it, as visible as possible in an analytical mode. As this is primarily a laboratory artifact, not many user friendliness features have been implemented; the emphasis has been utility for someone who knows the project."

The US Code supplied to the Legal Information Institute "is marked up for typesetting; [the project team] uses this specialized markup to help discover the structure to motivate more generalized XML elements. In a preliminary micro-translation, the control-code based input is rendered in a quite literal readable format, which is then stored as a file with the same scope as the input (title or appendix) as well as fragmented along data-natural boundaries and rendered as static HTML for easy viewing."

The U.S. Code XML data is licensed under a Creative Commons License. The relevant Creative Commons "Attribution-NonCommercial-ShareAlike 1.0" license ensures that users are are free to: (1) copy, distribute, display, and perform the work, and (2) to make derivative works, provided that attribution is given to the original author credit, that the use is noncommercial, and that derivative works which alter, transform, or build upon the original are distributed only under the identical Creative Commons License."

It's actually under the 2.0 license. Here is how Mr. Bruce describes what they are doing and how:

"This version of the US Code is the latest in a project that’s been going on since we put the first Federal legislation on the Net in 1992-1993 (you and Groklaw-ites won’t be surprised to learn that the first thing we did was Title 17, Copyright – in Gopher, no less, for those whose memory is that long). We’ve been at it ever since, with some funding from our parent institution (the Cornell Law School), and from the former Red Hat Foundation, which became the Center for the Public Domain. We’re now largely supported by contributions from our user base and of course I’d encourage anybody who likes what we do to make a donation – we’re a very small organization, for all that we’re inside a big university, and every little bit helps us a lot.

"We’ve been using XML quietly, in the background, for a few years now, but this is the first time we’ve released the underlying XML to the world. We hope it’ll encourage information scientists, computational linguists, and others to work with this material – we’re particularly eager to encourage more work on legal text, hoping that it will result in more freely-available information of the kind we offer on our site. The version we use is up-converted from something called 'locator-code data' – essentially typesetting data with escape codes – that we get from the US House of Representatives and its Office of the Law Revision Counsel. It’s converted using a series of rather complicated Perl scripts to create the XML version, and XSLT to make a static HTML derivative; we use Mason, swish-e, Apache, mySQL, and all that good stuff in serving it up. The technically-curious will find a disorganized collection of research notes here, and a dated but accessible explanation of the project in presentation form here. We owe special thanks to Eric Loach and Elliott Chabot of the House staff for their cheerful assistance through the years, and to any number of volunteers and students who have bent their minds around the textual contortions of Congress and helped stuff them into structured and accessible data formats. Anyway… hope you and everyone else gets some use out of it, and I’ll be eager to hear about any derivatives."

You can find the HTML version at; the XML is available for download at Here are some of the other references Cornell lists [scroll down]:

Contact information is on that page as well. But back to my idea for Google, all the pieces are available and online now as far as laws and yet there remains a key piece missing, filings and decisions in legal cases. The public can't get those without subcriptions and money being exchanged. Even if they are willing to pay, most don't know where to go to do so. Even when they know, it's scattered all over the place.

If Google did it, I absolutely would use it, and I'd gladly endure ads to get at it. Or if Cornell's LII wanted to expand and do it, I'd encourage everyone to donate money (in fact, they already deserve our support and here's where you can go to do so) and volunteer labor, if needed. It's really important, I think, for someone to do it. The law affects people more directly now than ever in history, particularly in the area of intellectual property law, and they know it and are interested in participating more knowledgeably. And the simple truth is, without complete case law, you really can't understand the law thoroughly. There are many case decisions available on the Internet, but only recent cases, for the most part, and usually only higher profile ones. I know that because I frequently hit that stone wall, when I'm trying to link to cases for Groklaw. I'd been thinking of explaining that whole process more thoroughly on Groklaw, but if you click on Cornell's legal research link, you'll get a fine overview of how to do research in case law, and how impossible it is to do it without a paid service currently.

So, there's my idea, Google. Hope you like it enough to do it, because otherwise I might get it into my head that I need to do it, and you know what that can lead to. Joke. Joke. Groklaw can't do it, because we're volunteers, and it's an idea that requires some funding, so that is why I am passing the idea along to you, and I sincerely hope you decide to implement it. If not, the idea is out there now, and as Thomas Jefferson pointed out, it's a mighty powerful and useful thing to send an idea into the world.