About | Project Members | Research Assistants | Contact | Posting FAQ | Credits

Google Book Search / The Google Print Library Project

Research Report by Lisa Swanstrom
(created 1/19/06; version 1.1, 1/21/06)

Related Categories: New Reading Interfaces, Online Text Archives, Search and Data Mining Innovations

Original Object for Study description

Google Book Search is a controversial initiative by the California-based search engine company Google to dramatically increase the amount of print literature available for on-line consumption. Google’s strategy is double-pronged, involving negotiations with both publishers and libraries to scan print works and convert them into searchable digital formats. While Google claims the project provides a benefit to the public good because it offers increased access to print texts, opposition has been fierce, resulting by early 2006 in at least two separate lawsuits arguing that the nature of Google Book Search constitutes an infringement of copyright law. Despite continuing litigation, the project is currently in beta-testing, is operational, and is growing steadily, boasting over one hundred thousand searchable texts.

This evolving project, as well as the controversy surrounding it, raises questions regarding intellectual property rights, authorship, access to information, and the shifting material nature of the book in the face of digital culture.

In October 2004 the online search engine company Google announced at the annual international Frankfurt Book Fair its intentions to launch “The Google Print Library Project,” an ambitious program whose ultimate, far-reaching goal would be the online availability of all print books.

Google Book Search has two divisions: the Print Publisher Program, which works in conjunction with publishing companies to make portions of their print books available online for promotional purposes, and the Print Library Project, which scans and digitizes the contents of a number of major cooperating libraries, including Stanford University Library, Harvard University’s Widener Library, Oxford Library, the University of Michigan Library, and the New York Public Library.

Google has made arrangements with these libraries to scan their books and then host the full-text documents on secure servers. Yet although Google plans to digitize each and every book–in its entirety–on these libraries’ shelves, it has also made multiple assurances on its website that readers will not have full access to copyright-protected texts. Rather, readers will only have access to “snippets” of the text surrounding their search items, which Google argues fully complies with “fair use” copyright rules. Only books that are in the public domain, they insist repeatedly, will be offered in their entirety.

Controversy surrounding The Google Print Library Project has been forceful. In late 2005, a group of opponents, including the American Association of Publishers and the Authors Guild, filed separate lawsuits against Google. The focus of the suits is what publishers see as a violation of copyright law. Another focus is Google’s intention to scan all books unless the publisher explicitly “opts-out,” a policy that essentially places the burden of non-participation upon the publisher, rather than upon Google. The publishers’ opposition to Google Book Search resulted in a three-month hiatus to allow publishers more time to “opt out” of the program. Despite this concession, litigation against the Google remains in process at the time of this writing.

Research Context:
In terms of its technological capabilities, Google Book Search is perhaps best thought about in the research context of information storage and retrieval known as data mining. An innovator in “crawling” technology–i.e., technology that sends out probing entities known as “crawlers” or “web bots,” to search, index, and catalogue information on Internet web pages–Google holds a patent for its “connectivity-based” search strategies and is an acknowledged forerunner in the arena of information management. Google Book Search is of particular interest to the field of information retrieval because this particular use of its technology raises a host of questions about issues related to intellectual property rights, authorship, access, and the materiality of the book in digital culture.

Technical Analysis:
The fact that the technology behind Google Book Search is not entirely transparent is not very surprising. Google, after all, has been notably silent regarding the details of its 2003 patent for “interconnectivity-based” search capabilities. Regardless of this opacity, however, it is clear that the Book Search project makes use of computer technology to fulfill at least two broad functions: conversion & hosting and searching & retrieval.

The conversion and hosting of books requires a conjunction of capabilities, including initial access to the printed text, scanning and conversion, and secure hosting once a book has been converted. For purposes of searching and retrieving, Google presumably makes use of its already impressive data-mining capabilities–with the difference that the infusion of book data will offer yet more data for Google to mine, and more data for its users to search.

Evaluation of Opportunities/Limitations for the Transliteracies Project:
Google Book Search is an object ripe for study by the Transliteracies Project, especially in regard to how its development will shape legislation and set precedents regarding authorship and intellectual property rights (IPR) in the digital era. While conventions regarding intellectual property and copyright laws have a rich history, there has been an explosion of debate in the past ten years regarding the management of digital property, especially in terms of how it is protected, sold, and disseminated across the World Wide Web (see, for example, Lawrence Lessig’s Code and Other Laws of Cyberspace as an example of this increased urgency of interest).

In this debate, the digitized book becomes an object of scrutiny that raises many questions: Is the digitized book an object? Is it a property? Is it a commodity, and, if so, who holds the rights to profit from its reproduction? In essence, projects such as Google Book Search put publishers and authors of print works in the position of having to reckon with the subtleties and complexities of digital culture. Conversely, lawsuits about property rights filed by publishers and authors caution business giants like Google to tread very carefully in what, until recently, has been a print-dominated cultural domain.

Apart from the controversy surrounding the legality of Google Book Search, the potential ramifications for reading practices are fascinating to contemplate. If Google does succeed in its ambition to create, in its own words, “… a world where all books are online and searchable [their emphasis],” then its claims about access could really be put to the test because we could see (or not) Google’s egalitarian dream of equal book access put into practice. Additionally, the ability to search any text in its entirety from the convenience of one’s own private or public workspace would be a decided–if not invaluable–boon to most researchers.

On the other hand, Google’s status as the dominant business working on this project raises troubling questions in terms of access. For starters, how would the introduction of a central “authority” to which digitized books must conform change the way information is disseminated? Book collections have traditionally been distributed widely, not only geographically, but according to varying levels of ownership and organization. One benefit of this arrangement is that if a reader/researcher cannot locate a book in her local library, she can visit another institution and very likely find the book she is looking for. If all digitized books were stored, maintained, and organized by Google, what recourse would the reader/researcher have if, for whatever reason, she found Google to be a dissatisfactory gatekeeper of this vast digital archive? Furthermore, Google’s status as a for-profit enterprise may set off warning bells for traditional libraries that house print works–libraries which are typically not-for-profit. What would it mean to confer all of their treasures–albeit in digital form–upon one dominant profit-making entity?

In addition to issues of intellectual property, authorship, and access, there is at least one other intriguing layer to the Google Book controversy: the status of what a book becomes if it now exists as both a print document and an entirely searchable digital file. By divorcing words from the printed page, Google Book Search strips content from its traditional form and in principle is free to reconfigure that form in any way it deems fit. Whatever form the object ultimately takes, it will undoubtedly complicate cherished notions about the book’s status as a material object.

Resources for Further Study:

Point(s) for Expansion:
Other projects that involve book digitization (e.g., Amazon.com’s “Search Inside” option, Random House’s pay-per-page service, and Microsoft et al’s Open Content Alliance).

  tl, 01.19.06

One Response to Google Book Search / The Google Print Library Project

  1. Publications | swanstream says:

    [...] objects, and technologies related to online reading. The Internet Archive “esc for escape” Google Print Inform.com “The Legible City” Sony Reader “Reading as [...]