April 02, 2008

Could Jeanneney's Euro-Centric Digital Library Increase Competition?

Less than a full chapter into Jeanneney’s brief book, we all knew he’d be torn to shreds on our uchicago blog. This post seeks to do two things: First, to give Mr. Jeanneney a fair shake. His concerns may be somewhat ill-founded, but the plan for a European digital library to be the Airbus to Google’s Boeing has the spirit of competition. (63) His idea is not simply protectionist and self-serving; it serves the marketplace of ideas and competition for viewers, readers, and providers of content. These are precisely the things we support, and accordingly, we can smile on Jeanneney’s goal of (if not his motivation and means for) more texts, more search engines, more algorithms, etc. Second, I would like to pursue more vigorously the practical problems and theoretical shortcomings of Jeanneney’s position discussed briefly in Claire and Anglee’s posts. Specifically, this post will focus on a) how his texts would be selected and compartmentalized (compared with Google and normal libraries), and b) why he thinks either there is no market for such content or such a market is underserved by Google.

1) Jeanneney’s Admirable Goal

As his translator points out, the internet has made the world smaller. There is a question as the world shrinks, whether we want a uniform, monolingual, two-dimensional webscape. (92) The answer among people who pick up this book is supposed to be a resounding “no.” Prior posters have responded that the market can, in fact, provide a varied and diverse set of views. Perhaps it can. Jeanneney analogizes the web to television, a field in which the United States has gone private, the United Kingdom has the BBC, and France is somewhere in the middle. Just as there are niche markets for HBO, the Sundance Channel, and Fox Reality, there will be markets for lesser known books and viewpoints. The internet is not a place where independent or small publishers are necessarily crushed. Rather, they can make their content available for relatively little money to a wide audience who may stumble upon it or seek it out. Further, those niche content providers, like, say the Robb Report or Forbes magazines, allow advertisers to target a narrow slice of the population for particular products. This, market-fans say, is a good thing and supports competition. Accordingly, Jeanneney is wrong to worry about the small publisher.

If we are to be so market sympathetic, however, why oppose the entry of a Euro-centric digital library? While Jeanneney speaks fondly of a maximum discount of 5% on books hampering the sales of big corporate book chains in Europe, he is not advocating such regulation of Google books. His essay is call to arms precisely because “Google [is] triumph[ing] over weak rivals,” meaning that “Europe [must] pool its energies and embark on a vast plan to digitize our writings.” (65) His goal of a more continental search engine and database isn’t reprehensible, it’s supplemental. Just as Hausman touted Google books in her post as a supplement to normal research methods, Jeanneney can tout his continental/world-oriented database as a supplement to Google Books. We should no more oppose his goal than he should oppose Google’s.

I believe prior posts were a little too quick to jump on Jeanneney for being anti-privatization, anti-market. Those were his motivations, but his goal was ultimately one that would increase competition. However, the devil is in the details. While both sides of this debate (market v. government and popular majority v. diverse minority) have big goals, the means of achieving it is the nitty-gritty, and it is where the rest of this post’s focus lays.

2) Jeanneney’s Gameplan: Why? How?

a) Why?

This is a major issue in most prior posts: Jeanneney’s motivation. I believe it’s been sufficiently ripped up already, but I’ll say this about his concerns: they are nothing that minor tweaking cannot fix. He complains about poorly navigable e-books are and how a table of contents can help. This complaint is already being addressed in other areas: it is how even a medium length wikipedia entry is organized; it is available from store websites on the search results page, e.g. if you search “crate and barrel,” the search result page has not only the homepage, but subheadings that take directly to those specialty pages. Something comparable is easily possible for books. He complains about the order of the search results. For instance, when searching “Cervantes” a Spanish-language text comes up 9th after English-language texts. This is already fixed, for example, in the Law School library’s database: you can select the language of the results for which you’re searching. It seems to me that these and most of his other complaints are minor at best.

Even his major assault (that Google’s market-driven digital library won’t appropriately select or provide access to non-Anglo texts) undermines itself: France, Germany, and the world have vast cultural histories and billions of people to whom those histories matter (or ought to). If Google is so mercilessly money-oriented, why would it focus on English language works to the exclusion of other languages? Jeanneney’s “quick way to make a buck” account of Google makes sense only if he also admits that Google will attempt to make a buck in foreign markets, in foreign languages by tailoring their results to various non-Anglo target demographics. Jeanneney surely has to concede that no culture will be left behind if that culture can be digitized and sold. And if a particular culture has an insufficient target audience to make digitalization profitable, why would a paper library or bookstore choose to stock its history and literature rather than something more popular? The internet has made the world smaller, and because of it there is less purely local content, but more total content accessible worldwide.

This is all by way of saying that digital libraries are like other libraries in terms of content selection and preservation of diverse ideas. The local library does not especially carry local writers or texts, it carries the most important ones and the most popular ones. The e-library is no different. It seems to me, in fact, that Jeanneney’s fantasy continental/global digital library would have the same content as Google’s, as we’ll see immediately below.

b) How?

i) The Selection Process?

To counterbalance the 800 lbs googrilla, Jeanneney proposes a digital collection and algorithm more to his liking (in the vaguest of terms). What does he want? “[F]ounding texts,... major writings that have contributed to democracy, to human rights,... writings that have fostered the development of literary, scientific, legal, and economic knowledge.” (78) Google may be greedy, but they’re not dumb. It seems impossible that Google would not choose such texts on its own since there’s obviously a big audience for it. Further, this does not, as Jeanneney pretends, further the goals of the small publisher or the niche writer. The unprofitable minority view allegedly ignored by Google would not be a part of this group either.

And how would such a laundry list be satisfied? How would we know what qualified? Well, for one Jeanneney thinks a good gauge of a work’s importance is whether it’s “appeared in numerous translations, thus attesting to [its] influence.” (78) This sorting method for deciding what gets digitized is exactly the same as Google “classif[ying] the results according to criteria of frequency and density of links” to the page. (45) When something is translated, it’s linked to a group of readers who otherwise wouldn’t get it by putting it in their language right in front of them. It’s strikingly similar to linking someone to an article on the internet.

But don’t worry, there’s more to Jeanneney’s brilliant plan: he’ll have “scholarly councils [] deal with the precise choice of works to be digitized.” (80) The idea of such a council picking out the right texts as opposed to Google’s wrong texts is plainly silly. Jeanneney’s plan presupposes that there exist experts who can assess the objective value of a work independent of its popularity, independent of other people’s opinions, and without any apparent referent outside of the panel itself and some unannounced “jointly defined framework.” (80) Even if the texts selected fit the definition above (founding texts... fostered development...), the only thing that would prove their value is that the panel chose them rather than popular works. (Not to nerd up, but for more on the hairy issue of taste and true judges of value and the stupidity of Jeanneney, check out David Hume’s essay “Of the Standard of Taste.”)

ii) The Search Process

Suppose Jeanneney had the right texts, how would his search process differ from Google’s? It seems impossible that it would be very different. As discussed above, the number of translations can help rank importance and set up a hierarchy, just like Google’s sorting based on how many links there are to the site. More importantly, though, when there’s a search for a topic on Jeanneney’s search engine, won’t it be just like a keyword search on a library website? Keywords allow sorting by metadata, which is exactly what Google is doing. (56) All we really know about Jeanneney’s gameplan is that there’d be more French and Spanish language stuff at the top of search results, that there’d be classics, and (without explanation) that he’d throw in minority views that are unprofitable to digitize. I don’t understand what good Jeanneney thinks the unprofitable minority views will do, though. They’ll diversify his database, but a) they won’t be searched for because they’re unpopular, and b) they won’t come up in related searches because they’re unpopular relative to the classic, well-known majority view that drowned them out in the first place (but of course the panel of experts can probably rank the results, too).

Either Jeanneney’s search results will have classics and popular views ranked first, in which case it’ll effectively be like Google or it will have many unpopular results returned contrary to users’ desires, making it diverse, but unattractive and unable to attract users. I want to defend Jeanneney’s goal because it sounds different from Google in relevant ways. For it to have teeth, however, I don’t think it should be done fully through government agencies. It’d be like having a minister of good taste telling me what I to read and in what order. Rather, some hybrid public/private version could work. Perhaps the governments could take bids to create a database the helped people navigate from the popular views to the unpopular minority works; e.g. below the link to Adam Smith’s Wealth of Nations, there could be a link to books by Antonio Gramsci or Althusser or below a link to “Google and the Myth of Universal Knowledge” (which is on Google Books, by the way) there could be a link to something by Richard Epstein.

April 01, 2008

A Very Brief Note on European Efforts

    Just to point out one thing that I haven't seen much discussion of, yet, but which I was actually reading about before seeing Jeanneney's book: Jeanneney's touted BNF project Gallica has existed since 1997, yet it has translated only 90,000 complete books (see here). Quaero, another project Jeanneney cites with approval, was announced in 2005, but only obtained funding a week ago. It's not actually a united effort, but a collaboration of 23 companies and organizations. Despite the promise of €298 million over the next five years, Quaero has a website, but apparently does not yet have any working applications online. Germany has already all but abandoned it. Perhaps the most revealing comment on Quaero, however, is this one from the founder of one of the Quaero project members, Exalead:

[Exalead founder François] Bourdoncle conceded that Quaero would not be able to develop a rival to Google in search-related advertising, the company's most lucrative business. "The scale is just too large for anyone to catch up," he said.

Just thought people might find these facts interesting.

P.S. I'm assigned to comment this week, not post, but since I just read about these projects last week, I thought the class might find this material interesting.

A new front in the Franco-American Culture Wars

   

Jeanneney’s work raises several interesting arguments regarding Google’s relationship with the European Union. Jeanneney ostensibly claims that he seeks merely to provide an alternative model for Europeans to follow. At its heart, however, Jeanneney’s concerns largely stem from a suspicion of the potential threat of US cultural dominance and a sharp disagreement with the underlying principles of free-market capitalism.

European Resistance to Google: a Tribute to De Gualle

As the other posters have noted, his arguments contain significant flaws and biases that weaken the basis for his claims. Ms. Agarwal correctly notes that upon closer inspection, Jeanneney’s concern about selection bias seems more focused on the American source of the bias. This concern echoes those voiced at various times when European leaders felt the need to take a stand against the American colossus. Chapter 3, appropriately named Hyperpower, demonstrates that Jeanneney is at least partially concerned with fighting this old war on its newest front. It is noteworthy that the Chapter begins with De Gualle’s cynical view of NATO and the international market as nothing more than “camouflage for American hegemony”. (Pg. 35) Jeanneney, though careful to praise the close ties between the New and Old Worlds, clearly views the EU’s primary role as a much needed counterweight to America’s international influence. (Pg. 36)

Such an aspiration the European Union at first glance seems simple enough: as the nascent superpower grows in influence, it is entitled to develop along a different path as its American counterpart. However, one cannot discount that Jeanneney is looking at this issue through a distinctively French, rather than simply a European perspective. Jeanneney notes that new nations are the scene of competition between Latin and Anglo-Saxon law and worries that the dominance of Google will shift the balance to the latter. (Pg. 43) This concern cannot properly be called purely European; the United Kingdom and Ireland clearly have little to fear from the potential ascendancy of their own legal systems. Rather, this reflects a Continental concern. Over the course of modern European history, French leaders such as De Gualle have often steered the development of the European Community in ways designed to ensure that European influence was synonymous with French influence. It should come as little surprise that Jeanneney cites to the inaccuracies of Google.fr (no mention of any other European versions of Google) as evidence that Google poses a dangerous threat to the culture of Europe. (Pg. 43) Ms. Agarwal’s point can thus be refined a step further: Jeanneney’s primary concern is to ensure that French culture is not marginalized in the hyperlinked world of the digital age.

Network Effects: Mixed Blessing Rather Than Uniform Evil

 
One of the key justifications Jeanneney cites for the development of a publicly-funded European search engine and digital archive is the need to introduce a degree of competition into the technology industries that appear to be dominated by American firms. (Pg. 53) Jeanneney criticizes the French government’s decisions in the past to reduce or eliminate France’s presence in markets ranging from international news reporting to advanced photographic imaging. (Pg. 53) Jeanneney goes on to extol the past instances where European firms successfully competed with dominant American players, such as in the aerospace and satellite industries. (Pg. 61) Based on these past successes, he argues that Europe must again meet the challenge of Google and develop its own unique plan to digitize its writings. (Pg. 65) Jeanneney’s arguments that competition between America and Europe has produced benefits to consumers rest on fairly solid grounds. Ironically, to the “Chicago-school” liberals, this should come as no surprise: competition between firms can lower prices for consumers and lead to more efficient outcomes. What Jeanneney fails to recognize, however, is that in an industry that in all likelihood demonstrates considerable network effects, an increase in the number of competing search engines should not be treated as an unmixed blessing. As with all companies, Google and other search engines benefit considerably as the number of people who use their services grow. Yet as Jeanneney notes, algorithms set up by search engines adapt as a particular cite receives more and more traffic. (Pg. 45) These adaptations can in turn allow search engines to understand more about people’s preferences. It is undeniable that private profit-seeking plays an important role here. As Google gains more information about its users’ patterns, it is better able to deliver “available human brain time. (Pg. 26) Yet at the same time, as the algorithms become more responsive to the desires of internet users, the value of Google as an information gathering tool to each individual increases. It may be true that Google’s international editions leave something to be desired in terms of foreign-language searching abilities. Yet Jeanneney seems to believe that the only way this can be remedied is by reconstructing an entirely new search engine and digitization project for the European Union. A less expensive and possibly more beneficial alternative would be to partner with Google to improve its capabilities in this area. Jeanneney acknowledges that the European model will cost taxpayers a great deal of money (Pg. 82), yet appears perfectly willing to incur this cost despite the fact that it may in fact be duplicative. Fostering artificial competition for the sake of competition itself risks repeating the experience of the US under the 1996 Telecommunications Act.

Jeanneney views Google’s algorithm with suspicion, noting that it favors incumbents and popular sites while making it difficult for newcomers to get as much exposure. Indeed, he goes on to highlight the risk that Legifrance might not receive the attention that it deserves due to the inherent bias towards sites with heavier traffic. (Pg. 43) As Ms. Hausman correctly points out, this result is by no means guaranteed: European searchers seeking information from domestic sources may have to scroll down farther on the list, but they will not automatically substitute American sources. Yet even if this were to be the case, one might ask whether this is not simply a function of the market itself. If Legifrance is overlooked by Google because not enough people care to use its service, isn’t that simply a reflection on the value of Legifrance rather than Google? Jeanneney seems to believe that people’s preferences as expressed by the market can and should be overridden by the “better” views of the government.

To conclude, Jeanneney’s views are based on two key pillars: the notion that French culture should not and must not be eclipsed by American culture and a view that the regulation of the state is superior to the ordering of the private market. As the comments on his book have persuasively demonstrated, it is not clear that these bases provide sufficient support for his assertions.

 

More on Jeanneney's Myth

While I think there is a strong case to be made for the importance of protecting cultural heritage, particularly in the form of ancient documents, and for the dangers of the consolidation of information in the hands of one entity, I do not believe that Jean-Noel Jeanneney makes either this broad case or his specific case against Google. Jeanneney’s argument would have been stronger if he had limited himself to a careful analysis of the nature and implications of the Google Library Project. However, as Claire, Chang, and Ruben have already pointed out, Jeanneney instead engages in a fairly undisciplined analysis in which he frequently conflates Google’s Library Project with the entire Google operation, research in general with a Google search in particular, commercialism with absolute cultural death, and the operation of a single American company with America as a whole.

Mischaracterizations of Google’s digitization project abound. While Google’s general mission is to organize all of the world’s knowledge, Google has never suggested that the digitization project contains all of the books in the world, nor is Google the sole owner of the copies of the digitized books (they are at minimum also held by cooperating libraries). Further, Google is not destroying copies of books as it progresses, and they are still available for public consumption. Google’s power to monopolize lies only in a world where no other copies of the digitized books exist, where people’s preferences have no sway, and where people uncritically and only use the Google Book Search resource to the exclusion of other tools. To the extent that Jeanneney can be seen as making a case for competition in digitization so that these other tools exist, his point is well-taken, but his analytical method takes away from this helpful point.

Jeanneney’s concerns seem to be premised on the idea that people are mindless drones with weak or non-existent preferences who will all uncritically accept whatever preferences Google presents to them. In order to accept Jeanneney’s premise, we have to assume that people come to Google (or Google Book Search– it’s difficult to tell which he is really talking about at points) without preferences. I agree with Claire that this is an obviously false assumption and that Google (or America– again, the distinction is not clear) cannot claim “unilateral control over the thinking of the world” (p. 41) unless people’s preferences crumble in the face of Google. This is a highly implausible view of individual action and the way people’s preferences are formed (as others have noted, education plays a huge role here). Users will look at results with their own preferences in mind, and if an engine does not produce the preferred results, a user will go elsewhere. (I realize Jeanneney in part is concerned that there will be nowhere else for them to go, but he also describes digitization projects other than Google’s, so this concern seems at best theoretical.) For example, if Google does not present a Hugo work in French and that is what a user is looking for, then they will simply look for that work elsewhere.

Based on what I have written thus far, it is probably obvious that I am a fan of the free market. That said, I think that strong and well-reasoned arguments about free market alternatives or at least exceptions can be made. Jeanneney, however, relies mostly on sneers and straw men to make what might otherwise be compelling points. For example, he suggests that the kind of book search Google is developing is valueless (“An indeterminate, disorganized, unclassified, uninventoried profusion is of little interest.” p. 5). However, this can’t possibly be true, because first, if a book digitization project of the type Google is undertaking is valueless, it is unlikely that Google would do it because there would be “little interest”. (As Jeanneney is fond of pointing out, they have the bottom line to worry about.) Second, if the Google digitization project is of “little interest,” then Jeanneney is making much ado about nothing, because a project that interests no one can’t possibly have the dire effects he suggests.

As Frank points out, much of Jeanneney’s concerns and arguments are premised on ideas that do not prevail in America. While I think this is a fair point and certainly helps us understand where Jeanneney is coming from, I don’t think this provides a strong defense of his position. His skepticism of markets, for example, is only convincing to the extent that he provides justifications for such skepticism. Similarly, his reliance on government and the ability of government to provide a quality digitization product is only compelling to the extent that we believe that government is better able to provide this kind of service. As others have noted, Jeanneney does not provide any real evidence that a government produced search engine will be able to produce any less culturally-biased algorithms than any other entity, and in some places, seems to suggest that he would be fine with a bias as long as it skewed towards the language and culture that Jeanneney prefers and feels is underrepresented in Google.

In many ways, I think “Google and the Myth of Universal Knowledge” can be understood best not as a carefully constructed treatise on the intellectual property problems and cultural responsibilities of corporations and governments in the Internet age, but rather an attempt on the part of a bureaucrat to encourage his government to fund his pet project. I in no way fault Jeanneney for this effort, but I do think it’s essential that the book be viewed for what it is. Jeanneney would have us believe that he is a cultural philosopher of sorts, but most of the book reads more as an appeal for funding or a PR effort to angle for a position on a new EU committee for digitization. Again, I take no issue with what I see as Jeanneney’s efforts to advance his interests and the interests of the organization that he directs, but I think it’s important to realize that these interests are in play.

The greatest value in Jeanneney’s work, as I see it, is that he attempts to make a case for competition between and among digitization efforts. As Anglee pointed out, there is certainly danger in one entity, whether that is a government or a corporation, being responsible for the digitization of all books everywhere. If a single entity controls the flow of information, there is always the potential for abuse. And, while I think the reasons that Jeanneney gives for Google being a particular threat are somewhat overblown, I think the underlying point is well-taken. Competition among providers of book digitization can only lead to better, more useful, and more accountable book databases. In addition, different providers, particularly governments, may be able to encourage authors and publishers to provide access to some copyrighted works on public interest grounds, or may alter their laws in order to facilitate digitization efforts, and this too would help to expand and improve the breadth and utility of digitization. On this point, Jeanneney and I agree. However, we diverge when it comes to valuing the various actors in the digitization competition. Jeanneney quite obviously thinks that Google offers an inferior product because the company might, for example, thematically link advertisements to books. However, I see no problem with these sorts of activities, as the users of Google will be using the search despite knowing that there are ads, and Google certainly does not hide the fact that it places ads on its search pages.

I want to note that while I take a critical approach to “Google and the Myth of Universal Knowledge,” I do agree with certain underlying assumptions made by Jeanneney regarding the importance of preserving cultural heritage and the encouragement of a broad, context-based view of a society’s intellectual and cultural history. However, I question whether we overstate the ability of a single book search or search engine to strip away all cultural context from a work, let alone the user’s preferences in searching a database. I do not dispute Jeanneney’s underlying assumptions about the value of culture; however, I mostly disagree with him about the methods he proposes to protect those values. I do not believe that governments are any better, on the whole, at achieving these goals than private enterprise, because governments are equally susceptible to individual biases and pressures. We might even see something relatively accessible like Google, which develops its product based on user preferences, as more democratic and responsive to the true preferences of the greater population than government decisions that may be influenced by small but powerful interest groups. I propose that the best digitization model is one that involves competition among government actors and the private sector, which will ultimately lead to the greatest variety and best level of service for users from digitization efforts as a whole.

March 31, 2008

Jeanneney's own Myths on Universal Knowledge

Jean-Noël Jeanneney's European call-to-action against the development of a Google-dominated digital library makes for an interesting read.   I tend to agree that a private sector, market-driven endeavor such as Google's probably won't succeed in truly providing "universal knowledge" in a global sense.  For one thing, Jeanneney is probably right that Google's book search service will favor the dissemination of Anglo-American or English language works over those of other countries or other languages.  This of course doesn't stop other countries and other companies from digitizing foreign works.  Jeanneney also correctly points out some of the problems with entrusting "public" knowledge to a private entity that controls access to that public information (what if Google goes belly up? Gets bought out by someone whose motto is "Don't be evil… some of the time." Sure it's improbable, but it's far from impossible.  Also, what if Google decided to "recapture" public domain works with it's own licenses on scanned content, something it already tries to do with it's "not for commercial use" disclaimer?). 

That being said, I do think Jeanneney overstates some of the issues with Google's service, and in particular he mischaracterizes (in my opinion) the market for search services generally and for that of a book search.  I'm also not convinced that a large, expert-driven, government sponsored initiative such as the one Jeanneney proposes is necessary to assure a publicly accessible digital library, particularly given projects already under way to compete with Google's book search.  The history of technological innovation on the internet has usually been one of diffuse, bottom-up innovation.  A large government-driven organization may itself be susceptible to biases similar to the market-bias Jeanneney fears.  I'm less concerned with the market-style of bias than the expert-panel style of bias, since the first is presumably capable of providing search results that users actually want.  Jeanneney also seems to overlook the fact that the market already plays a large role in determining which publications are made available in print to begin with. 

Jeanneney seems to fear that the Google's market-driven approach means that many works simply won't be selected for digitization because they lack demand in the market.  I don't think that's actually correct, I believe it conflates two types of products: the first being the books themselves, the second being a searchable database of digitized books.  Google's "product" is the latter, and the value of the latter is determined by the exhaustiveness of the database and quality of the search, and not so much the value of each book itself.   I'd argue that the market forces would pressure Google into including as many books as possible, not just those that would be the most popular in search results.   

While Jeanneney attempts to rally support for his vision of a large government-driven digital library project, he overlooks some of the non-governmental/non-profit attempts at book digitization.  The most prominent of these is probably the Open Content Alliance.  Jeanneney mentions OCA in his book, but mistakenly as another example of a corporate project like Google's simply because of Yahoo!'s commitment to it.  OCA, however, is very different from Google's book search and is entirely a not-for-profit endeavor.  Many U.S. libraries, such as those of the Boston Library Consortium, have declined invitations to join Google's initiative citing some of the fears shared be Jeanneney.  The OCA hopes to be a depository of public domain works that will then be accessible over the internet, to all, for free, with no strings attached.  Other projects include Project Gutenburg, a decentralized attempt at having users themselves scan and upload public domain works.  Yet another attempt involves a topical approach to digitizing books: the Enyclopedia of Life hopes to one day catalogue all works on biodiversity and currently relies on the catalogue provided by the Biodiversity Heritage Library.  Jeanneney discusses at length the need to discuss how information is to be "organized" in the digital context, the ability to take topical approaches such as EoL seems to be a great advantage to the power of the internet.  Mandating a government-run digitization program might take away resources from innovative knowledge-organization schemes such as EoL. 

While reading Jeanneney's book I was surprised by the omission of what I perceive to be one large barrier to the development of useful and widely accessible digital libraries:  Copyright Law.  While most of these book scanning projects focus on the archiving of public domain works, that doesn't make for an easy distinction in the international context.  Sometimes, what could be in the public domain in the U.S. might not be in the public domain in a European country.  Also, most U.S. companies have little experience with the European concept of moral rights.  These aspects of international copyright law might unexpectedly hinder a U.S. Corporation's ability to scan and index international works.  I'd be interested to here Jeanneney's opinion on this matter, I'd also be interested to know to what extent, if any, such international copyright issues played in limiting the amount of international content on Google's book search.  I'd be amused if it wasn't actually the market that prevented Jeanneney's ability to find foreign works on google.fr, but rather European regulation. 

Throughout Jeanneney's book, the concern seems to be the dissemination of knowledge, and an attempt to avoid discriminatory dominance of one culture.  Interesting though, most of the "culture' and "knowledge" in these digital book projects involve public domain works, those published before 1923 in the U.S. (or before 1930) in France.  While I can see the value in preserving these works for historical significance, and as a way of preserving our past, I really do wonder how big a role these public domain works play in shaping our culture.  My first hunch is that works published in the past 70 years are probably more important for promoting culture, and contain knowledge that is more useful to the people of today.

Jean-Noel describes often his concern that projects like Google might marginalize certain works, or disadvantaged populations.  If the access of culture and knowledge to marginalized population is really a concern, then is all this discussion about access to public domain works really that important?  Perhaps a better discussion would be how to leverage the benefits of digitization and the power of the internet to make information under copyright more accessible.  Jeanneney does discuss the need to talk with publishers to digitize works still under copyright but it's not clear what he hopes the end goal of this discussion will be.  Google currently provides snippets of copyrighted works under it's interpretation of "Fair Use"… but as Jeanneney likes to point out (citing former ALA president Michael Gorman), retrieving just a few pages or snippets of a work, out of context to the work as a whole, presents little value to the reader.  If this in fact hampers the dissemination of culture and knowledge, then what is the best way to digitally disseminate works currently under copyright while still providing a return and incentive to authors?  The current international standard for copyright terms is Life+70, by the time such works reach the public domain, the vast majority of them have little commercial value, and likely have little value as disseminators of culture and knowledge as well.  However, many works might lose commercial significance long before their copyright expires, in such situations the knowledge and culture contained in such works might still be valuable (particularly to marginalized populations), yet the default rules of antiquated copyright regimes would prevent the digital dissemination of such works for years--even where a publisher or author would care little about such dissemination.  How could a modern-era copyright regime more effectively promote the dissemination of culture and knowledge? Should we reduce copyright terms?  Should we require copyright renewals as U.S. copyright law once did? Should we create pricing schemes that allow authors and publishers to recover fees that cover costs and create incentives to write while allowing cheap digital access?  I think there's a really interesting discussion to be had here.   

Critique of Google and the Myth of Universal Knowledge

Jean-Noel Jeanneney raises interesting several issues with Google’s Library Project in his book “Google and the Myth of Universal Knowledge.” Unfortunately, several flaws in his argument distracts from otherwise a sensible idea of having a government establish a digital library for the general public.

Jeanneney first makes a fundamental mistake in treating Google’s Library Project as a project to create an online digital library. Although Google claims that its mission is to “organize the world’s information” and its Library Project is the “first step toward a long-dreamed-of universal library,” Google’s primary business is in advertisement. This mistake is somewhat surprising because Jeanneney also raises doubt about Google's stated mission by noting that the CEO of TF1, “the first commercial television station” in France, stated that its business in not to provide entertainment but rather create an atmosphere for viewers to comfortably receive commercial messages. Similarly, the primarily purpose behind Google’s highly touted search engine is not to provide accurate search results to the users but rather to provide advertisement. Google generates substantially all of its revenue from advertisement. Even its widely used Google Mail is primarily a platform to serve advertisements which are relevant to the user’s emails. Therefore, it is more accurate to view and treat Google’s Library Project not as a project to create an online digital library but rather as a platform to facilitate online sale of books through advertisements. What Google is providing through its Library Project is an experience similar to one might experience in a traditional brick-and-mortar bookstore. A person does not go to a book store to conduct a research on a complicated subject. Instead, that person goes to a bookstore to purchase an appropriate book. Google makes it easier to find the right book by allowing the person to browse the book at the person’s convenience and by making it easier to search for books that might interest the reader. If this insight is correct, most of Jeanneney’s arguments becomes moot. As it would be ridiculous to raise the same issues with a traditional bookstore, it will not matter if Google’s Library Project will have mostly American/English books, or if there will be advertisements, or if Google is acting primarily out of for-profit motives.

Even if Jeanneney is correct in treating Google’s Library Project as a project to create an online digital library, his concern about the potential domination of American culture is overstated. First, although Jeanneney cites to a study which claims 75 percent of online requests is handled by Google, Google simply does not have a world-wide dominance over the search market. In certain countries such as China, Korea and Japan, Google is only a marginal player in the local search market. The people in those countries primarily use local search engines other than Google because the local search engines are able to provide far more relevant information to the users, probably because of their better understanding of the local culture. Consequently, if creating a digital library is a viable business plan, the leading internet companies in those local markets will create their own local digital library, which will be removed from the influence of American cultural hegemony.


 As a side note, a more interesting question is the effect of dominance of English language in programming languages. As far as I know, there are no non-English programming languages that have been used in a commercial setting. Furthermore, most of the technical documentation are provided primarily in English. Nevertheless, the risk of American cultural hegemony in software development has never been an issue.  One could dismiss this question by saying computer languages are merely machine code. However, I believe as with any creative product (and software is largely (unfortunately) a result from a creative process) has cultural significance. For example, I do not believe it is a mere coincidence that the most popular open-source project, Linux, was originally started by a Finnish programmer, instead of an American programmer. Furthermore, there are substantial differences in the software that are produced in different countries, which can be confirmed by merely by observing the differences in news websites of different countries. 

 Moreover, Jeanneney underestimates the difficulties in creating a global digital library. Myriad of different copyright laws would make it difficult for Google to provide foreign books through its service to everyone around the world. For example, Google had to severely limit the functionalities in its Google News Service, which provides headlines of news and the corresponding link to the news articles, in Belgium because of the differences in copyright laws. There are significant differences in concept of freedom of speech between countries. For example, a French court ordered Yahoo to block French residents from accessing certain parts of Yahoo’s US site which were facilitating the auction of Nazi memorabilia because of French law barring sale of Nazi memorabilia. Furthermore, it is not clear how any organization will deal with religious differences. Wikipedia has been recently criticized heavily by the Muslims around world for  allowing a historical (non-offensive) 14th century picture of Prophet Mohammad being displayed on its website for an biographical entry on Prophet Mohammad.

 Jeanneney also raises concern about the for-profit nature of Google. However, it will be in the best interest of Google to provide the most accurate and relevant information to the users because otherwise users will not use Google’s Library Project, and Google will face drop in advertisement revenues. This particular issue has already been discussed extensively in the Internet community and my general feeling is that the consensus is that the risk of Google subverting the rankings by selling the location of the search result is fairly minimal and there is no reason to believe that the risk will be higher for Google's Library Project. Jeanneney also cites a study which claims that most users fail to distinguish between advertisement and the search result. I do not know which study he is referring to but I do know that this particular issue has been also discussed extensively and again, my understanding of the community consensus is that the risk of confusion is fairly minimal for relatively discrete text-only advertisement that Google uses in its search website.

 Jeanneny’s proposal to have a government-funded digital library may still be a sensible idea. If we are living in an information age, access to information may be fundamental to our liberty and it may be the proper for the government to provide free information through an open digital library. Furthermore, government may be in the best position to resolve sticky copyright issues. Nevertheless, there are still significant technical hurdles (some which have been identified by Jeanneny, such as the problem of categorizing books) and it will take more time before we will be ready to starting building an open digital library.

March 30, 2008

Google and the Myth that it will Destroy European Knowledge

Jean-Noel Jeanneney in “Google and the Myth of Universal Knowledge” has two main complaints about Google Book Search: (1) it furthers American cultural domination and (2) its search algorithm does not allow for a socially beneficial dissemination of knowledge. My Anglo-Saxon capitalism, Chicago School dominated mind was not convinced. This post discusses reasons Google Book Search is not as problematic as Jeanneney claims.

Google Book Search Will Not Lead to American Cultural Hegemony
Jeanneney raises the specter of populism destroying cultural variety but does not explain why a search engine should serve to steer people away from accessing the material in which they are most interested. Jeanneney states that “specifically concerning books and images, there is the danger that cultural populism will organize channels of access in favor of the most elementary, the least disturbing, and most commonplace products.” (31) Google Book Search will of course facilitate this process by listing first materials that are most popular results for a search. But why is it a flaw that a search engine efficiently returns the sources that a searcher is looking for? Jeanneney’s real complaint is not with Google Book Search, but with the uncultured, unsophisticated masses who will use it to find the latest mystery novel instead of A la recherche du temps perdu. But can a book search engine really change popular taste by returning materials the government thinks are most beneficial for an educated mind ahead of the material a person was actually searching for? Are people really so aimless in their searches that the order in which
a search engine returns results will cause them to read what the government wants them to read? Jeanneney places too much emphasis on a search engine’s ability to manipulate taste, and further, fails to explain why popular taste should not be respected.
Even when it comes to general searches where a reader seeks to educate himself about a topic, Google Book Search will not reinforce the American cultural hegemony that Jeanneney fears. Jeanneney is suspicious that Google’s top ordering of American sources will lead to the American view becoming the dominant one on historical events such as the French Revolution, September 11th, or the Cuban Missile Crisis. (41-42) Considering Jeanneney’s statements denigrating America’s unfortunate market oriented bent-- “[t]his [short-term profit for shareholders] philosophy, fortunately is not ours” (27)-- and assertion that those in France and probably in the rest of Europe who subscribe to it are “clearly in the minority,” (27) it seems unlikely that European citizens mistrustful of America will blindly depend on American sources to inform themselves about events just because Google Book Search happens to rank them first. Nevermind the effect of the language barrier, which Jeanneney does not address. Despite English’s place as the current lingua franca, many European citizens do not read English well, and even if they do, prefer to read in their native language unless a source is otherwise unavailable. It is not intuitive that a French reader doing a search on the French Revolution will read the American source just because it is listed first. Instead, he will scroll down until he finds a French source because of both his increased linguistic and cultural comfort with French sources. Finally, Jeanneney underestimates the power of feelings of cultural superiority, which is strange, since the tone of his book conveys his belief in France’s, and the European Union’s superiority over America’s economic and social structuring. Nationalistic spirit gained from living and being educated in a country leads most citizens instinctively to believe that their culture’s point of view is superior. People are therefore most comfortable and trusting of sources written by their own citizens. Users of Google Book Search will largely skip past higher ranked foreign sources to read sources written by citizens of their own country, in their own language.

Google Book Search Will Not Destroy Research Methodology
Jeanneney has also overblown the threat Google Book Search’s algorithm creates in regard to a socially beneficial dissemination of knowledge. At the beginning of his book, Jeanneney reassures readers that books will survive, as will the need for booksellers and librarians. (20-24) As he notes, “librarians have always helped to organize chaos, to guide readers to the information they are seeking among the vast quantity of sources and media that contain it. And now, with the irruption of digitization, this essential function will be enhanced.” (23) So it would seem to follow that Google Book Search would be an additional useful tool for researchers to use, and one that librarians could use to assist researchers find the sources they need. Instead, Jeanneney assesses Google Book Search from the perspective that its search algorithm will somehow replace current research methods, instead of supplement them. Jeanneney criticizes Google’s search algorithm for bringing up hundreds of thousands of pages to scan in response to “a difficult question such as whether democratization favors equality or not.” (45) But this criticism misses the point. This question could be the subject of an entire book, and has no concrete or factual answer. Google Book Search never claimed to be a magical book writing engine that pops out a complete and concise bibliography at a searcher’s command. A researcher still has to know how to research, and choose smart searches for Google Book Search. In reality, a researcher would never do such a broad search in the first place expecting an easy answer in either a library catalogue, or in Google Book Search. Instead, he might start with a key word search of “democratization and equality” to see whether any books had been written on this exact subject. If not, he would then decide how he was going to frame his research, for example by researching measures of equality other authors have used, and research those in recently democratized countries. Google simply enhances available searches by expanding searches from keyword categorizations of material by librarians to the text of the book itself. Jeanneney somehow casts text searches as a negative, asserting that a found paragraph in a book that contains information relevant to a keyword search is “all but useless” (69) because without more searching, or reading of the book, the information is meaningless out of context. Jeanneney doesn’t clarify why the researcher could not peruse other chapters of the book for context. Anyone who has researched in a library has had the experience of finding a source that seems relevant, but then upon some further reading reveals itself to not to be helpful. Jeanneney does not explain why a researcher using Google would not verify the usefulness of the source after doing a keyword text search that brings upon a source that initially seems relevant. Jeanneney’s statement that “Google is concerned only with keywords and with individual pages, not with works considered as wholes” (69) may be true, but is not a condemnation unless we assume that researchers bring with them no research methodology. So yes, Google Book Search may not be a good research tool for an elementary school child researching a complex topic, but that’s why, as Jeanneney stated, librarians are still important, as is teaching children how to research properly.

A much larger threat to complete and useful source listings are Intellectual Property laws in every country, not Google Book Search’s algorithm. Intellectual Property laws mean that publishers can prevent any book not in the public domain from being scanned in. Jeanneney would be better served in fearing those limitation, than American cultural hegemony, or the keyword search that destroys all knowledge of how to research.

A Critique of Google and the Myth of Universal Knowledge

Jean-Noel Jeanneney’s evaluation of the Google library project critiques Google on several fronts, some more convincing than others. Three of his primary critiques, however—concerning the problem of selection, the problem of hierarchical search results, and the problem of context—are unpersuasive. Jeanneney’s discussion of these three issues simply highlights problems that already exist even in the print world. Moreover, his solution to these issues, in the form of a European digitization effort, will not solve any of his concerns.

Jeanneney’s first major critique of the Google library project revolves around the problem of selection. Jeanneney worries that, when Google is choosing which books to digitize, the list of priorities will necessarily “weigh in favor of Anglo-Saxon culture.” (p. 6) Moreover, if commercial concerns are the only ones that drive the selection and exclusion of certain books, then smaller, innovative works, as well as books from less developed countries, will be left behind. (p. 31) That is, the motivation for profit will dominate the selection of books, meaning that only books will be chosen that are best suited to satisfy the demands of advertisers. (p. 31)

Jeanneney does not recognize, however, that both of these issues are already at play in libraries and bookstores all over the world. Google is digitizing works from partner libraries. These libraries have been using various criteria to select books for their collections for decades. Google is not personally selected books to digitize; rather, it is relying on libraries to make selections for it. Of course, Google can choose which libraries to partner with, and perhaps this is an area of concern, but it is a different concern than the one expressed by Jeanneney.

Furthermore, the profit motive that Jeanneney is so concerned with already partially drives the publishing market. He worries that by selling advertisements on the web pages of the library project, best sellers will crush more scholarly works. (p. 32) But, Jeanneney does not explain how this system would be any different from the current print system, where books that are not expected to be best sellers are nevertheless still printed and stocked in libraries. Perhaps the digitization project will make it less likely that publishers would be willing to print less mainstream works, but this is not at all clear. In fact, Jeanneney himself admits that the Internet may actually lead to obscure works gaining some popularity by making it easier for such works to be noticed. (p. 21) Thus, Jeanneney’s concerns about selection are misplaced.

Indeed, Jeanneney’s proposal of a competing European database does not fully address these concerns as he has laid them out. Jeanneney seems to assume that the European database would not marginalize under developed countries’ works in the same way as Google’s library project. This seems rather over-confident. Perhaps, then, Jeanneney is really only worried about dominant European culture being marginalized by Google’s project, not all less dominant cultures (European and otherwise).

Furthermore, Jeanneney asserts that national governments should be involved in digitization efforts in order to encourage creativity and to act as a buttress against commercial interests. (p. 33, 76). He does not explain why governments will be any less biased in making book selections or defining standards than a commercially driven enterprise. To take an American example, the government is involved in making artistic judgments through the National Endowment for the Arts. Those decisions are often very controversial and the NEA is often accused of being biased in its decisions. Though some might be concerned about “an excessive governmental role,” Jeanneney insists that those “concern[s] seem[] groundless.” (p. 76). This, however, is completely inconsistent with his earlier discussion of Google’s possible impulses toward censorship of particular materials that “might seem inappropriate to the American mainstream.” (p. 48). If Google could be pushed towards censorship out of some misguided sense of American patriotism, then surely the government, if involved in the selection of books, is even more likely to have a “patriotic” agenda. Thus, either Jeanneney is again assuming that a European database run by European governments would not fall pray to biased selection or he is actually only concerned with a pro-American selection bias and would be fine with a pro-European selection bias. In either case, Jeanneney’s proposal does not respond to the problem of selection biases.

Another major concern for Jeanneney is the way the Google library project will list search results. Google Book Search is likely to list obscure books or books in languages other than English lower than others. (p. 42) While the “gondola end” may indeed favor well known books, Google’s search algorithm would simply be adapting the way searching for material works in a non-digitized world. That is, well known books are well known for a reason; consumers have heard of such books and actively seek them out. If a customer is physically searching for a book on a particular subject at a bookstore or library, he is likely to choose the most popular, most prominently displayed, or first book on the subject. If he wants to find a more obscure book, then he will look further along the stacks, just as a digital searcher could easily scroll down through results if he wanted a more obscure book. If anything, searching digitally through Google might actually lead to more minor books being seen because scrolling through results is relatively costless. Jeanneney’s solution to the search results problem is to propose a European search algorithm. (p. 46). As with his solution to the selection bias problem, however, he seems to be assuming that a European algorithm would be free from the gondola end. There is no reason to think that the European algorithm, however, would be any different from Google’s algorithm, except that it would prioritize search results in a way that is more pleasing to Jeanneney.

Finally, Jeanneney asserts that cultural and linguistic context will be lost if search engines pull up only a few pages of a book at a time. (p. 68) He quotes Michael Gorman, who argues that books are designed “to be read sequentially and cumulatively.” (p. 68) Jeanneney seems to want to force people to read whole works in their original context and language. However, even without digital copies of books there is no guarantee that a searcher will read an entire work if he only wants to find one or two simple facts. Indices and tables of content can be thought of as the first search engines, making it easier for the reader to read only a portion of the work. Indeed, any one piece of information is just a piece of the overall cultural and historical landscape. Whether it is one page in a larger work, or just one book in a larger series, no one can define what “enough” context for a work is. The implication of Jeanneney’s assertions is that search engines should not pull up individual pages in books. Rather, people should be forced to read the entire work—or at least flip through the pages and look for their search terms manually. Jeanneney’s paternalistic instinct is to tell people what the appropriate amount of context is that will allow them to truly appreciate a work, but people can and will make this decision for themselves. Jeanneney does not seem to precisely outline what should be done in response to this contextual problem beyond saying that information must be organized in a clear manner. (p. 72). Any organization of works, however, immediately runs into the two problems discussed above—selection bias and search results bias.

Jeanneney’s criticism of Google and his counterproposal raise some interesting concerns. There is a real danger in one entity being responsible for the digitization of all of the world’s works because as Jeanneney points out, Google is not invincible. In that sense, Jeanneney’s proposal for a European database is valuable because it will distribute the responsibility for the digitization project. His other reasons for suggesting a European database, however, are unconvincing. A European library project would not solve his concerns of marginalization of obscure or minor works because of selection bias, search results bias, or lack of context. Indeed, these problems are already inherent in the nature of libraries and bookstores, and are not grounds for criticizing the Google library project.

Anglee Agarwal
Session 1
March 31, 2008