Search Engines
From A2K Wiki
Contents |
Panelists
Speakers
- Judith Dueck - Vice-Chair, Human Rights Information and Documentation Systems International (HURIDOCS)
- Michael Geist - Canada Research Chair in Internet and E-commerce Law, Faculty of Law, University of Ottawa, Canada.
- Robin Gross - Executive Director, IP Justice
- Richard Owens - Director, Copyright E-Commerce Technology and Management Division - WIPO
Moderator
- Sudhir Krishnaswamy - National Law School, India University, Bangalore
A2K2 Conference Organizer
- James Grimmelmann - Knight Fellow, Information Society Project, Yale Law School
Panel Description
Search and information retrieval tools are an essential component of any information policy. Without effective search, even nominally open access to knowledge resources remains theoretical, given the scale and dispersion of information today, particularly online information. At the same time, questions of search are profoundly political. Search tools can be biased to favor particular information providers or to censor some forms of information altogether by hiding them from view. They similarly raise deep and problematic questions of control over information resources, including privacy, intellectual property, and telecommunications. This panel will ask how debates about access to knowledge infrastructure apply to search in the international development context. It will examine issues such as:
- The relationship of global, national, and local search tools to global, national, and local cultures.
- Strategies for the provision of search.
- The nature of search engines' power, and the distinctive threats posed by concentration of that power.
- The role of search in bringing to light human rights abuses--and in facilitating them by repressive governments.
- The relationship between search policy and free speech policy.
- The place of search technologies within a healthy overall ICT agenda.
- The appropriateness of search tools for traditional knowledge and noncommodified knowledge goods.
- Whether the current prominence of the centralized search engine is a transitory phenomenon.
- Designing open-access information resources to promote effective searchability.
Speaker Presentation Slides
- Judith Dueck demonstrated and discussed a human rights search engine: HuriSearch (http://www.hurisearch.org/). Her slides are here.
She also brought several handouts: HuriSearch, What is HURIDOCS?, HURIDOCS Tools, HURIDOCS Partners
- Robin Gross discussed law and policy issues for search engines and access to knowledge. See Robin's slides [here].
Remote Questions for the Panelists
Notes
Judith Dueck -- HURISearch for Human Rights Information
Information can be poewrful in the defense of human rights. Knowledge provides the basis for good decisions in civil society. Civic rights are without meaning if they can't be based on good information. I hope to provide an example of specialized search to meet a common need.
HuriDocs has no formal membership. It shares in an open-ended, international spirit. It's a network. A wide range of volunteers -- progammers, companies, translators, trainers, etc. Our tools are used around the world. HuirsSearch is just one of the information tools we create for human rights.
Online, people consume and create content. Hurisearch provides human rights information to the world, but also gives 3200 human rights orgs the tools to make their information available. it also gives them tools to keep control of their information -- in the HR community, people die if information gets into the wrong hands.
HuriDocs helps in finding non-English materials, especially in other scripts. It supports 77 languages of docments, and has 8 languages for interface. NGOs, IGOs, academic institutions supply information (not governments). Often it is small, unknown organizations with the most authentic information. They can get it online, perhaps not in a sophisticated way, but it's there. We bring it up to the fore.
(Demonstration of search -- she searches for "child soldiers" and then filters for results in Arabic, then cuts a word in Arabic from the results and searches on it. Then she flips the interface into Russian, including translating HURIDocs index terms. It can also be filtered based on what country the website was published in. You can also filter by organization.)
Huridocs terms are there to help nonnative speakers use controlled vocabulary to focus and filter searches. There's also a page of help, and a search-box for webmasters to search their own sites. We always welcome support. The OSCE has created a customzied adaptation for the 53 countries, along with canned searches on specific themes.
What do we need? Train smaller organizations in good metatagging (both for us and for other search engiens). Funding, or an egalitarian business plan. Translations into other languages, programming -- lots of needs for those who want to get involved. I just returned from Romania and did a demonstration. Those who were included were proud; those who weren't went home and submitted their sites.
Michael Geist -- a2rk (Access to OUR Knowledge)
This idea grew out of the Haifa search conference. This talk will focus on Google because it's dominant, but the issues they raise are endemic. Interesting discussions of Googlebombing, about what Google does with "Jew" searches. And a really interesting idea that grows out of Battelle's book -- the database of intentions. That's all the information that search engines can aggregate about us.
AOL's search query release from last summer was and still is the biggest story. Good old searcher number 4417749 -- AOL's anonymization didn't work, and it was easy to identify searchers from queries. AOL took the queries down; the researchers objected that this would set back search quality. He came away from the conference thinking that this was just the tip of the iceberg. Here's another example -- Google Web History. And another -- Google Docs, gCal, Blogger. Who else has access to that knowledge? Slideshare, mozy, the list goes on.
Privacy policies are layered onto this. Google's policy is very upfront about the implications. There's a privacy center, and specific information for each product.
The concern he has isn't with Google per se, but rather with who else might seek that information. Particularly government and law enforcement. It's no longer about seizing a particular laptop; it's about going to a service provider who has access to that information. The Google-DoJ suit, the Li Zhi suit against Yahoo, the German government's acknowedment that it conducts online searches of computers. Canadians have been afraid that outsourced data would be transferred to the U.S., so that secure Canadian private data would be exposed to U.S. eyes. Some very interesting cases about squaring U.S. obligations with Canadian ones. Some Canadians have objected that LSATs must provide a thumbprint; that biometric is sent to the Pennsylvanian LSAC and stored for seven years. Some students have said they're unwilling to take the test if the data will be stored in the U.S.
Question Period 1:
(1) "Universally available" -- has it ever hit you that not everyone is on the Internet? Certain villages will call local community radio saying "I need information from the Internet on how to make milk into yogurt." People in town search Google, call radio back, which translates into local language and broadcasts the next day. Another path is tracking people; I'm from Nigeria, where you get those emails saying I have $21 million. We track them with the IP addres; these guys move from place to place. When we are talking about privacy and all of that, I hope we'll keep these cyber-vagabonds in mind.
Judith: we have a philosophy of using whatever technology, of whatever form. I love your example about radio and translation. If it's on the web somewhere, HURISearch will promote it, which ties into . . .
(2) Are the "wrong hands" capable of using HURISearch to make their searches more effective?
. . . (Judith cntd.) we had a big debate about information in the 1980s. Was there one big database, or were there distinct groups that use the information to connect. When groups want to make information about torturers public, then they put it on the web. So those decisions are made by the groups who decide when to make it public.
(3) Are the human-rights community and the privacy community connected?
Judith: That's one of the reasons I'm here. What I've heard so far is fascinating, but it's a long way from Eastern Europe or Rwanda. There's a disconnect, and also a deep digital divide.
(4) What about Fast?
Judith: They created AllThWeb; we've had some informal connections. And now they host us and provide us with software at no charge.
(5) Aren't many people concerned that big business has this information, in the same way that Canadians are concerned that Americans have it?
Michael: I can understand that. I see a continuum. The most concerning end is where the collection is secret. People might have a certain comfort level with some tools, and not with others.
(6) What outreach efforts have an impact so that people will respond? If Canadians know that they have these rights, to what extent do citizens care?
Michael: As for what citizens can do, there is a sensitivity to being called out on these issues. A lot of this takes place below the radar screens. One of the real issues in the Canadian Debate over "lawful access" is the question of oversight.
Robin Gross
I'll talk about some legal issues in search. What is the role of search engine? It often comes down to analogies. Digital delivery truck or a sponsor/endorser of information? Some would say it's similar to a library catalog system. Some legal issues: privacy, freedom of expression, IP, and public policy (antitrust, media concentration).
Privacy: at least in the U.S., there's very little protection for individuals against this kind of data collection. Michael Geist went over many of the issues already.
Freedom of expression: liability for linking to "illegal search?" How about de-listing controversial or minority viewpoints? Ranking algorithms will always have the bias of tech design -- but market forces could create competing, alternative search engines. You need to know that alternative search engines exist.
IP: copyright, trademark, databse, trade secret, patents on algorithms. The distnction between primary and secondary liability is important. Fair use is the big issue for the search engine legal teams. The billion-dollar industry is built on fair use. This requires analysis of (at least) the traditional four factors.
Some copyright issues: Google Print Library (She thinks this is fair). Images (compare Kelly, which found fair use, with Perfect 10, which found infringement). Caching -- fair use so far (Field v. Google). Google News is unknown. Usenet is fair (Parket).
Trademark is also a big issue -- the adwords sales. Are these a use in commerce? The courts are splitting on this; some go each way. Hyperlinking might raise some trademark issues -- cites the Shetland Times deep-linking case, which settled. Most people tend to agree that hyperlinkg is noninfringing, but you never now how a court will go. If you used a logo to link, that might be closer. There are some other IP-like legal barriers -- trespass to chattels (eBay), database rights (EU Database Directive). The EU has held that search engine indices don't qualify for protection as databases.
Other public policy issues: think about Google's huge market power. In the marketpalce of ideas, that's worth looking at. Democracy could powerfully be promoted. Opportunity to provide universal access to education. In summary: leave room for innovation and progressive use of IP rights. Promote policies that value free expression. Show respect for Internet users' privacy. There's been a dynamic coalition on A2K coming out of the Internet Governance Forum, and we're dealing with the liabilities of third parties, much like search engines.
Richard Owens
Robin did a great job of talking about what I was going to say, except for one thing -- it was U.S.-focused. In other commonwealth countries, we have fair dealing. Australia is making moves to broaden fair use in its FTA negotiations. So we're de facto looking at broadening fair use in common-law sytems. In the civil-law systems, we're looking at much more narrowly defined exceptions. So, for example, Belgium's news-reporting case ws strongly constrained by that narrowness. Limitations and exceptions are classically one of the most important ways to build flexibility into the treaty systems in non fair-use systems. It's not going to be easy to rely on statutory exceptions everywhere.
What do we need to look at? In terms of international public law, there are number of possible answers. In the U.S., there was Grokster and contributory/vicarious/inducment. Elsewhere, there were enabling infringement sometimes, but there aren't secondary liability doctrines in the same way. If there were to be some way of dealing with this, it probably wold not be a treaty, and NOT WIPO. With respect to the broadcast treaty, we're approaching something very narrowly focused on signal piracy and nothing more, and we can hope that this process will end (either up or down) soon. No one is really happy with treaties right now. So we're not creating a secondary liability treaty. There could be some soft-law approaches that could build out of the grassroots. The differential of standards remains a big problem.
I really wanted to talk about how we could use technology to address problems once these issues are more clear. Search engines are going to be able to need to read and interpret content. Right now, robots.txt is an up-or-down thing. The Automated Content Access Protocol is being worked up by rightsholders and some ISPs and some interest from search engines -- find a way to automate licensing and permissions. Litigation is extremely expensive, so clarity int he law is important, and beyond that, we need to have ways to signal these thigns. This metadata is the lifeblood of content provision.
How do you find a rich and accessible public domain? You need to be able to avoid infringing integrity and paternity rights of creators. So that information could be applied as metadata in a standard way, allowing moral rights to be respected in the same way. Is this a WIPO issue? A governance issue. Im trying to think ahead to a time when WIPO isn' in the doghouse
Sudhir comments:
The question of not starting with infringements -- I think I agree with this approach. James Grimmelmann asks who the actors are, what information flows, etc. -- and then see where IP fits in. Other articles on this wiki are ways of thinking about the legal side. I also worry about Internet exceptionalism. We have existing vocabularies that would help us cast things in different terms.
Questions:
(1): I'd like to hear about the reasoning that was persuasive in relation to caching problems, and also why eBay decision was problematic. I'd like to hear from both of you because in the caching case, was it fair use, or something else? If it was, how do you deal when you don't have fair use?
Robin: The cache was legal on a traditional four factors analysis.
(2): Ten days ago, the U.S. Department of Commerce banned Google Earth in Sudan -- just after Google mapped conflict areas. This was part of a sanction, with software being distributed by the United States. Where does software end and informaiton begin? How do we treat freedom of expression and diversity when looking ath media structures? it's alarming to treat some things just as software platforms.
Follow-up: it's just that it uses encryption, and that's banned in five countries. (So it wasn't to do with mapping, directly).
(3) It's becoming increasingly impossible to leave a search engine, especially when search is intergrated with all sorts of other applications. Your data stays with them forever.
(4) Is there something truly distinctive about search? Or is it just another online intermediary?
Michael: it is. Increasingly, it feels like an operating system. All of the tools are in the hands of a single search provider. Not quite the same lock-in, I suppose.
Judith: Search hs about it a whiff of the mysterious and holy.
(5) How about open source search?
Robin: This gets us into the question about community among diverse groups. A lot has to do with the A2K movement.
Judith: This has a value, but also how open do we want to be? Clickstream exhaust can tell us about society. And what about those who don't have access to that.
(6) None of the speakers addressed specific liability/nonliability regimes. (E.g. U.S. Section 512, and the E.U. E-commerce directive). Could the revision process in the EU be a chance for rightsholders to increase pressures on search, or to open things up and get it right?
Robin: I share your concerns about some of the drafting, but I also see opportunity, particularly at places like WIPO. I do have hope to use legal mechanism to build in protections.
Michael: opportunities to focus domestically, and to build in principles in domesntic legislation. Israel is very close to enacting a fair use provision modeled on the U.S. one; the Gowers report in the U.K. would build in personal copying rights; Canada is thinking about caching. It's good policy to make these things avaiilable.
(7) Shoudl we really draw a big distinction between TPMs and rights management information. If you require that certain actions be taken when certain information is present, then you have a TPM. Thus, robots.txt is particularly salient here.
Richard: Is RMI a TPM or not? Its show perhpas how a distinction needs to be made from a public-policy perspective. The broadcast flag is a case were there's a connection, some would say a nefarious one. It would lead automatically to technology mandates. From a treaty regime perspective, there a different systems. We as WIPO want to initiate thinking about what that might mean from a policy perspective. RMI is not necessarily about protecting content.
Resources and papers
Articles
James Grimmelmann, The Structure of Search Engine Law, Iowa Law Review (forthcoming 2007)
Urs Gasser, Regulating Search Engines: Taking Stock and Looking Ahead Yale Journal of Law and Technology (2006)
Frank Pasquale, Rankings, Reductionism, and Responsibility, Cleveland St. L. Rev. (2006)
Jonathan Band, The Google Print Library Project: A Copyright Analysis (2005)
Lucas Introna and Helen Nissenbaum, Shaping the Web: Why the Politics of Search Engines Matters, The Information Society (2000)
Books
Web Resources
* A2K@IGF Dynamic Coalition *
UN Internet Governance Forum (IGF) coalition of NGOs, business, govt, academics working together to promote access to knowledge and freedom of expression. The A2K@IGF Coalition focuses on the appropriate balance for freedom of expression in the online environment. A2K@IGF Dynamic Coalition website is http://www.a2k-igf.org/.

