Ethical retrieval

It may surprise you to know that librarians have codes of professional ethics. The main UK membership organisation for librarians, CILIP, requires its members to follow its ethical code; the American Library Association have something similar. Subject classification and indexing is one of the more interesting areas where ethical concerns can arise. Headings that seemed fine in earlier ages may now seem not fit for purpose (a bit like the famous moment when the American Psychiatric Association reclassified homosexuality in DSM-IV) . The great US cataloguer, Sanford Berman, has been a leader in pressing for bias to be removed from subject headings. See this article summarising his achievements (pdf). Sanitising the catalogue in this way may be seen as politically correct but sometimes it is just common sense (e.g. putting Mark Twain under “English literature” is just wrong!).

In these days of internet search engines and full-text retrieval, library subject headings seem rather arcane and unnecessary. You can search for whatever term you want in Google, be it abusive or polite, but there are still problems. Google Scholar is an index of scholarly literature, but the way that it defines and detects what is scholarly has led to some disquiet recently and a petition to remove creationist material from its index. PZ Myers has pointed out that the petition is wrong-headed:

Google Scholar does not index on content; it can’t, it’s just a dumb machine sorting text …The way items get on Google Scholar is based entirely on whether they’re formatted like a scholarly paper.

Google then is not concerned with the content and makes no judgment on the rightness or wrongness of it, rather like the principle of net neutrality which is in the news right now.

Google does make some value judgements though. There has been a growing wave of complaints that its search service is becoming dominated by spam sites:

Google’s search results [are] full of spammy links that lead to nothing of value… content scrapers, marketers, or sites that consisted of nothing but keywords surrounded by useless crappy content.

Some people were suspicious that the presence of Google ads on a site affected its position in the search rankings. Google have denied this and have now responded with a promise to work harder to remove these so-called ‘content farms’ from search results. The Blekko search engine is taking similar steps; these spam sites are not just a Google problem.

All search engines share the same problems of course – how to find everything relevant and only what is relevant, and to present the most relevant items at the top of the list. They each find their own way to resolve that problem, giving slightly different results. Or do they? Danny Sullivan, who blogs about search engines, has reported that Bing, Microsoft’s search engine, has been copying results from Google. In an elaborate sting operation Google created some ‘synthetic’ search terms to seed some false results into its database. They then searched for these terms using laptops with Internet Explorer and the Bing toolbar installed. Within two weeks the false results were appearing in Bing. Microsoft have admitted that they do watch how their customers use Google but say that this is not copying Google, and anyway all search engines do the same.

Who would have thought that Search Engine Ethics 101 could be so interesting? I was surprised that a Google search for search engine ethics brought up quite a few results, including some from the International Review of Information Ethics which was a new one on me. There is even a book The Blackwell Guide to the Philosophy of Computing and Information if you want to immerse yourself in the topic.

Google of course are famously the company who do no evil. But Siva Vaidhyanathan has just published a book called The Googlization of Everything: And Why We Should Worry. He doesn’t think that Google is evil, but he does think that its dominance and the speed with which it has reached that position are a little worrying. I confess I haven’t read the book but there is an interesting interview with its author in Publishers Weekly. I think this comment from Vaidhyanathan gets to the core of things:

The assumption for years has been that Google merely aggregates our decisions, perceptions, and our judgments. But it’s not that simple. Google is not without its biases, and I wanted to try to unpack the nature of some of its biases, which, not surprisingly, skew toward what’s new, popular, and tech-savvy. The major realization I had in doing this book is that Google now governs the Web, and more because of the choices it makes than the choices we make. Think back to when Google first started. There were a handful of search engines, and if you went to any of them and typed in common words like “Asian” or “facial,” you’d get porn sites. It was Google that figured out how to make our Web experience better by filtering—not by censoring or blocking access to porn sites. But while Google is officially content-neutral, de facto it’s not, because it filters. For example, it favors certain aspects of page design. That’s a good thing, of course. It has made the Web better. But it is also important that we acknowledge what Google does, and that Google now pretty much runs the Web, albeit with our tacit, implicit consent.

Maybe Google should be signing up to one of those codes of ethics, or recruiting Sanford Berman to advise it?

5 Responses to Ethical retrieval

Steve Caplan says:

February 2, 2011 at 2:58 am

Great commentary Frank. While I tend to rely on Google searches for personal matters, for some reason I seem to avoid Google Scholar for science-based searches–although I have to admit that simple Google searches have turned up quite a cache of Ph.D. dissertations and other not-so-accessible documents on work in my field that I otherwise would never have been aware of its existence. However, it is alarming to see the enormous power that search engines have in their inherent (and sometimes designed?) biases.
Frank says:

February 2, 2011 at 4:30 pm

Steve – thanks! I think any new entrant into search has to prove itself and at first some healthy scepticism is in order. Google Scholar has by now shown its worth, though it’s far from perfect, as I’ve pointed out before. It’s always worth trying more than one search tool and we should never imagine that one tool has all the answers. Scepticism is the best approach I think.
Cath@VWXYNot? says:

February 2, 2011 at 7:28 pm

Great post, Frank! I’d never thought of the problem of bias in subject headings before – very interesting!

I worry that Google are waiting until we’re all 100% dependent on them and will then announce that we’re going to have to start paying for our searches, email, documents, maps, RSS feeds, and everything else they do for us. So while I love Google (hell, I named one of my cats Google!), I love it when they start facing competition and/or have some of their products fail (Google Wave, anyone?)
Frank says:

February 2, 2011 at 9:43 pm

Thanks Cath. I know what you mean about excessive dependence, but I remain optimistic that the magic of the net will save us. Openness and cooperation have remained the bywords of Internet development and ensured that we can always route around a problem.

Comments are closed.

About Frank Norman

5 Responses to Ethical retrieval

Recent Posts

Recent Comments

Archives

Categories

Blogroll

Meta