Search Engines Don’t Work

“A good question is much more difficult than a brilliant answer.”

That’s a decent summation of Richard Saul Wurman’s life’s work. That he said it suggests he may think so, too. Wurman is probably best known for creating the TED conference, but in my estimation, should be best known for his work as a thinker about information.

In 1989, he published a book called Information Anxiety, which not only seeks to define a phenomenon created by an increasing volume of information within our daily lives, but also to summarize an inflection point for our culture. Take this passage, for example:

Information is power, a world currency upon which fortunes are made and lost. And we are in a frenzy to acquire it, firm in the belief that more information means more power. But just the opposite is proving to be the case. The glut has begun to obscure the radical distinctions between data and information, between facts and knowledge. Our perception channels are short-circuiting. We have a limited capacity to transmit and process images, which means that our perception of the world is inevitably distorted in that it is selective; we cannot notice everything. And the more images with which we are confronted, the more distorted is our view of the world.

What strikes me now, over thirty years later, is how incredibly applicable this passage is to our time — how much of our culture for over a generation has been focused on this aggregate quest: a pursuit that looks like one of knowledge, but is really one of facts.

Hindsight also reveals a curious semantic twist on Wurman’s take. The word “we,” for a reader in 2021, has two objects. When Wurman writes of information that “we are in a frenzy to acquire it,” an easy contemporary interpretation is to read “we” as “our culture,” or more specifically, those who have the most power today to do the acquiring. If our culture is one of information-given power, then that power is held by a short list of massive corporations — Google, Facebook, Amazon, etc. — who treat it as proprietary. Finders, keepers!

But then later, when Wurman writes that “we have a limited capacity to…process,” and that the more with which “we are confronted, the more distorted is our view of the world,” it is nearly impossible to not read those wes as us individually. You and I. We’d likely have not read it that way in 1989, but after decades of expanding our reach into the world’s information and then experiencing the unforeseen costs of doing so, a read informed by history is much more complicated, though no less grim.

In 1989, Wurman was concerned about the human experience resulting from an already information-hungry culture. The idea of information anxiety came from his observation of an “ever-widening gap between what we understand and what we think we should understand.” What we think we should understand comes, of course, from the perception of available information. So consider, for a moment, what that looked like thirty years ago. In a 1989 review, the L.A. Times summarized the world into which Information Anxiety was published quite nicely:

If you went to the Library of Congress and looked at one book, manuscript or other library resource each minute eight hours a day, five days a week, it would take you more than 688 years to see all 85,895,835 items. (That’s up from 59,890,533 in 1969.) The size of the average American newspaper has more than doubled since 1970 to 91 pages, and we spend 45 minutes a day reading it—though that is just 10 minutes more than a decade ago. The average Sunday paper in 1970 had 145 pages; in 1986 it had 351.

If that situation prompted enough anxiety to birth a book about it, imagine how much longer the litany had it been written today — if it included the web, texting, social media, e-books, streaming video, and on and on. Nevertheless, had it included the many forms and sources of information that cause us anxiety today, Wurman’s first step, to organize them all in only five different ways, would have been the same. He argues that all information can only be organized by category, time, location, alphabet, and continuum. When I first read that, I immediately resisted — That can’t be!, I thought — but the more I pondered it, the more I came back with examples that were really just another way of expressing of one Wurman’s five. (In particular, “continuum” threw me. Continuum is Wurman’s way of talking about arranging information by some kind of priority. For example, sorting a list of search results on Amazon by average customer rating is arranging by continuum. So is sorting by price.)

However unable I was to introduce a sixth means of organizing information, I was also stuck with a problem: which of these ways is how a search engine works? When Google makes good on its mission to “organize the world’s information,” how exactly does it do that? Once I get information back from Google’s search engine, I can, of course, sort it in some of the ways Wurman lists. But the most curious absence in the tools available to a searcher is continuum. I cannot sort results by any kind of priority or subjective rating. And yet, continuum of some kind is at the heart of how Google does it. The problem is, Google’s continuum is its trade secret.

That Google seeks to organize the world’s information, but doesn’t tell the world how it does it seems an almost self-defeating contradiction. But because most people find Google’s search results to be satisfactory and aren’t bothered by not knowing exactly how they produce them, Google has amassed an enormous amount of power. They control how we all find information. They also control what information we find.

Is that good?

It seems to me that there are many good reasons to answer that question with a quick and resounding “no.” That there should be a limit to the amount of power any individual or corporation is able to amass is becoming a more favored position in our culture. But that’s informed by ideas about distribution of power that stand apart from how we’d otherwise judge how an agent with power operates. But if we don’t know how they work, we can’t judge if how they work is good.

Tobias Blanke, a philosopher and computer scientist, came to a similar conclusion fifteen years ago in an ethical review of search engines. Of Google specifically, he wrote:

The full details of its algorithm are not known because of property rights, but an early paper of the two Google founders indicates that it is based not only on word occurrence statistics but also on a system of authorities and hubs. Authorities are web pages that are linked by many others, while hubs link themselves to many other pages. Web pages achieve a better ranking if they optimise their relation within this system of hubs and authorities.

In other words, Google’s algorithm is entirely based on a continuum method — of ordering information based upon a perceived priority. Google calls it PageRank. They say that PageRank “works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.”

I’ve read those words many times over the course of my career as a designer; understanding search engines has been important to how I make things and advise that others make things because the persistent implied purpose of it all has always been to be found, to be seen, to be engaged with. But over the course of that time, how I’ve received this explanation of how Google works — the fundamental assumption they make about importance, especially — has radically changed. Fifteen years ago, while Tobias Blanke was asking the right questions, I’d have likely thought that organization by authority made good sense. Today, with the accumulated power Google has, it’s an outrageously cynical method with a wide-open back door.

Richard Saul Wurman does not include organization by authority for good reason. Authority is too subjective. It’s kicking continuum’s can. Authority vested by whom? Importance according to what?

Authority is also too vulnerable to exploitation.

Once you understand that information on the web will gain prominence as the number of other web pages linking to it increases, boosting prominence is a pretty easy thing to do. It may not be a good thing to do, but that’s not stopping anyone. An entire industry of Black Hat search engine optimization arose out of this vulnerability, using a very long list of tactics to get bad information to be treated as good by Google.

The simplest way to exploit Google’s vulnerability, though, is not to bother messing with metadata — the information Google extracts from pages and the pages that link to them — but to create new information to manipulate Google’s perception of importance. That’s called Google bombing, and you’ve likely encountered or heard about it before as it’s been a pretty useful way of embarrassing political figures. Perhaps you remember the time that Googling “miserable failure” returned George W. Bush’s own Whitehouse.gov biography. Or when “liar” led to Tony Blair. Or, more recently, when “idiot” returned pictures of Donald Trump. There was a particularly famous Google bomb that connected Rick Santorum with bodily fluids. I’ll leave that to you to, well, Google for yourself.

The point is, PageRank, Google’s black box of information organization, is really a Pandora’s box. We don’t even need to know exactly what’s in there to know that its full of trouble.

Again, Blanke puts it well when he repeatedly stresses that the problem isn’t just the subjective nature of the search engine’s design, but the way in which that subjectivity is rendered: “The ethical problems of search engines do not begin with the fact that they decide about relevance but with how they decide about it.”

Subjectivity, after all, isn’t a bad thing. It an inescapable part of who and what we are. But that also means that it’s an inevitable part of what we make.

Joseph Weizenbaum, a computer scientist considered to be one of the fathers of modern artificial intelligence, pointed out that what “search engines still cannot do and probably will also not be able to do in the near future is to understand the content of what they retrieve and reflect that in their relevance decisions.” In other words, search engines are not designed to be objective because they’re not able to be objective.

In fact, the subjective manner in which search engines organize information has forced people to create information in strange ways. We have to frame it, annotate it, add layers to it so that the machine will see it and understand what it is. Even the title we’ve given that practice — search engine optimization — is subjective. Does adding metadata make my information better? It makes it more likely to be found by Google, sure. But is that better?

All the labor a creator must do to make their information Googleable is truly bizarre, when you think on it. It’s akin to the Dewey Decimal System expecting the book to produce its own catalog card — which would be odd enough — but without telling anyone what should be on the card!

The entire world is compliant to a system of organization that has its origins in one small act of hypothetical subjectivity by two graduate students in 1996. Whereas 25 years ago the anxiety Sergey Brin and Larry Page attempted to address was one of how to possibly find the right information, today’s most common anxiety is how to properly word a Google search query so that the first thing you get back is the truth. Imagine that.

Is that good?

It’s fascinating to me that though the origins of the search engine can be traced to the need to organize the world’s information, the origins of the internet and the web can be traced to a much more personal need and idea.

Forty years before Richard Saul Wurman wrote Information Anxiety, Vannevar Bush wrote As We May Think, an essay on the future of information organization that started from the exact same lament: There’s too much of it! What the hell do we do? Bush wrote:

“The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

Namely, slow, painstaking, scribal work that drove Monks mad.

Like Wurman, Bush was not principally concerned with finding the objectively right in a sea of information. He was concerned about finding what was important. His solution was something he called the “memex,” a “device in which an individual stores all his books, records, and communications, and which is mechanized so that it can be consulted with exceeding speed and flexibility.” Notice that Bush wasn’t proposing a device for all information, but one for all of one’s information. A meaningful difference.

All vs. mine is a difference especially meaningful to me now. More and more, the notion of a personal web is exactly what I want. Not necessarily instead of the World Wide Web, but perhaps alongside it. A collection, after all, is a personal statement of value: these are the important things to me.

When a collection is established around volume - these are all the things - it quickly becomes unmanageable and uninteresting. That is what the web is now, and that is why the tools we have to find things within it don’t work and are not good. They don’t help me to find what is important to me, and by delegating the question of importance to the so-called “wisdom of crowds,” they guarantee a volatile, fickle, and manipulated answer as often as one that can be trusted.

Plenty of philosophers and ethicists who study technology have openly questioned the subjectivity of search engines for as long as they’ve existed. I am, frankly, quite late to the discussion. But I also wonder whether anyone would care about search engine neutrality if there was no money at stake. If advertising didn’t monetize so much of the web’s information and our engagement with it, would a search engine dependent upon the aggregate authority of incoming links have been the final word on how best to organize the world’s information? I can’t imagine it would. But when that way directly benefits those who first attached cash to clicks, you can bet on a quick entrenchment.

Google’s control over our access to information produces revenue for Google. The more control, the more revenue. They make more money when more of us turn to them for help finding things. Though we may be motivated to find the truest or most objective information, they are only motivated to preserve our perception that they can provide it. We shouldn’t expect Google to care about what is actually true. But if they don’t care, who will?

Richard Saul Wurman’s broader conclusion was spot on. Before anyone ever used an internet search engine, Wurman questioned the connection between information and commerce, but also worried that “information is power…the more [of it] with which we are confronted, the more distorted is our view of the world.”

Thanks to information anxiety and our subjective solutions to it, we now live in a world defined by distortions. We all look through the same lens, and yet what we see could not be more different.

How ironic that the tools we created to provide organization — on which we depended to identify what is, and what is true, and what is important — have metastasized our anxiety into a chaotic cloud of emotional volatility through which we cannot see a single commonality strong enough to maintain the bond between us. And how sad that when we have never had more tools to create consensus and community, we instead use them to construct mutually uninhabitable worlds with intolerable names and purposes. Information anxiety has become information animus.



Written by Christopher Butler on
February 5, 2021
 
Tagged
Essays