« Markup ProposalThai Bloggers »

Language Barriers in Blogging

For a while now, I've been interested in how language barriers affect our ability to communicate online. With some real blog census data now coming in (and with the better half gone to her sister's graduation, and so unable to keep me from wasting a perfectly good Saturday) I spent today trying to measure how high those barriers are.

As I write this, the database has 380,000 entries and is pretty evenly split between weblogs in English and weblogs in other languages. If language barriers meant nothing, and bloggers could read material no matter what language it was written in, you would expect the average link to have about a 54/46 chance of hitting an English versus a non-English weblog.

Of course, language does matter, so links tend not to cross language boundaries. If you look at all the outgoing links from English language blogs, only about 1.75% point to a non-English weblog. In the reverse direction, however, the figure is much higher. A full 7% of links from non-English-language weblogs point to an English site.

This means that non-English speakers, on average, link in to our community at four times the rate at which we link into the rest of the world. This is a kind of one-way mirror effect: because English dominates the Internet, we are less likely to to see anything outside our own community, while non-speakers will still be exposed to a lot of what goes on here. In the global conversation, we're the ones standing at the microphone.

That figure of 4% is for the aggregate of all links coming from non-English languages. The effect for any individual language community will necessarily be more pronounced, especially if the community is a small one.

Take Iceland, for example. The Icelanders are avid bloggers, with about 3500 weblogs (out of an online population of about 160,000). In any given Icelandic weblog, 12% of the links will point to a site written in English. So even those Icelandic readers who don't speak any English are fairly likely to come into contact with ideas that cross over from the English-language Internet.

But in the other direction, my own chance as an English speaker of coming across a link to an Icelandic site is a whopping 0.02%. In fact, I've found fewer than 300 such links to Icelandic sites across the entire data set. Unless I happen to read Kristiv's Weird Existence or Digital Dreaming, there's no way I'll ever hear about anything cool happening among Icelandic bloggers.

"Big deal, Mr. Multicultural", you'll think to yourself, " of course you won't find links to Icelandic blogs. Iceland is a tiny country, and we've got them outnumbered". But the imbalance in links is far greater than relative numbers would suggest. Once again, if you assumed that links were completely independent of language, you would expect about 54% of all Icelandic links to point to English sites, and 0.9% of English links to point to Icelandic ones. Predictably enough, both languages have fewer links to each other because of the language barrier, but to a very different degree. Icelandic blogs underlink to English ones by a factor of about 4.5 (54% predicted,12% actual). But English blogs underlink Icelandic ones by a factor of 80. Just the fact that they're writing in Icelandic makes these 3,000 bloggers eighty times less visible to us than an equivalent group of English-language bloggers would be.

I propose we call this 'underlinking' coefficient the "Bennett Factor", in honor of the great thinker who said "Our common language is English. And our common task is to ensure that our non-English-speaking children learn this common language." Our Bennett Factors to other languages remain astronomical. We continue to keep ourselves isolated from world opinion, which is particularly troubling at a time when our country's politics are becoming more exceptionalist and unilateral.

Of course, having a lingua franca is a blessing. It makes it possible to communicate in a common forum, and it's vastly better than having the kind of language soup you find at EU headquarters or the UN. But for us English speakers it's a mixed blessing, because it tempts us to get all solipsistic and insular.

The problem isn't that everyone else is learning English. It's that Americans, as a rule, do not bother to learn foreign languages. In 1998, across the entire United States, there were only 841 students studying Hindustani, a language with half a billion speakers. A grand total of 5055 were studying Arabic.

You'd think that in an age of empire, there would be a strong incentive for us to pick up the lingo. But of course you would be wrong:

What [the WMD search team] could not do was ask a question, should they find someone there. Yet they were supposed to ask questions under the guidelines for surveying a suspected secret police site such as this. One suggested query is, "Was there a lot of noise, such as people screaming?" Others ask about covered buses and unusual activity at night.

Anderson, the only team member learning Arabic, still does not have the ability to ask those questions. He has taught himself five phrases so far: "Good morning," "Good evening," "Drop your weapon," "That's dangerous," and "Keep away."

As Team 3 worked, it became evident more than once that even a passive reading knowledge would help.

On its way through one darkened corridor, the team reached an especially recalcitrant door. Sgt. Ivan Westrick, the team's explosive ordnance technician, swung the sledgehammer in a powerful arc that struck sparks with every blow, like flint on steel. A reporter later translated a snapshot of a sign across that door. It said, "No Smoking."

A longer announcement, in bold red and blue strokes, attracted the team's attention. The sign had been positioned in such a way that Saddam Hussein, gazing sternly off the canvas of a youthful portrait, appeared to be reading it. Anderson wondered briefly what it might say.

Had anyone known the answer then, the chamber of vacuum cleaners in the next corridor would have come as no surprise. Neither would the contents of the other sealed rooms: air conditioners, rolls of fabric, marble facing stones.

"Honorable Brother and Packer," the sign began. "Packaged goods cannot be returned after leaving the depot." The sign welcomed suggestions, apologized for delays, and thanked patrons for their cooperation. It concluded with a two-word signature: "STORAGE ADMINISTRATION."

We can't change the Internet overnight just by worrying about language. But we should at least recognize the magnitude of the problem. BlogTalk in Vienna is a good first step. My proposal to Tim O'Reilly: Hold the 2004 Emerging Technology conference in Brazil.

« Markup ProposalThai Bloggers »

Greatest Hits

The Alameda-Weehawken Burrito Tunnel
The story of America's most awesome infrastructure project.

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?

No Evidence of Disease
A cancer story with an unfortunate complication.

Controlled Tango Into Terrain
Trying to learn how to dance in Argentina

Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews

A Rocket To Nowhere
A Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor, visitor from the future

100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law

Every Damn Thing

2020 Mar Apr Jun Aug Sep Oct
2019 May Jun Jul Aug Dec
2018 Oct Nov Dec
2017 Feb Sep
2016 May Oct
2015 May Jul Nov
2014 Jul Aug
2013 Feb Dec
2012 Feb Sep Nov Dec
2011 Aug
2010 Mar May Jun Jul
2009 Jan Feb Mar Apr May Jun Jul Aug Sep
2008 Jan Apr May Aug Nov
2007 Jan Mar Apr May Jul Dec
2006 Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2005 Jan Feb Mar Apr Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun Jul Aug Oct Nov Dec
2003 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct Nov Dec

Your Host

Maciej Cegłowski


Please ask permission before reprinting full-text posts or I will crush you.