The Internet Lost in Translation

Tonight I went to the BayChi presentation on Web 2.0. The panelist/presentors were Stewart Butterfield from Flickr, David Sifry from Technorati, Paul Rodemacher from HousingMaps, and Thomas Vander Wal from Personal Infocloud.

The topics of tags and tagging came up since Flickr and Technorati use them. Sifry mentioned that only 33% of the tags in Technorati are in English. (That leaves 67% that are not). That’s a lot of information that I can’t read since my Spanish and French are embarassingly poor.

Vander Wal talked about how tags would be helpful in identifying the different “information clouds”. Then you could categorize the information into clusters to get a better idea of what that information is really about without having to go into the pages to see exactly what’s there. Flickr already has clusters of photos that you can browse. Tags could help you find information not only in your personal or local infocloud but also in the global infocloud.

That got me thinking.

English tags comprise only 33% of all tags on Technorati, and Sifry said that the Chinese are starting to adopt it. That’s potentially a lot of tags I can’t read. Everyday at work I take into consideration how easy or difficult it might be to translate something that I write in English because there are many localized versions of the site. But most people don’t have their sites translated into 20 different languages and don’t think of translation issues. So there are volumes of information out there that people aren’t seeing when they search.

Right now, there’s no real point to show the search results in foreign languages because most people couldn’t read them anyway. But what if you could because there was on-the-fly translation. Granted it would be rather poor translations, but you could get the gist of it. And those information clusters would be a truer representation of information out there in the entire world.

The ideal application to try this out on is Flickr. There’s nothing really to translate except the caption. It’s visual and most photos cross cultural boundries.

So I went to Flickr to see what the search differences would be if I entered “Tokyo” and “東京” (Babelfish’s translation for Tokyo). Well, I got a very strange search result for “東京”: There aren’t any photos available to you tagged with “2140723487”. Well my only guess is that Flickr’s search function cannot handle the characters, which is a shame because I’d probably get a better representation of Tokyo through the eyes of locals rather than through the eyes of a tourist.

Butterfield indirectly touched on this issue when talking about how tags depend on the user. He said that if you type in Tokyo you probably wouldn’t get a photo of skyscrapers and neon signs because most tourists will take a picture of their hotel room (and massaging toilet because we do not have them here). Now if in the background they translated tags into various languages, we would get a better view of the world. Literally.

I pasted “東京” into Technorati and got results back in Japanese. But again, the translation issue. Yahoo and MSN actually return some English results, but that’s probably because of the site owner’s SEO and not the search engine. Google returns all Japanese. But what if I was Italian – I don’t want to read English.

It might be a huge undertaking, or it might not. If you speak more than one language, try the Google Language Tools to translate this site. How close is it? Is it enough to understand what I’m writing? If so, then maybe we’re not that far away from having a true global internet where everyone can read what anyone else on the planet is saying. And then we’d be able to find information no matter what language it’s tagged in.

1 thought on “The Internet Lost in Translation”

  1. I have been thinking about how to do searches in Spanish or in languages I don’t know. August 31 is “Blogday” which is an international effort to blog about blogs outside of your native language, country, or culture. I find that I can get good results if I pick a word in another language, or if a place name, use a smaller town or neighborhood to avoid getting only tourist blogs or Flickr photos. to follow your example “D.F.” gets you spanish-speakers but “mexico city” gets tourists – that is somewhat obvious – but you can also try “DF” plus some word in Spanish (I’m always doing “amigas” or “mujeres”, “feminismo” or “blogueras”, stuff like that) which makes it more likely that you’ll get a Spanish-language source.

    I have the same vision of searching and translating across languages, and flickr would be an ideal place for that… so that if you search on “red” you get photos tagged “rojo” and whatever the equivalent in other languages. Obviously there woudl be huge problems but any elementary set of words would at least provide some level of information… it would be very much worth doing. Maybe a central dictionary-matching tag concept service & API that other web services would subscribe to & use.

Leave a Reply

Your email address will not be published. Required fields are marked *