|
If you’re brave, today you can finally download the Internet Explorer 7 public beta. Why should you be interested? Not because the browser’s wonderful. It isn’t—initial reports are that it’s not ready for prime-time.
But you might be interested to know that as of today, users of IE will be able to use internationalized domain names (IDNs). These are domain names that use non-Roman type—everything from umlauts in German to entirely different alphabets and scripts. Many other browsers are already IDN-capable, including Firefox, but most people in the world use Explorer.
Think China, Japan, India. Think most of the world’s population… Think of millions of new Internet users working in their own language, customers for commercial goods and services. But think also about intellectual property nightmares, think about phishing, think about whether there’s one interoperable Internet, or several Internets acting very weird. These issues and others will become big news when people start using IDNs massively—and with support from Internet Explorer, that’s about to happen.
Intellectual Property
If you own trademarks, you’ve got serious trouble ahead. As the International Trademark Assn (INTA) puts it in a comment to ICANN,
“The IDN Guidelines do not mention, let alone deal with, any Whois issues that will arise from the implemenation of IDNs…. INTA respectfully submits that the draft IDN Guidelines must deal with the creation, maintenance and publication of Whois data, and that a further draft dealing with Whois issues should be prepared and resubmitted for public comment.”
What they’re really talking about here is finding violators of their trademarks; about doing searches for IDNs. But respectfully or not, INTA will get nowhere with this one, because searching of IDNs is really tough. While you can create and register IDNs (and have been able to for some time), you can’t at present search for anything except an exact match. So if you were farsighted enough to have registered your trademark as a domain name in (say) one of the several Chinese scripts, you will be able to search for that exact domain name, but you won’t be able to do any pattern searching. No plurals, no misspellings, no instances of your mark contained in longer names—no variations at all unless you search for them one by one.
That’s because the Domain Name System (DNS) doesn’t actually use non-Roman-character domain names natively. It translates them into randomish Roman-character gibberish so that the DNS can resolve them, then back again so the user can read them, as illustrated below.
The curious can find out much more than they might ever wanted to have known about the technical aspects of IDNs by looking at the IDN Standards (RFC 3490), Unicode and string prep.
Phishing
It’s not just passing off trademarks that have some people worried. There’s the issue of homographic spoofing (using confusingly similar characters, for instance 0 (zero) and O (capital “o”). This gets much much worse when you consider homographs that span alphabets—the string “CCCP” (the old initials for the Soviet Union, for instance. To human eyes, “CCCP” in Roman letters looks exactly the same as the completely different “CCCP” in Cyrillic characters.
Solutions that depend on people understanding character sets are unconvincing. Muscovites may have no idea which
C
ingular.ru they’re buying from, and Americans may be fooled by the IDN domain name that looks just like
P
aypal.com.
IDNs and Domainers
Speculators are getting into the act. They can see the writing on the wall. While no-one is making much money yet, people are stocking up and waiting for the “explosion.” There’s a lot of activity recently with regard to IDNs:
Splitting the Root?
When it comes to the interests and desires of large numbers of Internet users, ICANN is notoriously deliberate, by which I do not mean to imply that they come to good decisions. But with IDNs even their severest critics praise them, sort of:
I’ve often said that ICANN regulates the business of buying and selling of domain names and that ICANN’s claim that it coordinates technical matters to preserve the stability of DNS is a fantasy. Well I am proven wrong. ICANN has done something technical.
ICANN has issued Guidelines for the Implementation of Internationalized Domain Names, Draft Version 2 [PDF] (pending approval by the ICANN board.) It’s only four pages long, but those few pages contain a lot of significant material. But as in all things IDN-ish, solutions are not easy (in fact, IDN is a particularly difficult subject.)
—Karl Auerbach, from his Cave Bear blog
So why has ICANN, and many other people, put so much work into this? Is it to help the under-Internetted billions in Asia? Um, maybe. Is it to help large companies to part the growing Chinese middle class from their hard-earned dollar-denominated currency? That’s certainly part of it.
But the hugely important reason for this push, the reason that just about every one of the fractious ICANN sub-splinter-groups is behind this, is that if they don’t figure out a way to make IDNs work for everyone, the Chinese and the Japanese and Koreans and Arabic-speakers will just do it on their own without ICANN.
That might not sound so bad, except that to do it they would create their own root—their own Internet(s), either interacting unpredictably with the one we know, or outside of it completely. Say “split the root” and you’ll give unpleasant little shivers to everyone from corporate executives to touchy-feely Free the Internet types.
A split root could mean, for instance, that there were two namesatwork.com domain names, owned by different people, with different websites, because two different Internets could create two different .COM top-level domains. When you clicked a link or typed in the URL, you might go to either site, more or less randomly. Chaos.
Actually, it’s already happened, so the current efforts are more damage control than prevention. The Chinese government has created at least two Chinese-script-only top-level (Chinese-level?) domains. Particularly in Arabic-speaking countries, ISPs are creating their own little Internets by rerouting traffic destined for the public Internet to their own servers. For instance, in certain parts of the world, type in “redcross.org” and you end up at the Red Crescent web site.
Summary
Domain names are technical little beasts, none more so than IDNs. But the wider Internet community ignores them at their peril. Yes, Virginia, domain names do matter...
Some links
Some other links that may be useful.
* ICANN Forum on IDNs
* FAQ on IDNs from gtld.com
* FAQ on IDNs from DENIC, the German registry (in English)
Corrections of any sort are welcome.
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byIPv4.Global
Sponsored byVerisign
Sponsored byDNIB.com
Sponsored byCSC
Sponsored byRadix
While otherwise great, this article has a significant technical error. It says “you can’t at present search for anything except an exact match”. That is plainly incorrect. You cannot search the untranslated zone file for an exact match, but as the article shows in the next paragraph, there is a single unambiguous way to convert every IDN name to its Unicode equivalent. If you want to search for “plurals, misspellings, instances of your mark contained in longer names” and so on, you simply convert all the names in the zone to their Unicode equivalents and search the converted list. People have been doing this since the first day that IDNs were introduced into any of the TLDs.
I’ll take the “otherwise great” with a huge grin, coming from someone who really does know this area.
I don’t know of anyone who’s offering this service to the public, though I suspect it won’t be long in coming. But that only works if you can get the zone file, which is possible in .com and some others TLDs, but in many cases zone files are inaccessible, even in otherwise reasonable countries such as Canada and Ireland.
>I don’t know of anyone who’s offering this service to the public, though I suspect it won’t be long in coming.
That would probably depend on whether the agreement with the zone owner even allows publishing that information.
>But that only works if you can get the zone file, which is possible in .com and some others TLDs, but in many cases zone files are inaccessible, even in otherwise reasonable countries such as Canada and Ireland.
Sure, but then you have the same problem with IDNs as you do with all-ASCII names. You can’t search a zone for plurals, near-matches, and so on without access to the zone file. The presence or absence of IDNs doesn’t affect that. In other words, please don’t make this sound like an IDN-specific issue: it isn’t.
What I meant by “offering this service” was a public interface to search IDNs—in other words, someone with access to a zone (e.g., .com), who encoded (decoded?) the ASCII equivalents and made them searchable via the Web.
I agree, the problem of searching inaccessible zones for IDNs is no different than searching for non-IDNs, except insofar as the language itself made a difference.
Practically speaking, solutions for non-IDN English word searches exist already. Mark Kudlacik at CheckMark put together a system that spun out likely variants of a word, then one-by-one determined if those existed in the otherwise-inaccessible zone (you can always find out if one particular domain name exists). However, that was English-specific, built on rules for pluralizing, likely misspellings, etc. in the English language. I don’t know if such rule sets exist for other languages. If they did, they could be applied to native-character IDNs.
While no-one is making much money yet, people are stocking up and waiting for the “explosion.” There’s a lot of activity recently with regard to IDNs:
Thank you for the mention Anthony I’m the owner of IDN Forums & want to say that this info is a bit incorrect. It is not easily seen but the site has been up for a little bit more than 4 months but sales are steadily happening since January.
Most of the sales are still kept private by request but from what I personally know has sold from a few close members & myself it is already in the low XX,XXX range.
I assume buyers do not want it public yet that they are investing in premium IDN domains yet.
Excellent article.
In addition to the trademark and technical challenges, don’t forget that this whole thing falls apart without proper language and variant tables. It took the CDNC 2+ years to wrap up the first guidelines and tables that allowed registrations in the Chinese, Japanese & Korean scripts (CJK) which share some common parts of the alphabet.
-Ram