|
With the IGF underway, there’s a lot of discussion surrounding Internationalized Domain Names (IDN). There has been lots of great progress in IDN technology with IE7 and Firefox browsers now fully IDN-Aware, strong IDN registrations and websites behind them.
Now that many of the hurdles to implementation have been addressed to where the technology is either currently available to most internet users, or shall be soon, we now focus to the other aspects of IDN.
Most folks who work in a non-Romanized character set state that their user experience is a better experience if the entire domain name is in their native character set (Arabic, Chinese, Hebrew, etc.) versus having to switch back and forth between Bopomofo and ASCII keyboard entry to compose a URL.
With discussions are now turning to implementations of IDN top level domains, I wanted to pose some questions (rhetorical) to our community on CircleID as fodder for some good dialog on the topic.
Here’s some rhetorical questions to consider that I think exist in the IDN discussions…
How many folks will step up and state that they should operate an IDN version of an existing Top-Level Domain (TLD):
(translated using Google)
.?? (.network in Chinese) or
.?? (.info in Korean)?
Who is most appropriate when there is an incumbent operator of the roman character string, or multiple languages in addition to the initial string?
.Ásia (.asia in Portuguese) or
.???? (.asia in arabic)
in addition to .asia.
Who then is the registrant of news.asia, and is it the same registrant as news.Ásia, news.????, ???????.???? or notícia. Ásia?
Exacerbate this with generic words that there are trademarks for and how intellectual property interests clamoring for these strings (i.e. united, apple, or delta) and all interested parties registering domains. How will this play out?
Is the registry the government, or is it the private entity that currently operates the equivalent existing TLD, or should it be a new company?
What if the government is one registry, with one sovereign set of laws, and another registry of the same string in a different language is a company?
These are tough questions that are being addressed that are part of the large discussions that are in play over the IDN solutions on the horizon.
Sponsored byVerisign
Sponsored byCSC
Sponsored byDNIB.com
Sponsored byIPv4.Global
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byRadix
On ICANN’s roadmap for IDN:
(from their most recent newsletter)
* ICANN laboratory testing of top level IDNs in the root
+ Crucial for DNS resolver reactions
+ Allows Testing of maximum and minimum length IDNs, other variations
* ICANN testing of IDNs in live Root Zone File
+ Crucial for Browser and other applications reactions
* Cooperative, multi-stakeholder IDN policy formation
+ Adoption of new gTLD (and IDN) rules
+ Identification of IDN ccTLDs
+ Specification of characters allowed in IDNs
+ Note potential disputes over who has rights to IDN TLDs
* Implementation and testing of infrastructure and application software
* Installation of IDN TLDs into root zone file by ICANN
ICANN is also in the process of developing and implementing an outreach plan to insure better coordination among all bodies that are necessary for the implementation of internationalized top level labels into the Internet.
Jothan,
I also have the same questions as you and I feel that the current approach is much too commercial-minded. I have been paying attention to the public opinion of Internet users in Korea, and many of them have expressed their objection to the ICANN’s plan to create a new name space if it would lead to a big burden for them to register another IDN TLDs.
Aside from the technical considerations of IDN TLDs, the people suggest that IDN TLDs should be implemented using DNAME record rather than creating a new NS record. (For example, “.kr” -> “.??”, “.co.kr” -> “.??”, “.or.kr” -> “.??”, etc)
We should think about what will be the most beneficial decision for the Internet users who want to use the Internet with the domain names in their native characters.
At this point, I think we should try harder to collect more extensive opinions from various language communities and their government. Their silence does not mean acceptance of the current discussions of IDN TLDs implementation issues in ICANN.
What if the government is one registry, and there are several foreign entities that want to become the registry of IDN TLDs in the language which is a national heritage of the government & people in the language community?
I am sure that the government will do whatever they can to prohibit other foreign entities from making money out of their national heritage and property.
In case the government is against other foreign entities operating IDN TLDs in their native language, how can we solve the potential disputes over who has rights to IDN TLDs?
I worry because it seems that there are many people who count their chickens before they have hatched.
Anyway, the Google translation for “.info” is correct. It is “.??” in Korean.
The view that languages belong to Nations is a very narrow one and totally unworkable.
From a Japanese or Korean perspective it easy to see why some would take this view as their languages are little spoken outside their national boundaries. But even here there is huge overlapping of the use of Chinese Characters.
If you take certain other languages the problems with this concept become obvious. What would be the situation if the British claimed copy right over English and attempted to prevent US organisations using anything that could be construed as English? Much of English is derived from French and Latin, so many of the words are indistinguishable. Who is to say what language keywords belong to?
Getting on to non Latin scripts. There are about 26 Arabic Countries, which share a language and a culture, but there is no political unity. Script and vocabulary are also shared with Farsi and Urdu.
If you take Urdu as another example. It is the official language of Pakistan, but more Idians use Urdu than Pakistanis. Most Pakistanis actually speak one language and write another.
There are 100 Million Chinese that do not like in the PRC.
And where would you go with Hebrew? Can Israel really claim a monopoly over that languge?
The basic reality that we are all going to have to accept is that all Human Culture is a joint legacy, irrespective of our Nationality or Racial origins. Any other approach is going to be devisive and unhelful. We also need to grasp that domain names are fundamentally about scripts rather than languages.
IDNs need to support all protocols, not just WWW!
Most of the IDN proposals I’ve seen have really been designed to support International-character-set Web Pages, and use some kind of kluge to make HTTP 1.1 happy, and perhaps another kluge to also support SMTP. But DNS isn’t designed to support one or two Layer 7 applications (or more to the point, Layer 8 and 9, if you’ve got the T-shirt wiht Financial and Political on it) - it’s designed to provide correct Layer 3 addressing for all sorts of Layer 4,5,6,7,etc. protocols. If an IDN can’t support ssh, it’s broken - so the original Sitefinder-related IDN proposals were non-starters.
I also don’t like Punycode, though I guess it’s better than nothing; I’d prefer that we bite the bullet and change the rules about 7-bit-cleanness and case-folding, though I realize that that’s hard.
@Bill Stewart,
I am sorry Bill. I have been working with IDN for a couple of years and you have me completely lost. As far as I am aware, Punycode at the second level resolves through the DNS in the same way that anyother ASCII domain resolves. Also email function with punycode in the same way that it functions with other domains. The problems lie is Client side Applications. These applications almost universally support IDN with until recently the notable exception of Microsoft, which it appears in order to leverage its near monopoly position in the OS and Browser markets has for purposes of market manipulation and profit maximisation, to have opted to hamstring Internet Navigation in most the on the Non-English Speaking World. Their latest move is to not even publicise the IDN capability of IE7 and to delay its roll out in the Far East by about 6 months.
The reasons for this are far from clear, but obviously cynical. It would seem that they want to convince people to update their Operating System and Office Suites, rather than get most of the functionality they need from downloading IE7. I think the calculation is that they will sell millions more versions of Vista by suppressing IE7 in this way.
David is right, that punicode is something that looks icky and unnatural, and needs conversion to become resonably readible which requires some technical aptitude or tools.
But the goal is that punicode is somewhat transparently handled behind the scenes.
The good news is, that there was part of the very extensive effort within the IETF to also drive INDA, which is the magic that an application does so that it can talk punicode on the back end but present strings to the user that are read naturally.
So, web applications like Firefox, IE7, and other browers have adopted, and we’ll see more and more other applications that choose to include the IDNA behavior so that these ‘automagical’ conversions happen for the end user, and email and other applications will also interoperate in native languages.
The Punycode approach changes how DNS is used, in ways that break many things. “Many Things” means many more things than just browsers - it’s all the other applications on the internet, some of which have user interfaces and some of which don’t, and most of which work differently than browsers. It’s not just IE version N+1 and Firefox version N.0 need to support it - it’s every telnet or ssh client, every instand messaging client, every email client, every ping and traceroute client, many distributed file system clients, every Bittorrent client, every ftp client, probably many firewalls that handle ftp, etc., and any word processor, spreadsheet, PDF reader, etc. that lets you open web pages.
As a programmer (well, mostly former programmer), the concept that punycode is “somewhat transparently handled behind the scenes” doesn’t work for me. The way DNS works is that it translates a string of human-readable characters to a number that’s an IP address, or a number to a string of characters. The Punycode approach to IDNs replaces this by having the user interface translate a string of characters to a punycode and then translating that to the number, and for the reverse, translating the number to a string of ascii characters of punycode and then to Unicode. You could automate some of this by replacing the old DNS library routines with a library that first checked whether a domain-name character string started with xn- or contained only ascii characters, with only a simple matter of package management to recompile all your programs and DLLs, and in the reverse direction translated numbers to characters and if they started with xn- then translated them into Unicode, at least if the application didn’t choke on Unicode, and it probably wouldn’t fail much more often than Smartquotes fail when viewed on diffeent platforms.
But it’s not clear to me if that’s enough - what about domains like 3ld.2ld.tld where only some parts are internationalized? If the Unicode representation has characters that only use octet values that would be digits or lower-case numbers if you interpreted them as ASCII, do you punycode it or not? Or do you require domain name owners to also buy the misinterpreted-as-ASCII domain name and hope it’s not a trademark violation?
It would be much simpler if we could just change the rules for the DNS name-to-number lookups instead of adding a translation layer - there are only a small number of DNS-server programs that would have to change, and most application programs are 8-bit-clean even if they don’t admit it. (It’s possible that Unicode characters that have all-zero octets would break this naive approach, since C-language strings use those as a string-end, but it’s probably still simpler to do workaround for that small set of character rather than everything?)
If Unicode is ever put directly into the Root, which frankly I doubt will actually happen, then this would only ever be done at the top level. Personally, I think ICANN will see sense and adopt the DNAME approach.
The Vista and Office 2007 will support IDN in just about every way that is necessary. Those that build sites, manage hosting and FTP are quite capable of dealing with the Punycode until full support is given to the Unicode. It will come but it doesn’t need to be here now. For things like email, it is likely that the Punycode will need to persist, so that those that do not have the appropriate keyboard to input address can still do so using the punycode. I don’t see this as a major problem.