|
ICANN has, once again opened up a veritable can of worms, with their latest decision on the ‘horrors’ of Name Collision. While we are sure that ICANN and the Interisle Consulting Group have very good reason to make the decision that they have—delaying the delegation of several TLDs—we believe that the findings contained in Interisle’s report do not give sufficient cause to delay the new gTLD program in the manner proposed by ICANN staff.
Apart from the concerns voiced by NTAG (New gTLD Applicant Group), there are some additional issues that are of relevance, the most prominent ones being:
1. Quality of queries is MUCH more important than quantity of queries
The Interisle study simply counts the absolute number of queries received per string. Clearly, this should not be the only metric used by ICANN staff to make the far-reaching recommendations that have been proposed. Also, the study makes no mention of the following more important statistics:
a) The number and type of unique domain names queried in each string
b) The number and distribution of unique sources / IP addresses of queries in each string
It is important to analyze the above statistics more than others for a purely logical reason. Imagine a hypothetical, but possible situation, in which there are a handful of large organizations that use the .corp extension in their internal networks, and would therefore be responsible and account for the bulk of the 1.4 million queries seen in .corp (Rank 2). If this is true, there may be a finite number of unique domain name queries issued in .corp, and originating from a handful of easily identifiable sources / IP addresses in such a case. It is also likely that the same would apply to .brand strings like .hsbc (Rank 14), where a small number of unique domain name queries may have been issued from a few easily identifiable sources. If this is found to be true, mitigating such a risk in a targeted manner would actually be fairly straight forward. If not, the recommendations outlined below will serve to mitigate the risk anyway.
Conversely, imagine a large number of smaller stock broking businesses that possibly use the .trade (Rank 303) extension (44,000 queries) on their internal network for trading or storage of sensitive and confidential information. Naturally, if this were true, there would be a relatively larger number of unique queries in .trade originating from a widely distributed number of unique sources / IP addresses. This risk could potentially be a lot more challenging and complicated to deal with. But ICANN staff currently proposes to qualify it as “low risk”, without analyzing the data more thoroughly and therefore allowing it to proceed on to delegation.
Thus, it would be grossly incorrect to classify some strings as being less or more risky than other strings simply by summing the number of queries, and without analyzing the unique domain names queries and unique sources of these queries.
2. Any new data used for further studies can easily be compromised or gamed
In the situation that ICANN does choose to carry out further studies in relation to the “uncategorized risk” strings, it is imperative for them to avoid using any data that has been or will be collected after the commissioning of the Interisle study. There is an almost undeniable likelihood of such “new” data being gamed or compromised. Some important points to consider:
a) The Internet community and any interested members of the general public are now aware that query counts have been used to determine the level of risk that any proposed gTLD poses
b) They are also aware of the exact list of proposed gTLDs for which further studies of data will be carried out in order to ascertain refined risk levels
c) It is extremely easy to purchase a software or a service that can send an extremely large amount of targeted queries from multiple sources for non-existing strings of a particular kind/s for resolution
d) There are several parties that have vested interests, or stand to benefit from the delay or potential non-delegation of the so-called “riskier” strings
e) Thus it would be or already is incredibly easy for any newer data to be manipulated for personal gain
In light of the above, ICANN should make it mandatory for all future studies to use only pre-dated data (before commissioning of the Interisle study), which is more likely to be more representative of actual usage of proposed gTLDs in internal networks.
3. The threshold for dividing strings into “low risk” and “uncategorized risk” is arbitrary
The Interisle report section 8.3.1 suggests that one way for setting the threshold for dividing strings into “low risk” and “uncategorized risk” could be by reference to the number of queries for existing TLDs that have empty zone files. The report mentions two such existing TLDs—.sj and .bv, both ccTLDs associated with mostly uninhabited Norwegian colonies. ICANN simply picked .sj, and decided that all the proposed strings that appeared in the data stream more frequently than .sj (49,842 queries) should be classified as “uncategorized risk”.
The result of this arbitrary selection is that .bio (Rank 281) with 50,000 queries (rounded to the nearest thousand) is part of the “uncategorized risk” list, and is delayed by 3 to 6 months, whereas .engineering (Rank 282) with 49,000 queries (rounded to the nearest thousand) is part of the “low risk” list, and can proceed without any significant delays. What this means is that there may have been a handful of more queries for .bio (as little as 344 more queries) than for .engineering, which somehow made .bio appear more “risky” to ICANN than did .engineering.
Another validation for the fact that this division is unjustified comes from Digicert, one of the world’s largest CAs, which is also a founding member of the CA/Browser Forum. In a presentation conducted specifically to discuss the Name Collision issue on the 22nd of August 2013, Digicert’s associate general counsel, Jeremy Rowley stated that Digicert’s opinion was that the inclusion of 20% of all proposed strings in the “uncategorized risk” segment was unnecessary (See Slide 9 of the presentation). In fact, it was verbally stated that only the top 14 of the proposed gTLDs from the Interisle report are substantially risky. An opinion that bears thinking about…?
Stay tuned for Part 2 of this post!
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byRadix
Sponsored byWhoisXML API
Sponsored byDNIB.com
Sponsored byVerisign
Sponsored byCSC
The Interisle report does look at the source IP addresses and the second-level domains queried, doesn’t it?
Hi Kevin, The Interisle report does give counts of source IP addresses for the top 35 ranked proposed TLDs. Not the rest. My intention was to convey that ICANN staff should have taken such counts (for all strings) into consideration before making their sweeping recommendations for hundreds of strings together. Apologies for the incorrect wording. Thanks, Shweta
Hi Shweta -
The Interisle report makes the same argument that you do in your point 1 in as many places as we could—for example, in the Executive Summary: “The risk associated with delegating a new TLD label arises from the potentially harmful consequences of name collision, not the name collision itself”; and in Section 8.3.3: “Properly calculating the risk of delegating a proposed TLD in this category [Calculated Risk] would require an investigation of the context(s) in which the corresponding string is currently used and the circumstances under which it might collide with a syntactically identical delegated TLD.”
And a nit: with respect to your point 3, we used the terms “low risk” and “calculated risk” (not “uncategorized risk”).
Hi Lyman, Point well taken. I wouldn't take anything away from the Interisle team's output. It did a great job within the limited time frame and scope offered to it. I believe that at some level, even ICANN staff is probably aware that query counts should not be the only metric used to make such far reaching decisions. Evidently, my point is reinforced by the fact that the Interisle report also says that there is a requirement for investigating the context(s) in which strings are currently used among other things. Ideally ICANN staff should have considered this before announcing their "lets delay 279 strings" plan. Thanks, Shweta