Name Collision: Why ICANN Is Looking at It the Wrong Way (Part 1)

Home / Blogs

Name Collision: Why ICANN Is Looking at It the Wrong Way (Part 1)

	By Shweta Sahjwani Manager, Strategic Partnerships at Radix
	September 03, 2013 Views: 9,533 Comments: 4

ICANN has, once again opened up a veritable can of worms, with their latest decision on the ‘horrors’ of Name Collision. While we are sure that ICANN and the Interisle Consulting Group have very good reason to make the decision that they have—delaying the delegation of several TLDs—we believe that the findings contained in Interisle’s report do not give sufficient cause to delay the new gTLD program in the manner proposed by ICANN staff.

Apart from the concerns voiced by NTAG (New gTLD Applicant Group), there are some additional issues that are of relevance, the most prominent ones being:

1. Quality of queries is MUCH more important than quantity of queries

The Interisle study simply counts the absolute number of queries received per string. Clearly, this should not be the only metric used by ICANN staff to make the far-reaching recommendations that have been proposed. Also, the study makes no mention of the following more important statistics:

a) The number and type of unique domain names queried in each string

b) The number and distribution of unique sources / IP addresses of queries in each string

It is important to analyze the above statistics more than others for a purely logical reason. Imagine a hypothetical, but possible situation, in which there are a handful of large organizations that use the .corp extension in their internal networks, and would therefore be responsible and account for the bulk of the 1.4 million queries seen in .corp (Rank 2). If this is true, there may be a finite number of unique domain name queries issued in .corp, and originating from a handful of easily identifiable sources / IP addresses in such a case. It is also likely that the same would apply to .brand strings like .hsbc (Rank 14), where a small number of unique domain name queries may have been issued from a few easily identifiable sources. If this is found to be true, mitigating such a risk in a targeted manner would actually be fairly straight forward. If not, the recommendations outlined below will serve to mitigate the risk anyway.

Conversely, imagine a large number of smaller stock broking businesses that possibly use the .trade (Rank 303) extension (44,000 queries) on their internal network for trading or storage of sensitive and confidential information. Naturally, if this were true, there would be a relatively larger number of unique queries in .trade originating from a widely distributed number of unique sources / IP addresses. This risk could potentially be a lot more challenging and complicated to deal with. But ICANN staff currently proposes to qualify it as “low risk”, without analyzing the data more thoroughly and therefore allowing it to proceed on to delegation.

Thus, it would be grossly incorrect to classify some strings as being less or more risky than other strings simply by summing the number of queries, and without analyzing the unique domain names queries and unique sources of these queries.

2. Any new data used for further studies can easily be compromised or gamed

In the situation that ICANN does choose to carry out further studies in relation to the “uncategorized risk” strings, it is imperative for them to avoid using any data that has been or will be collected after the commissioning of the Interisle study. There is an almost undeniable likelihood of such “new” data being gamed or compromised. Some important points to consider:

a) The Internet community and any interested members of the general public are now aware that query counts have been used to determine the level of risk that any proposed gTLD poses

b) They are also aware of the exact list of proposed gTLDs for which further studies of data will be carried out in order to ascertain refined risk levels

c) It is extremely easy to purchase a software or a service that can send an extremely large amount of targeted queries from multiple sources for non-existing strings of a particular kind/s for resolution

d) There are several parties that have vested interests, or stand to benefit from the delay or potential non-delegation of the so-called “riskier” strings

e) Thus it would be or already is incredibly easy for any newer data to be manipulated for personal gain

In light of the above, ICANN should make it mandatory for all future studies to use only pre-dated data (before commissioning of the Interisle study), which is more likely to be more representative of actual usage of proposed gTLDs in internal networks.

3. The threshold for dividing strings into “low risk” and “uncategorized risk” is arbitrary

The Interisle report section 8.3.1 suggests that one way for setting the threshold for dividing strings into “low risk” and “uncategorized risk” could be by reference to the number of queries for existing TLDs that have empty zone files. The report mentions two such existing TLDs—.sj and .bv, both ccTLDs associated with mostly uninhabited Norwegian colonies. ICANN simply picked .sj, and decided that all the proposed strings that appeared in the data stream more frequently than .sj (49,842 queries) should be classified as “uncategorized risk”.

The result of this arbitrary selection is that .bio (Rank 281) with 50,000 queries (rounded to the nearest thousand) is part of the “uncategorized risk” list, and is delayed by 3 to 6 months, whereas .engineering (Rank 282) with 49,000 queries (rounded to the nearest thousand) is part of the “low risk” list, and can proceed without any significant delays. What this means is that there may have been a handful of more queries for .bio (as little as 344 more queries) than for .engineering, which somehow made .bio appear more “risky” to ICANN than did .engineering.

Another validation for the fact that this division is unjustified comes from Digicert, one of the world’s largest CAs, which is also a founding member of the CA/Browser Forum. In a presentation conducted specifically to discuss the Name Collision issue on the 22nd of August 2013, Digicert’s associate general counsel, Jeremy Rowley stated that Digicert’s opinion was that the inclusion of 20% of all proposed strings in the “uncategorized risk” segment was unnecessary (See Slide 9 of the presentation). In fact, it was verbally stated that only the top 14 of the proposed gTLDs from the Interisle report are substantially risky. An opinion that bears thinking about…?

Stay tuned for Part 2 of this post!

By Shweta Sahjwani, Manager, Strategic Partnerships at Radix

Filed Under

Comments

Quality Kevin Murphy – Sep 4, 2013 12:03 AM

The Interisle report does look at the source IP addresses and the second-level domains queried, doesn’t it?

# 1 Reply | Link | Report Problems

Hi Kevin,The Interisle report does give counts Shweta Sahjwani – Sep 4, 2013 5:29 AM

Hi Kevin, The Interisle report does give counts of source IP addresses for the top 35 ranked proposed TLDs. Not the rest. My intention was to convey that ICANN staff should have taken such counts (for all strings) into consideration before making their sweeping recommendations for hundreds of strings together. Apologies for the incorrect wording. Thanks, Shweta

# 2 Reply | Link | Report Problems

Hi Shweta -The Interisle report makes the Lyman Chapin – Sep 4, 2013 7:14 PM

Hi Shweta -

The Interisle report makes the same argument that you do in your point 1 in as many places as we could—for example, in the Executive Summary: “The risk associated with delegating a new TLD label arises from the potentially harmful consequences of name collision, not the name collision itself”; and in Section 8.3.3: “Properly calculating the risk of delegating a proposed TLD in this category [Calculated Risk] would require an investigation of the context(s) in which the corresponding string is currently used and the circumstances under which it might collide with a syntactically identical delegated TLD.”

And a nit: with respect to your point 3, we used the terms “low risk” and “calculated risk” (not “uncategorized risk”).

# 3 Reply | Link | Report Problems

Hi Lyman,Point well taken. I wouldn't take Shweta Sahjwani – Sep 5, 2013 5:02 AM

Hi Lyman, Point well taken. I wouldn't take anything away from the Interisle team's output. It did a great job within the limited time frame and scope offered to it. I believe that at some level, even ICANN staff is probably aware that query counts should not be the only metric used to make such far reaching decisions. Evidently, my point is reinforced by the fact that the Interisle report also says that there is a requirement for investigating the context(s) in which strings are currently used among other things. Ideally ICANN staff should have considered this before announcing their "lets delay 279 strings" plan. Thanks, Shweta

# 4 Reply | Link | Report Problems

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet