|
Co-authored by Dr. David Barnett, Brand Monitoring Subject-Matter Expert and Justin Hartland, Global Director of Account Management at CSC.
A domain name consists of two main elements: the second-level domain name to the left of the dot—often consisting of a brand name or relevant keywords—and the domain extension or top-level domain (TLD) to the right of the dot. Domain names form the key elements of the readable web addresses allowing users to access pages on the internet and also allowing the construction of email addresses.
There are different types of TLDs, including generic or global (gTLDs), that were originally intended to provide a description of the site type, such as .COM for company websites or .ORG for charitable organizations. There are also country-code TLDs (ccTLDs) for specific countries, e.g., .CO.UK for the U.K., .FR for France, etc. Finally there are a range of new gTLDs that have launched since 20131, usually relating to specific content types, business areas, interests, or geographic locations (e.g., .SHOP, .CLUB, .TOKYO). Each TLD is overseen by a registry organization, which manages its infrastructure.
Domain names are associated with the full spectrum of internet content, from legitimate use by brands or individuals, to infringing or criminal activity. CSC has observed that certain TLDs get used more for egregious content.
There are several possible reasons why particular TLDs are more attractive to infringers, including the cost of domain registration, and difficulties in conducting enforcement (takedown) actions against infringing content. TLDs operated by certain registries, like those offering low- or no-cost domain registrations or those with lax registration security policies, are more likely to be used for infringing activities. Additionally, domain extensions lacking well-defined, reliable enforcement routes like .VN (Vietnam) and .RU (Russia) prove to be especially high risk. Other factors are also significant; for example, a country’s wealth affects the levels of technical expertise of internet service providers (ISPs) and, therefore, the likelihood of domains being compromised.
In this two-part blog post, we aim to quantify the threat levels associated with specific domain extensions, i.e., the likelihood that a domain on a particular TLD might be registered for fraudulent purposes.
Determining the overall threat frequency for each TLD is useful in several ways:
For this first post, we analyzed data from CSC’s Fraud Protection services to uncover the TLDs associated with domains used for phishing activity. The analysis covers all sites detected between November 2021 and April 2022 for those TLDs with more than 10 phishing cases and where domain-based phishing cases were recorded (as opposed to subdomain-based). This yielded results for 115 distinct TLDs.
In addition, we also consider the frequency of domain use associated with threatening content across the TLD in question. We do this by expressing the raw numbers as a proportion of the total number of domains registered across the TLD2. We then normalize the data, so the value for the highest-threat TLD is 1, with all other values in that dataset scaled accordingly. It’s important to note that this value reflects the proportion of malicious domains across each TLD, rather than absolute numbers. Some other TLDs see high numbers of infringements by virtue of the total numbers of domain registrations across these extensions. Table 1 shows the top 20 TLDs represented in CSC’s phishing dataset (by absolute numbers), together with the normalized threat frequencies for these TLDs.
TLD | % of total phishing cases | Total no. of regd. domains across TLD | Normalized threat frequency within dataset |
.COM | 45.7% | 221,858,334 | 0.014 |
.ORG | 6.9% | 15,550,733 | 0.031 |
.APP | 6.2% | 1,155,807 | 0.377 |
.NET | 4.8% | 19,773,315 | 0.017 |
.XYZ | 2.5% | 10,841,304 | 0.016 |
.RU | 2.5% | 10,627,033 | 0.016 |
.CO | 2.1% | 4,110,132 | 0.035 |
.CN | 1.7% | 25,147,816 | 0.005 |
.ME | 1.3% | 1,669,800 | 0.054 |
.DEV | 1.2% | 391,929 | 0.222 |
.BR | 1.2% | 5,519,378 | 0.015 |
.TOP | 1.2% | 8,830,142 | 0.009 |
.IO | 1.1% | 923,588 | 0.085 |
.IN | 1.1% | 3,271,337 | 0.023 |
.PAGE | 1.0% | 368,474 | 0.195 |
.ID | 0.9% | 760,240 | 0.080 |
.ICU | 0.8% | 7,956,385 | 0.007 |
.INFO | 0.8% | 7,852,896 | 0.007 |
.DE | 0.7% | 22,881,115 | 0.002 |
.KE | 0.7% | 165,907 | 0.288 |
We’ve observed similar patterns in other analyses of threatening content. Interisle’s “Malware Landscape 2022” study found that the top 10 TLDs associated with malware domains also featured a mix of legacy gTLDs (.COM at position one, .NET at five, .ORG at six, and .BIZ at 10), new gTLDs (.XYZ at position two, .CLUB at seven, and .TOP at nine) and ccTLDs (.BR, .IN, and .RU at positions three, four, and eight, respectively)3. Eight of these 10 extensions feature in the top 14 of CSC’s phishing list above. Similarly, the Anti-Phishing Working Group’s (APWG’s) “Phishing Activity Trends Report” for Q4 2021 analyzed top phishing TLDs, with a top nine including new gTLDs .XYZ, .BUZZ, and .VIP, and ccTLDs .BR and .IN, alongside legacy gTLDs.
New gTLDs were more than twice as extensively represented in the dataset as would be expected purely based on the total number of domains registered across these extensions. A Q1 2022 study by Agari™ and PhishLabs also showed similar patterns, where the top 10 TLDs abused by phishing (by number of sites) included the new gTLDs .VIP, .XYZ, and .MONSTER, and ccTLDs .BR, .LY, and .TK5, 6.
Table 2 shows the pattern is rather different when looking at the top TLDs by their normalized threat frequency; the list is dominated by a distinct set of ccTLDs, a smaller number of new gTLDs, and excludes many of the more popular TLDs shown previously.
TLD | Normalized threat frequency within dataset | Total no. of regd. domains across TLD | % of total phishing cases |
---|---|---|---|
.GD | 1.000 | 3,306 | 0.05% |
.GY | 0.910 | 4,037 | 0.05% |
.MS | 0.739 | 9,440 | 0.10% |
.ZM | 0.531 | 4,838 | 0.04% |
.APP | 0.377 | 1,155,807 | 6.21% |
.LY | 0.356 | 25,801 | 0.13% |
.KE | 0.288 | 165,907 | 0.68% |
.DEV | 0.222 | 391,929 | 1.24% |
.PAGE | 0.195 | 368,474 | 1.03% |
.UG | 0.187 | 10,810 | 0.03% |
.SN | 0.187 | 9,842 | 0.03% |
.DO | 0.176 | 30,215 | 0.08% |
.BD | 0.127 | 37,465 | 0.07% |
.SBS | 0.120 | 44,222 | 0.08% |
.NP | 0.112 | 57,379 | 0.09% |
.SH | 0.110 | 25,070 | 0.04% |
.NG | 0.097 | 240,668 | 0.33% |
.IO | 0.085 | 923,588 | 1.11% |
.ID | 0.080 | 760,240 | 0.86% |
.SA | 0.079 | 60,246 | 0.07% |
In the second article in this series, we compare these findings with those from additional datasets to produce an overall measure of TLD threat frequency, considering a range of fraudulent uses. We then consider cybersecurity implications, discuss mediation measures, and cover how CSC can help with this process.
Sponsored byVerisign
Sponsored byRadix
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byDNIB.com
Sponsored byIPv4.Global
Sponsored byCSC
How are we counting domains here? The numbers in your “Total no. of regd. domains across TLD” column appear to be way off if we’re talking about a snapshot at a given moment in time. Verisign has never reported .com numbers as high as 221 million. It’s currently around 160 million and .net is around 13 million. Most of the other domain counts appear to be far too high also.
All overall TLD stats are taken from https://domainnamestat.com/statistics/tldtype/all. Even if their numbers turned out to be consistently (for the sake of argument) ~25% too high, this wouldn't affect the overall findings, since all ratios are normalised anyway.
That doesn't appear to be the case. Your number for .ru, for example, is more than double what the registry reports, while your number for .page is more that five times larger than what the registry reports. Meanwhile, your number for .br only appears to be about 10% off.
I can't vouch for the accuracy of their numbers, but even if they're only broadly correct - to, say, an order of magnitude - it won't significantly change the overall conclusions - particularly in Part 2 of the article, where we combine the findings with those from other independent datasets. Where are you getting your stats from?
I get my stats from the registries. Directly in the case of ccTLDs. Vicariously from ICANN in the case of gTLDs. Why are you basing your analysis on domainnamestat.com? Do you know who runs that site or what their methodology is? I certainly don't. This is pretty basic stuff mate.
Many of those TLD counts are wrong. Not ~25% in error. Simply wrong! These are the domain name counts for .COM and .NET as of this morning.
https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml
The .COM is at 160,593,240 and the .NET is at 13,226,928 domain names. The registry reports for the ICANN gTLDs are available from ICANN’s website.
https://www.icann.org/resources/pages/registry-reports
Many ccTLD registries publish their counts on their websites such as DEnic.
https://www.denic.de/
As Kevin said above, it is pretty basic stuff. The .COM has never been at 221M registrations. Some of the figures for domain name counts are multiples of the actual domain name counts for those TLDs. The claim in the footnotes that the statistics are correct as of June 13th, 2022 is simply wrong. Trying to calculate the frequency of abusive registrations in a TLD generally requires the number of domain names in that TLD.
There are other methods of taking samples of a TLD and checking for the occurence of abusive registrations in that sample. It can provide valid estimates of abuse in a TLD. I’m not sure that I’ve ever run across a method that calculates the frequency of abusive registrations in non-existent domain names.
.COM is at 160,593,420 this morning (apologies for slight error due to lack of coffee). The active figure, the number of domain names in the .COM zone file, is 158,670,053 and in the .NET zone, the figure is 13,029,731. The number of live domain names in a TLD differs from the overall number because some domain names are going through a deletion cycle or have no associated nameservers. The zone files for the gTLDs are available from ICANN’s CZDS website on approval by the registries.
https://czds.icann.org/home
Those zone files are updated daily.
Most of the ccTLDs do not provide access to their zones files but generally do publish live, monthly, quarterly or yearly statistics on the size of their TLDs.
Obviously for our formal domain-monitoring services at CSC we do use the full zone-file data downloaded from the individual registries.
Really what we were looking for, for this part of this study, was a convenient resource where we could find estimates of the overall size of all TLDs in a single place (which was the rationale for going to domainnamestat.com) - it really isn’t intended as anything more than a ‘quick-and-dirty’ estimate (and clearly the numbers have turned out not to be too robust!) but, as long as the figures are broadly correct to the order of magnitude, the overall findings are not significantly affected. (The footnote denotes that the numbers were consistent with those given on the site on 13-Jun-2022.)
The problem is that some of the figures were wrong even where the domain name counts and zone files were public. The ICANN CZDS provides a single site where all (except .AERO and .POST) zone files can be downloaded. If as, in some cases, the counts are double or more of the domain names that are in the actual zone files then this will affect the frequency calculations and underestimate the level of abusive registrations.
The .INFO had approximately 3.6M registrations in June 2022 and the figure above is claiming it had 7.85M. The .XYZ had approximately 4.25M and the figure above is for 10.84M. The .TOP had approximately 1.75M in June (1.99M by the end of June 2022) and the figure above has it at 8.83M. The .ICU had approximately 1.09 in the zone in June 2022. The figure above is for 7.96M.
The ICANN registry reports also seem to have an error on .APP compared to the zone file. (I ETLed the complete ICANN registry reports set from July 2001 to September 2022 as part of comparing the data with the ICANN Open Data Project dataset and for market analysis work.) There is always a difference between the registry gTLD total in the reports and the count in the zone files for active gTLDs but some of the counts in the figures above are multiples of the actual zone file counts. That makes the frequency calculations highly problematic. A lot of abusive registrations shifted from .COM to the heavily discounted new gTLDs. Those abusive registrations typically last for one year and are not renewed because in the heavy discounting model, the first renewal fee is at full fee whereas the discounted fee might only have been $1 or less. There was a very good paper by SIDN Labs (affiliated with the Dutch .NL registry) on this a few years ago. Free or heavily discounted TLDs are always going to attract bad actors because they change the economics of the activity. That is one of the main factors in abusive registrations and DNS Abuse.
These are interesting articles and provide a lot of food for thought.