|
This part 3 of the selecting a back-end registry service provider series focuses on Whois and sharing data in new gTLDs (see part 1 and 2)
If you’ve ever looked up information about a domain name you’ve used a Whois service. It’s the public information system about contact information for a domain name or IP addresses, though in this article, we will just talk about domain name Whois.
In some generic and sponsored Top Level Domains (gTLDs), Whois is run authoritatively by the gTLD. In older gTLDs such as .com and .net, the authoritative Whois service is run by the registrar responsible for the domain name. While some TLD operators run their own infrastructure, when a TLD operator uses a back-end service provider, that provider also provides the TLD Whois service. This public information system is of interest to law enforcement agencies and bodies, attorneys and courts, those studying the commerce of domain names, and those trying to address technical administration issues. It is typically operated as an open system that anyone can query. However, that very well could change over time and in certain circumstances, as I explain later in this article.
How you access Whois
Most people query Whois information through a web-based query page, usually through a registrar’s Whois website, as well as those of gTLDs. The information returned is typically only relevant to the registrar’s offered gTLDs, but there are also more generic Whois query tools. (Whois also has a machine level interface offered for querying on Port 43, which is what those nice web-based Whois query pages are actually talking to behind the scenes.)
What does Whois provide?
The content returned in different registry and registrar implementations of Whois can vary in how the output is displayed, but they are all more or less providing the same area of information. Whois services return contact information by area of responsibility for the domain: technical, administrative, billing or finance, along with the registrant of the domain, the registrar, and registration date. The contact information itself typically contains items such as name, address, email and phone numbers.
Would you like your Whois, thick or thin please?
Thick Whois: Gathered by the registrar during the registration process, this information is stored in the registry of the gTLD operator, which is responsible ensuring the data is valid. Most gTLDs conduct periodic Whois compliance audits rather than a complete real-time validation of submitted Whois data. Even if a gTLD offers thick Whois, registrars are required to maintain their own Whois service for their domain names. Since the registrar has the ability to update the related Whois information in the actual registry in near real-time, it is expected that the registrar maintain synchronized Whois data output between what their Whois service offers and what the gTLD Whois service offers.
Thin Whois: This really only applies to .com, .net and certain ccTLDs where the gTLD’s Whois offers much less information about the domain. Its primary value is to point to the registrar’s Whois service, where one should expect to find the detail Whois we see in a thick Whois. In this model, the registrar’s Whois service is authoritative and must remain in compliance with ICANN’s Whois data output requirements.
Why is this model not as desirable? It comes down to compliance monitoring. It’s easier to hold a number of gTLD’s accountable for Whois compliance under the thick model than to run periodic audits on many registrars for the many gTLDs they may service.
Privacy
Local privacy laws and practices in a global operating environment remain a challenge. Requiring full public Whois output can violate privacy rights of the region/jurisdiction where a registrant operates.
ICANN allows for exceptions to their requirements for thick Whois contact data where local laws contravene those requirements. This means that, theoretically, a gTLD might have to treat the Whois output of a registrant differently based on their residence or in relationship to the corporate home or operating region of the gTLD itself. It’s clear there will be some variation in the way gTLDs approach Whois output as a result of these issues.
Whois proxy services have been offered by registrars for some time now. These are services that provide indirect contact information for those Whois contact areas previously mentioned. For example, instead of putting the real registrant’s email address, the email address in the Whois output simply may be a forwarded email address. It still allows you to reach the registrant, but likely it’s first filtered by the registrar to see if it’s a valid request related to the domain. This product was born out of domain commerce parties mining Whois output for email contacts and incorporating those emails in various email marketing campaigns—some for legitimate products and some not.
Operating a robust Whois service in the new gTLD environment
Operating a solid thick Whois has a number of upcoming challenges. Whois is frequently a target of companies looking to mine the data. This is done by first downloading daily zone files for a given gTLD, which is free to the requestor and an ICANN required provision by gTLDs. These companies then use automated tools to systematically query the list of active domains and collect contact information for commercial purposes. Unfortunately, Whois queries can be quite small in comparison to the large amount of output the reply generates. This means someone mining Whois can readily apply load on the gTLD Whois servers. In short, an unprotected Whois server is easily knocked over with excessive load.
A good back-end registry service provider will have a plan to address this. Most apply a combination of Anycast network based Whois services with significant infrastructure capability and, mostly important, a source-based rate-limiting system to control how quickly a data miner can submit automated queries. Ask your back-end registry service provider what they can do for you and make sure those capabilities are reflected in your Abuse and Access policies in your Whois Service.
A Future for Whois
The changing environment of our Internet is bringing great new opportunities but also new challenges for Whois. For example, one problem is that new Internationalized Domain Name (IDN) TLD registries can’t offer contact information in the native characters those IDN registries support in their domains. Another problem is that traditional source based rate-limiting, currently effective against data-miners, is not effective in the burgeoning new IPv6 number space.
Whois capabilities being considered are tiered permissioned access to Whois services with related variable output to reflect the different needs of Whois consumers and localized privacy issues. Both consumers and providers alike have expressed an interest in an industry-wide, standardized Whois output structure for some time.
Work is underway in several areas to address a number of these shortcomings in Whois optional functionality. Some recent examples of these efforts include ICANN’s Internationalized Registration Data Working Group (IRD-WG), various ICANN project groups working on specific IDN TLD implementation script issues, the WHOIS-based Extensible Internet Registration Data Service (WEIRDS) discussion list in the Internet Engineering Task Force (IETF), and ICANN’s Whois Survey Working Group (concerned with Whois functional requirements).
The most important message a potential gTLD applicant can take away on Whois is this: Expect that the once “simple” service will become a much more complicated. Anticipated new functionality in Whois and integration of that functionality into your related Abuse and Access policies should be addressed by the back-end service provider you are considering.
Sponsored byCSC
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byRadix
Sponsored byDNIB.com
Sponsored byWhoisXML API
Sponsored byVerisign
Excellent write-up, Michael.
Whois is one of the five critical registry functions that a gTLD registry must provide, but is easily overlooked.
A few notes / nits:
“In older gTLDs such as .com and .net, the authoritative Whois service is ...” shared between the registry’s and registrar’s whois servers. The former serves authoritative data like statuses, sponsoring registrar of record, creation, expiry, updated dates of domains and hosts. The registrar serves the contact data for which it is authoritative.
“an unprotected Whois server is easily knocked over with excessive load” - unless the Whois server was implemented in an inefficient manner (allowing computationally or I/O intensive queries) the only real “load” a provider sees is bandwidth usage, which could well turn into a DoS due to the query-response amplification that you mentioned. Respectable providers should have ample safeguards against it.
Great point about rate-limiting being ineffective for IPv6.
I would also highlight an important takeaway for potential applicants - the need to verify local privacy laws to ensure that your back-end service provider is able to customise its Whois service to ensure compliance.