Home / Blogs

Unregistered Gems Part 6: Phonemizing Strings to Find Brandable Domains

The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregistered domain names, in order to find examples which may be compelling candidates for entities looking to select a new brand name (and its associated domain). The previous instalment of the series1 looked at the categorisation of candidate names according to the phonetic characteristics of its constituent consonants, using a simple one-to-one mapping between each consonant and a corresponding phonetic group.

In this study, I explore the use of a more formal phonetic representation of each string, involving its conversion to its IPA (International Phonetic Alphabet) representation2. This has a number of advantages over the previous approach, including the ability to properly handle differences in pronunciation of particular characters according to their context, handling of character combinations, and the ability to generalise the approach to strings of arbitrary consonant/vowel patterns and length.

Framework

As in the previous study, the strings are classified according to the phonetic categories of their constituent consonants, but with all vowel sounds just combined into a single group. This approach follows from the assertion that the consonants comprise the core ‘structure’ of the word, and avoids having to handle the more complex nature of vowel sounds (such as the presence of vowel diphthongs, variations in length (i.e. ‘long’ vs ‘short’ sounds), and the impact of the accent of the speaker (noting that the IPA conversion tool used is based on American English)).

The consonant sounds are divided into the following groupings, again following the framework used in the previous study, and with the phoneme symbols taking their usual IPA meanings3, 4.

Table 1:Groupings assigned to individual consonant phonemes as used in the analysis
Top-level groupGroupTypeConsonant phonemes
1 (plosive)1ABilabial plosiveb, p
1 (plosive)1BAlveolar plosived, t, ɾ
1 (plosive)1CVelar plosiveɡ, k
2 (nasal)2ABilabial nasalm
2 (nasal)2BAlveolar nasaln
2 (nasal)2BVelar nasalŋ
3 (fricative)3ALabiodental fricativef, v
3 (fricative)3BDental fricativeθ, ð
3 (fricative)3CAlveolar fricatives, z
3 (fricative)3DPostalveolar fricativeʃ, ʒ
3 (fricative)3EGlottal fricativeh
4 (approximant)4ALabial-velar approximantw
4 (approximant)4BRetroflex approximantɹ, r5
4 (approximant)4CPalatal approximantj
5 (lateral approximant)5AAlveolar lateral approximantl
6 (affricate)66APostalveolar affricateʧ, ʤ

Any string can then be represented as a ‘code’ (the ‘word type’), comprising the top-level group numbers of the consonants (and with any vowel sounds, or sequences of consecutive vowel sounds, denoted simply as a ‘V’), expressed in the order in which they appear in the string.

For example, therefore, the string ‘rolex’ is encoded in IPA representation as ‘ɹoʊlɛks’ which is assigned word type 4V5V13.

Analysis

By analogy with the previous study, it is informative to again consider the same set of 2,000 most popular 5-character (by second-level domain name, or SLD—i.e. the part of the domain name to the left of the dot) names offered for sale on the domain marketplace Atom.com7 (by virtue of which inclusion they have independently been deemed to be attractive from a brandability point of view), to determine any patterns or common word types within this dataset.

There are actually 627 distinct word patterns represented in this dataset (noting that there are 7 distinct groups into which the phonemes can be assigned (cf. 6 in the previous study), and that there is here no upper limit to the total possible length of the word ‘code’ representation), of which the top ten are shown in Table 2.

Table 2:Top ten word types represented in the dataset of 2,000 most popular 5-character (made-up, up to two syllable) names on the Atom.com domain marketplace
Word typeNo. domains
3V13V62
3V3V62
1V13V48
1V3V47
3V1V35
4V3V33
4V13V32
3V35V31
3V23V30
1V1V29

Accordingly, there are 62 of the 2,000 domains whose (SLD) names fit the most common word-type pattern (3V13V) represented amongst this set of popular domains, which are listed below.

Word type 3V13V:

  • vodzy
  • hixxi
  • xaxxy
  • hydso
  • votvy
  • zeexo
  • soaxy
  • vybsy
  • fiexa
  • zotvo
  • vapzi
  • vegvy
  • xoxxy
  • hooxo
  • virxi
  • huxxo
  • vebsy
  • xaxor
  • huxxa
  • vitvy
  • xuxxo
  • zogzy
  • vapzy
  • veexy
  • vuxxy
  • cerxa
  • hoxor
  • vuxoo
  • cixxa
  • fauxo
  • huxee
  • zirxo
  • hauxa
  • zetza
  • vudzi
  • phexy
  • zuxxa
  • zepvi
  • vycci
  • foxxu
  • vauxa
  • zatva
  • serxa
  • fotvi
  • suxxa
  • zopzy
  • voixi
  • zopvi
  • fexie
  • suxxo
  • sedza
  • cipza
  • cexxi
  • ciexa
  • fudfy
  • vodvi
  • fabvy
  • zepfy
  • huxey
  • vuxxi
  • vibsi
  • zabvi

Discussion

As discussed in the previous instalment, this type of analysis may allow steps towards the development of a set of ‘guidelines’ as to which types of word types (i.e. sound patterns) might constitute the most preferred names from a brandability point of view. If so, these ideas could be used as a basis for filtering large datasets to identify possible candidate names of interest. One downside to this approach is that, as with the use of phonotactic analysis8, the framework presented here involves the conversion of each string to a phonetic representation, which is computationally relatively slow. However, unlike phonotactic analysis, this new methodology provides a basis for a more granular clustering of candidate names, and potentially (providing the preferred word types are correctly selected) may provide a more effective ‘mapping’ between candidate names and their potential desirability.

If (for example) we assume that word type 3V13V is a ‘good’ pattern for brandable names, it is informative to investigate its use as a filter. For illustration, we can consider the set of unregistered .com names of the form CVCCV (‘C’ = consonant, ‘V’= vowel) from the original study in this series, using the subset beginning with the letter ‘s’ (a ‘group 3’-type sound) as an example. There are 6,044 such names. Of these, 567 (9.4%) are found to be of word type 3V13V9, and it might be reasonable to assume that (at least some of) these may be candidates for brandability which are at least as credible as the names taken from Atom.com listed above. Some examples of the names in this new filtered dataset include sagsy, sedsi, sicsy, sodsy, sudci, suqsy, sybzi, sycci, sygzy, syksi and sytzo.

NORDVPN DISCOUNT - CircleID x NordVPN
Get NordVPN  [74% +3 extra months, from $2.99/month]
By David Barnett, Brand Protection Strategist at Stobbs

Filed Under

Comments

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

Related

Topics

IPv4 Markets

Sponsored byIPv4.Global

Brand Protection

Sponsored byCSC

New TLDs

Sponsored byRadix

Domain Names

Sponsored byVerisign

Cybersecurity

Sponsored byVerisign

Threat Intelligence

Sponsored byWhoisXML API

DNS

Sponsored byDNIB.com