At GoDaddy, we have developed a new machine learning system that is able to beat human experts at predicting aftermarket domain name sale prices. This post dives into to some of the technical details of how we use deep neural networks to accomplish this prediction task. We will start by exploring some of the challenges of domain valuation in the aftermarket.
What is a name worth?
This can be a very difficult question to answer, but it is one that domain name investors need to answer every day. If you compare valuing a domain to valuing stocks, you can begin to see the difficulty of the problem. With stocks, you have vast amounts of data about a company’s financial performance, but with a domain name you have at most 63 characters. With stocks, there are millions of shares and a robust bid/ask spread, but with a domain name there is only one domain and once it is sold it may never be available again. In some sense, buying domains names is closer to buying unique pieces of art. It is a market dominated by experts who have developed the skill to value domain names through years of experience.
A simple answer to the question of what a name is worth is that it is worth what a buyer is willing to pay for it. However, in domain name investing there are two types of buyers. The first type of buyer is the domain name investor. Domain name investors are looking for good value in order to resell a name and typically either register new domain names or use auction websites, such as GoDaddy Auctions, that have much lower prices. The second type of buyer is individuals or businesses intending to use the domain name to build a website. If this type of buyer is willing to buy a premium name (as opposed to finding an unregistered one), they may be willing to spend orders of magnitude more money than the domain name investor. However, for a specific name, there is low probability this second type of buyer will exist. In some sense, what many domain name investors are trying to estimate is: 1) the probability there will be a buyer within a specific time period, and 2) how much that buyer will be willing to pay if she or he appears. The domain name investor will pay far less for a name because she or he must use the proceeds from the small percentage of the portfolio that sells to fund the names that don’t sell.
Different sellers have distinctive strategies when listing domain names. The following table shows the average ask price of the top ten sellers (by number of priced listings) on Afternic. It also includes a portfolio quality score (normalized predicted sale price) generated by the new GoDaddy Valuation model, which will be discussed in more detail below.
|Average Ask Price||Portfolio Quality||Ask Price / Quality|
Average ask prices vary by almost two orders of magnitude between different large sellers. Some sellers price domains higher. These sellers will make fewer total sales, but each sale will be large. Other sellers focus more on volume and price their domains lower. There is a complex trade-off between sales magnitude and sales velocity. This high variance in strategy combined with haggling in the sales process makes pricing domains especially difficult. If you ask each of these sellers what a name is worth, you may get 10 different answers.
Valuation with machine learning
There has been a history of using machine learning to value domains. In 2012, GoDaddy released its first attempt at automated domain valuation. This system had underwhelming performance and thus was not well received. The most popular automated domain valuation system with domain investors is Estibot, however, most investors are not satisfied with its performance as well. It is often used to find outliers in hand pricings and not used as a primary source. Part of the issue is the error rate of the tool, while another major issue is each seller has their own strategy and view on value, so there no single valuation which will make every investor happy.
Since May 2016, GoDaddy has been working on a novel approach to domain valuation. This approach uses deep learning to leverage the vast amount of training data available only to GoDaddy as the world’s largest domain name marketplace. State of the art techniques and unrivaled amounts of data have allowed GoDaddy to advance the state of the art in automated domain valuation.
To measure how well the new GoDaddy Valuation works, we asked four industry experts to predict the sale prices for 1,000 Afternic sales from late 2015. These sales were hidden from the model during training. We also compare to Estibot and the prior GoDaddy model. The following chart compares the mean error of humans and machine predictions:
Measured by mean error, the new GoDaddy model is 1.3x better than the best of the four human experts, 1.5x better than Estibot, and 1.6x better than the average of all four human experts. Different metrics such is root mean squared error, median error, explained variance, R2 score, and ability to sort domains by value lead to similar conclusions.
How does it work?
The first step is to tokenize the domain name into a series of words. This can sometimes be ambiguous, for example, bostonspark.com could either be Bostons Park or Boston Spark. To build a robust tokenizer, we need training data which can provide a true tokenization of domain names. To get this data, we crawled the web and used the content on each website to figure out the correct tokenization of their domain names. We use this data to build a vector representation of words and a tokenization model that is able to correctly resolve ambiguity as well as obscure words and phrases. We also use this tokenization and vector representation of words as an input to our domain valuation model.
The next issue that the model must solve is modeling the context of a sale. Just as different sellers have drastically different strategies, different marketplaces also have very different pricing trends that can change over time. For example, GoDaddy Auctions, used mostly by domain investors, has an average sale price of $170, while the Afternic reseller network has an average sale price of over $1,500. These differences in sale context can often account for more of the differences in price than the names themselves.
We simultaneously build a model of the sale context contribution to the price and a model of the domain name contribution to the price. This modeling of sales context allows us to train on data from a large variety of sellers and contexts and extract meaningful insights about domain prices across data sources. It also allows our model to output different price predictions for almost 300 different contexts where the sale might have happened.
There was also a large amount of feature engineering that went into building the model. Many of the model’s features come from the words, characters, and top-level domain (TLD). These features include detailed dictionaries spanning six different languages. As well other features that detect places, people, and things by using data from external sources such as Wikipedia. We also use information about how other similar domains with different TLDs are being used. In total, there are hundreds of different features that go into the model.
To put all of this together, we use a recurrent neural network (RNN) to process the words and per-word features of a domain. The output of this RNN feeds into a fully connected neural network to process the sale context and domain-level features. The following figure shows a simplified structure of our neural network:
Coming in 2017
The new GoDaddy Domain Valuation Tool is not yet publically available. We have given some Premier Service Customers and domain industry press a sneak peek. You should expect to see new domain valuations in various parts of GoDaddy in 2017. We hope that by providing quality domain valuations tools we can expand the domain investing community by making it easier to buy and sell aftermarket domain names.
Interested in working on challenging project like this? Check out the GoDaddy Jobs page to learn about all our current openings across the company.