Author Image Chris Ambler

Seamlessly Connecting Domains to Services with Domain Connect 2.0

Domain names are meant to be used. At GoDaddy, we want to empower our customers to build their dreams, and one way we can measure that is by seeing the number of active domains that are out there. Thus, any technology or product that promotes usage is a good thing, as active domains also mean renewals. With so many services available online that need domain names and the Service Providers who offer them (including, for example, Wix and Squarespace for easy website creation or Microsoft’s Office 365 Email offerings) providing an easy way for our customers to attach domain names to services is a huge win.

What’s the Problem?

Often, configuring just the DNS for a domain to be used by these services can get complex, especially for novice users. If I had a dollar for every time I had to explain to a friend or family member what an “A Record” is, and why they need to change it, I’d be flush in ice cream. Simply explaining the reason that a domain needs to be connected is enough to turn a customer off to doing so. Having a simple way for a customer to “connect” a domain name to a service – or even multiple services – without having to worry about the technical aspects is critical. In other words, presenting customers with opportunities to use their domains without exposing them to the confusion of the underlying infrastructure creates a seamless experience and an expectation that things will “just work” every time.

As that expectation becomes mainstream, future developments in the Internet of Things make the implications even more staggering. Imagine if every home cable router came with an easy way to purchase and configure a domain name for the connected home? Or if the current Augmented Reality craze (as seen with Pokémon Go) let you purchase and configure a discoverable domain for your involvement.

Domain Connect 2.0 is a simple way to make the connecting of domains to services as seamless and easy as possible. Through a standardized protocol, implementation at both the Service Provider level as well as the DNS Provider level is consistent and predictable. Further, using templates to accomplish the configuration, DNS providers can ensure that changes have a lesser chance of causing problems or exposing unregulated DNS record manipulation to the outside world. Domain Connect 2.0 is a GoDaddy innovation that has been submitted to the IETF as an open standard, and our implementation, as a DNS Provider is underway.

Domain Connect has been implemented as a simpler synchronous web-based offering already, and the 2.0 version expands the concept with a better authorization strategy as well as an asynchronous REST-ful API. In this blog entry about the new version, the 2.0 is implied. Or, implied val domainConnect: Double = 2.0 as we say in Scala.

So What Are Templates, Anyway?

I called out templates as a benefit, but what are they and why do we use them? Templates solve a critical problem in configuring DNS: unregulated changes to the DNS and a lack of predictability are bad. In the simplest implementation, a protocol for making changes to the DNS could simply say, “tell me what records you want me to create,” and a Service Provider could pass in information that essentially says, “please create an A Record with a host name of ‘www’ and a value of ‘192.168.42.42’.” This would work, of course, but could cause some foreseeable problems. What if there were already a host with that name? What if that name was reserved? What if the host name requested was known to cause problems? While these issues still exist in the world of templates, it is easier to know ahead of time that there could be a problem and simply not allow such things in a template. Put another way, templates can have the host names pre-defined and sanitized by both the Service Provider as well as the DNS Provider. In fact, creating a human touch-point in onboarding new templates means that someone can eye-ball them for such problems. Once configured, a template would prevent a Service Provider from making arbitrary changes to DNS, either accidentally or deliberately.

Without templates a Service Provider could request the addition or modification of a DNS record of just about any supported type and restrictions would be difficult, at best. With templates, however, if a DNS provider doesn’t wish to allow a specific (or any) Service Provider to create, say, TXT records, the DNS provider can simply not allow them in any templates. Or they can make per-provider exceptions by simply requiring all templates be reviewed and approved before being made available for use.

Finally, templates allow for predictability. A template suitable to setting records sufficient to point to a web hosting provider could be used consistently across all connecting domains. The chances of a Service Provider slipping in other record changes are reduced to zero since those records simply don’t exist in the “host this domain’s web site” template. Standard templates to accomplish common configurations can even be shared between Service Providers, making DNS Providers’ jobs easier. Overall, everyone gets a predictability and reliability boost.

In other words, templates are never gonna give you up, never gonna let you down, and never gonna run around and desert you. They are never gonna make you cry, never gonna say goodbye and they are never gonna tell a lie and hurt you.

Now that I’ve thoroughly convinced you that templates are awesome, how do we use them? While the internal implementation of templates is left up to the DNS provider, a standard JSON format is specified to make implementation consistent for all providers (and will encourage the sharing I mentioned just now). Indeed, a central repository of templates is noted as an improvement for further versions of the specification. Each template is identified by the provider and given a unique ID comprised of a composite key that identifies the Service Provider and the service being offered. Templates contain information suitable to making DNS changes. This is done through an array of records or actions for the DNS provider to modify or enact. Records note the DNS record type and any values to modify, including the ability to pass in dynamic data (such as an IP address for an A Record). DNS providers can check templates for suitability to purpose or policy before making them available to be used and can work with service providers to refine templates.

And as I mentioned, by making this a standard, we allow Service Providers to re-use templates across multiple DNS providers. Similarly, DNS providers can make a template suitable to a particular purpose available to multiple service providers, saving time and effort.

Discoverability

When a customer wishes to connect a domain, the service provider needs to know who the DNS provider is. To do this, Domain Connect specifies a TXT record be added to the DNS for a domain that specifies a URL that can be called for discovery. The service provider queries the domain for this TXT record (called “DOMAIN_CONNECT”) which, if present, indicates that the domain is served by a DNS provider that supports the Domain Connect protocol. Given the URL, a service provider can call a API endpoint for protocol discovery:

GET v2/{domain}/settings

This will return a JSON structure that contains settings for Domain Connect on the domain name specified, including the provider name and URLs to the two main methods of using Domain Connect. Rather than bloat the DNS with this record, we implement our DNS to inject it in all applicable requests, which also allows for rapid change if necessary without modifying a massive number of DNS zones.

Two Ways to Get It Done – Way the First, the Synchronous Web-Based Flow

Domain Connect provides two ways to get the job of connecting a domain done. The first is via a one-time synchronous web-based HTTP call. This flow is for service providers who want a one-time change to the DNS, and is very similar to how our first version of Domain Connect works. A user identifies the domain they wish to connect and the service provider determines the DNS provider through the discovery process. Once ensuring that the DNS provider supports Domain Connect, as demonstrated previously, the service provider simply calls a known URL and passes in the information necessary to configure a domain to their specification.

v2/domainTemplates/providers/{providerDomain}/services/{serviceName}/apply?[properties]

So a typical call might look like:

https://webconnect.dnsprovider.com/v2/domainTemplates/providers/coolprovider.com/services/hosting/apply?www=192.168.42.42&m=192.168.42.43&domain=example.com

This call indicates that the Service Provider wishes to connect the domain example.com to the service using the template identified by the composite key of the provider (coolprovider.com) and the service owned by them (hosting).  In this example, there are two variables in this template, “www” and “m” which both require values (each requires an IP address).  These variables are passed as name/value pairs in the query.

Once on the website of the DNS provider, the customer is asked to authenticate and give permission for the DNS changes to be applied. These changes to the DNS are then done via templates, which ensure that the DNS records to be applied are both already known as well as properly constrained. Once the changes are made or any errors handled, the customer is optionally redirected back to the service provider’s site, providing confirmation that all went well (or an indication of an error is passed).

web-based-flow

Way the Second – The Asynchronous API Flow

The second connection method (and new in Version 2) is an OAuth-based flow combined with a RESTful API. This is intended for service providers who want to make DNS changes asynchronously or with use cases that require multiple steps. This flow begins like the synchronous flow in terms of authenticating the customer, but instead of actually applying any DNS changes, the calling service provider is given an access token allowing them to call API functions to apply or remove a DNS change template (or even multiple templates).

This permission gives the service provider the right to apply (or remove) specific templates for a specific domain owned by a specific customer. The service provider may retain this token to apply or remove the template at any time during the token’s lifetime (or until the customer revokes permission, of course).

Service providers who want to use this flow register as an OAuth client with the DNS provider by both giving the templates that will be used as well as callback URLs that specify where customers are redirected after OAuth authorizations are done. Customers are authenticated much like the web-based flow and after giving permission to apply a template to a domain an OAuth authorization token is issued. This token can be used to request or renew a specific access token to perform API calls. The access token is passed in the Authorization Header of API requests.

The API is very simple, and contains endpoints to apply a template, remove a template, or revoke access.

Applying a template is done to a single domain and the domain is part of the authorization. While the provider ID and service ID are also implied in the authorization, these are on the path for consistency with the synchronous flows. If not matching what is in the authorization, an error is returned. The API endpoint should look familiar:

https://connect.dnsprovider.com/v2/domainTemplates/providers/coolprovider.com/services/hosting/apply?www=192.168.42.42&m=192.168.42.43

Since this is an API call, HTTP response codes are used to indicate success or error.

Reverting a template is a similar API call, but under the hood there is a bit more going on, in that the DNS provider has to ensure that the template had been previous applied (you can’t revert what you’ve not actually done!) and that doing so won’t break anything else. The specification leaves such implementation details to the individual DNS providers.

Interesting Alternatives

Another possible flow has the initiation of the connection coming from a DNS provider, such as suggesting to a customer that they might want to connect their domain to a partner service provider. In this case it’s entirely possible that the whole process be done at the DNS provider’s site and the service provider either need not be called or perhaps could be called just to notify them of the connection. Some DNS providers are, in essence, also Service Providers as well, and Domain Connect can be used 100% internally, making things much easier for customers to have a consistent experience.

In cases where a template would have dynamic elements, the flow could still be initiated by the DNS provider but then handed off to the service provider to inject the appropriate variables and the flow continue as usual.

Future Developments

While this is a 2.0 version, it is really the first iteration of this specification that connects all the dots for templated connection of domains between service providers and DNS providers. During its development there were many ideas put forth for future development. Once adopted by industry, Domain Connect has the promise to make the configuration of domains much easier for customers, driving innovation and adoption.

Come Innovate With Us

GoDaddy’s Domain Connect 2.0 is a new innovation and we are looking for awesome engineers to help us build out more cool features for our customers. If you are interested, come join us at GoDaddy Careers.

Premium Results through Elasticsearch

The GoDaddy FIND Team is responsible for services that help suggest domain names. While this sounds reasonably straightforward, it is a critical function of taking a customer from the point of being interested to making the purchase of the perfect domain name – and when it comes to premium names, the higher price point means the suggestions must be right on target. Based on an initial starting suggestion, be it an initial domain name or even just search terms, we leverage ElasticSearch to quickly and insightfully suggest names to the customer.

We can even take into account customer information like previous purchases, other domains a customer owns, or any other hint. The inventory of premium names comes from a number of different sources including names available for sale and auctions from both internal as well as partner providers. We load this data continuously and put it into ElasticSearch. From this index, our engine can query for good candidates to present to customers.

Why Should You Care?

Like most things here at GoDaddy, this process needs to be fast, accurate, and reliable. Fast and reliable can be accomplished by any modern cache or key/value lookup system. Accurate can be achieved by quite a few robust search systems; many of which don’t win any races when it comes to speed and make “reliable” a nice way to discover humor. ElasticSearch, when used properly, hits all three requirements. Learning how to use ElasticSearch provides a solid resource for collecting, analyzing, and serving data that allows our customers to make solid purchasing decisions.

GoDaddy had to configure ElasticSearch by trial-and-error, test-and-measure and, in some cases, making outright guesses to see how everything played out. ElasticSearch, while more mature now, is still somewhat of an emerging technology and learning how to use it for our specific needs was both a challenge and an opportunity.

How we use Eslasticsearch at GoDaddy

What Is ElasticSearch?

Let’s take a closer look at how we use ElasticSearch to help customers find domain names at GoDaddy and examine some of the challenges we faced and the solutions we uncovered.

ElasticSearch is a scalable, distributed search engine that lives on top of Lucene, a Java-based indexing and search engine. ElasticSearch is fast and provides an elegant API-based query system. The first thing we need to understand is how ElasticSearch defines indexes that live in shards across nodes and how it replicates data for reliability. Our first challenge was ensuring that our index was always available and returned results fast enough to support real-time search.

An ElasticSearch index is the data set upon which our searches are performed and shards are pieces of that data set distributed amongst nodes, which are individual installations of ElasticSearch typically one per machine. For the GoDaddy Domain Find team, we rebuild our index daily while also taking in real-time feeds of auction domains that inform us when names are added and deleted as well as when prices change. We have set-up a Jenkins job to bring in this data and add it to our current index throughout the day without impacting read operations. To do this, we have configured our ElasticSearch cluster to separate nodes that hold data and nodes that we make available via our API for doing searches. This way, as we’re taxing the data nodes with new data the API nodes are not impacted. We even turn off replication during large loads so that each batch loaded does not start a brand new replication operation. These have now become somewhat standard practice with ElasticSearch, but when we were first starting out, it was the Wild West! This strategy reaped the best results out of much trial and error.

Scaling Challenges

We scale our system with the addition of new nodes which reduces the number of shards per node (thus reducing load) by re-allocating nodes automatically. Indeed, most of the scaling and distribution of ElasticSearch is done automatically with hints based on your configuration. Any given node may have all shards or a subset, thereby distributing the load when a search is performed. This is a key benefit of ElasticSearch in that it does the hard work for you.

The default number of replicas is one. A replica is a replication of your index, so with a replica of one you have two complete sets of data: one to start plus one replica. ElasticSearch will ensure that a shard and its replica are never stored on the same node (provided you have more than one node, of course). If a node should go down, it will take with it either a shard or its replica, but not both. So you’re covered. Even if you have more than two nodes and two replicas, you could suffer a failure of two nodes and still have your data available.

For our system, we chose to have two data nodes and two client nodes that hold no data, but have the interfaces for performing queries. This was also trial and error. We tried two and we tried six.

The Black Magic

Deciding how many shards and replicas is somewhat of an untested black art both here at GoDaddy and also in the wild. Trial and error on the number of shards, if you have the time and patience, is an interesting exercise. We started with the default of five and tested performance. We then increased or decreased and then remeasured. As ElasticSearch matures, this area will likely receive attention from developers. At the ElastiCon15 conference this year, none of the presenters had the same configuration in terms of size and that was rather telling. Each had to determine their configuration individually based on their use cases. One thing to note is that once you create an index with a set number of shards, you cannot change it. You can create a new index, of course, but the one you’ve created has its shard count set. Replica count, however, can be changed at any time and ElasticSearch will re-allocate shards across nodes appropriately.

For our purposes in the GoDaddy Domain FIND team, we stuck with the default of five shards because our data set does not contain a huge number of documents and five shards is a decent number. If your document count is high, you would want to consider more shards to split up the data to make queries (and indexing) faster. We also found that for our use, four nodes provided enough headroom and redundancy, so we created three replications in our configuration.

Down the Rabbit Hole: Cross Data-Center Distribution

What about distributing across multiple data centers like we have? The official answer seems to be, “sure, but it’s not supported.” If the pipe between your data centers is reasonably wide and reliable, there’s no reason you can’t. We tried it with nodes in different data centers and the communication between them got bogged down and caused nodes to overload and time out – taking the whole cluster down! For now, it’s my recommendation that we avoid doing it. This, too, is something the ElasticSearch developers say they’re working on improving. Honestly, I’ll believe it when I see it.

Monitoring

For monitoring, we use a number of tools, but the best one that we like is the very simple “head” plugin.

Head Plugin Sceenshot

In the image above, we see the output from head plugin. Each index is spread across six nodes and each node has five shards. In this case, we have set the number of replicas to five, meaning we have a total of six copies (the original plus five replicas). Primary shards have a bold border and replicas do not. This tool gives us a great, but simple visual representation of our cluster and also provides for quick ad-hoc queries and modifications to our indexes.

One indicates everything is reasonably balanced, queries are fast. Ours tend to be sub-10 milliseconds which allows ElasticSearch to be used as a real-time responsive system. While we had a number of challenges in crafting efficient queries, once we got over that hurdle things have been fast and stable ever since. Word to the wise: don’t put wildcards in your queries that result in huge intermediate results. It’s not pretty.

ElasticSearch also provides a more heavyweight monitoring solution called Marvel. Our first try at Marvel was less than impressive as it put too much of a load on each node and filled the indexes with lots of monitoring information that cluttered things up. I’m told that this has improved dramatically in the past year and we’re keen to give Marvel another try.

Challenge: Indexing a Live System

What about indexing? For our team, our biggest challenge is that we need to ensure 100% uptime and that includes ensuring ongoing indexing does not impact read operations. Failing to provide solid premium domain recommendations means money left on the table. So when we rebuild our entire index every night, we do it in a creative way by indexing to only one node which is marked as never being a master node and containing only data. This is, in essence, a “write” node. We turn off replication, create the new index, and load into it. This operation takes about two hours. Once done, we turn replication back on and let the index be copied to all nodes, including those from which we read. Once that is complete, we then tell ElasticSearch about the new index and make it primary using an alias. This gives us zero downtime for reads as well as keeping the heavy lifting of indexing constrained to one node until it is done. When we add records throughout the day, we do it to that one node and let it push the updates to the read-only nodes.

Eleasticsearch at GoDaddy

For us at GoDaddy, such strategies make a lot of sense when considering how uptime, responsiveness, and indexing operations can potentially impact read performance.

Shameless Self-Promotion

The fine folk at Elastic created a video wherein they asked a number of people how they’re using ElasticSearch and why they like it. I got to ramble for a while and they used a couple clips.

If you’re interested in working on these and other fun problems with us, check out our jobs page.