Welcome to Modern Digital Business!
Aug. 29, 2022

8 Steps to higher quality DNS systems

DNS is a highly available, highly redundant, highly reliable service that is absolutely essential to your company's application and business operations. A failure in your DNS system can bring your company's business to a halt jeopardizing your company's future.

DNS is essential to the operation of all aspects of the internet and modern digital businesses. The problem with DNS, is that a very tiny mistake in a configuration file can cause ripples throughout the entire DNS system and impact all aspects of your company's operations, it's customer's ability to use the company's products and a company's ability to make money. All of it can be brought to its knees by a very tiny mistake in a single configuration entry. Without solid DNS configuration management in place, you make yourself vulnerable to simple but costly mistakes.

But how do you implement a high quality DNS hygiene solution? In this episode, I'll give you eight steps to higher quality DNS systems.

Today on Modern Digital Business.

Useful Links


About Lee

Lee Atchison is a software architect, author, public speaker, and recognized thought leader on cloud computing and application modernization. His most recent book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee has been widely quoted in multiple technology publications, including InfoWorld, Diginomica, IT Brief, Programmable Web, CIO Review, and DZone, and has been a featured speaker at events across the globe.

Take a look at Lee's many books, courses, and articles by going to leeatchison.com.

Looking to modernize your application organization?

Check out Architecting for Scale. Currently in it's second edition, this book, written by Lee Atchison, and published by O'Reilly Media, will help you build high scale, highly available web applications, or modernize your existing applications. Check it out! Available in paperback or on Kindle from Amazon.com or other retailers.


Don't Miss Out!

Subscribe here to catch each new episode as it becomes available.

Want more from Lee? Click here to sign up for our newsletter. You'll receive information about new episodes, new articles, new books and courses from Lee. Don't worry, we won't send you spam and you can unsubscribe at any time.

Mentioned in this episode:

O'Reilly Media - Building a Cloud Roadmap

Have you struggled with the cloud migration? Then you'll appreciate my live training course, Building a Cloud Roadmap presented by O'Reilly Media. Live on October 5th at 9:00 AM PDT. For more information, go to mdb.fm/roadmap or leeatchison.com/roadmap. But hurry seats are limited.

Transcript
Lee:

DNS is a highly available, highly redundant, highly reliable service, that is absolutely essential to your company's applications and business operations. Yet, DNS configurations are highly sensitive and simple mistakes can cause catastrophic problems. But how do you implement a high quality DNS hygiene solution? In this episode, I'll give you eight steps to higher quality DNS systems. Are you ready? Let's go. DNS is a highly available, highly redundant, highly reliable service that is absolutely essential to your company's application and business operations. A failure in your DNS system can bring your company's business to a halt jeopardizing your company's future. DNS is essential to the operation of all aspects of the internet and modern digital businesses. The problem with DNS, is that a very tiny mistake in a configuration file can cause ripples throughout the entire DNS system and impact all aspects of your company's operations, it's customer's ability to use the company's products and a company's ability to make money. All of it can be brought to its knees by a very tiny mistake in a single configuration entry. Without solid DNS configuration management in place, you make yourself vulnerable to simple but costly mistakes. That's where problems often occur. Why are DNS configurations so sensitive to mistakes? The root cause of this sensitivity is that DNS changes are so common and so simple that they are rarely considered risky business operations. For smaller organizations, the development team probably manages their own DNS servers or has some other way to make DNS changes on the fly. As organizations get larger and more complex, the number of DNS servers and the number of people who can make changes to them tends to multiply. With so many people, making so many changes, it's not surprising that something goes wrong occasionally. In fact, it would be much more surprising if things didn't go wrong. DNS outages can be caused by a variety of factors, including human error, software issues, and hardware failures, but the most common cause of DNS outages is incorrect configuration files being deployed to DNS servers. What steps can smaller companies without quality DNS hygiene make in order to put a high quality DNS management process in place? Here are eight things any company can do to improve their overall DNS quality to keep your applications operational and healthy. Number one. manage DNS configuration using revision control. This is the simplest and most basic thing you can do to improve the quality of your DNS infrastructure. At the core, DNS configurations are simply flat text files. Many DNS providers do give you a front end control panel to these configuration files in order to let you make changes easier, and with less actual knowledge on the impact of the changes you are making. Don't use these control panels. Instead, manage your configuration files, using the standard flat text file format. Once you have moved to this flat file format, you can easily manage these configuration files using the same revision control program you use for managing your application source code. For most companies, this is some variation of GIT. You undoubtedly have processes in place today in your company for managing your source code. Use the same or similar process for managing a DNS configuration files as well. This simple change will allow many other process improvements to come naturally, such as configuration reviews, approval workflows, and the ability to track when specific changes were made that may have impacted your application. This is an essential base necessary to keep your DNS service operating and error free. Number two. Review all needed, DNS changes. This falls right behind the first recommendation. Once you're managing your changes using a revision control program, make sure that all changes you make are reviewed and approved. This can be accomplished just like your application source code using branches, pull requests and merges. Establish a process for approvals for all DNS changes. Make sure at least one or more people review all changes before they are incorporated into your production configuration. This review process should include checks for things like syntax errors, incorrect DNS settings, and other potential problems. Problems with DNS configurations can be subtle. So the review should be thorough and methodical by a knowledgeable reviewer. Number three. Document the intent of all changes. Every change you make should be documented. If you following the above steps, then this can naturally be accomplished using the code checking commit and poll request process. This documentation will help you later if a problem exists or an incompatible change is proposed. Understanding why a previous change was made will help repair problems and help you avoid future problems. Number four. Automate the configuration deploy process. Once you have the process in place to manage your configuration files, establish a process to automate the deployment of those configuration file updates to your production DNS system. By automating this process, you reduce the likelihood of an incorrect change being pushed to production or a simple human error causing your DNS system to fail or produce bad results. If you find yourself copying and pasting changes from one configuration file to another, during a deployment process, you're much more likely to make a mistake and introduce a bug into the DNS system. Automatically deploying changes using scripts, we'll make sure the changes are applied in a consistent and reliable manner. Part of the automated system should include an automated rollback mechanism. This may be a natural extension of your revision control process or a separate deployment rollback process, but being able to quickly and effectively undo a change may make the difference between a mistake being a small inconvenience or a massive product outage. Number five. Grow into a more sophisticated change management system. As your DNS system grows in complexity, you may want to consider putting an entire change management system on top of the simple version control system that you've already established. This might include using change request forms, request for authorization, multi-team sign-offs and other such processes. These changes may seem onerous, but DNS configuration is not a place for slacking off and process. A simple DNS change can impact many teams within your organization. Allowing those teams input before the change is made, or even the proposal for changes accepted can save you many headaches later on. The size and complexity of your change management system will naturally be tied to the size and complexity of your organization, and other software management processes that you employ. Number six. Use an independent DNS provider. A high quality DNS system requires more than configuration management. It requires a high quality operational environment as well. Many of your existing service providers may provide DNS services that you can easily and inexpensively leverage. In particular, most cloud providers naturally provide DNS services and usually rather high quality DNS services. However, be careful using a DNS service that is provided by a company that provides you any other services, including other cloud services. The reason why? Well, during a service outage, the most critical tool you need to be operating normally is your DNS system. You need it to help you diagnose and repair most other outages. If your DNS system is also down, the length of your outage will extend significantly. The reverse is also true. if you are dealing with a DNS issue, the last thing you also want to be dealing with is an outage caused by another service in your application ecosystem. Avoid these problems by using a high quality DNS provider that only provides DNS services to you and nothing else. This allows you to isolate your DNS and problems with your DNS system, from any other service in your application, reducing the likelihood of a DNS related extended outage. And be careful, make sure the provider you select isn't dependent on service providers, such as cloud providers, that you are also already relying on. If AWS has an outage, you want your independent DNS provider to keep operating. That doesn't happen if that service provider is also depending on AWS. Now, some people run their own DNS systems. If you decide to run your own DNS, make sure you operate it using independent resources from the rest of your application. This means operating it in different data centers, availability zones and even cloud regions, than the rest of your application. Number seven. Separate internal and external DNS. Let's take that last point one step further. You have DNS needs that are internal to your company and external DNS needs that your customers depend on. Your internal DNS provides access to internal documentation, internal systems including email and communications tools and other internal processes and systems. Your external DNS provides access to your company's applications, products, and services that your customers depend on. Make sure these two DNS needs are handled by different providers. If your external DNS goes down, fixing that problem will be substantially harder if your internal DNS is also down. This is part of what took Facebook so long to fix their application when they went down in October of 2021. There external DNS went down and they couldn't diagnose and fix the problem easily because their internal DNS was also down. And conversely, if your internal DNS goes down, you don't want that problem to bleed out to your external customers. Using different providers, along with different DNS configurations and configuration processes is extremely valuable to avoid these sorts of problems. And lastly, number eight. Duplicate your DNS in another provider. Let's go one final step further. Set up your production, DNS using two different providers, use one as a primary provider and the secondary is a backup provider. This way. If your primary provider goes down, for some reason, you may be able to switch your production DNS over to your backup provider quickly. The backup provider should have a complete, operational and fully tested copy of your DNS configuration set up and operating. So it can be put into play quickly if needed. This process will be easier if you have implemented the automated deployment processes, we talked about previously. This automated process can help assure that you keep your changes in sync between your primary and backup providers. The worst thing that can happen is for your primary provider to go down, you switched to your backup provider, but you end up with an incomplete or incorrect DNS configuration because you haven't tested your backup provider setup. DNS is a critical system that should be designed for high availability and reliability from the start. You also need to think about security when designing your DNS infrastructure. Make sure you have redundant systems in place, and that access to your DNS system is tightly controlled. Finally monitoring DNS is critical to ensuring your system continues to run smoothly. You need tools that will alert you if problems occur so you can take steps to mitigate the impact as quickly as possible. DNS outages are common occurrences, but they don't have to bring your entire company to a standstill. By using the proper processes and tools. You can minimize the impact of any outages and keep your business running smoothly.