Welcome to Modern Digital Business!
Oct. 18, 2022

Tech Tapas Tuesday-"A little bit of tech": Flying Two Mistakes High

What do model airplanes have to do with avoiding application failures?

This is Tech Tapas Tuesday, a "little bit of tech".

Today on Modern Digital Business.

Useful Links


About Lee

Lee Atchison is a software architect, author, public speaker, and recognized thought leader on cloud computing and application modernization. His most recent book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee has been widely quoted in multiple technology publications, including InfoWorld, Diginomica, IT Brief, Programmable Web, CIO Review, and DZone, and has been a featured speaker at events across the globe.

Take a look at Lee's many books, courses, and articles by going to leeatchison.com.

Looking to modernize your application organization?

Check out Architecting for Scale. Currently in it's second edition, this book, written by Lee Atchison, and published by O'Reilly Media, will help you build high scale, highly available web applications, or modernize your existing applications. Check it out! Available in paperback or on Kindle from Amazon.com or other retailers.


Don't Miss Out!

Subscribe here to catch each new episode as it becomes available.

Want more from Lee? Click here to sign up for our newsletter. You'll receive information about new episodes, new articles, new books and courses from Lee. Don't worry, we won't send you spam and you can unsubscribe at any time.

Transcript
Lee:

What do model airplanes have to do with avoiding application failures? It's Tech Tapas Tuesday, let's go. I learned to fly radio controlled airplanes when I was a kid. And one of the most important rules. I remember was always keep your airplane, at least two mistakes high. You see when you're learning to fly a model airplane, especially when you begin to attempt acrobatics, you learn this lesson quickly because mistakes equal altitude. You make a mistake, you lose altitude. As you can imagine, losing too much altitude makes for a very bad day for you and your airplane. So what does this have to do with avoiding application failures? Well, keeping your plane, at least two mistakes high means staying high enough so that you can recover from two mistakes made at the same time. Imagine you're flying your plane and you make a mistake. You lose altitude. While you're trying to recover from the mistake, you have to do a number of tricky maneuvers, such as trying to level the plane out, slow it down and turn it into the wind. These are critical tasks you need to perform to save your plane. What happens if you make a mistake while you're performing those tasks? You need to make sure you are still high enough so that the second mistake doesn't result in a crash. The same rule of thumb applies when building highly available high-scale web applications. Say your application has a problem and your website goes down in the middle of the night. After getting paged you find yourself in a war room, what the impacted developors, product owners and other team members trying to figure out what to do. You try one thing, then another, than another desperately trying to fix the problem that caused the application failure in the first place. This is a high stress situation. One in which it's easy for people to make mistakes, including potentially catastrophic mistakes. I was once in one of these war rooms, when an engineer suddenly put their head down on the table and moaned, oh no! You see, the engineer had just typed a command that was designed to fix a problem, but instead of typing the correct command, the engineer type the command that caused a major failure of a critical database, making the entire situation substantially worse. It was at that moment that our struggling model airplane, our entire company's application, our entire reason for existence as a company was in serious trouble and headed for a crash. So how can you make sure you're keeping two mistakes high when you're running a modern digital application? To help avoid damaging application failures, you can start by making sure you have processes, rules, and procedures to use during critical problem scenarios that are designed to help the situation without introducing even worse problems. For example. First during critical downtime responses, don't allow a single lone engineer to execute commands on any production system. Overly stressed engineers can make simple mistakes that can lead to even bigger problems instead require that all commands are reviewed by at least one other engineer before they submit the command to be executed. The simple two-step process can help your team avoid making catastrophic mistakes. Second, create standard processes and procedures for solving various common problem scenarios. Often these are called playbooks, or runbooks make sure to use these playbooks during critical period. This gives everyone clear steps to follow and reduces the likelihood of your team. Making additional mistakes. And finally, look for and avoid cascading or double dependent problems. A double dependent problem is a set of problems that combine to make the situation worse than any of the individual single problems were themselves. It's like leaving your garage door opener in your car in the driveway overnight, then forgetting to lock your car door. Either of those two mistakes by itself, isn't a big problem. But when they occur together, you're inviting big trouble. Finding double dependent problems can be a challenge, but when you do locate them, they're critical to resolve since they can cause small problems to become large problems quickly.