Welcome to Modern Digital Business!
Jan. 12, 2023

Testing at Scale with Nate Lee, Co-Founder of SpeedScale

Modern businesses rely on applications, and they rely on continued innovation in those applications to drive their business.

This strive for innovation creates a need for improved techniques for validating that an application will work as expected. But constant innovation means a constant chance for problems, and testing applications at scale is not an easy task. This is where SpeedScale comes into play. SpeedScale assists in stress-testing applications by recreating real-world traffic loads in a test environment.

Today on Modern Digital Business.

Useful Links

About Lee

Lee Atchison is a software architect, author, public speaker, and recognized thought leader on cloud computing and application modernization. His most recent book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee has been widely quoted in multiple technology publications, including InfoWorld, Diginomica, IT Brief, Programmable Web, CIO Review, and DZone, and has been a featured speaker at events across the globe.

Take a look at Lee's many books, courses, and articles by going to leeatchison.com.

Looking to modernize your application organization?

Check out Architecting for Scale. Currently in it's second edition, this book, written by Lee Atchison, and published by O'Reilly Media, will help you build high scale, highly available web applications, or modernize your existing applications. Check it out! Available in paperback or on Kindle from Amazon.com or other retailers.

 

Don't Miss Out!

Subscribe here to catch each new episode as it becomes available.

Want more from Lee? Click here to sign up for our newsletter. You'll receive information about new episodes, new articles, new books and courses from Lee. Don't worry, we won't send you spam and you can unsubscribe at any time.

Mentioned in this episode:

LinkedIn Learning Courses

Are you looking to become an architect? Or perhaps are you looking to learn how to drive your organization towards better utilization of the cloud? Are you you looking for ways to help you utilize a Cloud Center of Excellence in your organization? I have a whole series of cloud and architecture courses available on LinkedIn Learning. For more information, please go to leeatchison.com/courses or mdb.fm/courses.

Courses by Lee Atchison

Transcript

Lee:

Modern businesses rely on applications and we rely on continued innovation in those applications to drive their business. But as these applications evolve and become more complicated, testing them also becomes more challenging. Testing applications at scale is not an easy task. Today, we're going to look at a company focused on easing the burden of testing applications at scale. Are you ready? Let's go. Modern businesses rely on applications and they rely on continued innovation in those applications to drive their business. This strive for innovation creates a need for improved techniques for validating that an application will work as expected. But constant innovation means constant chance for problems and testing applications at scale is not an easy task. This is where SpeedScale comes into play. SpeedScale assists in stress testing applications by recreating real world traffic loads in a test environment. Nate Lee is co-founder of SpeedScale and he is my guest today. Nate, welcome to Modern Digital Business.

Nate:

Thanks, Lee. Glad to be here.

Lee:

You know, I think we finally have this worked out, after a couple of delays and an internet outage, I think we're finally going to do this podcast, don't you? What? What do you think?

Nate:

no, it's, it's, it's always exciting, uh, eventful, uh, leading up to something like this. But, uh, yeah, with the power outages and, um, we're, we're recording this, uh, between Thanksgiving and Christmas, uh, the holiday season, and everybody's kind of hectic. Um, a lot of our customers are retail, so they're going through code freezes and trying to make sure, you know, hold their breath. Tap their head, you know, rub their belly to make sure nothing goes down a critical time, but that, hey,

Lee:

I remember those days at a, in Amazon retail and, this time of the year was always a, a very busy time and yeah, a lot of holding your breath. You, you didn't do much change, but everyone was really busy. It was a very busy time.

Nate:

Yeah. Yeah. Actually, uh, I got, I got a funny story about Amazon and the holiday rush period we were talking to, um, I think it was heavy bit, um, one of the venture capital firms that kind of specialize on, on Kubernetes, um, dev tools. And, um, one of the gentlemen were telling us that, um, were, they were at Amazon working on SRE stuff and, and we're like, how are we gonna get ready for the holiday season? Like we, we have to run like a gigantic load test. And it, it kinda speaks to the genesis of SpeedScale, right? It's very difficult to run these sorts of, um, high traffic situations without a perfect carbon copy replica production, right? Because, you know, a lot of, lot of the load and whether can I handle it or not is, is critical on, on having production like, uh, hardware. They said, well, what if we run a gigantic sale? And, uh, we can basically just simulate what we're gonna be encountering in production and during the holiday season. And so they were like, yeah, that's a good idea. What are we gonna call it? And they decided to call it Prime Day. So when you have Amazon Prime Day, which is, it's pretty, pretty big deal, right? Um, that's really just a veiled dress rehearsal for, uh, black Friday season and Christmas holiday shopping. But you. Like, like a few of the ideas that that Amazon's put through, it actually ended up being a huge barn burner of an event.

Lee:

Yeah. Prime Day came after I left. I left Amazon in 2011, I think.

Nate:

Okay.

Lee:

Well, well, definitely one of the things we always used to do is we always, um, had, test days where it's like, what happens if we take this data center offline and. In what happens when we cut this cable? We do that sort of testing in production the time. The theory was everything should continue to work at scale. What's no issues whatsoever. But we had to it in production. You know, it's, it's the only way to, um, to get that volume of traffic until we have someone like SpeedScale. Why don't you tell us a little bit exactly what SpeedScale is and what it does.

Nate:

Yeah, so, so SpeedScale's a production, traffic replication service, and, uh, we help engineers simulate production conditions using actual traffic. Um, you know, it's, there's kind of been a long history of these sorts of tools. Um, you, I think you were referring to Chaos Monkey. That, you know, the army, I think it had come from the Netflix days where they were randomly executing these daemons to take down services and then seeing what fails. And then of course, Gremlin's got a productized version of, um, specifically focusing on chaos, right? Running these game days. Um, and experiments to take down, um, aspects of the servers. And I think they're tiptoeing around how do I, how. Run these experiments, but also not affect production. Right? Uh, but SpeedScale's approach is slightly different, and we actually capture the traffic and then allow you to run that traffic in a safe manner lower environments. another way to think about this is shifting left, uh, what you're, what you're gonna encounter in production, but do it in a safe way in these, in these lower environments.

Lee:

So you record production traffic and then replay it in a staging or a test environment

Nate:

That's right. A lot of this is possible now because of the advent of cloud environments, right? You can spin up these ephemeral environments and was always a promise of cloud was You know, just use what you need and uh, and, and, and spin up these environments at a moment's notice. And, I think the reality of it is, well, these environments are expensive. Uh, they, they actually can skyrocket and cost and they don't actually stay up ephemerally, we end up keeping 'em on for long periods of time. Right. Uh, and, and people, uh, are actually, especially given the current economic state, are looking for ways to reduce our costs.

Lee:

Your customers really are building modern or have modern applications, modern application development. I'm talking about things like. Cloud native applications, they're in undoubtedly cloud-based applications where they can do these replicated environments, um, a a lot easier. So in, in that sort of mindset, what challenges do you find exist for your customers in managing those applications? What are the, some of the problems they come to you with?

Nate:

Yeah. You know, um, I think that's kind of the key, um, qualifiers. What do our customers come with there? There are a variety of challenges in developing. And modern Cloud, you know, security is always of paramount concern and, um, know, making sure that, uh, scale is proper. But our customers typically are coming to us with the specific challenge of environments, and, and that's something that's, um, been, been kind of a common threat that we've noticed. Um, Environments themselves aren't the issue. Um, when I say environments more specifically, I mean the data and the, and the downstream constraints of those environments. So, they can always spin up just a carbon copy replica of production and, and a full end-to-end environment at a lower scale, right? Um, but even if you do. The problem is that, uh, a it's expensive cuz there's so many moving parts and, and databases and stuff like that. b, if it's not seated with the proper data that they need in order to exercise their applications, it's really quite useless. And and that's where the challenge exactly. So, so how are my clients hitting my app that I'm trying to test and uh, how does my app send these downstream calls? To, to the third party backends or the, or the, the systems of record or the other internal APIs. And, and what do those systems need to be seated with data-wise in order to respond accurately? So capturing state managing item potents, it becomes a huge headache actually. And, um, That, that's one of the reasons why we had developed SpeedScale, is we want the engineers to be able to come into a self-service portal and understand, okay, what does my app do? Like how does it behave currently? And then how do I recreate this situation? Um, in, in a cloud native environment without a lot of hassle. The current state of the art is usually. Using a conventional tool, like, uh, something that can actually the transactions. And, and on a very simple level, it could be something like Postman or Insomnia. Um, a more sophisticated level, maybe you're, you're replaying large, large, uh, reams of traffic using something like K six. Um, But again, what we hear typically is going on is you're doing those sorts of transactions and exercising your application in a full staging environment where everybody else is using it at the same time. Right? And so you don't know if somebody's pushed their alpha version of an application in and you're getting these. because somebody is, you know, doing some tests at the same time you are, or if you truly do have a bug and you should be paying attention to it and fixing it. Um, and, uh, yeah, so, so specifically backend environments, the right source of data, and then also simulating the inbound calls into your application. Those are the challenges we typically see, um, in, in modern cloud development. And, and it's really about having the. Um, if, if you're focusing on just one area, or one type of transaction, like, you know, gold medallion members, when you're really trying to test platinum medallion members right, you could be missing a lot of code coverage.

Lee:

So I imagine the typical QA development environment is kind of what you were describing, what kind of chaos is going on because everyone's doing everything in it but you know, in a, in like a full C I C D pipeline where you might have a, let's do a validation at scale test as part of pipeline. I imagine in that case, um, you. Could spin up, you could afford to spin up for a short period of time, a full fledged production environment, use something like SpeedScale to, to um, to execute, to test the environment at scale, to make sure nothing works as not anticipated. But I imagine the problem with that sort of scenario though, is as you're making deployments and making changes exactly what the script is from SpeedScale, the scripted. Traffic that you're getting in will change as time goes on. How do you keep that up to date? do you constantly take new scripted traffic and replay that? Is that how you do

Nate:

Yeah. Yeah, yeah, So it's really kind of shifting the paradigm. So the, the way SpeedScale was developed, um, we've all got backgrounds in companies like New Relic and observing and, It k o that really kind of founded the concept of service virtualization, which is a fancy way to say service mocking. But with that background, we inherently understood that it's really slow to, uh, develop these scripts. So we don't actually take a script based approach in running this traffic. What we actually do is, We run traffic snapshots. So what we're doing is since we are capturing all this traffic, we develop a snapshot, um, and generate things. One is the inbound traffic. We generate like a script, if you will. It's really just a JSON snapshot file, is what we call it. And there's no actual scripting involved. It's auto-generated from real traffic. Uh, a key point in this, uh, for the listeners is we are redacting PII we are capturing the traffic, cuz you don't wanna be, you know, spewing, uh, sensitive information, uh, when you're replaying it. So data loss prevention is actually a very big piece of this. Um, but anyways, so the snapshots are auto-generated as well as, From the traffic, we can kinda reverse engineer what backends you need in order to exercise a particular So, so not only do we, um, generate a traffic snapshot and you can replay of the inbound traffic, but we also generate a mock server in a pod, if you will, that mock server in a pod can be spun up. And what this really does is, is vastly uh, Narrow the scope of the environment that you need to spin up. So we're actually just spinning up n plus one and N minus one. We're spinning up your API and only its neighbors instead of the whole full, full-blown end-to-end environment. And so it's like a little microcosm of your API, but your API for all intensive purposes thinks it's in a fully integrated end-to-end environment.

Lee:

But you're essentially doing service by service level versus an application level. So you're not really, you're not. Scripting user traffic into the system, you're scripting traffic into a particular service in and out of the service and, and the data that goes with it. So you can, you only have to bring up the service and what's around it, and you don't have to bring up the entire application.

Nate:

Well, you really only need to bring up just the app and we're, and SpeedScale's taking care of the rest really. Yeah. Um, so we're scripting all the inbound traffic for you. There's no scripting involved. You basically, we, we have what's called traffic viewer and you use that to browse the type of traffic you want to invoke. Um, and once you select the traffic that you wanna invoke, we basically take a look at all the traffic around it and say, okay, well when you run this call inbound, As a result of that, your application calls, you know, a Dynamo database and then these other two APIs, and then you make a call to a third party, I don't know, let's say Stripe or Google Maps or something like that. And so we will automatically generate a mock server, uh, based off of reverse engineering, how your app works and make sure everything is there that you need. Um, so yeah, it, it, it's, you, you got it. The, the concept is we, Virtualizing your neighbors so that you can do consistent scientific, uh, dry runs of, of your API as part of ci and, and, but it's also a huge reduction in cloud costs cuz you're not spinning up a big end-to-end environment of literally everything that is included in your app every time. And, and, and to be honest, that's also sometimes not possible. Because of the connections that you do have to third parties, almost everybody integrates with like a payment or maybe like a background check organization or, a mapping or a messaging solution that's a third party. And so, so many times, uh, these wires that hang out at the cloud, uh, as I call them, uh, those are difficult. To simulate, uh, you have to call the vendor and ask for a sandbox. And if you wanna do a load test, forget it. You know, that's not gonna go to a hundred tps, right? They're just standing up the sandbox to give you, you know, ones Z two Z transactions, whereas,

Lee:

uh, performance testing or anything like

Nate:

exactly. Exactly. But if we're simulating that using, um, traffic to, to auto-generate a mock server and a pod for you, uh, the, the, the possibilities go up.

Lee:

Cool, cool. Now, I, I can see how this works for APIs. Um, and you can include database, AC activities such as to DynamoDB, uh, DB as an API call essentially is what it is. But what about native databases or native data that's stored in the service? You know, like MySQL database might be part of the service or, or, um, cash or Redis cash or something like. Do you simulate those as well or how do you, uh, what do you do in those cases?

Nate:

Yeah, so, um, for tus we can actually, uh, see the traffic going through it, but we can't simulate it, um, for other data sources. Um, we do have ongoing support developing for like things like Postgres and MongoDB. Um, we've got the full list of supported technologies on our documentations page, which is docs SpeedScale.com. But. Really the beauty with, um, being able to provision these, um, backends, if they're API based, right? Usually it's all fairly standard. If you communicate to a system of record via API, we can also handle that something like elastic search, for example. Uh, but if it is a local data source or, or something like MySQL, which, uh, sorry, uh, MS that's got like a proprietary, um, non-open standard, you would probably wanna provision that locally. Uh, by yourself as part of that, uh, kind of simulated microcosm. So, um, with most of these cloud native environments, you can specify either like, you know, the environment script or, or, or, or the, the yams to properly stand those things up. In addition to the SpeedScale simulations.

Lee:

Makes sense. Let's talk about resiliency a little bit. You know, resiliency is, it's an, it's an interesting aspect when it comes to cloud-based applications, you know, because, building the DNA of the cloud is the cloud is designed to break, right? I mean, the whole fundamental aspect of the cloud is if a service server isn't working right, just terminate and restart it. And, and that mindset extends throughout the entire cloud ecosystem where everything is designed to. You know, with, with retry, with, with a redundancy built in so that you can lose components, components can go away, come back, and your entire system as a whole continues to work. What does SpeedScale do to help with that sort of resiliency testing? Are there ways you can simulate those sorts of, of environment?

Nate:

Yeah. Yeah, to an extent. I mean, uh, well, first of all, before I ju jump into that, I, I think, um, a lot of people have kind of a false level of comfort with the, the resiliency that's, that's inherently built into the cloud. I think what people realize is, oh, look, the, the, the, you know, The startup times of the Lambda serverless instances are actually quite long. And how do we get past that right? Or hey, horizontal pod auto-scaling rules actually take quite a little while to understand that, hey, a pod is down and then spin up another pod. Like it, it waits and it retries a couple times, and meanwhile, you know, you're bleeding thousands of dollars, uh, because you know, your, your mobile ordering app is down. So, I think it's a little bit of false sense of comfort, or, or protection. And um, that's what we can really help simulate. And, and what we do with that is, again, it's, capturing traffic, um, in order to understand how users run your application. But once we do have that traffic, engineers can multiply that. Um, and, and empowering the engineers to run these what if scenarios? Like, what if I had a hundred x traffic, or what if I had, you know, a thousand x traffic for 30 seconds, um, and run more of a soak or soak or sanity test. Um, this is all things that are available with the few mouse clicks once we have that baseline of traffic. Um, and the traffic captures kind of how your, your application is exercised as well as. We've got the necessary backends ready to be spun up in a mock server. So it's kinda like a turnkey simulation that you can run. and so when people do have DR rules or HPA rules, um, they can actually verify that things are going to, to fail over as expected or scale as expected. Another aspect within resiliency that that. simulation can help catch is, your resource consumption. So if you're making logic changes to your, your services and, um, or, or you make this calculation change and for some reason, let's say it causes CPU to skyrocket or it causes, you've got a memory leak in your code and it, it begins to r rise over time. The state of the art in catching issues like that really is to, to just go ahead and release and then pay really close attention to Datadog or New Relic or AppDynamics, right? And, and rely on those observability tools to give you an early warning. And then it's kind of all hands on deck reacting or, or trying to shut down that pod over and over again, whatever it starts creeping up. Those sorts of changes can be actually proactively caught by, you know, running these traffic simulations. So by simulating the inbound traffic in the mock server pods, those are your controls. And really the only thing that changes is your application as you make changes. And that's another kind of reason to use these crowded, um, chaotic staging environments is because there's so much noise in the system and other people are doing things and. Staging can break quite frequently, and I know you've written actually about this,

Lee:

Yep.

Nate:

and And so that's another kinda argument to using these, um, production simulations in a very kinda sterilized lab environment, if you will. And you know, the only thing that's changing is your code. So it's, it's a way to consistently iterate and experiment, make changes. And so that's another way you could improve your resiliency. Um, you can make sure that you're optimizing all the resources at hand and you're not, you know, irresponsibly, allocating memory, and then just hoping that horizontal auto scaling rules or the cloud scalability will cover for you. Right. You might not be economical with your code.

Lee:

Right, right. That makes sense. You, you can also do controlled failures too, right? You can do game day testing, if you will, during these simulation runs, so you can see what You know, your, your, your normal traffic works fine, but what happens if three servers go down while that's going on? The DR rules you're talking about certainly cover that, but, but this is kind of a way of interject what if scenarios and to get even useful information that you can feed back the development org about, Hey, it didn't quite work the way we expected to in this scenario. What if we changed the rules a little bit and adjusting so it's higher likelihood of success.

Nate:

That's right. Yeah. Um, we can, we can, you know, generate the inbound traffic into, you know, just an API. But you can also just use that in isolation. You can use our traffic generation capabilities to hit you at the front door like an ingress or an API gateway and test your entire application so you can actually piecemeal out the solutions, um, which is like, you know, we've got the traffic generation piece and the mock server piece. some people spin up our mocking pod just leave it on full-time because they need, uh, to, to simulate the third party components. That's the cool part about having the traffic patterns as a snapshot is once we do have the traffic, we can play with the traffic we can start to slow things down. So we can say, Hey, we're mocking, stripe. what if Stripe goes down? Then we can just tell that traffic replay to be a black hole not respond. We can also tell it to respond with 22nd latency. Um, and then you can start check. my application time out gracefully? Uh, does it wait the whole time? can also speed up the traffic. I've actually heard of cases of applications failing because the back ends get improved and they start responding faster, and then your application then becomes the bottleneck and starts crashing.

Lee:

So even as development tool, right? This is, um, you know, when you're, when you're trying to build your application and build the resiliency in, or you're trying to build, what, if scenarios in, you can take the scripted language in your development environment. And, and fool around with it and do different things there. I, I'm assuming these are all rational use cases for speed that correct?

Nate:

that, yeah, exactly. They're out of the box kind of. And then, and again, just to, just to reemphasize, uh, while under the hood we are developing, you know, JSON and Scripps and stuff, uh, no scripting involved. It's literally just a UX dashboard where you peruse all the API level calls that we've been picking up and desensitized. And you can see basically the ins and outs of all the traffic of a particular API you're trying to test. You tell us, Hey, I wanna generate a snapshot, and I wanted this snapshot to have this set of inbound traffic that you're gonna rerun, and also this set of mocked traffic that I wanna run, and you get this kind of turn. Ephemeral environment, lab environment that you can run over and over again. If production happens to update, then you can just go out and grab another side of traffic, right? The paradigm's completely changed. Now. There's no scripting involved. There's no maintenance of the script and updating the script, like normal kind of testing organization to take a look at it. It's literally. out and grab a new snapshot. Wait two minutes for it to be auto generation, and then run that new snapshot. Or it can be automated via GitHub or API call. You can say, Hey, grab the last 15 minutes of traffic, run it again. Uh, and, and it can all be done as part of the CI pipeline as well.

Lee:

Yeah, so one of your use cases is, like you said, C I C D pipelines, another use cases development. Another use case I'm assuming is uh, QA departments who just want to see what happens if scenarios and they just poke around and. Make changes dynamically just to try to see what's going on, whether that's a QA department, I, as I said, or if it's the development organization going through a QA process. It doesn't matter, but a, it's a step to validate. So I imagine, so those are like three distinct use cases, an automated pipeline, a QA doing random testing and development, using it to harden the application, or even in part of the development process itself, are there. Use cases that are not represented by those three that this, this, this is useful for.

Nate:

Yeah. Yeah. There's, um, within those three use cases, I mean, I guess you could, uh, break it up into specific. Phases of testing. I mean, it could, the traffic replays can really be curated in a way, uh, where you're checking for functionality or contract changes, right? Um, you can look at it more as like an integration test. You can also multiply the traffic and look at it more as a load test. So that's where the concept gets interesting is load testing at a regular interval as part of ci. Um, so I've heard people call it performance assurance. Uh, I've heard people call it continuous performance testing. Um, once you integrate, and, and really the linchpin to all of that is the mocks because when you're doing load testing, typically everybody has to be finished with their application code, like their, their particular piece of it, right? And then they have to curate it at a performance environment that's, you know, one 10th the size of staging so they can extrapolate the results and multiply it by 10. Now if we're mocking the back ends and they're performing and they can do a thousand t ps, um, then really that constraint goes away. and, and now you can understand, well, this one piece, this payment API, or this fulfillment API I'm working on needs to go up to 800 transaction per second. You can do that, without having to wait for the full end, end environment without having to tell the dba, Hey, I'm gonna be hammering the database. You know, please don't get mad at me, kind of thing. And so that can all be done and, in, a self-service way. Now you've written about like all these different microservice teams that are disparate and, and siloed, and they, but they all have to be communicating tightly, right? And you've written about the ability for them to have some sort of self-service way to understand how they interconnect with every. And also understand the integrations and then spin up these environments and, and so SpeedScale literally does that, is allows somebody to jump into this API or that API view the traffic and we'll show them service map. Then they say, well I run this, I exercise my application. They can actually just grab the traffic that's relevant to them. Um, and so in that way we've actually beyond just like the CI and the development enablement and then the, the QA testing kinda what if testing that they can do, can also take that traffic and point it at different endpoints, right? So they can actually do performance benchmarking. One of the stories that we've had from a customer is like, you know, they were graviton, Google came out with a new graviton process. And they were like, well, is that really gonna be any faster than what we're currently on? And so they were able to benchmark like, well, this is business as usual traffic. Let's test on the Google Graviton processors. And they did find out that there was like a X percent faster throughput. Um, so they ended, yeah. So you can use it to, to benchmark in a conventional load testing sense. Um, there's also the, the use case that I call parody testing. To check for parody and, uh, when you're doing migrations like from e c two to Kubernetes, if your application fundamentally is gonna remain the same, but you're just re-platforming, you could capture business as usual traffic coming into your e c two app. And then once you're done platforming, like moving to Kubernetes, you can do a sanity check. And before you. Fork all the traffic over and kind of do the grand opening. You can run the old traffic that you would normally get to e C two, run it against Kubernetes platform and say, Hey, am I getting to the same response times? Are things scaling properly? Do the functionality get preserved as we moved over? the last, just the last piece actually is, um, In particular, when you're doing like docker environment development. That you can run like Docker locally on your laptop or you docker compose, Minicube, micro Kates kind. so all of these concepts, all these mock server pods, traffic generator pods can actually be spun up locally on your laptop. So now, an argument for like, Hey, I don't need the full blown end to end environment. I can just simulate my neighbors, get SpeedScale to generate those pods and then run them locally on my laptop.

Lee:

It's one of. Biggest complaints I hear about microservice architectures is the development, the laptop development environment is so difficult to set up and manage and this, this is a tool that'll help make that a lot easier.

Nate:

Yep. Yep. We

Lee:

done offline as well, or is it still only an online tool?

Nate:

So we're about to launch a command line version of this that is, uh, is it, it doesn't require an internet connection. So it'll be free and you can generate those pods and then run them locally. In like a mini cube environment, something like that.

Lee:

So you talked a little bit about what motivated you to start SpeedScale but why you, why,

Nate:

why did you start SpeedScale? You were writing device drivers. And, uh, Matt had actually developed a, uh, it was basically like a, what would we call it? Um, like a visual kind of driver development kit, that allowed us to. these drivers more quickly. And then Ken developed a simulator that would, it was like kind of like a stub code harness that you could drop the driver in and it would test the out, the input and outputs to it. So all three of us have kind of been in this mindset of like, you know, better testing, faster development, and those two got into the observability space first with Wiley. Uh, and then New Relic and Observe Inc. meanwhile, I kind of took a different path. I, I had actually, um, been with itt, k o, actually Ken worked at itt. K he's the one who pulled pn, but it, k o had developed this concept of service virtualization, and that was back in the so days. there's just a huge mix of like legacy queuing technologies like MQ and TIPCO and amqp. And then there was just, you know, SOAP services were just becoming a thing. So developing these service mocks was a hugely complex affair. You had to, you know, redirect WA servers and bounce them and, and do a lot of networking to get these mocks up and running. Um, and we had always been, uh, Enamored with the concept, but really dissatisfied with the process of developing these service marks cuz done properly, they're a huge enabler. they're a huge value add cuz they can accelerate, um, the dev process. You can develop in parallel, you can simulate all these conditions and so on and so forth. But service blocks kind of. A bad reputation because you usually have to hand script the responses one by one. If you want a backend to simulate whatever, you have to seat it with the right data. It has to be onesie, twosy, programmed to respond. So, uh, this is a long-winded way of saying, um, when the cloud came about Kubernetes and cloud data warehouse storage, realized, oh, we can do this very quickly. There's proxies, uh, there's always, there's already network taps that we can take advantage of, and then we can use the traffic to train the models. once the, the mock pods and, and the inbound traffic, uh, can, can be simulated, the rest of it is just an orchestration problem. And, you know, with Terraform Scripts, helm charts and yaml, all that stuff is pretty, pretty well known as well. So, It was, it was a matter of desire and background. And then the, the cloud data technology has actually just been a huge enabler, so,

Lee:

Yeah, and I've known Ken for many years, but, I'm so glad I, I met you guys and, and I'm really excited to see what you guys, are going to accomplish as, uh, as you go along. So the natural question that always comes up at this time of. Year is what's next year like, and so what, what are your plans for next year? What are you gonna do in 2023? And for what, what does SpeedScale look like in 2023?

Nate:

Yeah, you know, we've been working with some great partners in 2022 and really refining the ergonomics of the product and, It's been a huge kind of developer productivity accelerator. 2023, we're gonna release a, for a free version of SpeedScale. We know kind of what aspects that people love uh, we just wanted to be careful about understanding where the, the, the real, you know, exceptionally useful features are, and then what, what, what those features could be command line driven, which of those actually need like a full-blown ui? Um, so the freemium tool is gonna be mostly command ba, command line based. Um, but once you start needing re uh, you know, enterprise level things like single signon and more sophisticated, uh, redaction. and visual reports, that's when you would, you know, kind of have a paid tier. So we expect the free tier to be, uh, a great value add for engineers that, that need mocking and traffic generation. And then, there's also gonna be, A lot of, momentum around kind of publicizing SpeedScale from a marketing perspective. We help to, uh, really kind of listen to the engineering community and understand, uh, where we can provide the most lift and, uh, iterate quickly to, to develop those features in. But already we're, we're getting stories of, you know, taking two week load testing sprints down to three hours and improving API performance by 30 x. And we just wanna continue that.

Lee:

So, so if any listener is interested in learning more about SpeedScale, where should they go?

Nate:

Yeah, they could just go to SpeedScale.com. Uh, spelled exactly like it sounds, uh, one word. Uh, we also have a community on slack SpeedScale.com, where they can talk directly to the founders or the engineers, ask questions. Um, and then if you go to SpeedScale.com/free-trial, they're able to, to download the product and try it, um, locally, um,

Lee:

And I'll make sure those links are in the show notes as well too, so people can see 'em there. So, great. So, um, anything else you wanna add before we, we, uh, we wrap it up here and, uh, we managed to make it all the way through the episode without losing the internet again. That's, that's fantastic.

Nate:

Yeah, I mean, wouldn't No, no. That was it. Always a pleasure to, to talk and, uh, you know, um, kinda commiserate over the technical problems of modern Cloud with you, Lee, it's always great.

Lee:

Definitely. I love to love talking with you, Nate. Thank you. my guest today has been, uh, uh, Nate Lee, who is the co-founder of SpeedScale. Nate, thank you very much for joining me on Modern Digital Business.

Nate LeeProfile Photo

Nate Lee

Co-Founder

Nate holds a BS in CS and an MBA focused on Technology from Georgia Tech. Most recently, he was in sales for the digital transformation consultancy Contino. Prior, he served as a Product Manager over an API mocking tool called Service Virtualization. The majority of his career he has spent time as a Presales Engineer (6 years) working with enterprise DevOps teams.