I want to talk about why database migrations are an anti pattern. I’m a software architect, principal architect at Twin Health. I’ve been a software architect for many years. Right now I’m principal architect at Twin Health. While I was architect at my previous startup, Heartbeat Health, my Devops engineer was saying, you know, he didn’t understand why we needed to do any database migrations.
Now, up until then, I’d follow the standard pattern where you have a series of migration files, each of which indicated a change to the database that you need to apply when you deploy some code. So a database migration for those of you who aren’t technical and are still watching this video, is a little bit of code that changes the format of the data that’s in your database. So the way this works is if you’re deploying new code that expects things on the database to look differently, they’re structured differently, it would automatically apply the changes in the code. Also kind of in your same code base. When you execute the code, the system works fairly well, except it runs into problems when your system is starting to scale with a lot of engineers, it also has the problem that sometimes the change that you want to apply to your database take some time to actually propagate. This can also cause some downtime, which means when you apply the change to your code, you may need to shut everything down temporarily, do the migration, and then restart your application or service.
There are many other problems associated with this way of doing things. For example, what if two different engineers are working on two different migrations, which are in conflict? You could end up having a situation where the migrations aren’t really applied correctly. So this Devops engineer told me, hey, why are we ever doing migrations? And I thought, I wonder what, why do we do migrations? Do we really need to do this? And his idea was that we shouldn’t have to do database migrations whatsoever. There shouldn’t be any downtime. So when I thought about this, I realized it’s actually possible to do this, and you just have to be a little bit careful. I’ll talk about what we ended up doing, which worked really well.
So instead of storing data in a standard SQL format, we stored it in an event that format. So basically the schemas were always the same. We were storing things in an event sourced way where you just store the changes.
For example, it could be like, let’s say you’re building a ticketing system, you could say, we’re gonna create a ticket with this title, or then another event might be we’re gonna set the description of the ticket, or we’re going to assign this user to be the owner of this ticket, or we’re gonna add a comment to it, or we’re gonna add an attachment and so on. So each change you make is an event. The events are just stored in a simple table that has like all the events related to tickets. And the payload of the event is basically a kind of no SQL JSON document of some kind. If you want to add new fields, you just add new things to your JSON. You don’t have to do a migration whatsoever. You just design your code so that it can be both forward and backward compatible. If a old code is reading your new event, but it doesn’t understand some of the fields that you’ve added in. You have to design it so that when you add those new fields, it doesn’t cause a problem for old code that doesn’t know about them. So that means your code is forward compatible from the past and it’s also backward compatible because old code can read your new, you know, or new code can read the old code. It can see that it’s missing some fields. It just puts in meaningful defaults.
Now, what if you can’t do this? If you have to make a breaking change, then what we would do is we would create a new event type. The new event is a completely new schema, whatever you want it to be. And then we would simultaneously write both the new and the old events so that if we have to revert our code back to, you know, older version or get rid of the new code, the code would still work because the old format events would be there.
The beauty of this approach is that you can deploy code in any order you want. You could have have, you could be deploying, let’s say, a multi pod Kubernetes cluster and some of the pods could be running different versions of the code and would still work perfectly. As the pods eventually get replaced and you have all new pods, then they’re all writing the new code only after you’re absolutely sure that there’s nothing else that’s gonna be running the old code, would you stop writing the old event types? But in general, you don’t have, there’s no rush to do that. You just keep writing the old event types for days, even, or even weeks. It’s just wasting a little bit of space in the database.
But everything still works perfectly well. There are a lot of other schemes for doing this out there. I highly recommend looking into this topic because it really is not necessary to do database migrations the way we’ve been doing them in the past. Using some kind of no migration approach provides a huge number of benefits. You don’t have to worry about downtime. You can deploy things in any order. If you have a distributed system with a lot of horizontal scaling, then it all works perfectly. You can just upgrade, you know, parts of the system as they go and eventually it all gets upgraded and everything’s fine. The key is don’t break backward compatibility and don’t break forward compatibility. This is not that difficult to do. It’s actually much simpler to do.
Then I thought when I when my Devops engineer told me, told me about this idea, he didn’t invent what I just said. That was actually my idea, the way we structured it. But he did tell me, you know, why do we have to do migrations? And I was like, maybe we don’t need to do it. And we came up with this approach, right? I think there are a lot of different ways you could do it. But regardless, it’s, I highly recommended.