The Big Rewrite: How DockYard used database multitenancy for Ruby to Elixir data migration

This is the first in a series of posts detailing the process of migrating data in order to rebuild a legacy Ruby app from the ground up in Elixir.

In 2021, DockYard assisted a company facing scaling issues for a groundbreaking online service. During the startup phase their team pivoted, adapted, and grew. In the process, they gained the trust of their clientele, grew market and mindshare, and delivered a unique and valuable experience to their users. But after rapid growth as a startup, the service faced a sudden and massive increase in demand—and their existing Rails application couldn’t keep up.

Doing so much and so quickly left them with technical debt and unique problems to address. A critical issue was scaling for sudden and immense onrushes of user activity. They termed this the “thundering herd” problem: How do you handle enormous surges of activity all at once, while still delivering a top-notch experience? With their impressive, rapid success and the thundering herd problem, they found themselves in a very challenging position to continue their meteoric growth.

The company was left with a choice to make: Continue refactoring and making small improvements to stay above water in the short term, or overhaul the existing product to create room for scale and feature delivery in the long term. Both choices came with various pros and cons, but in the end they chose to partner with DockYard and make a leap with a complete re-platform in Elixir.

Spoiler Alert, it was a great success.

In this series of articles, we’re going to hone in on a specific aspect of the rewrite—the data. We’ll document the journey of migrating their data and how we were able to set up and simplify our migration path before we even wrote a line of Elixir.

Part I: From Apartment to Single Family Home

Ground Work

In envisioning how we would move data from the legacy Ruby on Rails app to the new Elixir Phoenix version, we met a challenging series of constraints. Tight timelines and an existing app that was crashing at scale meant we could not wait to solve the data problem until after we hammered out the new data model, much less after we finished the rewrite. Our first step was to decide what we could do to take a bite out of the current woes and set ourselves up for success with a smooth and timely launch.

The legacy app’s data model had a few issues we could address immediately. One was that they were using schema-based multitenancy to keep data siloed and the infamous Apartment gem from Influitive to handle both the siloing and subdomain mapping.

Architecturally, it was a bad fit. The legacy app ended up creating thousands of schemas, putting crushing weight on ActiveRecord, PostgreSQL, and their hosting provider. Attempts to backup the production database would fail. The gem itself had many bugs and was seemingly abandoned and disavowed by its creators, including Brad Roberton, ex-Principal Engineer at Influitive, years prior:

At this point, I cannot recommend going the Postgres schema approach given the headaches we’ve seen [using it in multi-tenancy applications].

This architecture was clearly not going to survive the rewrite. If we could remove the Apartment gem and shift to table-based multitenancy, we could remove a host of bottlenecks and bugs giving ourselves more runway to develop a scalable solution in Elixir.

So…what is this whole multitenancy thing?

Multitenancy

Multitenancy is an architectural concept where a piece of software running in an environment is structured in such a way to allow various aspects of it to be partitioned and islated. This could include data, configuration, processes, features, users, etc. A tenant is simply the idea by which these partitions are grouped. In this case we only needed to consider it at the database level.

There are generally three forms of multitenancy, each with their own pros and cons. In decreasing order of separation they are:

Database Multitenancy

Each tenant has its own database.

Pros:
- There’s maximal resource separation—high usage by one tenant tends not to affect the others
- It’s easier to scale in a targeted fashion
- Security compliance is much easier
- It’s possible to customize tables per tenant
- It’s easy to dump tenant data
Cons:
- Infrastructure/DevOps costs are higher
- Analytics are a pain to aggregate
- Aggregating data across databases is not performant

Schema Multitenancy

Each tenant has its own schema inside a shared database (this is what the client was currently using in this case).

Pros
- It allows you to utilize database roles to maintain security
- It requires a simpler infrastructure
- It’s possible to customize tables per tenant
- It’s easy to dump tenant data
Cons
- After a point, the more schemas there are, the worse the performance is
- After a point, the more tables there are, the worse the performance is
- Overhead of dealing with public and private schemas for a single set of results is higher

Table Multitenancy

Each tenant is isolated at the application level, usually by a foreign key on each table to be tenanted. The database, schema, and tables are shared.

Pros
- It’s easy to conceptualize—you’re just constraining results to an index
- It’s great for administration, migration, and DB level tasks like connection pools
- Analytics are straightforward, with no special aggregation steps
- It’s easy to mix tenanted and untenanted data
- It has the simplest infrastructure
Cons
- Query optimization can get difficult at times with the extra index
- It’s difficult to dump tenant data
- Bugs or naive queries (it’s easy to omit a constraint in code) in the application layer can easily pierce the isolation of tenant data

The Path Forward

The application needed a table multitenancy approach. I’m going to lay out what we did and how we did it, so that anyone else in this situation can save themselves potential heartache in searching through years of ancient Github issues and crawling through countless lines of code.

We should also get some nomenclature out of the way, so we are all up to speed. In this application, the public schema contains tables that are accessible to all tenants. Private schemas group and silo tables accessible only to the specific tenant. Conceptually, tenants are simply the owners of data. Often in table-based architectures, there’s a table in the public schema containing all of the tenants. This could be a user or an organization or something similar.

We divided this phase of the data consolidation project into two parts. Part one is what I lovingly call prefactoring and is what I’ll cover here. Part two is “The Great Migration”, where we remove the Apartment gem and consolidate the private schemas into a single, unified public schema. We’ll get to that in another post.

Prefactoring

What is prefactoring?

If you’re reading this you are likely familiar with the TDD concept of “Red, Green, Refactor.” Red is writing a failing test. Green is making it pass. Refactor is more or less cleaning up and/or optimizing your previous work. While this is highly effective, definitively knowing when to stop refactoring can become tricky.

Over the years, I’ve found that sticking the refactoring step at the front of the process gives a clear design pressure and scope. Prefactoring is asking the question, “What code do I need to change to make the task I am about to do easier?” In the case of this data migration we asked ourselves, “Is it even possible without change?”

Public Schemas

First, we needed to add a tenant_id column to each table in the private schema. However, this is where we met our first major roadblock: Every table’s primary indices were the Rails default integer type, which prevented us from collapsing the schema-based multitenancy to table-based.

The reason why is that, in collating a given table across multiple schemas, there could easily be (and absolutely were) colliding primary keys and, by extension, foreign keys. For example:

---------------- ---------------- ---------------- 
Schema: tenant_1 Schema: tenant_2 Schema: tenant_3 
---------------- ---------------- ---------------- 
Users            Users            Users            
---------------- ---------------- ---------------- 
ID  Name         ID  Name         ID  Name         
---------------- ---------------- ---------------- 
1   Bob          1   Alice        1   Sasha        
2   Ted          2   Claire       2   Leslie

As is often the case in software, it is best to do these things in bite-sized chunks wherever possible. That gives you space to gather feedback on decisions throughout the process. We decided to focus on the central Tenant Rails model in addressing the unique ID issues first. First we add the column:

def change
    add_column :tenants, :_uuid, :uuid
end

As you can see in the above code snippet, we decided to prepend all of our new columns with an underscore. This would allow us to separate out and sort these new columns easily.

In addition, it made it simpler for the development team to ensure they were not writing any code that depended on these new fields. It also lightened our load by making it easy to spot any duplications or missed fields.

Once we added the column, we then added the code to set the value for new records. Sadly, we were unable to install extensions on the production DB, even after following the README and some white knuckle experimentation. Instead of having PostgreSQL do the work, we had to generate UUIDs in Ruby and assign them in a before_save callback:

before_save :generate_uuid
def generate_uuid
    self._uuid ||= SecureRandom.uuid
end

Once we found that this was running smoothly in production, we then wrote Sidekiq jobs to start backfilling the values for existing rows during periods of low site traffic. With a working prototype in hand, we began doing the same for each of the other public schema tables.

Private Schemas

To do the same for the private schemas, we first iterated over the schemas followed by the models. You’ll notice in the below code snippet that we are sleep-ing when iterating through each tenant. This is because we found that in ripping through the schemas, there was a bit of lag between the last write of a tenant and the first write after switching to a new one.

Without sleeping, some tables were being processed in the previous schema and therefore skipped in the current schema. It is definitely something to watch out for and tune to your needs should you find yourself following in our footsteps. 0.2 seconds was the threshold for us. It added a significant amount of time to our later and more intensive runs, but we couldn’t work around it without coming up with a patch for the library itself—which was out of scope.

tenants_to_update.each do |tenant|
    sleep SLEEP_DELAY_SECONDS
    Apartment::Tenant.switch(tenant) do
        private_schema_models_to_update.each do |model|
            _set_private_schema_uuid!(model, tenant)
        end
    end
end

We introduced these changes and monitored them for a bit to make sure all was fine, then we moved to tag each row in each private table with the UUID of their respective tenant owner. The eerily familiar code looked something like this:

def change
    add_column :widgets, :_tenant_uuid, :uuid
end

before_save :set_tenant_uuid
def set_tenant_uuid
    tenant = Apartment::Tenant.current
    if tenant == 'public'
        raise 'Tenant not set'
    else
        self._tenant_uuid ||= tenant._uuid
    end
end

tenants_to_update.each do |tenant|
    sleep SLEEP_DELAY_SECONDS
    Apartment::Tenant.switch(tenant) do
        private_schema_models_to_update.each do |model|
            _set_private_tenant_uuid!(model, tenant)
        end
    end
end

Again, we introduced these changes and monitored them for a bit to make sure all was fine, only to move onto dealing with the foreign keys. Our convention was that we’d take a given foreign key of the form something_id and create a UUID version like so: _something_uuid

add_column :widgets, :_something_uuid, :uuid
add_index :widgets, :_something_uuid

On each model we defined a class method for all of the belong_to’s that we could iterate over the result.

def self._uuid_relationships
    ['something']
end

Then we did the same sort of iterations as above for both public and private schemas. Since the data was already in the database we could simply run an UPDATE query in a batched manner, however we felt necessary:

_uuid_relationships.each do |rel_name|
    sql = <<~SQL
        UPDATE #{table_name}
        SET _#{rel_name}_uuid = #{fk_table_name}._uuid
        FROM #{fk_table_name}
        WHERE #{table_name}.#{rel_name}_id = #{fk_table_name}.id
    SQL
    ActiveRecord::Base.connection.execute(sql)
end

Since these were fast and nothing in the code base depended on these new keys yet, we ran them periodically while we were coding up the next part of this phase of the project: removing the Apartment gem and consolidating the schemas.

Stay tuned for Part 2 in this series!

The Big Rewrite: How DockYard used database multitenancy for Ruby to Elixir data migration - DockYard

Part I: From Apartment to Single Family Home

Ground Work

Multitenancy

Database Multitenancy

Schema Multitenancy

Table Multitenancy

The Path Forward

Prefactoring

Public Schemas

Private Schemas

Stay in the Know

Part I: From Apartment to Single Family Home

Ground Work

Multitenancy

Database Multitenancy

Schema Multitenancy

Table Multitenancy

The Path Forward

Prefactoring

Public Schemas

Private Schemas

Newsletter

Stay in the Know