-
Notifications
You must be signed in to change notification settings - Fork 123
Application Database Migrations
The BloodHound application database was originally built using Gorm auto-migrations. Over time, the need for separate stepwise migrations arose to handle cases that Gorm models could not. Today, Gorm auto-migrations have been removed due to long time pain points and migration errors. In their place, we now implement a base schema that matches the one Gorm auto-migrate created at the time of the transition, and run stepwise migrations on top of it. In the case that a database already exists, we only apply the stepwise migrations that haven't been applied yet, using the semver stored in the database migrations table vs the currenly running API semver to determine which files need to be migrated.
The loss of Gorm's AutoMigrate functionality adds some cognitive overhead. Simply labeling a Gorm model with constraints, changing its fields, or otherwise modifying the schema that should be used with a model will not automatically carry over to the underlying database. To be clear, any destructive operation already wouldn't work, so this cognitive overhead isn't actually new. However, there is a bit more process involved and you will need to dust off your raw SQL skills to write migrations for model changes you make.
Our models currently live in cmd/api/src/model
. The current reality of the app is that models are often multipurpose, meaning you'll find plenty of models that are designed for use as JSON models in the API layer and as Gorm models at the database layer. Any modification to these models must be understood by the engineer making the modifications and the appropriate database schema changes need to be written as a stepwise migration.
Our stepwise migrations can be found under cmd/api/src/database/migration/migrations
. Stepwise migrations are intended to take on the name of the next semver release that they are expected to be a part of. If you are unsure what the previous release semver is, run just show
after pulling the latest main
and it will give you the last tagged version. For example, if the current version is v5.5.0
, and the next release is expected to be a feature release, the stepwise migration should be named v5.6.0
. If the next release were to be a patch release (like a hotfix), it would be v5.5.1
instead. Migration names may be changed later if the release they belong to is moved or requires a different version bump than expected, but the current target should be determined when writing the migration. Afterall, others may also be adding migrations to the same release cycle, so sharing that file will be important.
Stepwise migrations should be written as concise, deterministic, and idempotent SQL targeting Postgres. Testing should be done both manually against the database using a database tool like DBeaver or psql
, as well as through migration (see Testing Migrations for detailed steps).
As stated above, models are currently shared between Gorm and other application needs in many cases. Most importantly, many models are used as API response types. As a result, any changes made to a model should be preceded by careful study of where the model is used and what parts of the application are impacted. If the modification could cause a breaking API change, it will need to be reviewed with the BHE team to assess whether it's warranted or if another approach will be necessary.
In the cases where the intended modification of a model is both necessary and the impact is understood and approved, an assessment of database schema impact is required. Some examples include:
- Adding a field requires a stepwise migration that adds the same field to the database schema, being mindful of naming conventions in Gorm https://gorm.io/docs/conventions.html.
- Removing a field requires a stepwise migration to remove the field from the database as well
- Changing the name of a field will require a corresponding rename operation in the stepwise migration.
- Constraints should no longer be added to the model (as they're only read for AutoMigrate and CreateTable use cases) but instead be explicitly created/modified/deleted with a stepwise migration in all cases.
- New tables will need to include all the metadata fields in addition to the explicit fields. Most models "inherit" (through embedding) fields from a struct that implements a specific type of
id
and embeds theBasic
struct, which adds theCreatedAt
,UpdatedAt
, andDeletedAt
fields. Always parse all embedded structs in your model and include their fields in your table migration.
Follow best judgement, ensure your migrations are idempotent (can be re-run without causing changes or errors), and test thoroughly from the database layer all the way up to API integration testing and manual API endpoint testing.
The example migration PR template linked below is based on the migration found in cmd/api/src/database/migration/test_migrations/v0.1.1.sql
The following is a template to follow for documenting your manual migration testing in your pull request:
- Run
just bh-dev
- Create fresh database on
main
git checkout main
-
just bh-dev down -v
(clear existing volumes) -
just bh-dev
(wait for server to start, then proceed)
- Input example dataset (see below)
- Test the migration directly
- In your database, go to migrations and delete the row with version set to
v999.999.999
- If this is the only row in your database (which is likely), add a new row with the last release tag given from
just show
usingINSERT INTO migrations (major, minor, patch) VALUES (<major>, <minor>, <patch>)
where major/minor/patch are the integers you get from the last release tag.
- If this is the only row in your database (which is likely), add a new row with the last release tag given from
- Stop currently running
bh-dev
(ctrl+c
orjust bh-dev stop
) - Checkout this branch
-
just bh-dev
to start the server and run migrations - If successful, there should be a log that this migration was run, as well as additional entries in the migrations table
- In your database, go to migrations and delete the row with version set to
- Validate the data that was expected to be migrated was successfully migrated and that the data that shouldn't be migrated was not touched.
- Check the schema of the affected table/tables to ensure that it matches the intended new schema post-migration
-- Insert data we expect to be migrated
INSERT INTO migration_test (name, foo)
VALUES ('foo', 'foo'), ('foo', 'bar'), ('foo', 'bar')
-- Insert data we expect to be untouched (canary)
INSERT INTO migration_test (name, foo)
VALUES ('bar', 'baz')
Before Migration:
After Migration: