9.9 KiB
Troubleshooting
Your toolkit
For a general overview, read this documentation
Some common commands:
- docker-compose exec app bash -- gets you into bash
- ./manage.py showmigrations -- shows the current migrations that are finished, all should have [x]
- ./manage.py makemigrations -- makes the migration
- ./manage.py showmigrations -- now you should see the new/updated migration with a [ ]
- ./manage.py migrate [folder name here, ie registrar] -- applies those changes to db for specific folder
- ./manage.py showmigrations -- the migration changes should now have a [x] by it
Scenarios
Scenario 1: Conflicting migrations on local
If you get conflicting migrations on local, you probably have a new migration on your branch and you merged main which had new migrations as well. Do NOT merge migrations together.
Assuming your local migration is 40_local_migration
and the migration from main is 40_some_migration_from_main
:
- Delete
40_local_migration
- Run
docker-compose exec app ./manage.py makemigrations
- Run
docker-compose down
- Run
docker-compose up
- Run
docker-compose exec app ./manage.py migrate
You should end up with 40_some_migration_from_main
, 41_local_migration
Alternatively, assuming that the conflicting migrations are not dependent on each other, you can manually edit the migration file such that your new migration is incremented by one (file name, and definition inside the file) but this approach is not recommended.
Scenario 2: Conflicting migrations on sandbox (can be fixed with GH workflow)
A 500 error on a sanbox after a fresh push usually indicates a migration issue. Most of the time, these migration issues can easily be fixed by simply running the "reset-db" workflow in Github.
For the workflow, select the following inputs before running it; "Use workflow from": Branch-main "Which environment should we flush and re-load data for?" <YOUR_TARGET_SANDBOX>
This is not a cure-all since it simply flushes and re-runs migrations against your sandbox. If running this workflow does not solve your issue, proceed examining the scenarios below.
Scenario 3: Conflicting migrations on sandbox (cannot be fixed with GH workflow)
This occurs when the logs return the following:
Conflicting migrations detected; multiple leaf nodes in the migration graph: (0040_example, 0041_example in base). To fix them run 'python manage.py makemigrations --merge'
This happens when you swap branches on your sandbox that contain diverging leaves (eg: 0040_example, 0041_example). The fix is to go into the sandbox, delete one of these leaves, fake run the preceding migration, hand run the remaining previously conflicting leaf, fake run the last migration:
cf login -a api.fr.cloud.gov --sso
cf ssh getgov-<app>
/tmp/lifecycle/shell
- Find the conflicting migrations:
./manage.py showmigrations
- Delete one of them:
rm registrar/migrations/0041_example.py
/manage.py showmigrations
/manage.py makemigrations
/manage.py migrate
Scenario 3: Migrations ran incorrectly, and migrate no longer works (sandbox)
This has happened when updating user perms (so running a new data migration). Something is off with the update on the sandbox and you need to run that last data migration again:
cf login -a api.fr.cloud.gov --sso
cf run-task getgov-<app> --wait --command 'python manage.py migrate registrar 39_penultimate_miration --fake' --name migrate
cf run-task getgov-<app> --wait --command 'python manage.py migrate' --name migrate
Scenario 4: All migrations refuse to load due to existing duplicates on sandboxes
This typically happens with a DB conflict that prevents 001_initial from loading. For instance, let's say all migrations have ran successfully before, and a zero command is ran to reset everything. This can lead to a catastrophic issue with your postgres database.
To diagnose this issue, you will have to manually delete tables using the psql shell environment. If you are in a production environment and cannot lose that data, then you will need some method of backing that up and reattaching it to the table.
cf login -a api.fr.cloud.gov --sso
- Run
cf connect-to-service -no-client getgov-{environment_name} getgov-{environment_name}-database
to open a SSH tunnel - Run
psql -h localhost -p {port} -U {username} -d {broker_name}
- Open a new terminal window and run
cf ssh getgov-{environment_name}
- Within that window, run
tmp/lifecycle/shell
- Within that window, run
./manage.py migrate
and observe which tables are duplicates
Afterwards, go back to your psql instance. Run the following for each problematic table:
DROP TABLE {table_name} CASCADE
WARNING: this will permanently erase data! Be careful when doing this and exercise common sense.
Then, run ./manage.py migrate
again and repeat step 7 for each table which returns this error.
After these errors are resolved, follow instructions in the other scenarios if applicable.
Scenario 5: Permissions group exist, but my users cannot log onto the sandbox
This is most likely due to fixtures not running or fixtures running before the data creating migration. Simple run fixtures again (WARNING: This applies to dev sandboxes only. We never want to rerun fixtures on a stable environment)
cf login -a api.fr.cloud.gov --sso
cf run-task getgov-<app> --command "./manage.py load" --name fixtures
Scenario 6: Data is corrupted on the sandbox
Example: there are extra columns created on a table by an old migration long since gone from the code. In that case, you may have to tunnel into your DB on the sandbox and hand-delete these columns. See scenario #4 if you are running into duplicate table definitions. Also see this documentation for a good reference here:
cf login -a api.fr.cloud.gov --sso
- Open a new terminal window and run
cf ssh getgov{environment_name}
- Run
tmp/lifecycle/shell
- Run
./manage.py migrate
and observe which tables have invalid column definitions - Run the
\l
command to see all of the databases that are present \c cgawsbrokerprodlgi635s6c0afp8w
(assume cgawsbrokerprodlgi635s6c0afp8w is your DB) ‘\dt’ to see the tablesSELECT * FROM {bad_table};
alter table registrar_domain drop {bad_column};
Scenario 7: Continual 500 error for the registrar + your requests (login, clicking around, etc) are not showing up in the logstream
Example: You are able to log in and access the /admin page, but when you arrive at the registrar you keep getting 500 errors and your log-ins any API calls you make via the UI does not show up in the log stream. And you feel like you’re starting to lose your marbles.
In the CLI, run the command cf routes
If you notice that your route of getgov-<app>.app.cloud.gov
is pointing two apps, then that is probably the major issue of the 500 error. (ie mine was pointing at getgov-<app>.app.cloud.gov
AND cisa-dotgov
)
In the CLI, run the command cf apps
to check that it has an app running called cisa-dotgov
. If so, there’s the error!
Essentially this shows that your requests were being handled by two completely separate applications and that’s why some requests aren’t being located.
To resolve this issue, remove the app named cisa-dotgov
from this space.
Test out the sandbox from there and it should be working!
Debug connectivity
dig https://getgov-.app.cloud.gov (domain information groper, gets DNS nameserver information) curl -v https://getgov-.app.cloud.gov/ --resolve 'getgov-.app.cloud.gov:' (this gets you access to ping to it) You should be able to play around with your sandbox and see from the curl command above that it’s being pinged. This command is basically log stream, but gives you full access to make sure you can ping the sandbox manually https://cisa-corp.slack.com/archives/C05BGB4L5NF/p1697810600723069
Scenario 8: Can’t log into sandbox, permissions do not exist
./manage.py migrate --fake model_name_here file_name_BEFORE_the_most_recent_CREATE_migration
(fake migrate the migration that’s before the last data creation migration -- look for number_create, and then copy the file BEFORE it)./manage.py migrate model_name_here file_name_WITH_create
(run the last data creation migration AND ONLY THAT ONE)./manage.py migrate --fake model_name_here most_recent_file_name
(fake migrate the last migration in the migration list)./manage.py load
(rerun fixtures)
Scenario 9: Inconsistent Migration History
If you see django.db.migrations.exceptions.InconsistentMigrationHistory
error, or when you run ./manage.py showmigrations
it looks like:
[x] 0056_example_migration [ ] 0057_other_migration [x] 0058_some_other_migration
- Go to database-access.md to see the commands on how to access a certain table in the database.
- In this case, we want to remove the migration "history" from the
django_migrations
table - Once you are in the
cgaws...
table, select thedjango_migrations
table with the commandSELECT * FROM django_migrations;
- Find the id of the "history" you want to delete. This will be the one in the far left column. For this example, let's pretend the id is 101.
- Run
DELETE FROM django_migrations WHERE id=101;
where 101 is an example id as seen above. - Go to your shell and run
./manage.py showmigrations
to make sure your migrations are now back to the right state. Most likely you will see several unapplied migrations. - If you still have unapplied migrations, run
./manage.py migrate
. If an error occurs saying one has already been applied, fake that particular migration./manage.py migrate --fake model_name_here migration_number
and then run the normal./manage.py migrate
command to then apply those migrations that come after the one that threw the error.